<p style="font-family: Arial; font-size:3.75em;color:purple; font-style:bold"><br>
Introduction to numpy:
</p><br>

<p style="font-family: Arial; font-size:1.25em;color:#2462C0; font-style:bold"><br>
Package for scientific computing with Python
</p><br>

Numerical Python, or "Numpy" for short, is a foundational package on which many of the most common data science packages are built.  Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices.  

The key features of numpy are:

- ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient.  There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).
- Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
- Vectorization: enables numeric operations on ndarrays.

<b>Additional Recommended Resources:</b><br>
<a href="https://docs.scipy.org/doc/numpy/reference/">Numpy Documentation</a><br>




<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Getting started with ndarray<br><br></p>

**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy.  Like the data structures in Week 2, let's get started by creating ndarrays using the numpy package.

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

How to create Rank 1 numpy arrays:
</p>

### Installing numpy package through pip and conda

In [1]:
! pip install numpy



In [1]:
! conda install numpy -y

Fetching package metadata .............
Solving package specifications: .

Package plan for installation in environment /home/nbuser/anaconda3_420:

The following NEW packages will be INSTALLED:

    mkl_fft:    1.0.4-py35h4414c95_1 
    mkl_random: 1.0.1-py35h4414c95_1 
    readline:   7.0-ha6073c6_4       

The following packages will be UPDATED:

    conda:      4.3.31-py35_0         --> 4.5.11-py35_0        
    conda-env:  2.6.0-h36134e3_1      --> 2.6.0-1              
    libgcc-ng:  7.2.0-h7cc24e2_2      --> 8.2.0-hdf63c60_1     
    numpy:      1.11.3-py35h1b885b7_9 --> 1.15.2-py35h1d66e8a_0
    numpy-base: 1.11.3-py35h3dfced4_9 --> 1.15.2-py35h81de0dd_0
    pycosat:    0.6.1-py35_1          --> 0.6.3-py35h14c3975_0 

Proceed ([y]/n)? ^C


In [2]:
import numpy as np

## Creating 1-D array

In [2]:
l = [1,2,3,4,5]
arr = np.array(l)
print(arr)
type(arr)

[1 2 3 4 5]


numpy.ndarray

In [12]:
# test the shape of the array we just created, it should have just one dimension (Rank 1)
print(arr.shape)

(5,)


In [13]:
# because this is a 1-rank array, we need only one index to accesss each element
print(arr[0], arr[1], arr[2]) 

1 2 3


In [3]:
arr[0] = 8           # ndarrays are mutable, here we change an element of the array
print(arr)

[8 2 3 4 5]


In [4]:
arr1 = np.array([1,2.0,'c'])  # np.arrays are homogeneous in nature 
print(arr1) 

['1' '2.0' 'c']


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

How to create a Rank 2 numpy array:</p>

A rank 2 **ndarray** is one with two dimensions.  Notice the format below of [ [row] , [row] ].  2 dimensional arrays are great for representing matrices which are often useful in data science.

In [5]:
arr_twod = np.array([[11,12,13],[21,22,23]])   # Create a rank 2 array

print(arr_twod)  # print the array

print("The shape is 2 rows, 3 columns: ", arr_twod.shape)  # rows x columns                   

print("Accessing elements [0,0], [0,1], and [1,0] of the ndarray: ", arr_twod[0, 0], ", ",arr_twod[0, 1],", ", arr_twod[1, 0])

[[11 12 13]
 [21 22 23]]
The shape is 2 rows, 3 columns:  (2, 3)
Accessing elements [0,0], [0,1], and [1,0] of the ndarray:  11 ,  12 ,  21


In [7]:
# create an array of random floats between 0 and 1

np.random.seed(1)
#ex5 = np.random.rand(3,3)  # Random numbers from a uniform distribution between 0 & 1
#print(ex5)
##plt.hist(ex5)

#a1 = np.random.uniform(1,10,(2,2))  # Random numbers from a uniform distribution between 0 & 1
#print(a1)

#ex6 = np.random.randn(3,3) # Random numbers from a normal distribution 
#print(ex6)
#plt.hist(ex6)

#a2 = np.random.normal(0,2,(2,2))  # Random numbers from a normal distribution between 0 & 2
#print(a2)

ex7 = np.random.randint(1,100,10)  # generate random integers numbers in the given range
print(ex7)

ex8 = np.random.randint(1,100,(3,3)) # generate random integers numbers in the given range for specified shape
print(ex8)


[38 13 73 10 76  6 80 65 17  2]
[[77 72  7]
 [26 51 21]
 [19 85 12]]


### Exercise 1 : Genearte a list of 20 random samples for a dice throw 

In [8]:
dice_samples = np.random.randint(1,7,20)
dice_samples

array([5, 3, 5, 6, 3, 5, 2, 2, 1, 6, 2, 2, 6, 2, 2, 1, 5, 2, 1, 1])

# Basics of NumPy Arrays

## * Attributes of arrays
## * Indexing of arrays
## * Slicing of arrays
## * Reshaping of arrays


# 1. Attributes of Arrays: 
### Determining the size, shape, data type and dimension of array

In [9]:
x1 = np.random.randint(10, size=6)
x2 = np.random.randint(10, size=(3,4))
x3 = np.random.randint(10, size=(3,4,5))
print('X1 array is : ' ,x1)
print('**********')
print('X2 array is : ', x2)
print('**********')
print('X3 array is : ')
print(x3)

X1 array is :  [3 9 8 7 3 6]
**********
X2 array is :  [[5 1 9 3]
 [4 8 1 4]
 [0 3 9 2]]
**********
X3 array is : 
[[[0 4 9 2 7]
  [7 9 8 6 9]
  [3 7 7 4 5]
  [9 3 6 8 0]]

 [[2 7 7 9 7]
  [3 0 8 7 7]
  [1 1 3 0 8]
  [6 4 5 6 2]]

 [[5 7 8 4 4]
  [7 7 4 9 0]
  [2 0 7 1 7]
  [9 8 4 0 1]]]


In [26]:
print("dimension of x1 : ", x1.ndim)
print("Shape of x1 : ", x1.shape)
print("size of x1 : ", x1.size)
print("data type of x1 : ", x1.dtype)

dimension of x1 :  1
Shape of x1 :  (6,)
size of x1 :  6
data type of x1 :  int64


In [11]:
print("dimension of x2 : ", x2.ndim)
print("Shape of x2 : ", x2.shape)
print("size of x2 : ", x2.size)
print("data type of x2 : ", x2.dtype)

dimension of x2 :  2
Shape of x2 :  (3, 4)
size of x2 :  12
data type of x2 :  int64


In [10]:
print("dimension of x3 : ", x3.ndim)
print("Shape of x3 : ", x3.shape)
print("size of x3 : ", x3.size)
print("data type of x3 : ", x3.dtype)

dimension of x3 :  3
Shape of x3 :  (3, 4, 5)
size of x3 :  60
data type of x3 :  int64


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Array Indexing
<br><br></p>

In [26]:
# Rank 2 array of shape (3, 4)
P = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]], dtype='int32')
print(P)
P.shape

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


(3, 4)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Normal indexing:
</p>

We can use rows and cols index values to retrieve any element. 

In [27]:
P[2,2]

33

In [28]:
P[2,2] = 50  # Normal indexing fetching an element present at the location 2,2

In [29]:
P

array([[11, 12, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 50, 34]], dtype=int32)

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Slice indexing:
</p>

Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays.

Use array slicing to get a subarray consisting of the first 2 rows x 2 columns.

In [30]:
p_slice = P[0:2,0:2]
print(p_slice)
print(p_slice.shape)

[[11 12]
 [21 22]]
(2, 2)


In [31]:
print(p_slice[0,0],p_slice[1,1])

11 22


In [32]:
p_slice[0,1] = 60
p_slice

array([[11, 60],
       [21, 22]], dtype=int32)

In [33]:
P

array([[11, 60, 13, 14],
       [21, 22, 23, 24],
       [31, 32, 50, 34]], dtype=int32)

In [34]:
a_slice = P[:2, 1:3].copy()   # Creates a explicit copy
a_slice[0,0]=1000
print(P)
print(a_slice)

[[11 60 13 14]
 [21 22 23 24]
 [31 32 50 34]]
[[1000   13]
 [  22   23]]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Use both normal indexing & slice indexing
</p>

We can use combinations of integer scalar and slice indexing to create different shaped matrices.

In [35]:
# Create a Rank 2 array of shape (3, 4)
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [36]:
# Using both integer scalar & slicing generates an array of lower rank
row_rank1 = an_array[1,:]    # Rank 1 view 
print(row_rank1, row_rank1.shape)  # notice only a single []

[21 22 23 24] (4,)


In [37]:
# Slicing alone: generates an array of the same rank as the an_array
row_rank2 = an_array[1:2, :]  # Rank 2 view 
print(row_rank2, row_rank2.shape)   # Notice the [[ ]]

[[21 22 23 24]] (1, 4)


In [52]:
#We can do the same thing for columns of an array:

print()
col_rank1 = an_array[:, 1]
col_rank2 = an_array[:, 1:2]

print(col_rank1, col_rank1.shape)  # Rank 1
print()
print(col_rank2, col_rank2.shape)  # Rank 2



[12 22 32] (3,)

[[12]
 [22]
 [32]] (3, 1)


array([12, 22, 32])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Integer indexing:
</p>

This mechanism helps in selecting any arbitrary item in an array based on its N-dimensional index.
Each integer array represents the number of indexes into that dimension. 
It is like simple indexing but we pass arrays of indices in place of single scalars. 
This allows us to very quickly access and modify complicated subsets of an array's values. 


In [39]:
x = np.array([[1,2,3], [4,5,6], [7,8,9]]) 
x

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [40]:
y = x[[0,1,2], [0,1,0]]         # selects the elements at (0,0), (1,1) and (2,0) 
print(y)

[1 5 7]


In [41]:
y[1] = 10

In [42]:
y

array([ 1, 10,  7])

In [43]:
x

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

### Exercise 2: Create a 3x3 array and write statements to retrieve its corner elements. 

In [44]:
x = np.array([[1,2,3], [4,5,6], [7,8,9]]) 
x

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [45]:
r = [0,0,2,2]
c = [0,2,0,2]
a = x[r,c]
print(a)

[1 3 7 9]


In [16]:
rows = np.array([[0,0],[2,2]])
print(rows)
print('-----------')
cols = np.array([[0,2],[0,2]]) 
print(cols)
print('-----------')
y = x[rows,cols] 
print(y)                # prints the corner elements of the array. (0,0), (3,0) (0,2) and (3,2)

[[0 0]
 [2 2]]
-----------
[[0 2]
 [0 2]]
-----------
[[1 3]
 [7 9]]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>
Boolean Indexing

<br><br></p>
<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Array Indexing for changing elements:
</p>

In [46]:
# create a 3x2 array
an_array = np.array([[11,12], [21, 22], [31, 32]])
print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [47]:
# create a filter which will be boolean values for whether each element meets this condition
filter1 = an_array < 15
filter1

array([[ True,  True],
       [False, False],
       [False, False]])

Notice that the filter is a same size ndarray as an_array which is filled with True for each element whose corresponding element in an_array which is greater than 15 and False for those elements whose value is less than 15.

In [48]:
# we can now select just those elements which meet that criteria
result=an_array[filter1]
result

array([11, 12])

In [49]:
result[0]

11

In [50]:
result[0] = 100

In [51]:
result

array([100,  12])

In [52]:
an_array

array([[11, 12],
       [21, 22],
       [31, 32]])

In [53]:
# Filter values which are more than 20 and less than 30
res = (an_array>20) & (an_array<30)
print (res)
a_slice = an_array[res]
print(a_slice)


[[False False]
 [ True  True]
 [False False]]
[21 22]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold">
Array Operations
<br>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold">
Arithmetic Array Operations:

</p>

In [1]:
import numpy as np

In [3]:
x = np.array([[1,2],[3,4]], dtype=np.int)
y = np.array([[1.0,2.0],[3.0,4.0]], dtype=np.float64)
print(x)
print()
print(y)


[[1 2]
 [3 4]]

[[1. 2.]
 [3. 4.]]


In [4]:
print('Addition : ')
print(x+y)
print('Subtraction : ')
print(x-y)
print('Multiplication : ')
print(x*y)
print('Division : ')
print(x/y)

Addition : 
[[2. 4.]
 [6. 8.]]
Subtraction : 
[[0. 0.]
 [0. 0.]]
Multiplication : 
[[ 1.  4.]
 [ 9. 16.]]
Division : 
[[1. 1.]
 [1. 1.]]


In [6]:
print(np.add(x,y))           # vectorized methods 
print(np.subtract(x,y))
print(np.multiply(x,y))
print(np.divide(x,y))

[[2. 4.]
 [6. 8.]]
[[0. 0.]
 [0. 0.]]
[[ 1.  4.]
 [ 9. 16.]]
[[1. 1.]
 [1. 1.]]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold">
Element-wise Functions: </p>

For example, let's compare two arrays values to get the maximum of each.

In [7]:
# random array
np.random.seed(1)
x = np.random.randint(1,8,5)
x

array([6, 4, 5, 1, 2])

In [8]:
# another random array
np.random.seed(2)
y = np.random.randint(1,8,5)
y

array([1, 6, 1, 7, 4])

In [9]:
# returns element wise maximum between two arrays
print(np.maximum(x,y))
print(np.minimum(x,y))
print(np.equal(x,y))

[6 6 5 7 4]
[1 4 1 1 2]
[False False False False False]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold">
Statistical Methods
<br>

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold">
Basic Statistical Operations:
</p>

In [10]:
np.random.seed(1)
# setup a random 2 x 5 matrix
arr = np.random.randint(1,10,(2,5))
print(arr)
#dir(np.random)
arr.shape

[[6 9 6 1 1]
 [2 8 7 3 5]]


(2, 5)

In [11]:
# compute the mean for all elements
print(arr.mean())

4.8


In [12]:
# compute the means by row
print(arr.mean(axis = 1))


[4.6 5. ]


In [13]:
# compute the means by column
print(arr.mean(axis = 0))

[4.  8.5 6.5 2.  3. ]


In [14]:
# sum all the elements
print(arr.sum())   # sum of all elements
print('columns sum is : ', arr.sum(axis=0))    # colsum
print('row sum is : ', arr.sum(axis=1))    # row sum

48
columns sum is :  [ 8 17 13  4  6]
row sum is :  [23 25]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold">
Finding Unique elements:
</p>

In [15]:
array = np.array([1,2,1,4,2,1,4,2])
print(np.unique(array, return_counts=True))

(array([1, 2, 4]), array([3, 3, 2]))


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold">
Broadcasting:
<br>

Introduction to broadcasting. <br>
For more details, please see: <br>
https://docs.scipy.org/doc/numpy-1.10.1/user/basics.broadcasting.html

Broadcasting in NumPy follows a strict set of rules to determine the interaction between the two arrays:

Rule 1: If the two arrays differ in their number of dimensions, the shape of the one with fewer dimensions is padded with ones on its leading (left) side. <br>

Rule 2: If the shape of the two arrays does not match in any dimension, the array with shape equal to 1 in that dimension is stretched to match the other shape. <br>

Rule 3: If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

In [18]:
a1 = np.array([[1,2,3],[4,5,6]])
print('shape of a1 : ',a1.shape)
print('dimension of a1 : ', a1.ndim)
print('array a1:',a1)

shape of a1 :  (2, 3)
dimension of a1 :  2
array a1: [[1 2 3]
 [4 5 6]]


In [19]:
a2 = np.array([1,2,1])
print('shape of a2:',a2.shape)
print('dimension of a2:', a2.ndim)
print('array a2:', a2)

shape of a2: (3,)
dimension of a2: 1
array a2: [1 2 1]


In [20]:
a3 = a1+a2
print(a3)

[[2 4 4]
 [5 7 7]]


In [21]:
start = np.zeros((4,3))
print(start)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [22]:
add_cols = np.array([[1],[2],[3],[4]])
add_cols

array([[1],
       [2],
       [3],
       [4]])

In [23]:
start+add_cols

array([[1., 1., 1.],
       [2., 2., 2.],
       [3., 3., 3.],
       [4., 4., 4.]])

In [39]:
# this will just broadcast in both dimensions
add_scalar = np.array([5])  
print(start+add_scalar)

[[5. 5. 5.]
 [5. 5. 5.]
 [5. 5. 5.]
 [5. 5. 5.]]


In [24]:
arr = np.ones((3,2))
print(arr.shape)
print(arr)

(3, 2)
[[1. 1.]
 [1. 1.]
 [1. 1.]]


In [25]:
arr2 = np.arange(3)
print(arr2.shape)
print(arr2)

(3,)
[0 1 2]


In [26]:
print(arr+arr2)

ValueError: operands could not be broadcast together with shapes (3,2) (3,) 

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold">
Reshaping array:
</p>

In [28]:
# grab values from 0 through 19 in an array
arr = np.arange(0,20,1)
print(arr)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [32]:
# reshape to be a 4 x 5 matrix
new_arr=arr.reshape(4,5)
new_arr

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [33]:
new_arr.flatten()

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold">
Indexing using where():</p>

In [35]:
x_1 = np.array([1,2,3,4,5])

y_1 = np.array([11,22,33,44,55])


In [39]:
out = np.where(x_1%2!=0,x_1,y_1)
print(out)

[ 1 22  3 44  5]


In [40]:
np.random.seed(1)
mat = np.random.randint(1,10,25)
mat.reshape(5,5)

array([[6, 9, 6, 1, 1],
       [2, 8, 7, 3, 5],
       [6, 3, 5, 3, 5],
       [8, 8, 2, 8, 1],
       [7, 8, 7, 2, 1]])

In [89]:
np.where( mat > 5, 1000,-1)

array([1000, 1000, 1000,   -1,   -1,   -1, 1000, 1000,   -1,   -1, 1000,
         -1,   -1,   -1,   -1, 1000, 1000,   -1, 1000,   -1, 1000, 1000,
       1000,   -1,   -1])

<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold">
"any" or "all" conditionals:</p>

In [41]:
arr_bools = np.array([ True, False, True, True, False ])
b1 = np.array([ True, True, True, True, True ])
b2 =  np.array([ False, False, False, False, False ])

In [69]:
print(arr_bools.any())
print(b1.any())
print(b2.any()) 

True
True
False


In [17]:
print(arr_bools.all())
print(b1.all())
print(b2.all())

False
True
False


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold">
Merging and Spliting data sets:
</p>

In [42]:
K = np.random.randint(low=2,high=50,size=(2,2))
print(K)

print()
M = np.random.randint(low=2,high=50,size=(2,2))
print(M)

[[19 10]
 [26 15]]

[[49 44]
 [10 32]]


In [43]:
np.vstack((K,M))

array([[19, 10],
       [26, 15],
       [49, 44],
       [10, 32]])

In [44]:
np.hstack((K,M))

array([[19, 10, 49, 44],
       [26, 15, 10, 32]])

In [81]:
np.concatenate([K, M], axis = 0)

array([[33,  9],
       [49,  6],
       [16, 37],
       [30, 29]])

In [82]:
np.concatenate([K, M], axis = 1)

array([[33,  9, 16, 37],
       [49,  6, 30, 29]])

In [47]:
A = np.random.randint(1,21,size=(4,4))
A

array([[16, 16,  8, 20],
       [11, 15,  1,  2],
       [18, 14,  4,  1],
       [14,  7,  7,  3]])

In [50]:
B = np.split(A,4,axis=0)
B

[array([[16, 16,  8, 20]]),
 array([[11, 15,  1,  2]]),
 array([[18, 14,  4,  1]]),
 array([[14,  7,  7,  3]])]

In [88]:
C = np.hsplit(A,2)
C

[array([[ 7, 12],
        [10,  1],
        [ 3, 16],
        [20, 20]]), array([[18, 11],
        [ 7, 20],
        [10,  4],
        [ 3, 13]])]

In [89]:
D = np.vsplit(A,2)
D 

[array([[ 7, 12, 18, 11],
        [10,  1,  7, 20]]), array([[ 3, 16, 10,  4],
        [20, 20,  3, 13]])]