<p style="font-family: Arial; font-size:3.75em;color:purple; font-style:bold"><br>
Introduction to numpy:
</p><br>

<p style="font-family: Arial; font-size:1.25em;color:#2462C0; font-style:bold"><br>
Package for scientific computing with Python
</p><br>

Numerical Python, or "Numpy" for short, is a foundational package on which many of the most common data science packages are built.  Numpy provides us with high performance multi-dimensional arrays which we can use as vectors or matrices.  

The key features of numpy are:

- ndarrays: n-dimensional arrays of the same data type which are fast and space-efficient.  There are a number of built-in methods for ndarrays which allow for rapid processing of data without using loops (e.g., compute the mean).
- Broadcasting: a useful tool which defines implicit behavior between multi-dimensional arrays of different sizes.
- Vectorization: enables numeric operations on ndarrays.
- Input/Output: simplifies reading and writing of data from/to file.

<b>Additional Recommended Resources:</b><br>
<a href="https://docs.scipy.org/doc/numpy/reference/">Numpy Documentation</a><br>
<i>Python for Data Analysis</i> by Wes McKinney<br>
<i>Python Data science Handbook</i> by Jake VanderPlas



<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>

Getting started with ndarray<br><br></p>

**ndarrays** are time and space-efficient multidimensional arrays at the core of numpy.  Like the data structures in Week 2, let's get started by creating ndarrays using the numpy package.

In [28]:
import numpy as np 

an_array = np.array([3,3,333,3,333000])
print(type(an_array))

<class 'numpy.ndarray'>


In [29]:
print(an_array.shape)

(5,)


In [30]:
print(an_array[0],an_array[1],an_array[2])

3 3 333


In [31]:
an_array[0] = 158
print(an_array)

[   158      3    333      3 333000]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

How to create a Rank 2 numpy array:</p>

A rank 2 **ndarray** is one with two dimensions.  Notice the format below of [ [row] , [row] ].  2 dimensional arrays are great for representing matrices which are often useful in data science.

In [32]:
another = np.array([[11,12,13],[14,15,16]])
print(another)

print("The shape is: ",another.shape)
print(f"Accesing elements {another[0][2]}, {another[1][1]}")

[[11 12 13]
 [14 15 16]]
The shape is:  (2, 3)
Accesing elements 13, 15


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

There are many way to create numpy arrays:
</p>

Here we create a number of different size arrays with different shapes and different pre-filled values.  numpy has a number of built in methods which help us quickly and easily create multidimensional arrays.

In [33]:
import numpy as np 

# create a 2x2 array of zeros

ex1 = np.zeros((2,2))
print(ex1)

[[0. 0.]
 [0. 0.]]


In [34]:
# creates a 2x2 array filled with 9.0

ex2 = np.full((2,2),9.0)
print(ex2)

[[9. 9.]
 [9. 9.]]


In [35]:
# create a 2x2 matrix array filled with 9.0

ex3 = np.eye(2,2)
print(ex3)

[[1. 0.]
 [0. 1.]]


In [36]:
# create an array of ones

ex4 = np.ones((1,2))
print(ex4)

[[1. 1.]]


In [37]:
# notice that the above ndarray (ex4) is actually rank 2 it is a 2x1 array
print(ex4.shape)

#which mean we need to use two indexes to access an element
print(ex4[0,1])

(1, 2)
1.0


In [38]:
# create an array of rando float between 0 and 1 
ex5 = np.random.random((2,2))
print(ex5)

[[0.86205712 0.55405483]
 [0.27562419 0.65471908]]


<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>
Slice indexing:
</p>

Similar to the use of slice indexing with lists and strings, we can use slice indexing to pull out sub-regions of ndarrays.

In [39]:
import numpy as np 

# RANK 2 array of shape (3,4)

an_array = np.array([[11,12,13],[21,22,23],[31,32,33]])
print(an_array)

[[11 12 13]
 [21 22 23]
 [31 32 33]]


In [40]:
a_slice = an_array[:2,1:3]
print(a_slice)

[[12 13]
 [22 23]]


In [41]:
print("Before: ", an_array[0,1]) 
a_slice[0,0] = 1000
print("After: ", an_array[0,1])

Before:  12
After:  1000


In [42]:
an_array = np.array([[11,12,13,14], [21,22,23,24], [31,32,33,34]])
print(an_array)

[[11 12 13 14]
 [21 22 23 24]
 [31 32 33 34]]


In [43]:
row_rank1 = an_array[1,:]
print(row_rank1, row_rank1.shape)

[21 22 23 24] (4,)


In [44]:
row_rank2 =  an_array[1:2,:]
print(row_rank2, row_rank2.shape)

[[21 22 23 24]] (1, 4)


In [45]:
col_rank1 = an_array[:,1]
col_rank2 = an_array[:,1:2]

print(col_rank1, col_rank1.shape)
print(col_rank2, col_rank2.shape)

[12 22 32] (3,)
[[12]
 [22]
 [32]] (3, 1)


In [46]:
an_array = np.array([[11,12,13], [21,22,23], [31,32,33], [41,42,43]])


print("Original array: ")
print(an_array)

Original array: 
[[11 12 13]
 [21 22 23]
 [31 32 33]
 [41 42 43]]


In [47]:
col_indices = np.array([0,1,2,0])
print("\n Col indice picked : ", col_indices)

row_indices = np.arange(4)
print("\n Rows indices picked :", row_indices)


 Col indice picked :  [0 1 2 0]

 Rows indices picked : [0 1 2 3]


In [48]:
for row, col in zip(row_indices, col_indices):
    print(row, " ", col)

0   0
1   1
2   2
3   0


In [49]:
print('Values in the array at those indices: ',an_array[row_indices, col_indices])

Values in the array at those indices:  [11 22 33 41]


In [50]:
an_array[row_indices, col_indices] += 10000
print("\nChanged Array: ")
print(an_array)


Changed Array: 
[[10011    12    13]
 [   21 10022    23]
 [   31    32 10033]
 [10041    42    43]]


<p style="font-family: Arial; font-size:2.75em;color:purple; font-style:bold"><br>
Boolean Indexing

<br><br></p>
<p style="font-family: Arial; font-size:1.75em;color:#2462C0; font-style:bold"><br>

Array Indexing for changing elements:
</p>

In [51]:
an_array = np.array([[11,12], [21, 22], [31, 32]])
print(an_array)

[[11 12]
 [21 22]
 [31 32]]


In [52]:
filter = (an_array > 15)
print(filter)

[[False False]
 [ True  True]
 [ True  True]]


In [53]:
print(an_array[filter])

[21 22 31 32]


In [54]:
an_array[(an_array % 2 == 0)]

array([12, 22, 32])

In [55]:
an_array[an_array % 2 == 0] += 100
print(an_array)

[[ 11 112]
 [ 21 122]
 [ 31 132]]


In [56]:
ex1 = np.array([11,12])
print(ex1.dtype)

int64


In [58]:
ex2 = np.array([11.2,11.3,11.4])
print(ex2.dtype)

float64


In [60]:
ex3 = np.array([11.22, 11.55], dtype=np.int64)
print(ex3)

[11 11]


In [61]:
# you can use this to force floats into integers (using floor function)
ex4 = np.array([11.1,12.7], dtype=np.int64)
print(ex4.dtype)
print()
print(ex4)

int64

[11 12]


In [62]:
ex5 = np.array([11,12], dtype=np.float64)
print(ex5.dtype)
print(ex5)

float64
[11. 12.]


In [64]:
x = np.array([[111,112],[121,122]], dtype=np.int64)
y = np.array([[211.1,212.1],[221.1,222.1]], dtype=np.float64)

print(x)
print()
print(y)

[[111 112]
 [121 122]]

[[211.1 212.1]
 [221.1 222.1]]


In [67]:
print(x+y)
print()
print(np.add(x, y))

[[322.1 324.1]
 [342.1 344.1]]

[[322.1 324.1]
 [342.1 344.1]]


In [68]:
print(x - y)
print()
print(np.subtract(x, y))

[[-100.1 -100.1]
 [-100.1 -100.1]]

[[-100.1 -100.1]
 [-100.1 -100.1]]


In [69]:
print(x * y)
print()
print(np.multiply(x, y))

[[23432.1 23755.2]
 [26753.1 27096.2]]

[[23432.1 23755.2]
 [26753.1 27096.2]]


In [70]:
print(x / y)
print()
print(np.divide(x, y))

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]

[[0.52581715 0.52805281]
 [0.54726368 0.54930212]]


In [71]:
print(np.sqrt(x))

[[10.53565375 10.58300524]
 [11.         11.04536102]]


In [72]:
print(np.exp(x))

[[1.60948707e+48 4.37503945e+48]
 [3.54513118e+52 9.63666567e+52]]


In [78]:
arr = 10 * np.random.rand(2,5)
print(arr)

[[6.2566639  3.47999295 5.19174964 6.604775   5.60292397]
 [8.65908859 4.48873206 6.54811867 4.60384908 1.51454299]]


In [79]:
print(arr.mean())

5.295043685092532


In [81]:
print(arr.mean(axis=1))

[5.42722109 5.16286628]


In [82]:
print(arr.mean(axis=0))

[7.45787624 3.98436251 5.86993415 5.60431204 3.55873348]


In [83]:
print(arr.sum())

52.95043685092532


In [85]:
print(np.median(arr, axis=1))

[5.60292397 4.60384908]


In [86]:
unsorted = np.random.randn(10)

print(unsorted)

[ 0.80756201 -0.64140956 -0.45290314 -0.71298105 -1.04753172  0.86806139
  0.24390302 -0.40679205  0.04422168 -0.37626084]


In [87]:
sorted = np.array(unsorted)
sorted.sort()

print(sorted)
print()
print(unsorted)

[-1.04753172 -0.71298105 -0.64140956 -0.45290314 -0.40679205 -0.37626084
  0.04422168  0.24390302  0.80756201  0.86806139]

[ 0.80756201 -0.64140956 -0.45290314 -0.71298105 -1.04753172  0.86806139
  0.24390302 -0.40679205  0.04422168 -0.37626084]


In [88]:
unsorted.sort() 

print(unsorted)

[-1.04753172 -0.71298105 -0.64140956 -0.45290314 -0.40679205 -0.37626084
  0.04422168  0.24390302  0.80756201  0.86806139]


In [89]:
array = np.array([1,2,1,4,2,1,4,2])

print(np.unique(array))

[1 2 4]


In [90]:
s1 = np.array(['desk','chair','bulb'])
s2 = np.array(['lamp','bulb','chair'])
print(s1, s2)

['desk' 'chair' 'bulb'] ['lamp' 'bulb' 'chair']


In [91]:
print(np.intersect1d(s1, s2))

['bulb' 'chair']


In [92]:
print((np.union1d(s1,s2)))

['bulb' 'chair' 'desk' 'lamp']


In [93]:
print(np.setdiff1d(s1,s2))

['desk']


In [94]:
print(np.in1d(s1,s2))

[False  True  True]


In [95]:
import numpy as np

start = np.zeros((4,3))
print(start)

[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]


In [96]:
add_rows = np.array([1, 0, 2])
print(add_rows)

[1 0 2]


In [97]:
y = start + add_rows 
print(y)

[[1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]
 [1. 0. 2.]]


In [99]:
add_cols = np.array([[0,1,2,3]])
add_cols = add_cols.T

print(add_cols)

[[0]
 [1]
 [2]
 [3]]


In [100]:
y = start + add_cols 
print(y)

[[0. 0. 0.]
 [1. 1. 1.]
 [2. 2. 2.]
 [3. 3. 3.]]


In [101]:
add_scalar = np.array([1])  
print(start+add_scalar)

[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]


In [102]:
arrA = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
print(arrA)

[[ 1  2  3  4]
 [ 5  6  7  8]
 [ 9 10 11 12]]


In [105]:
arrB = [1,1,0,2]
print(arrB)

[1, 1, 0, 2]


In [106]:
print(arrA + arrB)

[[ 2  3  3  6]
 [ 6  7  7 10]
 [10 11 11 14]]


In [107]:
from numpy import arange
from timeit import Timer

size    = 1000000
timeits = 1000

In [108]:
nd_array = arange(size)
print(type(nd_array))

<class 'numpy.ndarray'>


In [111]:
time_numpy = Timer("nd_array.sum()", "from __main__ import nd_array")

print("Time taken by numpy nd_array: %f second " % (time_numpy.timeit(timeits)/timeits))

Time taken by numpy nd_array: 0.000866 second 


In [113]:
a_list = list(range(size))
print (type(a_list) )

<class 'list'>


In [114]:
timer_list = Timer("sum(a_list)", "from __main__ import a_list")

print("Time taken by list:  %f seconds" % (timer_list.timeit(timeits)/timeits))

Time taken by list:  0.012844 seconds


In [115]:
x = np.array([23.23,24.24])

In [None]:
np.save('an_array', x)

In [None]:
np.load('an_array.npy')

In [None]:
np.savetxt('array.txt', X=x, delimiter=',')

In [None]:
!cat array.txt

In [None]:
np.loadtxt('array.txt', delimiter=',')

In [116]:
x2d = np.array([[1,1],[1,1]])
y2d = np.array([[2,2],[2,2]])

print(x2d.dot(y2d))
print()
print(np.dot(x2d, y2d))

[[4 4]
 [4 4]]

[[4 4]
 [4 4]]


In [117]:
a1d = np.array([9 , 9 ])
b1d = np.array([10, 10])

print(a1d.dot(b1d))
print()
print(np.dot(a1d, b1d))

180

180


In [118]:
print(x2d.dot(a1d))
print()
print(np.dot(x2d, a1d))

[18 18]

[18 18]


In [119]:
ex1 = np.array([[11,12],[21,22]])

print(np.sum(ex1))         

66


In [120]:
print(np.sum(ex1, axis=0))  

[32 34]


In [121]:
print(np.sum(ex1, axis=0))

[32 34]


In [122]:
x = np.random.randn(8)
print(x)

[ 1.93632965 -0.96406571  0.42713823  1.1416345  -0.83504279  1.01578105
 -0.99576346 -0.13871442]


In [124]:
y = np.random.randn(8)
print(y)

[-0.35259174  2.35745903  0.16069571 -1.15323242  0.02296956 -2.21933368
  0.0041397   0.54339494]


In [125]:
np.maximum(x, y)

array([1.93632965, 2.35745903, 0.42713823, 1.1416345 , 0.02296956,
       1.01578105, 0.0041397 , 0.54339494])

In [126]:
arr = np.arange(20)
print(arr)

[ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19]


In [127]:
arr.reshape(4,5)

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [129]:
ex1 = np.array([[11,12],[21,22]])
print(ex1.T)

[[11 21]
 [12 22]]


In [130]:
x_1 = np.array([1,2,3,4,5])

y_1 = np.array([11,22,33,44,55])

filter = np.array([True, False, True, False, True])

In [131]:
out = np.where(filter, x_1, y_1)
print(out)

[ 1 22  3 44  5]


In [133]:
mat = np.random.rand(5,5)
print(mat)

[[0.06067781 0.42307795 0.54116435 0.78880277 0.88462729]
 [0.30836336 0.50830162 0.56996914 0.87648158 0.45625747]
 [0.60396913 0.93229564 0.31396327 0.06103815 0.38115893]
 [0.70053208 0.92532966 0.35026274 0.27514448 0.83194535]
 [0.01351465 0.55323222 0.90736933 0.10863495 0.11364972]]


In [134]:
np.where( mat > 0.5, 1000, -1)

array([[  -1,   -1, 1000, 1000, 1000],
       [  -1, 1000, 1000, 1000,   -1],
       [1000, 1000,   -1,   -1,   -1],
       [1000, 1000,   -1,   -1, 1000],
       [  -1, 1000, 1000,   -1,   -1]])

In [135]:
arr_bools = np.array([True, False, True, True, False])

In [136]:
arr_bools.any()

True

In [137]:
arr_bools.all()

False

In [138]:
y = np.random.normal(size=(1,5))[0]
print(y)

[-2.72629825  0.47715245  1.9260415   0.52062674 -1.12977659]


In [139]:
Z = np.random.randint(low=2,high=50,size=4)
print(Z)

[ 8  8 38 38]


In [140]:
np.random.permutation(Z)

array([ 8, 38,  8, 38])

In [141]:
np.random.uniform(size=4)

array([0.79027386, 0.29763482, 0.33040957, 0.62831344])

In [142]:
np.random.normal(size=4)

array([ 0.6353271 ,  0.32022528, -0.89396116,  0.62671478])

In [147]:
K = np.random.randint(low=2,high=50,size=(2,2))
print(K)

print("")

M = np.random.randint(low=2,high=50,size=(2,2))
print(M)

[[16  4]
 [36 49]]

[[37 39]
 [45  5]]


In [148]:
np.vstack((K,M))

array([[16,  4],
       [36, 49],
       [37, 39],
       [45,  5]])

In [149]:
np.hstack((K,M))

array([[16,  4, 37, 39],
       [36, 49, 45,  5]])

In [150]:
np.concatenate([K,M], axis=0)

array([[16,  4],
       [36, 49],
       [37, 39],
       [45,  5]])

In [151]:
np.concatenate([K,M], axis=1)

array([[16,  4, 37, 39],
       [36, 49, 45,  5]])