# Numpy

NumPy is the fundamental package for scientific computing with Python. It contains among other things:

- a powerful N-dimensional array object
- sophisticated (broadcasting) functions
- tools for integrating C/C++ and Fortran code
- useful linear algebra, Fourier transform, and random number capabilities

Besides its obvious scientific uses, NumPy can also be used as an efficient multi-dimensional container of generic data. Arbitrary data-types can be defined. This allows NumPy to seamlessly and speedily integrate with a wide variety of databases.

[Numpy.org](https://www.numpy.org/)

It is widely used in Data Science, as almost all PyData Ecosystem relies on Numpy. Most of the time, we can use plain Numpy instead of Python lists as Numpy arrays are more memory efficient and fast. For more info on why you would want to use Arrays instead of lists, check out this great [StackOverflow post](http://stackoverflow.com/questions/993984/why-numpy-instead-of-python-lists).

In this course, we will learn the basics of Numpy for Data Analysis like vectors, arrays, matrices before moving on to Pandas which is more SQL-like.


## Numpy Arrays
Numpy arrays are either vectors or matrices. Vectors strictly 1-d arrays and matrices are 2-d (but you should note a matrix can still have only one row or one column).

### NumPy Array from Python Lists

In [1]:
my_list = [1,2,3,4,5]
my_list

[1, 2, 3, 4, 5]

In [2]:
type(my_list)

list

In [4]:
import numpy as np
array1 = np.array(my_list)

In [5]:
type(array1)

numpy.ndarray

In [6]:
array1 # => this is a normal 1D vector

array([1, 2, 3, 4, 5])

In [7]:
#2D array => Matrix of 3x3
my_list2 = [[1,2,3],[4,5,6],[7,8,9]]
my_list2

[[1, 2, 3], [4, 5, 6], [7, 8, 9]]

In [8]:
array2 = np.array(my_list2)

In [9]:
array2

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [10]:
# matrix m x n. 2 x 3 matrix dot 3 x 2 matrix . What is the resulting matrix? Answer: 2 x 2

### arange, zeroes & ones, linspace, rand, randint, eye

In [11]:
list(range(0,10))

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

In [12]:
np.arange(0,10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [13]:
np.arange(0,10,2)

array([0, 2, 4, 6, 8])

In [16]:
np.arange(1.02, 10.21, dtype = int)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

In [17]:
np.arange(1,10)

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [18]:
#zeroes
np.zeros(3)

array([0., 0., 0.])

In [19]:
np.zeros((3,3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [20]:
#ones
np.ones(3)

array([1., 1., 1.])

In [21]:
np.ones((2,3))

array([[1., 1., 1.],
       [1., 1., 1.]])

In [22]:
#a dot product example => don't worry if you are unaware of this for now!
np.dot(np.ones((2,3)),np.zeros((3,3)))

array([[0., 0., 0.],
       [0., 0., 0.]])

In [23]:
#identity matrix. any matrix multiplied by the identity matrix will get back the same value
array2

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [29]:
np.eye(3,dtype=int)

array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])

In [31]:
np.dot(array2,np.eye(3,dtype=int))

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [33]:
# np.linspace to generate numbers that are linearly spaced out
np.linspace(0,10,4)

array([ 0.        ,  3.33333333,  6.66666667, 10.        ])

In [34]:
np.linspace(0,10,5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [49]:
#np.random package can be used to generate some random numbers within a given range
np.random.randint(0,100,40)

array([78, 75, 58, 78, 26, 90, 11, 78, 79, 96, 45,  4, 91, 34, 53, 32, 23,
       79, 69, 32, 54, 48, 76, 24, 96, 86,  4, 80, 61, 19, 42, 19, 39, 15,
        8, 40,  6, 58, 91, 13])

In [46]:
np.random.seed(35) #generates the same random number each time you run this cell
np.random.randint(0,100,40)

array([73, 15, 55, 33, 63, 64, 11, 11, 56, 72, 57, 55, 94, 44, 91, 55, 56,
       64, 76, 61, 21,  3, 19, 19,  9, 45, 59, 88, 39, 69, 56,  6, 80, 94,
        1, 62, 74, 57, 94, 32])

In [59]:
#rolling a dice
np.random.randint(1,7,1)

array([1])

In [69]:
#random.rand => standard distribution => centered at 50%
np.random.seed(30)
np.random.rand(3,3)

array([[0.64414354, 0.38074849, 0.66304791],
       [0.16365073, 0.96260781, 0.34666184],
       [0.99175099, 0.2350579 , 0.58569427]])

In [71]:
# random.randn => normal distribution => centered at 0
np.random.randn(3,3)

array([[-0.26586486,  1.16724904,  2.83434509],
       [ 0.65194963, -1.3181097 ,  0.89078925],
       [ 1.11122629,  0.45579259, -1.7503454 ]])

In [72]:
#array shape and reshape

In [74]:
np.random.randn(3,3).shape

(3, 3)

In [75]:
np.arange(25)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [76]:
arr = np.arange(25)

In [77]:
arr.shape

(25,)

In [79]:
arr.reshape(1,25).shape # transpose of the original vector

(1, 25)

    Normal Vector:
        [1
         2
         3
         4]

```python
    arr.reshape(1,4)
```
    output
        [1 2 3 4]

In [82]:
array2.shape

(3, 3)

In [85]:
# you cannot reshape to a different shape other than the original m*n
array2.reshape(1,9) #flatten the matrix

array([[1, 2, 3, 4, 5, 6, 7, 8, 9]])

In [86]:
array2.reshape(9,1)

array([[1],
       [2],
       [3],
       [4],
       [5],
       [6],
       [7],
       [8],
       [9]])

In [89]:
array2.reshape(4,2)

ValueError: cannot reshape array of size 9 into shape (4,2)

### Numpy built-in methods

In [93]:
np.random.seed(838)
arr2 = np.random.randint(0,100,50)

In [95]:
arr2.shape

(50,)

In [96]:
arr2.max()

93

In [97]:
arr2.argmax()

0

In [101]:
arr2

array([93, 74, 83, 77, 62,  3, 81, 20, 37, 22, 47, 77, 46,  6, 91, 41, 22,
       91, 90, 84, 34, 80, 80, 66, 75, 81, 40, 30,  8, 83, 18, 68, 80, 31,
       92, 73, 33, 16, 12, 48, 22,  6, 73, 34, 59, 38, 25, 73, 63, 92])

In [99]:
arr2.min()

3

In [102]:
arr2.argmin()

5

In [103]:
arr2.mean()

53.6

In [104]:
arr2.std()

28.389434654462566

In [105]:
arr2.var()

805.96

In [106]:
arr2

array([93, 74, 83, 77, 62,  3, 81, 20, 37, 22, 47, 77, 46,  6, 91, 41, 22,
       91, 90, 84, 34, 80, 80, 66, 75, 81, 40, 30,  8, 83, 18, 68, 80, 31,
       92, 73, 33, 16, 12, 48, 22,  6, 73, 34, 59, 38, 25, 73, 63, 92])

In [107]:
arr

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23, 24])

In [112]:
arr.reshape(25,1).dot(arr2.reshape(1,50))

array([[   0,    0,    0, ...,    0,    0,    0],
       [  93,   74,   83, ...,   73,   63,   92],
       [ 186,  148,  166, ...,  146,  126,  184],
       ...,
       [2046, 1628, 1826, ..., 1606, 1386, 2024],
       [2139, 1702, 1909, ..., 1679, 1449, 2116],
       [2232, 1776, 1992, ..., 1752, 1512, 2208]])

In [113]:
arr3 = np.random.randint(150,250,50)

In [114]:
arr2 + arr3 # can only add equal m x n arrays

array([267, 252, 325, 234, 218, 222, 311, 173, 259, 177, 264, 261, 255,
       224, 328, 250, 199, 273, 314, 316, 278, 233, 293, 297, 240, 278,
       216, 223, 236, 267, 236, 296, 290, 227, 281, 278, 268, 251, 193,
       257, 271, 160, 306, 240, 217, 201, 231, 316, 294, 325])

In [115]:
arr3 - arr2

array([ 81, 104, 159,  80,  94, 216, 149, 133, 185, 133, 170, 107, 163,
       212, 146, 168, 155,  91, 134, 148, 210,  73, 133, 165,  90, 116,
       136, 163, 220, 101, 200, 160, 130, 165,  97, 132, 202, 219, 169,
       161, 227, 148, 160, 172,  99, 125, 181, 170, 168, 141])

In [116]:
#slicing and indexing in np.array is exactly similar to list indexing
arr2

array([93, 74, 83, 77, 62,  3, 81, 20, 37, 22, 47, 77, 46,  6, 91, 41, 22,
       91, 90, 84, 34, 80, 80, 66, 75, 81, 40, 30,  8, 83, 18, 68, 80, 31,
       92, 73, 33, 16, 12, 48, 22,  6, 73, 34, 59, 38, 25, 73, 63, 92])

In [117]:
arr2[0]

93

In [118]:
arr2[5]

3

In [130]:
arr2[arr3.argmax()] # => arr2 [arr3.argmax()= 40] => arr2[40]

22

In [131]:
arr2[40]

22

In [120]:
arr3.argmax()

40

In [123]:
arr3

array([174, 178, 242, 157, 156, 219, 230, 153, 222, 155, 217, 184, 209,
       218, 237, 209, 177, 182, 224, 232, 244, 153, 213, 231, 165, 197,
       176, 193, 228, 184, 218, 228, 210, 196, 189, 205, 235, 235, 181,
       209, 249, 154, 233, 206, 158, 163, 206, 243, 231, 233])

In [124]:
arr3.max()

249

In [125]:
# filtering in numpy
#Filter numbers in arr2 that are more than 90
arr2[arr2 > 90]

array([93, 91, 91, 92, 92])

In [127]:
#filter numbers between 40 and 60 in arr2
arr2[(arr2 > 40) & (arr2 < 60)]

array([47, 46, 41, 48, 59])

In [129]:
# won't be able to do filtering in a list
my_list[my_list > 2]

TypeError: '>' not supported between instances of 'list' and 'int'

In [134]:
arr2.sort()

In [135]:
arr2

array([ 3,  6,  6,  8, 12, 16, 18, 20, 22, 22, 22, 25, 30, 31, 33, 34, 34,
       37, 38, 40, 41, 46, 47, 48, 59, 62, 63, 66, 68, 73, 73, 73, 74, 75,
       77, 77, 80, 80, 80, 81, 81, 83, 83, 84, 90, 91, 91, 92, 92, 93])

In [137]:
# Create an array of all the even integers from 10 to 50¶
np.arange(10,51,2)

array([10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36, 38, 40, 42,
       44, 46, 48, 50])

In [139]:
# Create an array of 10 fives¶
np.ones(10) * 5

array([5., 5., 5., 5., 5., 5., 5., 5., 5., 5.])

In [140]:
# Create a 3x3 matrix with values ranging from 0 to 8¶
np.arange(9).reshape(3,3)

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [143]:
# Create the following matrix:¶
np.arange(1,101).reshape(10,10) / 100

array([[0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1 ],
       [0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.2 ],
       [0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3 ],
       [0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4 ],
       [0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5 ],
       [0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6 ],
       [0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7 ],
       [0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.8 ],
       [0.81, 0.82, 0.83, 0.84, 0.85, 0.86, 0.87, 0.88, 0.89, 0.9 ],
       [0.91, 0.92, 0.93, 0.94, 0.95, 0.96, 0.97, 0.98, 0.99, 1.  ]])

    array([[ 0.01,  0.02,  0.03,  0.04,  0.05,  0.06,  0.07,  0.08,  0.09,  0.1 ],
       [ 0.11,  0.12,  0.13,  0.14,  0.15,  0.16,  0.17,  0.18,  0.19,  0.2 ],
       [ 0.21,  0.22,  0.23,  0.24,  0.25,  0.26,  0.27,  0.28,  0.29,  0.3 ],
       [ 0.31,  0.32,  0.33,  0.34,  0.35,  0.36,  0.37,  0.38,  0.39,  0.4 ],
       [ 0.41,  0.42,  0.43,  0.44,  0.45,  0.46,  0.47,  0.48,  0.49,  0.5 ],
       [ 0.51,  0.52,  0.53,  0.54,  0.55,  0.56,  0.57,  0.58,  0.59,  0.6 ],
       [ 0.61,  0.62,  0.63,  0.64,  0.65,  0.66,  0.67,  0.68,  0.69,  0.7 ],
       [ 0.71,  0.72,  0.73,  0.74,  0.75,  0.76,  0.77,  0.78,  0.79,  0.8 ],
       [ 0.81,  0.82,  0.83,  0.84,  0.85,  0.86,  0.87,  0.88,  0.89,  0.9 ],
       [ 0.91,  0.92,  0.93,  0.94,  0.95,  0.96,  0.97,  0.98,  0.99,  1.  ]])

In [145]:
#create the resulting matrix
arr25 = np.arange(1,26).reshape(5,5)

In [146]:
arr25

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

    array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [None]:
# get this from the matrix above

    array([[12, 13, 14, 15],
       [17, 18, 19, 20],
       [22, 23, 24, 25]])

In [154]:
arr25[2:,1:] #3rd row, 2nd column onwards

array([[12, 13, 14, 15],
       [17, 18, 19, 20],
       [22, 23, 24, 25]])

In [167]:
M = np.array([[1,4],[6,16]])
M

array([[ 1,  4],
       [ 6, 16]])

In [169]:
#any
if (M > 5).any():
    print("at least one element in M is more than 5")
else:
    print("No element larger than 5")

at least one element in M is more than 5


In [171]:
np.any([True, False])

True

In [173]:
#all
if (M>5).all():
    print("all elements in M are more than 5")
else:
    print("some elements may be more than 5")

some elements may be more than 5


In [183]:
#numpy where => filters with a more complex condition
a = np.arange(10)
a

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [184]:
#this method is for normal filters
a[a>5]

array([6, 7, 8, 9])

In [178]:
b = np.where(a > 5, a, a + 3) #np.where(condition, where true, where false)

In [180]:
b

array([3, 4, 5, 6, 7, 8, 6, 7, 8, 9])

In [185]:
np.where(b == 7)

(array([4, 7]),)

In [187]:
b[4]

7

In [192]:
np.max(np.where(a % 2 == 0, a, a * 1.2))

10.799999999999999

In [None]:
#Comparison operators => return either True or False
# >, >=, <, <=, !=, ==
# Use it whenever you want to check for a condition!
# if, else, or whereever you see condition

### More exercises

In [194]:
arr25

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [197]:
#Produce the one below. from the matrix arr25 above
arr25[3,4] # indexing by row, column (m,n)

20

In [198]:
arr25[3][4]

20

In [201]:
#Produce the one below. from the matrix arr25 above
arr25[0:3,1:2] #returned as a 2d matrix

array([[ 2],
       [ 7],
       [12]])

In [203]:
arr25[0:3,1] #single dimension vector

array([ 2,  7, 12])

In [204]:
arr25[:3,1:2] #can skip items at the start if it is starting from the first

array([[ 2],
       [ 7],
       [12]])

In [205]:
#Produce the one below. from the matrix arr25 above
arr25[4]

array([21, 22, 23, 24, 25])

In [209]:
#Produce the one below. from the matrix arr25 above
arr25[3:,]

array([[16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [210]:
arr25[3:]

array([[16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25]])

In [211]:
# Create a 8x8 matrix and fill it with a checkerboard pattern
z = np.zeros((8,8),dtype=int)
z

array([[0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0, 0, 0, 0]])

In [213]:
z[0::2,1::2] = 1
z

array([[0, 1, 0, 1, 0, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0],
       [0, 1, 0, 1, 0, 1, 0, 1],
       [0, 0, 0, 0, 0, 0, 0, 0]])

In [215]:
z[1::2,0::2] =1
print(z)

[[0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]
 [0 1 0 1 0 1 0 1]
 [1 0 1 0 1 0 1 0]]


    [[0 1 0 1 0 1 0 1]
     [1 0 1 0 1 0 1 0]
     [0 1 0 1 0 1 0 1]
     [1 0 1 0 1 0 1 0]
     [0 1 0 1 0 1 0 1]
     [1 0 1 0 1 0 1 0]
     [0 1 0 1 0 1 0 1]
     [1 0 1 0 1 0 1 0]]

In [218]:
# Create a 5x5 matrix with values 1,2,3,4 just below the diagonal
np.diag(1+ np.arange(4), k=-1)

array([[0, 0, 0, 0, 0],
       [1, 0, 0, 0, 0],
       [0, 2, 0, 0, 0],
       [0, 0, 3, 0, 0],
       [0, 0, 0, 4, 0]])

    [[0 0 0 0 0]
     [1 0 0 0 0]
     [0 2 0 0 0]
     [0 0 3 0 0]
     [0 0 0 4 0]]

In [219]:
# Multiply a 5x3 matrix by a 3x2 matrix (real matrix product)
np.dot(np.ones((5,3)),np.ones((3,2)))

array([[3., 3.],
       [3., 3.],
       [3., 3.],
       [3., 3.],
       [3., 3.]])