# Numpy tutorial from Keith Galli
I remember Keith from the pandas tutorial which I followed. He is good instructor, and I am now going to follow his numpy tutorial to get started with numpy. 

Click [here](https://www.youtube.com/watch?v=GB9ByFAIAH4) to check the video

In [1]:
import numpy as np

## The Basics

In [2]:
a = np.array([1,2,3])  # Defining basic array
print(a)

[1 2 3]


In [3]:
I_3 = np.array([[1,0,0],[0,1,0],[0,0,1]])    # Defining multi-dimensional array 
print(I_3)
print(I_3.ndim)             # Returns the dimension of the array (2 in this case)

[[1 0 0]
 [0 1 0]
 [0 0 1]]
2


In [4]:
d = np.array([[1,2,3],[4,5,6],[7,8,9]])
print(np.shape(d))     # Returns the shape of the matrix (3✕3 in this case)
# You can achieve the same thing with d.shape

print(np.size(d))    # Returns the total number of data in the matrix (9 in this case)

(3, 3)
9


In [5]:
d.dtype      # Returns the data type (int64 data type means each element will take 64 bits = 8 byte)
d.itemsize   # Returns size of each item 
d.itemsize * d.size   # Returns total size of the whole array
d.nbytes     # Returns total size of the whole 

72

## Accessing data from row and column

In [6]:
x = np.array([[1,2,3,4,5,6,7],[8,9,10,11,12,13,14]])
print(x[1,5])   # Prints out the element on 2nd row, 6th column 

13


In [7]:
print(x[0,:])    # Prints out the entire 1st row

[1 2 3 4 5 6 7]


In [8]:
print(x[:,4])   # Prints out all data of the 5th column

[ 5 12]


In [9]:
print(x[:,1:6:2]) 
# Prints out data from 1st row, and columns sliced from 2nd column to 7th column (endpoint is exclusive) with 2 steps

[[ 2  4  6]
 [ 9 11 13]]


In [10]:
print(x[:,2:7:2])

[[ 3  5  7]
 [10 12 14]]


In [11]:
print(x[1,2:7:2])

[10 12 14]


In [12]:
print(x[:,1:-1:2])       

[[ 2  4  6]
 [ 9 11 13]]


In [13]:
x[1,3] = 111   # Chnaging data from 2nd row, 4th column to 111
print(x)

[[  1   2   3   4   5   6   7]
 [  8   9  10 111  12  13  14]]


In [14]:
x[1,4:7] = 69   # Changing data of both rows, from 5th column to 7th column to 69 
print(x)

[[  1   2   3   4   5   6   7]
 [  8   9  10 111  69  69  69]]


In [15]:
# Defining 3D array
xx = np.array([[[1,2],[3,4]],[[5,6],[7,8]]])   
print(xx)
print(xx.shape)

[[[1 2]
  [3 4]]

 [[5 6]
  [7 8]]]
(2, 2, 2)


Playing with 3D array for a bit. 


NOTE: Accessing and changing data in 3D arrays are same as 2D arrays. You just need to think about one more dimension.

In [16]:
xx[:,1,:]

array([[3, 4],
       [7, 8]])

In [17]:
xx[0,1,0]

np.int64(3)

In [18]:
xx[:,:,1] = [[22,44],[66,88]]
xx

array([[[ 1, 22],
        [ 3, 44]],

       [[ 5, 66],
        [ 7, 88]]])

## Initializing different kinds of arrays

In [19]:
print(np.identity(3))     # An identity matrix of 3 x 3 dimension

[[1. 0. 0.]
 [0. 1. 0.]
 [0. 0. 1.]]


In [20]:
np.full((3,3),8)         # A 3 x 3 matrix field with eight

array([[8, 8, 8],
       [8, 8, 8],
       [8, 8, 8]])

In [21]:
print(np.full_like(xx,201))   # A matrix with the same dimension as xx array filled with 201

[[[201 201]
  [201 201]]

 [[201 201]
  [201 201]]]


In [22]:
np.ones((3,4))            # A 3 x 4 matrix filled with 1

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [23]:
np.zeros((3,3))           # A 3 x 3 null matrix 

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [24]:
np.random.rand(4,4)   # A 4 x 4 matrix filled with random values

array([[0.29884193, 0.56905103, 0.45962477, 0.06096826],
       [0.39925334, 0.38541513, 0.83636464, 0.46879662],
       [0.00786113, 0.31955397, 0.79989105, 0.53855364],
       [0.28218151, 0.51300127, 0.71691452, 0.60508821]])

In [25]:
np.random.random_sample(xx.shape)  # A matrix filled with random values with the same dimension of xx

array([[[0.85957266, 0.95453375],
        [0.22592289, 0.50300027]],

       [[0.67907086, 0.31331465],
        [0.12154666, 0.30732769]]])

In [26]:
np.random.randint(100,size=(4,4))  # A 4 x 4 matrix failed with random integers from 0 to 100

array([[56, 88, 94, 97],
       [20, 94,  1, 44],
       [34, 19, 72, 75],
       [25,  9, 76, 84]])

In [27]:
gg = np.array([[1,2,3]])
gg.repeat(3,axis=1)                 # Repeats the row of gg three times (if axis equaled to 1, it would repeat the columns)

array([[1, 1, 1, 2, 2, 2, 3, 3, 3]])

## Basic Math


In [28]:
b = np.array([[1,2,3],[4,5,6]])

In [29]:
b + 10    

array([[11, 12, 13],
       [14, 15, 16]])

In [30]:
b - 10

array([[-9, -8, -7],
       [-6, -5, -4]])

In [31]:
b * 10

array([[10, 20, 30],
       [40, 50, 60]])

In [32]:
b / 10

array([[0.1, 0.2, 0.3],
       [0.4, 0.5, 0.6]])

In [33]:
b ** 2

array([[ 1,  4,  9],
       [16, 25, 36]])

In [34]:
np.sin(b)       # Every output is in radian (every trigonometric function in numpy gives output in radian, same goes for python's built-in math module)

array([[ 0.84147098,  0.90929743,  0.14112001],
       [-0.7568025 , -0.95892427, -0.2794155 ]])

In [35]:
1/np.tan(b)          

array([[ 0.64209262, -0.45765755, -7.01525255],
       [ 0.86369115, -0.29581292, -3.436353  ]])

## Linear Algebra

In [36]:
# Multiplication of two matrices
ab = np.full((2,3),5)
ca = np.full((3,2),4)
np.matmul(ab,ca)

array([[60, 60],
       [60, 60]])

In [37]:
# Determinant of a matrix
gs = np.array([[1,2,3],[4,5,6],[7,8,9]])
np.linalg.det(gs)

np.float64(-9.51619735392994e-16)

In [38]:
# Matrix power
np.linalg.matrix_power(gs,2)

array([[ 30,  36,  42],
       [ 66,  81,  96],
       [102, 126, 150]])

In [39]:
# Trace of a matrix (the sum of the elements acorss a diagonal)
print(np.linalg.trace(gs, offset=0))  # Across the main diagonal
print(np.linalg.trace(gs,offset=1))   # 1 step above main diagonal (2+6 = 8)
print(np.linalg.trace(gs,offset=-1))  # 1 step below main diagonal (4+8 = 12)

# offset = 0 --> Main Diagonal
# offset > 0 --> above main diagonal
# offset < 0 --> below main diagonal

15
8
12


In [40]:
# Inverse Matrix
np.linalg.inv(gs)

array([[-4.50359963e+15,  9.00719925e+15, -4.50359963e+15],
       [ 9.00719925e+15, -1.80143985e+16,  9.00719925e+15],
       [-4.50359963e+15,  9.00719925e+15, -4.50359963e+15]])

In [41]:
# Calculating the Adjoint matrix and cofactor matrix
np.linalg.det(gs)*np.linalg.inv(gs)                  # Adjoint Matrix of gs
np.transpose(np.linalg.inv(gs)*np.linalg.det(gs))    # Cofactor matrix of gs

## NOTE: In this case, the adjoint and cofactor matrix are the same. But it doesn't happen always.

array([[ 4.28571429, -8.57142857,  4.28571429],
       [-8.57142857, 17.14285714, -8.57142857],
       [ 4.28571429, -8.57142857,  4.28571429]])

WARNING ! There is no inverse matrix of gs because the determinant value of gs is 0.  When it is divided by the adjoint matrix of gs, it returns an error, but because of the way computers save floating point numbers the determinant of gs is a float which is very close to 0 but not 0. And that's why it is not giving an error while we are calculating the inverse matrix of gs.

## Statistics


**An important breakdown first;**

In both pandas and numpy, axis parameter is often used in various functions. So let's break it down:

axis = 0 --> Returns value across columns, it's like going down the Y axis

axis = 1 --> Returns value across rows. It's like going down the X axis.

In [42]:
np.random.seed(43)                     # So that the array stays the same
fg = np.random.randint(1,20,size=(3,3))
fg

array([[ 5,  1, 18],
       [17, 18,  3],
       [15,  1,  4]])

In [43]:
np.min(fg,axis=0)    

array([5, 1, 3])

In [44]:
np.max(fg,axis=0)

array([17, 18, 18])

In [45]:
np.min(fg,axis=1)

array([1, 3, 1])

In [46]:
np.max(fg,axis=1)

array([18, 18, 15])

## Reorganizing Array

In [47]:
c = np.array([1,2,3,4,5,6,7,8])     # 1D Vector
dd = np.array([6,7,8,9,10,11,12,13])  # 1D Vector

In [48]:
# Reshaping arrays
c_reshaped = np.reshape(c,(2,4))     # Just provide the array and a dimension, it will do the magic. NOTE: Only factors of the total number of elements can be selected as the pair of dimension.
c_reshaped

array([[1, 2, 3, 4],
       [5, 6, 7, 8]])

In [49]:
dd_reshaped = np.reshape(dd,(2,2,2)) 
dd_reshaped

array([[[ 6,  7],
        [ 8,  9]],

       [[10, 11],
        [12, 13]]])

In [50]:
# Stacking arrays
np.vstack([c,dd])       # Vertical Stacking

array([[ 1,  2,  3,  4,  5,  6,  7,  8],
       [ 6,  7,  8,  9, 10, 11, 12, 13]])

In [51]:
np.hstack([c,dd])      # Horizontal Stacking

array([ 1,  2,  3,  4,  5,  6,  7,  8,  6,  7,  8,  9, 10, 11, 12, 13])

## Miscellaneous Features 

In [52]:
np.arange(1,31)   # Create an one day array, which has integer elements from 1 to 30 (endpoint exclusive)

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17,
       18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30])

In [53]:
# Importing a file 
filedata = np.genfromtxt("data.txt",dtype="int32",delimiter=",")
filedata

array([[  1,  13,  21,  11, 196,  75,   4,   3,  34,   6,   7,   8,   0,
          1,   2,   3,   4,   5],
       [  3,  42,  12,  33, 766,  75,   4,  55,   6,   4,   3,   4,   5,
          6,   7,   0,  11,  12],
       [  1,  22,  33,  11, 999,  11,   2,   1,  78,   0,   1,   2,   9,
          8,   7,   1,  76,  88]], dtype=int32)

In [54]:
filedata > 50    # Returns which elements from filedata are greater than 50 in a boolean data type

array([[False, False, False, False,  True,  True, False, False, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False,  True,  True, False,  True, False,
        False, False, False, False, False, False, False, False, False],
       [False, False, False, False,  True, False, False, False,  True,
        False, False, False, False, False, False, False,  True,  True]])

## Boolean Masking and Advanced Indexing

In numpy, we can use lists for indexing even further. The syntax is like this;

for 1D arrays:

array[[element index]]

for 2D arrays:

array[[row index],[column index]]

NOTE: If the index is not specific (if it uses :), do not use [sqaure brackets]

In [55]:
c[[0,1]]          # Returns the first and the second element from the 1D matrix

array([1, 2])

In [56]:
df = np.array([[1,2,3],[11,22,33],[111,222,333]])
df[[1,2]]                # Returns the second and the third row from the 2D matrix

array([[ 11,  22,  33],
       [111, 222, 333]])

In [57]:
dd_reshaped[[1,1,0]]     # Returns the first, first and the second 2D matrix from the 3D matrix

array([[[10, 11],
        [12, 13]],

       [[10, 11],
        [12, 13]],

       [[ 6,  7],
        [ 8,  9]]])

What we can also do, is select element via selecting specifying dimesions 

In [58]:
df[[0,1,2],[1,1,1]]

array([  2,  22, 222])

In [59]:
dd_reshaped[[1],[1],[0]]

array([12])

Because of this feature, we can do something pretty cool, like printing all of the numbers in filedata which are greater than 50

In [60]:
filedata[filedata>50]

array([196,  75, 766,  75,  55, 999,  78,  76,  88], dtype=int32)

Now let's see some more cool **boolean masking** features

In [61]:
filedata

array([[  1,  13,  21,  11, 196,  75,   4,   3,  34,   6,   7,   8,   0,
          1,   2,   3,   4,   5],
       [  3,  42,  12,  33, 766,  75,   4,  55,   6,   4,   3,   4,   5,
          6,   7,   0,  11,  12],
       [  1,  22,  33,  11, 999,  11,   2,   1,  78,   0,   1,   2,   9,
          8,   7,   1,  76,  88]], dtype=int32)

In [62]:
np.any(filedata>50,axis=0)     # Checks whether any integer in a column is greater than 50

array([False, False, False, False,  True,  True, False,  True,  True,
       False, False, False, False, False, False, False,  True,  True])

In [63]:
np.all(filedata>50,axis=0)     # Checks whether all integerg in a column are greater than 50

array([False, False, False, False,  True, False, False, False, False,
       False, False, False, False, False, False, False, False, False])

In [64]:
filedata[(filedata>50) & (filedata<105)]  # Return those integers which are greater than 50 and less than 105

array([75, 75, 55, 78, 76, 88], dtype=int32)

In [65]:
filedata[(filedata>=50) & (filedata<=105)]  # Returns all of the integers which are in between 50 and 105 (inclusive)

array([75, 75, 55, 78, 76, 88], dtype=int32)

In [66]:
filedata[~(filedata>50) & (filedata<105)]  # Returns all of the integers which are not greater than 50 and less than 105

array([ 1, 13, 21, 11,  4,  3, 34,  6,  7,  8,  0,  1,  2,  3,  4,  5,  3,
       42, 12, 33,  4,  6,  4,  3,  4,  5,  6,  7,  0, 11, 12,  1, 22, 33,
       11, 11,  2,  1,  0,  1,  2,  9,  8,  7,  1], dtype=int32)

### QUIZ #1

In [67]:
n = 5 
p1 = np.random.randint(0,20,size=(n,n))
p1[[0,-1]] = 1
p1[:,[0,-1]] =1
p1[1:-1,1:-1] = 0
p1[n//2,n//2] = 9
print(p1)
## NOTE: The value of n can only be odd number. Otherwise, 9 won't be in the center.

[[1 1 1 1 1]
 [1 0 0 0 1]
 [1 0 9 0 1]
 [1 0 0 0 1]
 [1 1 1 1 1]]


### QUIZ #2

In [68]:
q2 = np.arange(1,31).reshape(6,5)
q2

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10],
       [11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20],
       [21, 22, 23, 24, 25],
       [26, 27, 28, 29, 30]])

The first goal is to index the array in such a way that it prints 11,12,16,17

In [69]:
q2[2:4,0:2]

array([[11, 12],
       [16, 17]])

The seccond goal is to index the array in such a way that it prints 2,8,14,20

In [70]:
# My way
(q2[[0,1,2,3]].reshape(20))[[1,7,13,19]]

array([ 2,  8, 14, 20])

In [71]:
# Keith's Answer
q2[[0,1,2,3],[1,2,3,4]]

array([ 2,  8, 14, 20])

The third goal is to index the array in such a way that it prints 4,5,24,25,30

In [72]:
# My way
q2[[0,4,5]][:,3:]

array([[ 4,  5],
       [24, 25],
       [29, 30]])

In [73]:
# Keith's way
q2[[0,4,5],3:]

array([[ 4,  5],
       [24, 25],
       [29, 30]])

# Some other important functions 

I got these from ChatGPT. Here is the [link](https://chatgpt.com/share/68597277-6054-8011-9781-7ff959317ac2) of the conversation

In [74]:
np.random.seed(43)
p = np.random.randint(0,10,size=(3,3))
p

array([[4, 0, 1],
       [5, 0, 3],
       [1, 2, 7]])

In [75]:
q = np.random.randint(0,10,size=(3,3))
q

array([[0, 3, 2],
       [9, 1, 2],
       [2, 3, 5]])

**.flatten(), .ravel()**

Used to turn an array into 1D

In [76]:
p.flatten()    # Returns a copy
p.ravel()      # Returns a view (faster, but changes affect original)

array([4, 0, 1, 5, 0, 3, 1, 2, 7])

**.concatenate()**

It is just like vstack and hstack. When axis=0, it acts like vstack and when axis=1, it acts like hstack

In [77]:
np.concatenate([p,q],axis=1)
np.concatenate([p,q],axis=0)

array([[4, 0, 1],
       [5, 0, 3],
       [1, 2, 7],
       [0, 3, 2],
       [9, 1, 2],
       [2, 3, 5]])

**.sum()**

Returns the sum of all elements across all rows or columns (axis=0 --> column, axis=1 --> row, no axis defined --> Whole Array)

In [78]:
p.sum(axis=1)

array([ 5,  8, 10])

In [79]:
p.sum(axis=0)

array([10,  2, 11])

**.mean()**

Returns the mean of of the entire array or the columns or the rows (no axis defined --> entire array, axis=0 --> column, axis=1 --> row)

In [80]:
p.mean()        # When no axis is defined, it will flatten the entire area into a 1D array and then calculate the mean

np.float64(2.5555555555555554)

In [81]:
p.mean(axis=0)  # Returns the mean of each column  

array([3.33333333, 0.66666667, 3.66666667])

In [82]:
p.mean(axis=1)  # Returns the mean of each row

array([1.66666667, 2.66666667, 3.33333333])

**.std()**

Returns the standard deviation of of the entire array or the columns or the rows (no axis defined --> entire array, axis=0 --> column, axis=1 --> row)

In [83]:
p.std()   # When no axis is defined, it will flatten the entire area into a 1D array and then calculate the standard deviation

np.float64(2.2662308949301266)

In [84]:
p.std(axis=0)  # Returns the standard deviation of each column  

array([1.69967317, 0.94280904, 2.49443826])

In [85]:
p.std(axis=1)  # Returns the standard deviation of each row  

array([1.69967317, 2.05480467, 2.62466929])

NOTE: By default, the .std function calculates the population standard deviation, which means it divides by n, not (n-1). To get the estimated population standard deviation, we need to use a keyword argument named ddof (default degree of freedom) and set the value of ddof to 1.

In [86]:
p.std(axis=0,ddof=1)

array([2.081666  , 1.15470054, 3.05505046])

**.unravel_index()**

It is a very important function. If an array was converted into an 1D array, what would be the index of the x-th element in the orginal array? This function returns exactly that.

Let's say we have an array which has the size of (6,7,8). We want to know the index of the 100th element here.

In [87]:
np.unravel_index(99,(6,7,8))    # It is 99 considering that the indexing starts from 0

(np.int64(1), np.int64(5), np.int64(3))

That's how we can get the index of the element

**.isin(element_arr,test_arr)**

This function allows us to check whether some specified elements(test_arr) are present in the original array(element_arr). 
It returns a boolean array, where the specified elements are True and the rest are False

In [88]:
o = np.array([1,23,49,10,2,3,6])
np.isin(o,[10,49])

o = o[~np.isin(o,[10,49])]   # Negates 10 and 49 from the original array