# **Ten days rigorous FREE data science training with Python**


**Duration: 01st March 2024 to 09th March 2024.**

**Lecture Time: 07:30-0900 AM EST**

**Organized by: https://www.facebook.com/LearnPythonR4Datascience**

**Please like, share, and follow my facebook page for more interesting content.**

**Please note that zoom link is already posted on my facebook page.**

## **What will be covered in this free data science training?**

**NUMPY--------------------------------------------- 01st & 2nd March**

**SCIPY(self-study)--------------------------------- 3rd March (No class)**

**PYTORCH(basic introduction)----------------------- 4th March**

**PANDAS-------------------------------------------- 5-7th March**

**MATPLOTLIB---------------------------------------- 8th and 9th March**


**NOTE:**

**Recorded zoom lectures will be uploaded on my facebook page few hours after the lecture.**

**Python script will also be available on my Github account few hours after the lecture.**



# WELCOME TO NUMPY

In [None]:
#!pip install numpy ## You have to install the Python's numpy module once running the code on current line
import numpy as np  ## How to import numpy
import matplotlib.pyplot as plt

# WHY NUMPY?

NUMPY is the lingua franca of the data science. NUMPY is used to represent and manipulate multidimensional arrays. It provides the solid foundation for the other widely used data science libraries like Pandas, matplotlib, Pytorch etc. ML/DL/NLP is all about manipulating numbers. Input (Images, Videos, Text etc.) is converted into numbers and then manipulated in some way to come with output (again numbers). Data Science is all about the representation of input into numbers and manipulating in a way to generate output (that is also numbers).

We cannot perform element wise operation on Python's list without writing an explicit "for loop" but we can perform elementwise operation on the elements in numpy array without "for loop".

Comparison of numpy array with Pyhton's list
- All elements of numpy arrays **must** be of one (same) data type while the elements of Python's list could be of different data type.
- Numpy arrays are memory efficient.
- Computations with numpy arrays are superfast.
- We can perform element-wise operation on the elements of numpy array without writing an explicit "for loop".

In [None]:
a = [1, 5, 10, 12]

## What if we want to apply some "same operation" to all the elements of above Python's list. Let us say we want to multiply 
## each element in the above list with 5

a*5  ##won't multiply but repeat the same list five times and create a bigger list. This is not what we wanted.

In [None]:
## Element wise operation
## We have to write an explicit for loop to multiply each element with 5
storage_list = []
for i in a:
    storage_list.append(5*i)
print(storage_list)

In [None]:
## List comprehension: We can perform element wise operation on Python's list using list comprehension. List
## comprehensions are concise as compared to traditional verbose "for loop" but the list comprehension is still lot of code 
## to write for performing element-wise operation
[i*5 for i in a]

In [None]:
## Another method to perform element wise operation

list(map(lambda x: x*5, a)) ## a must be iterable. Another way of performing element wise operation on Python.

In [None]:
b = np.array([1,5,10,12])
b*5
## Just see how quickly you can perform elementwise operation on numpy's array

In [None]:
## Let us suppose we want to multiply the elements at the respective positions in the following two lists of equal length

x_list = [10, 15, 20]
y_list = [3,   4,  5]

x_list*y_list  ## we cannot do elementwise operation on the two lists

In [None]:
new_list = []
for j, k in zip(y_list, x_list):
    new_list.append(j*k)
print(new_list) ## You cannot escape for loop if we want to multiply(divide/add/subtract) the elements at the 
## respective positions in two lists.

In [None]:
x_array = np.array(x_list)
y_array = np.array(y_list)

x_array*y_array ## After converting the lists to numpy array we can do elementwise multiplication, addition, subtraction, division

## Arguments of the method np.array()

**object**: This is the primary argument and can take different forms:

If object is a Python list or tuple, it creates a 1-D array with the elements from the list or tuple.
If object is a nested list or tuple, it creates a multi-dimensional array where each nested list or tuple represents a row or axis.
If object is a scalar value, it creates a 0-D array containing only that value.
If object is an existing array, it creates a new array with the same content and properties.

**dtype**: Specifies the data type of the elements in the array. It can be specified using a string or a NumPy data type object, such as int, float, bool, etc. If not provided, NumPy will infer the data type based on the elements in the object.

**copy**: A Boolean flag that determines whether a new copy of the array should be created. If set to True (default), a new array is created. If set to False, it returns a view of the original array. Let us explore this copy versus view concept!!

**ndmin**: Specifies the minimum number of dimensions that the resulting array should have. By default, ndmin is 0, meaning the array will have as few dimensions as possible. If ndmin is set to a higher value, additional dimensions of size 1 will be added to the front of the shape, if needed.

In [None]:
b = np.array([1,5,10,12])
c = np.array(b, copy = True)  
print(b)
print(c)
## When you are creating the copy, the changes made in one array won't be reflected in the copied (other) array. 
## The copies are independent.

In [None]:
c[0] = 999 ## Modification in the original data structure (list) is not reflected in the array because array was copy 
## and threfore independent
print(b)
print(c)

In [None]:
d = np.array(b, copy = False)  


In [None]:
d[0] =999 ## d is a view of the original array b. Therefore, changes made in the array d will be reflected 
## in the original array b

In [None]:
print(b)
print(d)

### Attributes of numpy array

**shape**: Returns a tuple representing the dimensions of the array. For a 1-D array, it gives the length of the array. For a 2-D array, it provides the number of rows and columns, and so on.

**dtype**: Returns the data type of the elements in the array. For example, int64, float32, etc.

**ndim**: Returns the **number of dimensions** of the array.

**size**: Returns the **total number of elements** in the array.

**itemsize**: Returns the size in bytes of each element in the array.

**nbytes**: Returns the total number of bytes used by the array's data.

**T**: Returns the transpose of the array. It is equivalent to np.transpose(arr).


**These two are actually methods not attributes of array(s)**

**flatten()**: Returns a 1-D copy of the array, collapsing all dimensions.

**tolist()**: Converts the array to a Python list.


In [None]:
one_d = np.array([5, 85, 95, 100])
two_d = np.array([[1,2,3,4],[9,10,11,12]], dtype = "float32")

In [None]:
print(type(one_d))
print(type(two_d))

In [None]:
one_d

In [None]:
two_d

In [None]:
print(one_d.size)
print(two_d.size)

In [None]:
print(one_d.ndim)
print(two_d.ndim)

In [None]:
print(one_d.shape) ## How many elements we have in each dimension?
print(two_d.shape) ## How many elements we have in each dimension?

In [None]:
print(one_d.dtype)
print(two_d.dtype)

In [None]:
print(one_d.itemsize)
print(two_d.itemsize)

In [None]:
print(one_d.nbytes)
print(two_d.nbytes)

### Transpose

In [None]:
two_d.shape

In [None]:
two_d.T


### Flattening to list

In [None]:
np.arange(1,51, 1).reshape(5,10).flatten(order = "C")

In [None]:
np.arange(0, 10).reshape(5,2).flatten(order = "F")

In [None]:
print(type(one_d.tolist()))
print(two_d.tolist()) ## flatten to a list with two nested lists

In [None]:
two_d.flatten().tolist() ##Chaining of methods is very very important

In [None]:
np.random.randn()

### Element-wise operations

In [None]:
one_d * 2 

In [None]:
two_d * 2

### Array with zeros, ones or some other given number

In [None]:
zero_2d = np.zeros(shape = (5,5), dtype = "float")

In [None]:
print(zero_2d)

In [None]:
np.ones(shape = (10,2), dtype = "int64")

In [None]:
np.full(shape = (3,2), fill_value = 250, dtype = "int16") ## we can use shape to create higher order arrays

In [None]:

# Create a 1D array with random numbers (follows uniform distribution) between 0 and 1
arr1 = np.random.rand(25)
print(arr1)



# Create a 3D array with random integers between 1 and 10
arr2 = np.random.randint(1, 100, size=(3, 4)) ##discrete uniform distribution
#print(arr2)


In [None]:
# Create a 2D array with random numbers between -1 and 1
arr3 = np.random.uniform(0, 10, size=(3, 3))
print(arr3)

In [None]:
# Generate an array of 100 random numbers from a normal distribution with mean 0 and standard deviation 1
normal_array = np.random.normal(loc=0, scale=1, size=100)
print(normal_array)


In [None]:
range_array = np.arange(2,51, 2) ## Similar to Python's built-in range function
print(range_array)

In [None]:
np.linspace(0, 100, 4)  ## 5 equally spaced samples between zero and five

## Descriptive statistics

<img src = "stats_function.png" style = "width:880px;height:250px">

**Reference: McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.".**

### Mean
np.mean()

In [None]:
arr = np.array([1, 2, 3, 4, 5])
mean_value = np.mean(arr)
print(mean_value)

In [None]:
Two_dim_array = np.arange(0, 100).reshape((20,5), order = "F") 
print(Two_dim_array)

In [None]:
Two_dim_array.mean(axis = 0) 

In [None]:
Two_dim_array.mean(axis = 1) 

### Median
np.median()

In [None]:
median_value = np.median(arr)
print(median_value)

In [None]:
np.median(Two_dim_array, axis = 0)

In [None]:
np.median(Two_dim_array, axis = 1)

### Minimum value and Maximum value
np.min() and np.max()

In [None]:
min_value = np.min(arr)
max_value = np.max(arr)
print(min_value) 
print(max_value) 

In [None]:
np.min(Two_dim_array, axis = 0)

In [None]:
np.max(Two_dim_array, axis = 1)

### Sum
np.sum()

In [None]:
sum_value = np.sum(arr)
print(sum_value)


In [None]:
np.sum(Two_dim_array, axis = 0)

In [None]:
np.cumsum(Two_dim_array, axis = 1)

In [None]:
np.cumprod(Two_dim_array, axis = 0)

### Variance and Standard Deviation
np.var() and np.std()

In [None]:
variance_value = np.var(arr)
std_deviation_value = np.std(arr)
print(variance_value)       
print(std_deviation_value)

## Indexing and Slicing

### 1-D array indexing: Works similar to indexing a Python list

In [None]:
arr = np.array([1, 2, 3, 4, 5])
print(arr[1])  
print(arr[-2])

### Multi-dimensional array indexing: 
You can access elements in a multi-dimensional array by providing indices for **each dimension** separated by **commas**

In [None]:
two_darr = np.array([[1, 2, 3], [4, 5, 6],[7,8,9]])
print(two_darr)

In [None]:
print(two_darr[0,2]) 


In [None]:
print(two_darr[1, 2])

In [None]:
Identity_matrix = np.eye(5) ## identity matrix is a square matrix
print(Identity_matrix)

In [None]:
Identity_matrix[2,2]

In [None]:
three_dim = np.arange(0, 30).reshape((3, 2, 5))
print(three_dim)

In [None]:
three_dim[1,0,1]

### 1-D array slicing: 
Allows you to extract a portion of a 1-D array by specifying the start, stop, and step size

In [None]:
arr = np.array([1, 2, 3, 4, 5])
print(arr)

In [None]:
print(arr[1:4:2])     


In [None]:
print(arr[0:3:1])      


In [None]:
print(arr[2:])     


In [None]:
print(arr[0:5:2])

### Multi-dimensional array slicing: 
Allows you to extract subarrays from multi-dimensional arrays using a similar syntax for each dimension

<img src = "twd_slicing.png" style = "width:680px;height:350px">

**Reference: McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.".**

In [None]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)

In [None]:
print(arr[: , 1])          


In [None]:
print(arr[1, 0:2])        


In [None]:
print(arr[:2 , ::2]) 

In [None]:
print(arr)

In [None]:
arr[0::-1,:]

In [None]:
arr[:,1]

In [None]:
three_dim_array = np.arange(1,31).reshape((5,3,2))
print(three_dim_array)

In [None]:
three_dim_array[2,0,0]

## Boolean indexing and slicing

### 1-D array boolean indexing: 
You can use a Boolean array to select elements from a 1-D array that satisfy a condition

In [None]:
arr = np.array([1, 2, 3, 4, 5])
print(arr)

In [None]:
condition = np.array([True, False, True, False, True])
print(condition)

In [None]:
selected_elements = arr[condition]
print(selected_elements) 

### Multi-dimensional array boolean indexing: 
Boolean indexing can be applied to multi-dimensional arrays as well. In this case, the Boolean array must have the same shape as the original array.

In [None]:
arr = np.array([[1, 2, 3], [4, 5, 6]])
print(arr)
condition = np.array([[False, True, False], [True, False, True]])
print(condition)



In [None]:
selected_elements = arr[condition]
print(selected_elements)

### Boolean Slicing:
You can also use Boolean conditions to slice an array and select the elements that satisfy the condition

In [None]:
arr = np.array([1, 2, 3, 4, 5])
condition = arr % 2 ==0
print(condition)


In [None]:
selected_elements = arr[condition]
print(selected_elements)

In [None]:
normal_dist = np.random.normal(0, 3, 10000) ##10s samples from standard normal distribution
print(normal_dist)


In [None]:
plt.hist(normal_dist)

In [None]:
boolean_values = (normal_dist > 0)

sum(boolean_values)

In [None]:
median_value = np.percentile(normal_dist, 50)
ist_quartile = np.percentile(normal_dist, 25)
thirrd_quartile = np.percentile(normal_dist, 75)

In [None]:
normal_dist[boolean_values]  ##returns all the positive samples 

### Combining multiple boolean conditions

- use of **&** and **|**

In [None]:
arr[((arr > 1) & (arr < 5))] ## combining multiple boolean condition

## Fancy indexing

Wiith fancy indexing, you can use an array-like object containing **indices**, or **even another list of indices** to access specific elements or subsets of elements in a more flexible way.

In [None]:
Identity_matrix = np.eye(6)
print(Identity_matrix)

In [None]:
Identity_matrix[[3,4,5], [2,1,5]] 

In [None]:
## Let us suppose you want to access all the elements at the diagonal of identity matrix

row_index = np.arange(0,6)
col_index = np.arange(0, 6)
print(row_index)
print(col_index)

Identity_matrix[row_index, col_index] 

In [None]:
Identity_matrix[[0,1,2,3,4,5], [0,1, 3, 2,5, 4]] ## could be boolean, could be list, could be array

### Accessing and modifying values in arrays

In [None]:
simple_array = np.array([5, 10, 12,13])
print(simple_array)

In [None]:
simple_array[[1,3]] = np.array([20, 26])

In [None]:
simple_array

## Basic Linear algebra operations

### Matrix Addition and Subtraction

In [None]:
A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])

print(A)
print(B)

In [None]:
# Matrix addition
C = A + B
print(C)

# Matrix subtraction
D = A - B
print(D)

In [None]:
print(A)
print(B)

### Matrix Multiplication

In [None]:
element-wise multiplication
C = A * B
print(C)

# Matrix multiplication (dot product)
D = np.dot(A, B)
# or: D = A.dot(B)
print(D)

### Matrix inversion

In [None]:
 A = np.array([[1, 2], [3, 4]])

# Matrix inversion
B = np.linalg.inv(A)
print(B)

### Matrix Determinant

In [None]:
A = np.array([[1, 2], [3, 4]])

# Matrix determinant
det = np.linalg.det(A)
print(det)

## Broadcasting
A set of rules by which NumPy lets you apply binary operations (e.g., addition, subtraction, multiplication, etc.) between arrays of different sizes and shapes

### Binary operations with a scalar and arrays of any size (or shape)

In [None]:
array_x = np.arange(1,25)

10 / array_x

In [None]:
array_z = np.arange(1,26).reshape((5,5))
print(array_z)

10 / array_z

### For arrays of the same size, binary operations are performed on an element-by-element basis

In [None]:
a = np.array([0, 1, 2])
b = np.array([5, 5, 5])

a + b

In [None]:
q = np.arange(25).reshape((5,5))
print(q)

r = np.arange(25, 50). reshape((5,5))
print(r)

In [None]:
q + r

### Binary operations with arrays of different dimensions

In [None]:
## Let us suppose we want to add one dimension array with two-dimensional array

In [None]:
c = np.array([1,2, 3, 4])
print(c)

In [None]:
d = np.arange(12).reshape((3,4))
print(d)

In [None]:
d + c

### Rules of Broadcasting

Rule 1
If the two arrays differ in their **number of dimensions**, the shape of the one with
fewer dimensions is padded with ones on its leading (left) side.

Rule 2
If the shape of the two arrays does not match in any dimension, the array with
shape equal to 1 in that dimension is stretched to match the other shape.

Rule 3
If in any dimension the sizes disagree and neither is equal to 1, an error is raised.

<img src = "broadcasting.png" style = "width:680px;height:350px">

**Reference: McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc."**

In [None]:
row_vector = np.arange(4).reshape((1,4))

print(row_vector)

col_vector = np.arange(4).reshape((4,1))
print(col_vector)

In [None]:
row_vector + col_vector

In [None]:
M = np.ones((3, 2))
a = np.arange(3)



In [None]:
M + a

The shapes of the arrays are as follows:

• M.shape is (3, 2)

• a.shape is (3,)

Again, rule 1 tells us that we must pad the shape of "a" with ones:
• M.shape remains (3, 2)
• a.shape becomes (1, 3)

By rule 2, the first dimension of a is then stretched to match that of M:
• M.shape remains (3, 2)
• a.shape becomes (3, 3)

Now we hit rule 3—the final shapes do not match, so these two arrays are incompati‐
ble, as we can observe by attempting this operation.

<img src = "broadcasting3d.png" style = "width:680px;height:350px">

**Reference: McKinney, W. (2012). Python for data analysis: Data wrangling with Pandas, NumPy, and IPython. " O'Reilly Media, Inc.".**

## vstack and hstack

**vstack (Vertical Stack)**:

The vstack function is used to vertically stack arrays along their vertical axis (row-wise stacking). It takes a sequence of arrays as input and returns a single array as output

**hstack (Horizontal Stack)**:

The hstack function is used to horizontally stack arrays along their horizontal axis (column-wise stacking). It takes a sequence of arrays as input and returns a single array as output.

In [None]:
# Create two arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Vertically stack the arrays
result = np.vstack((array1, array2))

print(result)


In [None]:
array3 = np.arange(12).reshape((4,3))

result2 = np.vstack((array1, array2, array3))

print(result2)


In [None]:
#Create two arrays
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Horizontally stack the arrays
result3 = np.hstack((array1, array2))

print(result3)

### Sorting arrays

In [None]:
x = np.array([2, 1, 4, 3, 5])
print(x)
np.sort(x)

A related function is **argsort**, which instead returns the **indices** of the sorted elements

In [None]:
x = np.array([2, 1, 4, 3, 5])
i = np.argsort(x)
print(i)

### Sorting along rows and column

In [None]:
# Set the seed value
seed_value = 42
np.random.seed(seed_value)

# Generate a random array
random_array = np.random.rand(25).reshape((5,5))
print(random_array)

In [None]:
np.sort(random_array, axis = 0)