<a href="https://colab.research.google.com/github/Yuri-Njathi/python_intro/blob/master/python_for_ml/python_for_ml.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

1. Pandas basics 🐼

2. Numpy 🖼 & Tensor basics 🖼,🖼,🖼

# 1. Pandas basics 🐼
a. Installation

```
pip install pandas

```

b. Importing Pandas

```
import pandas as pd
```

c. Data Structures : 2 types

- Series - 1D labeled array

```
#create a simple series

s = pd.Series([1,3,5,7])

print(s)

#accessing values

print(s[1]) #3
```

- DataFrame - 2D table with labelled axes (rows, columns).

```
# simple dataframe
data = {'column1':['value1','value2','value3'],'count':[1,2,3]}

df = pd.DataFrame(data)

print(df)
```

```
# Access columns
print(df['column1'])
```

```
#Access rows using loc and iloc
print(df.loc[0]) #by label (index)

print(df.iloc[1]) #by position
```

d. Reading Data

Supports reading data from various formats such as CSV, Excel,JSON etc

```
pd.read_csv('.csv') #csv

pd.read_excel('.xlsx') #excel

pd.read_json('.json') #json

```

e. Basic Operations on DataFrames

```
# view dta #
df.head() #first 5 rows
df.tail() #last 5 rows
df.shape
df.info() #dataframe summary
df.describe()

```

```
# select columns #

df['column name'] # single column

df[['column 1','column 2']] #multiple columns

# selecting rows #

df.iloc[0] #first row by position

df.loc[0] #first row by index label

# filtering rows based on conditions:  #

df[df['Age'] > 30]
```

f. Adding or dropping data

```
df['new column'] = df['Age'] + 10 # create new column based on another

df = df.drop('new column',axis=1) #axis=1 means dropping columns

df = df.drop(0) # drops first row

## look up inplace ##
```

g. Handling missing data

Pandas makes it easy to handle missing values.

```
df.isnull().sum() #count missing values in each column

df.fillna(0) #replace missing with 0

df.dropna()
```

h. Sorting data by column

```
df.sort_values(by='Age',ascending=False)
```

i. Grouping and Aggregation

```
# Group data #
df.groupby('Gender')['Age'].mean() # group by gender and calculate mean of age
```

j. Merging dataframes

```
df1 = pd.DataFrame({'ID':[1,2],'Name':['Yuri','Njathi']})
df2 = pd.DataFrame({'ID':[1,2,5],'height':[100,120,140]})

merge_df = pd.merge(df1,df2,on='ID',how='inner')
```

k. Pivot Tables (optional)
```
# Summarize data
df.pivot_table(index='Category',columns='SubCategory',values='value',aggfunc='sum')

```

l. Working with dates (optional)

handle date and time effectively

```
df['Date'] = pd.to_datetime(df['Date']) #convert column to datetime

df['Year'] = df['Date'].df.year #extract specific components
df['Month'] = df['Date'].df.month

```

m. Writing dataframes to files

```
df.to_csv('output.csv', index=False)

df.to_excel('output.xlsx',index=False)
```
n. Advanced features
```
df['new column'] = df['Age'].apply(lambda x:x*2) #Apply function

df['Name'].str.upper()

```

## 2. Numpy 🖼 & Tensor basics 🖼,🖼,🖼

NumPy works with arrays, matrices and numerical operations

Torch tensors are similar to arrays but have additional capabilities such as GPUs and automatic differentiation



a. Installation

In [1]:
pip install numpy torch



b. Import NumPy and Torch

In [2]:
import numpy as np
import torch

c. Instantiating numpy arrays and tensors

- from a list
- from a list of lists to multidimentional tensors
- using build-in functions

In [3]:
#from list
#numpy
n = np.array([1,2,3])
#tensors
t = torch.tensor([1,2,3])
print(n,t)

[1 2 3] tensor([1, 2, 3])


In [4]:
#from a list of lists to multidimentional tensors
l_of_l = [[1,2],[3,4],[5,6]]
arr_2d, tensor_2d = np.array(l_of_l), torch.tensor(l_of_l)
print(arr_2d, tensor_2d)

[[1 2]
 [3 4]
 [5 6]] tensor([[1, 2],
        [3, 4],
        [5, 6]])


In [5]:
#using build-in functions
#numpy and torch
n_zeros, t_zeros = np.zeros((2,3)), torch.zeros((2,3)) #2x3 array of zeros
print(n_zeros, t_zeros)

[[0. 0. 0.]
 [0. 0. 0.]] tensor([[0., 0., 0.],
        [0., 0., 0.]])


In [6]:
n_ones, t_ones = np.ones((2,3)), torch.ones((2,3)) #2x3 array of ones
print(n_ones, t_ones)

[[1. 1. 1.]
 [1. 1. 1.]] tensor([[1., 1., 1.],
        [1., 1., 1.]])


In [7]:
n_arange , t_arange = np.arange(0,10,2), torch.arange(0,10,2) # [0,2,4,6,8]
print(n_arange , t_arange)

[0 2 4 6 8] tensor([0, 2, 4, 6, 8])


In [8]:
n_linspace, t_linspace = np.linspace(0,1,5), torch.linspace(0,1,5) #5 vlaues equally btwn 0 and 1
print(n_linspace, t_linspace)

[0.   0.25 0.5  0.75 1.  ] tensor([0.0000, 0.2500, 0.5000, 0.7500, 1.0000])


## Array operations

numpy and torch allow element-wise operations

In [9]:
n_a, t_a = np.array([1,2,3]), torch.tensor([1,2,3])

n_b, t_b = np.array([4,5,6]), torch.tensor([4,5,6])

print(n_a,n_b,'\n',t_a,t_b)

[1 2 3] [4 5 6] 
 tensor([1, 2, 3]) tensor([4, 5, 6])


In [10]:
#Arithmetic
print('Numpy addition :',n_a + n_b,'\nTensor addition :',t_a + t_b) # [5 7 9]

print('\nNumpy multiplication :',n_a * n_b,'\nTensor multiplication :',t_a * t_b) # [4 10 18]

print('\nNumpy division :',n_a / n_b,'\nTensor division :',t_a / t_b) # [0.25 0.4 0.5]

Numpy addition : [5 7 9] 
Tensor addition : tensor([5, 7, 9])

Numpy multiplication : [ 4 10 18] 
Tensor multiplication : tensor([ 4, 10, 18])

Numpy division : [0.25 0.4  0.5 ] 
Tensor division : tensor([0.2500, 0.4000, 0.5000])


In [11]:
#Math functions
print("Square root : ",np.sqrt(n_a),torch.sqrt(t_a))

print("\nSum : ",np.sum(n_a),torch.sum(t_a))

print("\nMean : ",np.mean(n_a),torch.mean(t_a.float()))

Square root :  [1.         1.41421356 1.73205081] tensor([1.0000, 1.4142, 1.7321])

Sum :  6 tensor(6)

Mean :  2.0 tensor(2.)


indexing, slicing and 2D arrays

In [12]:
#indexing
n_arr, t_tensor = np.array([1,2,3,4,5,6]), torch.tensor([1,2,3,4,5,6])
print(n_arr[0],t_tensor[0])

1 tensor(1)


In [13]:
#slicing
print(n_arr[1:4],t_tensor[1:4])

[2 3 4] tensor([2, 3, 4])


In [14]:
print(n_arr[::2],t_tensor[::2])

[1 3 5] tensor([1, 3, 5])


In [15]:
#2D arrays/matrices
arr_2d , tensor_2d = np.array([[1,2,3],[4,5,6],[7,8,9]]), torch.tensor([[1,2,3],[4,5,6],[7,8,9]])

print(arr_2d[1,2], tensor_2d[1,2]) #row 1, column 2

print(arr_2d[:,0], tensor_2d[:,0]) #first column

6 tensor(6)
[1 4 7] tensor([1, 4, 7])


### Array reshaping

In [16]:
arr,tensor = np.array([[1,2,3],[4,5,6]]),torch.tensor([[1,2,3],[4,5,6]])

print("Reshape : ",arr.reshape(3,2),'\n',tensor.view(3,2))#reshape(3,2))

Reshape :  [[1 2]
 [3 4]
 [5 6]] 
 tensor([[1, 2],
        [3, 4],
        [5, 6]])


Transposing

In [17]:
arr.T

array([[1, 4],
       [2, 5],
       [3, 6]])

In [18]:
tensor.T

tensor([[1, 4],
        [2, 5],
        [3, 6]])

## Random arrays and tensors

In [19]:
n_rand, t_rand = np.random.rand(3,2),torch.rand(3,2)
print(n_rand,'\n\n',t_rand)

[[0.06456617 0.06539721]
 [0.68494241 0.49279963]
 [0.6786188  0.01347687]] 

 tensor([[0.4625, 0.0312],
        [0.8097, 0.9764],
        [0.5198, 0.9463]])


In [20]:
n_rand, t_rand = np.random.randn(3,2),torch.randn(3,2) #normal distribution
print(n_rand,'\n\n',t_rand)

[[ 0.84616778 -0.7121355 ]
 [-0.14440635  1.08120263]
 [-0.1732879  -0.94086885]] 

 tensor([[-1.1997,  0.5570],
        [ 2.0523, -0.8392],
        [ 2.0337, -0.1858]])


## Automatic Differentiation (Tensor Specific)

One of pytorch's powerful features is automatic differentiation, essential for deep learning



In [21]:
# Creating a tensor with gradients
x = torch.tensor(3.0,requires_grad=True)
y = x**2
y.backward() #Compute the gradient of y wrt x
print(x.grad) #gradient = dy/dx

tensor(6.)


## Moving Tensors to GPU (Tensor Specific)

In [22]:
if torch.cuda.is_available():
  t_gpu = t.to('cuda')
  print(t_gpu)

tensor([1, 2, 3], device='cuda:0')
