<a href="https://colab.research.google.com/github/JP109/ML-Basics/blob/main/Pandas.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
import numpy as np
import pandas as pd

# **1. Series**

### **1. Create Series:**

In [2]:
# Using a list
list_1=['a','b','c','d']
labels_1=[1,2,3,4]

series_1 = pd.Series(data=list_1, index=labels_1)
series_1

1    a
2    b
3    c
4    d
dtype: object

In [3]:
# Using a numpy array
arr_1 = np.array([1,2,3,4])

series_2 = pd.Series(arr_1)
series_2

0    1
1    2
2    3
3    4
dtype: int64

In [4]:
# Using a dictionary
dict_1={'name':'Jai', 'surname':'Pawar', 'age':20}

series_3 = pd.Series(dict_1)
series_3

name         Jai
surname    Pawar
age           20
dtype: object

### **2. Access data in series**

In [5]:
series_3['name']

'Jai'

In [6]:
# Get data type
series_2.dtype

dtype('int64')

### **3. Math operations**

In [7]:
series_2 + series_2

0    2
1    4
2    6
3    8
dtype: int64

In [8]:
series_2 / series_2

0    1.0
1    1.0
2    1.0
3    1.0
dtype: float64

In [9]:
# Pass them into numpy functions
np.exp(series_2)

0     2.718282
1     7.389056
2    20.085537
3    54.598150
dtype: float64

Main difference between Series and numpy arrays is that operations align by labels in Series:

In [10]:
series_4=pd.Series({4:5,5:6,6:7,7:8})
series_2 + series_4

0   NaN
1   NaN
2   NaN
3   NaN
4   NaN
5   NaN
6   NaN
7   NaN
dtype: float64

This happened because both the series did not line up according to their labels

In [11]:
# Assign names to series
series_5 = pd.Series({1:2, 3:4}, name='yeehaw')
series_5.name

'yeehaw'

# **2. Dataframe**

**1. Creating dataframes**

In [12]:
# Using numpy array
arr_2 =np.random.randint(10,50,size=(2,3))
df_1 = pd.DataFrame(arr_2,['A','B'],['C','D','E'])
df_1

Unnamed: 0,C,D,E
A,49,10,22
B,48,10,43


In [13]:
# Using multiple Series
dict_3 = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']), 'two': pd.Series([4, 5, 6, 7], index=['a', 'b', 'c', 'd'])}
df_2 = pd.DataFrame(dict_3)
df_2

Unnamed: 0,one,two
a,1.0,4
b,2.0,5
c,3.0,6
d,,7


In [14]:
# Using a dictionary 
#from_dict() has 3 params= data:Supplied below, orient:default is 'column' as below, can be reversed ny supplying 'index,
#columns:list of values to be used as labels when orientation is 'index'
df_3 = pd.DataFrame.from_dict(dict([('A',[1,2,3]),('B',[4,5,6])]))
df_3

Unnamed: 0,A,B
0,1,4
1,2,5
2,3,6


In [15]:
# Shape of df
df_1.shape

(2, 3)

**2. Accessing/Editing data**

In [16]:
# Accessing a column using a label
df_1['C']

A    49
B    48
Name: C, dtype: int64

In [18]:
df_1[['C','D']]

Unnamed: 0,C,D
A,49,10
B,48,10


In [19]:
# Accessing a row as a series
df_1.loc['A']

C    49
D    10
E    22
Name: A, dtype: int64

In [21]:
df_1.iloc[1]

C    48
D    10
E    43
Name: B, dtype: int64

In [22]:
# Accessing rows and columns
df_1.loc[['A','B'], ['D','E']]

Unnamed: 0,D,E
A,10,22
B,10,43


In [31]:
# Create a new column
df_1['Total']=df_1['C']+df_1['D']+df_1['E']
df_1

Unnamed: 0,C,D,E,Total
A,49.0,10.0,22.0,81.0
B,48.0,10.0,43.0,101.0
F,44.0,45.0,46.0,135.0


In [28]:
# Create a new row
dict_2 = {'C':44, 'D':45, 'E': 46}
new_row = pd.Series(dict_2, name='F')
df_1 = df_1.append(new_row)
df_1

Unnamed: 0,C,D,E,Total
A,49.0,10.0,22.0,81.0
B,48.0,10.0,43.0,101.0
F,44.0,45.0,46.0,


In [32]:
# Delete columns
df_1.drop('Total', axis=1, inplace=True)
df_1

Unnamed: 0,C,D,E
A,49.0,10.0,22.0
B,48.0,10.0,43.0
F,44.0,45.0,46.0


In [34]:
# Delete rows
df_1.drop('B', axis=0, inplace=True)
df_1

Unnamed: 0,C,D,E
A,49.0,10.0,22.0
F,44.0,45.0,46.0


In [36]:
# Set a column as index
df_1['Sex'] = ['Men', 'Women']
df_1.set_index('Sex', inplace=True)
df_1

Unnamed: 0_level_0,C,D,E
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Men,49.0,10.0,22.0
Women,44.0,45.0,46.0


In [37]:
# Replace index with numbers
df_1.reset_index(inplace=True)
df_1

Unnamed: 0,Sex,C,D,E
0,Men,49.0,10.0,22.0
1,Women,44.0,45.0,46.0
