## Data Manipulation and Analysis with Pandas

Data Manipulation and analysis are key tasks in any data science or data analysis project.Pandas provide a wide range of functions for data manipulation and analysis, making it easier to clean, transform, and extract insights from data.

In this lesson, we will cover various data manipulation and analysis technique using Pandas.

In [15]:
import pandas as pd

In [16]:
data = pd.read_csv('Income.csv')

In [17]:
data.head(5)

Unnamed: 0,ID,Income,Age
0,1,10000.0,22.0
1,2,13000.0,30.0
2,3,5000.0,19.0
3,4,4000.0,20.0
4,5,4300.0,


In [18]:
data.dtypes

ID          int64
Income    float64
Age       float64
dtype: object

In [19]:
# Handling Missing Values
data.isnull()

Unnamed: 0,ID,Income,Age
0,False,False,False
1,False,False,False
2,False,False,False
3,False,False,False
4,False,False,True
5,False,False,False
6,False,False,False
7,False,False,False
8,False,False,False
9,False,False,False


In [20]:
data.isnull().any()

ID        False
Income     True
Age        True
dtype: bool

In [21]:
data.isnull().sum()

ID        0
Income    5
Age       3
dtype: int64

In [22]:
dataFill = data.fillna(0)
dataFill

Unnamed: 0,ID,Income,Age
0,1,10000.0,22.0
1,2,13000.0,30.0
2,3,5000.0,19.0
3,4,4000.0,20.0
4,5,4300.0,0.0
5,6,45000.0,33.0
6,7,20000.0,51.0
7,8,43000.0,28.0
8,9,48000.0,45.0
9,10,29000.0,44.0


In [23]:
# Filling missing value with the mean of the column
data['Income_FillNA'] = data['Income'].fillna(data['Income'].mean())

In [24]:
data

Unnamed: 0,ID,Income,Age,Income_FillNA
0,1,10000.0,22.0,10000.0
1,2,13000.0,30.0,13000.0
2,3,5000.0,19.0,5000.0
3,4,4000.0,20.0,4000.0
4,5,4300.0,,4300.0
5,6,45000.0,33.0,45000.0
6,7,20000.0,51.0,20000.0
7,8,43000.0,28.0,43000.0
8,9,48000.0,45.0,48000.0
9,10,29000.0,44.0,29000.0


In [26]:
data.dtypes

ID                 int64
Income           float64
Age              float64
Income_FillNA    float64
dtype: object

In [29]:
# Renaming Column

data = data.rename(columns={"Income_FillNA": "Salary (Income)"})

In [31]:
data

Unnamed: 0,ID,Income,Age,Salary (Income)
0,1,10000.0,22.0,10000.0
1,2,13000.0,30.0,13000.0
2,3,5000.0,19.0,5000.0
3,4,4000.0,20.0,4000.0
4,5,4300.0,,4300.0
5,6,45000.0,33.0,45000.0
6,7,20000.0,51.0,20000.0
7,8,43000.0,28.0,43000.0
8,9,48000.0,45.0,48000.0
9,10,29000.0,44.0,29000.0


In [33]:
# Change DataType
data['ValueNew'] = data["Salary (Income)"].astype(int)

In [34]:
data

Unnamed: 0,ID,Income,Age,Salary (Income),ValueNew
0,1,10000.0,22.0,10000.0,10000
1,2,13000.0,30.0,13000.0,13000
2,3,5000.0,19.0,5000.0,5000
3,4,4000.0,20.0,4000.0,4000
4,5,4300.0,,4300.0,4300
5,6,45000.0,33.0,45000.0,45000
6,7,20000.0,51.0,20000.0,20000
7,8,43000.0,28.0,43000.0,43000
8,9,48000.0,45.0,48000.0,48000
9,10,29000.0,44.0,29000.0,29000
