<a href="https://colab.research.google.com/github/SrinathMLOps/MLPractise/blob/main/pandas_df_methods_colab.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# 🧪 Pandas DataFrame Methods: Before & After with Code
This notebook demonstrates common pandas `df.` methods using a sample dataset. Each cell shows the original data, method usage, and result.

In [19]:
import pandas as pd

# Sample dataset
data = {
    "Name": ["Alice", "Bob", "Charlie", "David", "Eve"],
    "Age": [25, 30, None, 45, 30],
    "Gender": ["F", "M", "M", "M", "F"],
    "Salary": [50000, 60000, 70000, 80000, None],
    "Department": ["HR", "IT", "IT", "Finance", "HR"]
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


## 🔹 Method: `df.info()`
```python
df.info()
```

In [20]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [21]:
# AFTER APPLYING df.info()
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 5 columns):
 #   Column      Non-Null Count  Dtype  
---  ------      --------------  -----  
 0   Name        5 non-null      object 
 1   Age         4 non-null      float64
 2   Gender      5 non-null      object 
 3   Salary      4 non-null      float64
 4   Department  5 non-null      object 
dtypes: float64(2), object(3)
memory usage: 332.0+ bytes


## 🔹 Method: `df.fillna()`
```python
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Salary'].fillna(df['Salary'].median(), inplace=True)
```

In [22]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [23]:
# AFTER APPLYING df.fillna()
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['Salary'].fillna(df['Salary'].median(), inplace=True)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Age'].fillna(df['Age'].mean(), inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df['Salary'].fillna(df['Salary'].median(), inplace=True)


## 🔹 Method: `df.dropna()`
```python
df.dropna()
```

In [24]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [25]:
# AFTER APPLYING df.dropna()
df.dropna()

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
3,David,45.0,M,80000.0,Finance


## 🔹 Method: `df['Gender'].value_counts()`
```python
df['Gender'].value_counts()
```

In [26]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [27]:
# AFTER APPLYING df['Gender'].value_counts()
df['Gender'].value_counts()

Unnamed: 0_level_0,count
Gender,Unnamed: 1_level_1
M,3
F,2


## 🔹 Method: `df['Department'].unique()`
```python
df['Department'].unique()
```

In [28]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [29]:
# AFTER APPLYING df['Department'].unique()
df['Department'].unique()

array(['HR', 'IT', 'Finance'], dtype=object)

## 🔹 Method: `df.describe()`
```python
df.describe()
```

In [30]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [31]:
# AFTER APPLYING df.describe()
df.describe()

Unnamed: 0,Age,Salary
count,4.0,4.0
mean,32.5,65000.0
std,8.660254,12909.944487
min,25.0,50000.0
25%,28.75,57500.0
50%,30.0,65000.0
75%,33.75,72500.0
max,45.0,80000.0


## 🔹 Method: `df.rename()`
```python
df.rename(columns={'Salary': 'AnnualSalary'})
```

In [32]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [33]:
# AFTER APPLYING df.rename()
df.rename(columns={'Salary': 'AnnualSalary'})

Unnamed: 0,Name,Age,Gender,AnnualSalary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


## 🔹 Method: `df.drop()`
```python
df.drop('Department', axis=1)
```

In [34]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [35]:
# AFTER APPLYING df.drop()
df.drop('Department', axis=1)

Unnamed: 0,Name,Age,Gender,Salary
0,Alice,25.0,F,50000.0
1,Bob,30.0,M,60000.0
2,Charlie,,M,70000.0
3,David,45.0,M,80000.0
4,Eve,30.0,F,


## 🔹 Method: `df.replace()`
```python
df.replace({'M': 'Male', 'F': 'Female'})
```

In [36]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [37]:
# AFTER APPLYING df.replace()
df.replace({'M': 'Male', 'F': 'Female'})

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,Female,50000.0,HR
1,Bob,30.0,Male,60000.0,IT
2,Charlie,,Male,70000.0,IT
3,David,45.0,Male,80000.0,Finance
4,Eve,30.0,Female,,HR


## 🔹 Method: `df.astype()`
```python
df['Age'] = df['Age'].fillna(0).astype(int)
```

In [38]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [39]:
# AFTER APPLYING df.astype()
df['Age'] = df['Age'].fillna(0).astype(int)

## 🔹 Method: `df.sort_values()`
```python
df.sort_values('Salary')
```

In [40]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [41]:
# AFTER APPLYING df.sort_values()
df.sort_values('Salary')

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


## 🔹 Method: `df.groupby()`
```python
df.groupby('Gender')['Salary'].mean()
```

In [42]:
# BEFORE
import pandas as pd
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, None, 45, 30],
    'Gender': ['F', 'M', 'M', 'M', 'F'],
    'Salary': [50000, 60000, 70000, 80000, None],
    'Department': ['HR', 'IT', 'IT', 'Finance', 'HR']
}
df = pd.DataFrame(data)
df

Unnamed: 0,Name,Age,Gender,Salary,Department
0,Alice,25.0,F,50000.0,HR
1,Bob,30.0,M,60000.0,IT
2,Charlie,,M,70000.0,IT
3,David,45.0,M,80000.0,Finance
4,Eve,30.0,F,,HR


In [43]:
# AFTER APPLYING df.groupby()
df.groupby('Gender')['Salary'].mean()

Unnamed: 0_level_0,Salary
Gender,Unnamed: 1_level_1
F,50000.0
M,70000.0


✔️ Now you're ready to master these functions in real-world data analysis workflows.