# 📘 Important Pandas DataFrame Methods

In this session, we’ll explore powerful DataFrame methods in Pandas with real-world use cases, using student data (name, marks, attendance) as our example.

---

## 🔧 1. `sort_values()`

**👉 What it does:**  
Sorts the DataFrame based on the values of one or more columns.

**📌 Use case:**  
Sort students by marks (highest to lowest).


In [16]:
import pandas as pd
import numpy as np
data = {
    'Name': ['Aman', 'Neha', 'Ravi', 'Neha', 'Anjali', "Jay"],
    'Marks': [85, 92, None, 92, 75, np.nan],
    'Attendance': [95, 88, None, 95, 95, np.nan]
}

df = pd.DataFrame(data)
df.sort_values(by='Marks', ascending=False)

Unnamed: 0,Name,Marks,Attendance
1,Neha,92.0,88.0
3,Neha,92.0,95.0
0,Aman,85.0,95.0
4,Anjali,75.0,95.0
2,Ravi,,
5,Jay,,


## 🔧 2. set_index() and reset_index()
**👉 What it does:**
- `set_index()`: Sets a column (like "Name") as the row index.
- `reset_index()`: Resets it back to default numerical index.

**📌 Use case:**
 Make "Name" the index for better row identification.

In [17]:
# df = df.set_index('Name')
# print(df)
df = df.reset_index()
df

Unnamed: 0,index,Name,Marks,Attendance
0,0,Aman,85.0,95.0
1,1,Neha,92.0,88.0
2,2,Ravi,,
3,3,Neha,92.0,95.0
4,4,Anjali,75.0,95.0
5,5,Jay,,


## 🔧 3. isnull() and notnull()
**👉 What it does:**
Checks for missing (`NaN`) values.

**📌 Use case:**
Find out if any marks or attendance values are missing.

In [21]:
print(df.isnull())         # Returns True for missing values
# df.isnull().sum()   # Count of missing values per column
print(df.notnull())

   index   Name  Marks  Attendance
0  False  False  False       False
1  False  False  False       False
2  False  False   True        True
3  False  False  False       False
4  False  False  False       False
5  False  False   True        True
   index  Name  Marks  Attendance
0   True  True   True        True
1   True  True   True        True
2   True  True  False       False
3   True  True   True        True
4   True  True   True        True
5   True  True  False       False


## 🔧 4. dropna()
**👉 What it does:**
Removes rows with missing values.

**📌 Use case:**
Remove students with incomplete records.

In [22]:
df.dropna()

Unnamed: 0,index,Name,Marks,Attendance
0,0,Aman,85.0,95.0
1,1,Neha,92.0,88.0
3,3,Neha,92.0,95.0
4,4,Anjali,75.0,95.0


## 🔧 5. fillna()
**👉 What it does:**
Fills missing values with a given value or calculated value.

**📌 Use case:**
Fill missing marks with average, attendance with 100.

In [24]:
df['Marks'] = df['Marks'].fillna(df['Marks'].mean())
df['Attendance'] = df['Attendance'].fillna(100)
df

Unnamed: 0,index,Name,Marks,Attendance
0,0,Aman,85.0,95.0
1,1,Neha,92.0,88.0
2,2,Ravi,86.0,100.0
3,3,Neha,92.0,95.0
4,4,Anjali,75.0,95.0
5,5,Jay,86.0,100.0


## 🔧 6. drop_duplicates()
**👉 What it does:**
Removes duplicate rows (same values repeated).

**📌 Use case:**
Remove repeated entries of a student.

In [25]:
df.drop_duplicates()

Unnamed: 0,index,Name,Marks,Attendance
0,0,Aman,85.0,95.0
1,1,Neha,92.0,88.0
2,2,Ravi,86.0,100.0
3,3,Neha,92.0,95.0
4,4,Anjali,75.0,95.0
5,5,Jay,86.0,100.0


## 🔧 7. value_counts()
**👉 What it does:**
Counts how many times each unique value appears in a column.

**📌 Use case:**
Check how many times a student name is repeated.

In [26]:
df['Name'].value_counts()

Name
Neha      2
Aman      1
Ravi      1
Anjali    1
Jay       1
Name: count, dtype: int64

## 🔧 8. apply()
**👉 What it does:**
Applies a custom function to each value in a column or row.

**📌 Use case:**
Add bonus marks (+5) to each student.

In [28]:
df['Bonus'] = df['Marks'].apply(lambda x: x + 5)
df

Unnamed: 0,index,Name,Marks,Attendance,Bonus
0,0,Aman,85.0,95.0,90.0
1,1,Neha,92.0,88.0,97.0
2,2,Ravi,86.0,100.0,91.0
3,3,Neha,92.0,95.0,97.0
4,4,Anjali,75.0,95.0,80.0
5,5,Jay,86.0,100.0,91.0


## 🔧 9. map()
**👉 What it does:**
Maps or transforms values using a function or dictionary.

**📌 Use case:**
Convert marks into grades (A, B, C).

In [31]:
def get_grade(m):
    if m >= 90:
        return 'A'
    elif m >= 80:
        return 'B'
    else:
        return 'C'

df['Grade'] = df['Marks'].map(get_grade)
df

Unnamed: 0,index,Name,Marks,Attendance,Bonus,Grade
0,0,Aman,85.0,95.0,90.0,B
1,1,Neha,92.0,88.0,97.0,A
2,2,Ravi,86.0,100.0,91.0,B
3,3,Neha,92.0,95.0,97.0,A
4,4,Anjali,75.0,95.0,80.0,C
5,5,Jay,86.0,100.0,91.0,B
