In [None]:
import pandas as pd
import matplotlib.pyplot as plt


In [None]:
df = pd.read_csv("/content/Placement_Dataset.csv")
df.head()


Unnamed: 0,sl_no,gender,ssc_p,ssc_b,hsc_p,hsc_b,hsc_s,degree_p,degree_t,workex,etest_p,specialisation,mba_p,status,salary
0,1,M,67.0,Others,91.0,Others,Commerce,58.0,Sci&Tech,No,55.0,Mkt&HR,58.8,Placed,270000.0
1,2,M,79.33,Central,78.33,Others,Science,77.48,Sci&Tech,Yes,86.5,Mkt&Fin,66.28,Placed,200000.0
2,3,M,65.0,Central,68.0,Central,Arts,64.0,Comm&Mgmt,No,75.0,Mkt&Fin,57.8,Placed,250000.0
3,4,M,56.0,Central,52.0,Central,Science,52.0,Sci&Tech,No,66.0,Mkt&HR,59.43,Not Placed,
4,5,M,85.8,Central,73.6,Central,Commerce,73.3,Comm&Mgmt,No,96.8,Mkt&Fin,55.5,Placed,425000.0


In [None]:
df.isnull().sum()

Unnamed: 0,0
sl_no,0
gender,0
ssc_p,0
ssc_b,0
hsc_p,0
hsc_b,0
hsc_s,0
degree_p,0
degree_t,0
workex,0


Perfect! Here's a simple and practical guide on **when to use `mean`, `median`, or `mode`** to fill missing data (`NaN`) in a column like `salary`:

---

## ✅ **1. Mean (Average)**

### ➤ **Use Mean When:**

* Data is **numeric and continuous** (e.g., salary, age).
* Values are **evenly distributed** (not skewed).
* There are **no extreme outliers** (like a ₹10 lakh salary among ₹20k–₹50k ones).

```python
df['salary'].fillna(df['salary'].mean(), inplace=True)
```

### ❌ **Avoid Mean When:**

* Data is **skewed** (e.g., few huge values raise the average).
* There are **many outliers**.
* Data is **not continuous** (e.g., categories).

---

## ✅ **2. Median (Middle Value)**

### ➤ **Use Median When:**

* Data is **numeric**.
* Data is **skewed or has outliers**.
* You want a **robust central value**.

```python
df['salary'].fillna(df['salary'].median(), inplace=True)
```

### ❌ **Avoid Median When:**

* Data is **categorical** or **non-numeric**.
* You want to preserve **true average behavior**.

---

## ✅ **3. Mode (Most Frequent Value)**

### ➤ **Use Mode When:**

* Data is **categorical** (e.g., gender, city).
* You want to fill with the **most common value**.
* Even works on numbers if repetition makes sense.

```python
df['city'].fillna(df['city'].mode()[0], inplace=True)
```

### ❌ **Avoid Mode When:**

* Data has **no clear most frequent value** (many unique values).
* You want to preserve **numeric distribution** (then use mean/median).

---

## 📊 Summary Table:

| Method     | Best For                    | Avoid When                                |
| ---------- | --------------------------- | ----------------------------------------- |
| **Mean**   | Numeric, no outliers        | Skewed or outlier-heavy data              |
| **Median** | Numeric, skewed data        | Non-numeric                               |
| **Mode**   | Categorical/frequent values | High uniqueness / numeric averages needed |

---

Would you like a flowchart or code logic to choose between them automatically?


In [None]:
df['salary'].fillna(df['salary'].median(), inplace=True)

In [None]:
df.head()

Unnamed: 0,sl_no,gender,ssc_p,ssc_b,hsc_p,hsc_b,hsc_s,degree_p,degree_t,workex,etest_p,specialisation,mba_p,status,salary
0,1,M,67.0,Others,91.0,Others,Commerce,58.0,Sci&Tech,No,55.0,Mkt&HR,58.8,Placed,
1,2,M,79.33,Central,78.33,Others,Science,77.48,Sci&Tech,Yes,86.5,Mkt&Fin,66.28,Placed,
2,3,M,65.0,Central,68.0,Central,Arts,64.0,Comm&Mgmt,No,75.0,Mkt&Fin,57.8,Placed,
3,4,M,56.0,Central,52.0,Central,Science,52.0,Sci&Tech,No,66.0,Mkt&HR,59.43,Not Placed,
4,5,M,85.8,Central,73.6,Central,Commerce,73.3,Comm&Mgmt,No,96.8,Mkt&Fin,55.5,Placed,


In [None]:
ds = pd.read_csv("/content/Placement_Dataset.csv")
ds.head()

Unnamed: 0,sl_no,gender,ssc_p,ssc_b,hsc_p,hsc_b,hsc_s,degree_p,degree_t,workex,etest_p,specialisation,mba_p,status,salary
0,1,M,67.0,Others,91.0,Others,Commerce,58.0,Sci&Tech,No,55.0,Mkt&HR,58.8,Placed,270000.0
1,2,M,79.33,Central,78.33,Others,Science,77.48,Sci&Tech,Yes,86.5,Mkt&Fin,66.28,Placed,200000.0
2,3,M,65.0,Central,68.0,Central,Arts,64.0,Comm&Mgmt,No,75.0,Mkt&Fin,57.8,Placed,250000.0
3,4,M,56.0,Central,52.0,Central,Science,52.0,Sci&Tech,No,66.0,Mkt&HR,59.43,Not Placed,
4,5,M,85.8,Central,73.6,Central,Commerce,73.3,Comm&Mgmt,No,96.8,Mkt&Fin,55.5,Placed,425000.0


In [None]:
ds['salary'].fillna(ds['salary'].median(), inplace=True)
ds.head()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  ds['salary'].fillna(ds['salary'].median(), inplace=True)


Unnamed: 0,sl_no,gender,ssc_p,ssc_b,hsc_p,hsc_b,hsc_s,degree_p,degree_t,workex,etest_p,specialisation,mba_p,status,salary
0,1,M,67.0,Others,91.0,Others,Commerce,58.0,Sci&Tech,No,55.0,Mkt&HR,58.8,Placed,270000.0
1,2,M,79.33,Central,78.33,Others,Science,77.48,Sci&Tech,Yes,86.5,Mkt&Fin,66.28,Placed,200000.0
2,3,M,65.0,Central,68.0,Central,Arts,64.0,Comm&Mgmt,No,75.0,Mkt&Fin,57.8,Placed,250000.0
3,4,M,56.0,Central,52.0,Central,Science,52.0,Sci&Tech,No,66.0,Mkt&HR,59.43,Not Placed,265000.0
4,5,M,85.8,Central,73.6,Central,Commerce,73.3,Comm&Mgmt,No,96.8,Mkt&Fin,55.5,Placed,425000.0


In [None]:
ds1 = pd.read_csv("/content/Placement_Dataset.csv")
ds1.head()

Unnamed: 0,sl_no,gender,ssc_p,ssc_b,hsc_p,hsc_b,hsc_s,degree_p,degree_t,workex,etest_p,specialisation,mba_p,status,salary
0,1,M,67.0,Others,91.0,Others,Commerce,58.0,Sci&Tech,No,55.0,Mkt&HR,58.8,Placed,270000.0
1,2,M,79.33,Central,78.33,Others,Science,77.48,Sci&Tech,Yes,86.5,Mkt&Fin,66.28,Placed,200000.0
2,3,M,65.0,Central,68.0,Central,Arts,64.0,Comm&Mgmt,No,75.0,Mkt&Fin,57.8,Placed,250000.0
3,4,M,56.0,Central,52.0,Central,Science,52.0,Sci&Tech,No,66.0,Mkt&HR,59.43,Not Placed,
4,5,M,85.8,Central,73.6,Central,Commerce,73.3,Comm&Mgmt,No,96.8,Mkt&Fin,55.5,Placed,425000.0


In [None]:
ds1['salary'].fillna(ds1['salary'].median(), inplace=True)
ds1.head()

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  ds1['salary'].fillna(ds1['salary'].median(), inplace=True)


Unnamed: 0,sl_no,gender,ssc_p,ssc_b,hsc_p,hsc_b,hsc_s,degree_p,degree_t,workex,etest_p,specialisation,mba_p,status,salary
0,1,M,67.0,Others,91.0,Others,Commerce,58.0,Sci&Tech,No,55.0,Mkt&HR,58.8,Placed,270000.0
1,2,M,79.33,Central,78.33,Others,Science,77.48,Sci&Tech,Yes,86.5,Mkt&Fin,66.28,Placed,200000.0
2,3,M,65.0,Central,68.0,Central,Arts,64.0,Comm&Mgmt,No,75.0,Mkt&Fin,57.8,Placed,250000.0
3,4,M,56.0,Central,52.0,Central,Science,52.0,Sci&Tech,No,66.0,Mkt&HR,59.43,Not Placed,265000.0
4,5,M,85.8,Central,73.6,Central,Commerce,73.3,Comm&Mgmt,No,96.8,Mkt&Fin,55.5,Placed,425000.0


In [None]:
from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


In [None]:
!mkdir -p /content/drive/MyDrive/ml-daily-notebooks/Day01_HandlingMissingValues


In [None]:
!cp "/content/drive/MyDrive/Colab Notebooks/4.3 Handling Missing Values.ipynb" "/content/drive/MyDrive/ml-daily-notebooks/Day01_HandlingMissingValues/notebook.ipynb"


cp: cannot stat '/content/drive/MyDrive/Colab Notebooks/4.3 Handling Missing Values.ipynb': No such file or directory
