# 🧮 What is Interpolation 

## 📘 Definition

**Interpolation** ek mathematical technique hai jisme hum **missing values ko estimate karte hain**  
based on **existing (known) data points** between two values.

Simple words mein:  
> "Interpolation ek tareeka hai jisse hum do known points ke beech ki missing value nikalte hain."

---

## 💡 Example

| Index (X) | Y |
|:----------:|:----------:|
| 10 | 100 |
| 11 | ? |
| 12 | 300 |

Yahan par 11th value missing hai.  
Interpolation bole toh —  
👉 “100 aur 300 ke beech ek straight line kheench ke dekho,  
toh 11th value somewhere in between aayegi — approx **200**.”

So mathematically:

Y = Y1 + ((X - X1) / (X2 - X1)) × (Y2 - Y1)

Y = 100 + ((11 - 10) / (12 - 10)) × (300 - 100)  
✅ Y = 200  

---

## 🤔 Why Not Just Use Average?

Ab sawaal aata hai —  
> “Agar average (mean) se fill kar sakte hain toh interpolation kyu?”

Good question bhai!  
Chal compare karte hain 👇  

| Method | Description | Kya Issue Hai |
|:-------|:-------------|:--------------|
| **Mean/Median Imputation** | Missing values ko column ke average ya median se fill karte hain | Sab missing values ko same number milta hai. Data ki trend ya pattern **khatam ho jaati hai.** |
| **Linear Interpolation** | Missing values ko dono paas ke points ke basis par **smoothly** fill karta hai | Data ka **natural trend** maintain rehta hai — smooth transitions hoti hain. |

---

## 📈 Visual Soch Le (Imagine Kar)

- Suppose tere stock prices hain:  
  100, **?**, 300, **?**, 500  

  Agar tu average (say 300) daal dega,  
  toh pattern toot jaayega — har gap par same value (300) aa jayegi.

  Par agar interpolation karega,  
  toh wo 100 → 300 → 500 ko ek line maan ke  
  beech ke points 200 aur 400 jaise **realistic** values dega 😎  

---

## 🧠 When to Use What

| Situation | Best Method |
|:-----------|:------------|
| Data is continuous (e.g., temperature, price, sensor readings) | ✅ **Interpolation** |
| Data is categorical (e.g., gender, city) | 🚫 Interpolation not possible |
| Few missing values and random distribution | Mean/Median may work |
| Sequential or time-based data | 🔥 **Interpolation is best** |

---

## 🐼 Pandas Shortcut

```python
df['column_name'] = df['column_name'].interpolate(method='linear')


In [22]:
import pandas as pd

In [23]:
df=pd.read_csv("synthetic_dataset.csv")


In [24]:
print(df.isnull().sum())  # This will calculate total missing value in each colum

Category    2748
Price        174
Rating      2050
Stock       1352
Discount     392
dtype: int64


# Linear Interpolation Explained 

## Formula

Y = Y1 + ((X - X1) / (X2 - X1)) * (Y2 - Y1)

---

## Example

| Index (X) | Price (Y) |
|:----------:|:----------:|
| 16 | 9319.0 |
| 17 | ? |
| 18 | 2066.0 |

**Given:**
- X1 = 16  
- Y1 = 9319.0  
- X2 = 18  
- Y2 = 2066.0  
- X  = 17  (jahan value missing hai)

---

### Step-by-Step Calculation

Y = 9319.0 + ((17 - 16) / (18 - 16)) * (2066.0 - 9319.0)  
Y = 9319.0 + (0.5) * (-7253.0)  
Y = 9319.0 - 3626.5  
✅ **Y = 5692.5**

---

## Final Table

| Index | Price |
|:------:|:------:|
| 16 | 9319.0 |
| 17 | 5692.5 👈 (Interpolated Value) |
| 18 | 2066.0 |

---

## 🐼 Using Pandas Interpolation in Python

> ⚠️ Interpolation sirf **numeric columns** (int, float) par apply hota hai.

```python
# Syntax
df[['column_1', 'column_2', 'column_3']] = df[['column_1', 'column_2', 'column_3']].interpolate(
    method='linear',
    axis=1,
    inplace=True
)


In [26]:
df[['Price']] = df[['Price']].interpolate(method='linear')
#This will modify original dataframe  Price column
print(df.loc[17,'Price'])   # Displaying Price which was missing after interpolation at index 17

new_df=df[['Rating']].interpolate(method="linear")
print(new_df[['Rating']].head())  #You can see that at index 4 Rating has been interpolated
print(df.head())  #From this you can Price column has been interpolated where as Ratings is not because for rating we created new data frame new_df


5692.5
     Rating
0  1.870322
1  4.757798
2  3.124941
3  1.492085
4  2.217504
  Category   Price    Rating         Stock  Discount
0      NaN  5548.0  1.870322           NaN       0.0
1      NaN  3045.0  4.757798           NaN      38.0
2      NaN  4004.0       NaN      In Stock       0.0
3      NaN  4808.0  1.492085           NaN      33.0
4      NaN  1817.0       NaN  Out of Stock      23.0


# 🧮 Linear Interpolation When Multiple Values Are Missing 

## 📊 Example Data

| Index (X) | Y |
|:----------:|:----------:|
| 16 | 9319 |
| 17 | NaN |
| 18 | NaN |
| 19 | NaN |
| 20 | 2066 |

---

## 💡 Concept

Jab ek se zyada values missing hoti hain, toh **linear interpolation** har missing point ke liye **same straight line equation** use karta hai.  
Basically, wo start (X1, Y1) aur end (X2, Y2) ke beech ek **linear line** assume karta hai, aur har missing X ke liye uske corresponding Y nikalta hai.

Formula same hi rehta hai:

**Y = Y1 + ((X - X1) / (X2 - X1)) × (Y2 - Y1)**

Bas ab `X` alag-alag missing points (17, 18, 19) ke liye use hoga.

---

## 🧩 Step-by-Step Interpolation

**Given:**
- X1 = 16  
- Y1 = 9319  
- X2 = 20  
- Y2 = 2066  

---

### 🔹 For X = 17:
Y = 9319 + ((17 - 16) / (20 - 16)) × (2066 - 9319)  
Y = 9319 + (1 / 4) × (-7253)  
Y = 9319 - 1813.25  
✅ **Y = 7505.75**

---

### 🔹 For X = 18:
Y = 9319 + ((18 - 16) / (20 - 16)) × (2066 - 9319)  
Y = 9319 + (2 / 4) × (-7253)  
Y = 9319 - 3626.5  
✅ **Y = 5692.5**

---

### 🔹 For X = 19:
Y = 9319 + ((19 - 16) / (20 - 16)) × (2066 - 9319)  
Y = 9319 + (3 / 4) × (-7253)  
Y = 9319 - 5439.75  
✅ **Y = 3879.25**

---

## ✅ Final Table After Interpolation

| Index (X) | Interpolated Y |
|:----------:|:--------------:|
| 16 | 9319.00 |
| 17 | 7505.75 |
| 18 | 5692.50 |
| 19 | 3879.25 |
| 20 | 2066.00 |

---

## 🧠 Short Explanation

Dekha bhai, linear interpolation ek **straight line** assume karta hai between known points (16, 9319) and (20, 2066).  
Phir us line ke beech har missing point (17, 18, 19) ke liye **equal proportion** mein value nikalta hai.

Yaane ki values ekdum smoothly decrease ho rahi hain —  
9319 → 7505.75 → 5692.5 → 3879.25 → 2066 😎  

---

## 🐼 Pandas Tip

Agar tu Python mein kar raha hai toh bas ek command se yeh ho jayega:

```python
df['Y'] = df['Y'].interpolate(method='linear')
