# **Interpolation:**
- Interpolation is a method used to fill missing values (NaN) in a DataFrame or Series by estimating them based on other data points.<br><br>
**Common Use:**
- When you have missing values in a time series or numeric dataset and want to guess or estimate them based on nearby values.<br><br>
**How it works:**
- Pandas looks at the known values before and after the missing one and fills it based on a linear pattern by default.<br><br>
## **Usage:**
1. Time-Series data
2. Numeric data with trends
3. Avoid dropping rows

In [1]:
import pandas as pd
df = pd.read_excel("files/employee.xlsx")
df

Unnamed: 0,Serial,Emp_ID,Designation,Department,Age,Salary
0,1,1101,Manager,Accounts,50.0,200000.0
1,2,1107,Officer,IT,30.0,80000.0
2,3,1203,Officer,HR,28.0,
3,4,1005,Manager,HR,45.0,120000.0
4,5,2123,Office Boy,Accounts,27.0,45000.0
5,6,2451,Accountant,,34.0,100000.0
6,7,1111,Accountant,Accounts,,110000.0
7,8,1001,Officer,IT,25.0,75000.0
8,9,1234,Manager,IT,23.0,
9,10,2156,Engineer,Production,45.0,89000.0


## **Types of interpolation (via method = parameter):**
### 1. **method='linear' (Default):**
- Fills missing values by connecting known values with straight lines — assumes a linear trend.

In [2]:
df["Salary"] = df["Salary"].interpolate(method="linear")
df

Unnamed: 0,Serial,Emp_ID,Designation,Department,Age,Salary
0,1,1101,Manager,Accounts,50.0,200000.0
1,2,1107,Officer,IT,30.0,80000.0
2,3,1203,Officer,HR,28.0,100000.0
3,4,1005,Manager,HR,45.0,120000.0
4,5,2123,Office Boy,Accounts,27.0,45000.0
5,6,2451,Accountant,,34.0,100000.0
6,7,1111,Accountant,Accounts,,110000.0
7,8,1001,Officer,IT,25.0,75000.0
8,9,1234,Manager,IT,23.0,82000.0
9,10,2156,Engineer,Production,45.0,89000.0


In [18]:
data_with_linear = pd.Series([None,20,None,45,None,None,89])
data_with_linear = data_with_linear.interpolate(method="linear")
data_with_linear  # it does not fills the first row as it has not found any pattern because linear works in between two known values

0          NaN
1    20.000000
2    32.500000
3    45.000000
4    59.666667
5    74.333333
6    89.000000
dtype: float64

### 2. **method='time'**
- Used for time series data with datetime index — interpolates based on time intervals.

In [3]:
time_data = pd.DataFrame({
    "Names": ["Zahid","Ali","Shaham","Ahmed", "Nehaad"],
    "Marks": [65,70,None,86,54],
},    index=pd.to_datetime(["2024-02-02", "2024-04-23","2024-07-03","2024-09-12","2025-02-01"])
)
time_data

Unnamed: 0,Names,Marks
2024-02-02,Zahid,65.0
2024-04-23,Ali,70.0
2024-07-03,Shaham,
2024-09-12,Ahmed,86.0
2025-02-01,Nehaad,54.0


In [4]:
time_data["Marks"] = time_data["Marks"].interpolate(method="time")
time_data

Unnamed: 0,Names,Marks
2024-02-02,Zahid,65.0
2024-04-23,Ali,70.0
2024-07-03,Shaham,78.0
2024-09-12,Ahmed,86.0
2025-02-01,Nehaad,54.0


### 3. **method='index':**
- Interpolates based on the index values (numeric or datetime), not just position.

In [9]:
data_with_index = pd.Series([10,15,None,40], index=[0,1,5,6])
data_with_index

0    10.0
1    15.0
5     NaN
6    40.0
dtype: float64

In [10]:
data_with_index = data_with_index.interpolate(method="index")
data_with_index

0    10.0
1    15.0
5    35.0
6    40.0
dtype: float64

### 4. **method='nearest'**
- Fills missing values using the nearest known value (either before or after).

In [12]:
data_with_nearest = pd.Series([10,20,None,45,None,None,89])
data_with_nearest

0    10.0
1    20.0
2     NaN
3    45.0
4     NaN
5     NaN
6    89.0
dtype: float64

In [16]:
data_with_nearest = data_with_nearest.interpolate(method="nearest")
print(data_with_nearest)
"""it fills index(2) with 20 becuase it is nearest to 20, fill index(4) with 45 as it is nearest to 45 and index(5) with 89"""

0    10.0
1    20.0
2    20.0
3    45.0
4    45.0
5    89.0
6    89.0
dtype: float64


'it fills index(2) with 20 becuase it is nearest to 20, fill index(4) with 45 as it is nearest to 45 and index(5) with 89'

### 5. **method='polynomial'**
- Uses a polynomial equation to estimate missing values. You must specify the order.

In [24]:
data_with_polynomial = pd.Series([10,20,None,45,None,None,89])
data_with_polynomial = data_with_polynomial.interpolate(method="polynomial", order=1)
data_with_polynomial

0    10.000000
1    20.000000
2    32.500000
3    45.000000
4    59.666667
5    74.333333
6    89.000000
dtype: float64

In [26]:
# with order = 2
data_with_polynomial = pd.Series([10,20,None,45,None,None,89])
data_with_polynomial = data_with_polynomial.interpolate(method="polynomial", order=2)
data_with_polynomial

0    10.000000
1    20.000000
2    31.848485
3    45.000000
4    58.909091
5    73.575758
6    89.000000
dtype: float64

### 6. **method='spline'**
- Fills missing values using spline interpolation (smooth curve fit). Also needs order.

In [None]:
data_with_spline = pd.Series([10,20,None,45,None,None,89])
data_with_spline = data_with_spline.interpolate(method="spline", order=2)
data_with_spline

0    10.000000
1    20.000000
2    32.075758
3    45.000000
4    58.500000
5    73.257576
6    89.000000
dtype: float64

### 7. **method='pad' (Forward Fill / ffill) OR data.ffill()**
- Carries the last known value forward to fill missing values

In [30]:
data_with_forward_fill = pd.Series([10,20,None,45,None,None,89])
data_with_forward_fill = data_with_forward_fill.ffill()
data_with_forward_fill

0    10.0
1    20.0
2    20.0
3    45.0
4    45.0
5    45.0
6    89.0
dtype: float64

### 8. **method='bfill' (Backward Fill) OR data.bfill()**
- Fills missing values using the next known value (from below).

In [32]:
data_with_backward_fill = pd.Series([10,20,None,45,None,None,89])
data_with_backward_fill = data_with_backward_fill.bfill()
data_with_backward_fill

0    10.0
1    20.0
2    45.0
3    45.0
4    89.0
5    89.0
6    89.0
dtype: float64