## `fillna()` method

In Pandas, the `fillna()` method is used to fill missing or NaN (Not a Number) values in a DataFrame with specified values. Missing data is a common occurrence in datasets, and the `fillna()` method provides a way to handle these missing values by replacing them with appropriate values based on different strategies. Here's all about the `fillna()` method in Pandas:

**Syntax**:
```python
DataFrame.fillna(value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)
```

**Parameters**:
- `value`: The value or dictionary of values to use for filling missing data. If a single value is provided, all missing values in the DataFrame will be filled with this value. If a dictionary is provided, it can be used to specify different values for different columns.
- `method`: Specifies the method for filling missing values. Options are:
  - `'pad'` or `'ffill'`: Forward fill, which fills missing values with the previous non-missing value in the column.
  - `'backfill'` or `'bfill'`: Backward fill, which fills missing values with the next non-missing value in the column.
- `axis`: Determines the axis along which the `fillna()` method will fill missing values. Use `axis=0` to fill along rows (default), and `axis=1` to fill along columns.
- `inplace`: If True, the DataFrame will be modified in place, and the method will return `None`. If False (default), the method will return a new DataFrame with missing values filled.
- `limit`: The maximum number of consecutive missing values to fill. If provided, `fillna()` will only fill up to this limit.
- `downcast`: Specifies a dictionary of column names and data types to downcast the filled values. This is used for memory optimization.

**Examples**:
Assuming we have the following DataFrame with missing values (NaN):

```python
import pandas as pd

data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [9, 10, None, None]
}

df = pd.DataFrame(data)
```

```
     A    B     C
0  1.0  5.0   9.0
1  2.0  NaN  10.0
2  NaN  NaN   NaN
3  4.0  8.0   NaN
```

**1. Fill Missing Values with a Specific Value**
```python
# Fill all missing values with a specific value (e.g., 0)
df_filled = df.fillna(0)
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
1  2.0  0.0  10.0
2  0.0  0.0   0.0
3  4.0  8.0   0.0
```

**2. Forward Fill (Pad) Missing Values Along Columns**
```python
# Forward fill missing values along columns
df_filled = df.fillna(method='ffill', axis=0)
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
1  2.0  5.0  10.0
2  2.0  5.0  10.0
3  4.0  8.0  10.0
```

**3. Backward Fill (Backfill) Missing Values Along Rows**
```python
# Backward fill missing values along rows
df_filled = df.fillna(method='bfill', axis=1)
```

Output:
```
     A     B     C
0  1.0   5.0   9.0
1  2.0  10.0  10.0
2  NaN   NaN   NaN
3  4.0   8.0   NaN
```

**4. Fill Missing Values with Column-Specific Values**
```python
# Fill missing values with specific values for each column
values_to_fill = {'A': 100, 'B': 200, 'C': 300}
df_filled = df.fillna(value=values_to_fill)
```

Output:
```
       A      B      C
0    1.0    5.0    9.0
1    2.0  200.0   10.0
2  100.0  200.0  300.0
3    4.0    8.0  300.0
```

**5. Fill at Most 1 Consecutive Missing Value**
```python
# Fill at most 1 consecutive missing value
df_filled = df.fillna(method='ffill', limit=1)
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
1  2.0  5.

In [3]:
import pandas as pd

In [5]:
# data with missing values
data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [9, 10, None, None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,10.0
2,,,
3,4.0,8.0,


In [6]:
# 1. Fill all missing values with a specific value (e.g., 0)
df_filled = df.fillna(0)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,0.0,10.0
2,0.0,0.0,0.0
3,4.0,8.0,0.0


In [7]:
# 2. Forward Fill (Pad) Missing Values Along Columns

df_filled = df.fillna(method='ffill', axis=0)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,5.0,10.0
2,2.0,5.0,10.0
3,4.0,8.0,10.0


In [8]:
# 3. Backward Fill (Backfill) Missing Values Along Rows

df_filled = df.fillna(method='bfill', axis=1)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,10.0,10.0
2,,,
3,4.0,8.0,


Let's explain the above code for the example that uses backward fill (backfill) to fill missing values along rows:

```python

data = {
    'A': [1, 2, None, None],
    'B': [5, None, None, None],
    'C': [9, None, None, None]
}

df = pd.DataFrame(data)

# Backward fill missing values along rows
df_filled = df.fillna(method='bfill', axis=1)
```

Output:
```
     A     B    C
0  1.0   5.0  9.0
1  2.0  None  None
2  NaN  None  None
3  NaN  None  None
```

Explanation:
1. We start by creating a DataFrame (`df`) with missing values (represented as `None` in this example) in columns 'A', 'B', and 'C'.

2. Next, we use the `fillna()` method with the parameter `method='bfill'` to perform a backward fill along rows (`axis=1`). This means that missing values in each row will be filled with the value from the next non-missing cell in the same row.

3. Let's go through the filling process row by row:

   - Row 0: There are no missing values, so nothing is filled.
   - Row 1: The first missing value is in column 'B'. The value from the next non-missing cell in the same row is '5.0', so this value is used to fill the missing value.
   - Row 2: The first missing value is in column 'A'. However, there are no non-missing values in the same row after the missing value, so the cell remains unfilled (still `None`).
   - Row 3: There are missing values in columns 'A', 'B', and 'C'. Similar to Row 2, there are no non-missing values in the same row after the missing values, so the cells remain unfilled.

4. As a result, the `df_filled` DataFrame is updated with the backward-filled values. The missing values are filled with the values from the next non-missing cells in the same row, but for Rows 2 and 3, where there are no non-missing values after the missing values, the cells remain unfilled (still `None`).

Keep in mind that backward fill (`bfill`) fills values from the next non-missing cells in the same row. If the first cell in a row is missing, it won't be filled because there is no data available in the same row before it. Similarly, if the last cell in a row is missing, it won't be filled because there is no data available in the same row after it.

In [9]:
data = {
    'A': [1, 2, None, None],
    'B': [5, None, None, None],
    'C': [9, None, None, None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,
2,,,
3,,,


In [10]:
# Backward fill missing values along rows
df_filled = df.fillna(method='bfill', axis=1)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,
2,,,
3,,,


In [11]:
data = {
    'A': [1, 2, None, None],
    'B': [5, None, 4, None],
    'C': [9, None, None, None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,
2,,4.0,
3,,,


In [12]:
# Backward fill missing values along rows
df_filled = df.fillna(method='bfill', axis=1)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,
2,4.0,4.0,
3,,,


In [15]:
# added A and C last values
data = {
    'A': [1, 2, None, 3],
    'B': [5, None, 4, None],
    'C': [9, None, None, 7]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,
2,,4.0,
3,3.0,,7.0


In [18]:
# Backward fill missing values along rows
df_filled = df.fillna(method='bfill', axis=1)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,
2,4.0,4.0,
3,3.0,7.0,7.0


#### 4. Fill Missing Values with Column-Specific Values


In [25]:
data = {
    'A': [1, 2, None, None],
    'B': [5, None, None, None],
    'C': [9, None, None, None]
}

df1 = pd.DataFrame(data)
df1

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,
2,,,
3,,,


In [26]:
# Fill missing values with specific values for each column

values_to_fill = {'A': 100, 'B': 200, 'C': 300}
df_filled = df1.fillna(value=values_to_fill)
values_to_fill

{'A': 100, 'B': 200, 'C': 300}

#### 5. Fill at Most 1 Consecutive Missing Value

In [28]:
# Fill at most 1 consecutive missing value
df_filled = df1.fillna(method='ffill', limit=1)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,5.0,9.0
2,2.0,,
3,,,


In [31]:
# Fill at most 1 consecutive missing value
df_filled = df1.fillna(method='ffill', limit=2)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,5.0,9.0
2,2.0,5.0,9.0
3,2.0,,


###  Let's continue with more examples of using the `fillna()` method in Pandas:

Assuming we have the following DataFrame with missing values (NaN):

```python
import pandas as pd

data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [9, 10, None, None]
}

df = pd.DataFrame(data)
```

```
     A    B     C
0  1.0  5.0   9.0
1  2.0  NaN  10.0
2  NaN  NaN   NaN
3  4.0  8.0   NaN
```

**6. Fill Missing Values with Column Means**
```python
# Fill missing values with column means
column_means = df.mean()
df_filled = df.fillna(value=column_means)
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
1  2.0  6.5  10.0
2  2.333333  6.5  9.5
3  4.0  8.0   9.5
```

**7. Fill Missing Values with Column Medians**
```python
# Fill missing values with column medians
column_medians = df.median()
df_filled = df.fillna(value=column_medians)
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
1  2.0  6.5  10.0
2  2.0  6.5   9.5
3  4.0  8.0   9.5
```

**8. Fill Missing Values with Forward Fill and Backward Fill**
```python
# Use a combination of forward fill and backward fill
df_filled = df.fillna(method='ffill').fillna(method='bfill')
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
1  2.0  5.0  10.0
2  2.0  5.0  10.0
3  4.0  8.0   NaN
```

**9. Fill Missing Values with Linear Interpolation**
```python
# Fill missing values using linear interpolation
df_filled = df.interpolate()
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
1  2.0  6.0  10.0
2  3.0  7.0  10.0
3  4.0  8.0  10.0
```

**10. Modify DataFrame Inplace**
```python
# Fill missing values in the original DataFrame
df.fillna(0, inplace=True)
```

Output:
```
     A    B     C
0  1.0  5.0   9.0
1  2.0  0.0  10.0
2  0.0  0.0   0.0
3  4.0  8.0   0.0
```

These examples demonstrate different strategies for filling missing values in a DataFrame using the `fillna()` method. You can choose the appropriate method based on your data and analysis needs. The `fillna()` method is a powerful tool for data cleaning and preparation, ensuring that your DataFrame is ready for further analysis and modeling tasks.

In [32]:
data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [9, 10, None, None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,10.0
2,,,
3,4.0,8.0,


#### 6. Fill Missing Values with Column Means

In [33]:
# Fill missing values with column means
column_means = df.mean()
df_filled = df.fillna(value=column_means)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,6.5,10.0
2,2.333333,6.5,9.5
3,4.0,8.0,9.5


#### 7. Fill Missing Values with Column Medians

In [34]:
# Fill missing values with column medians
column_medians = df.median()
df_filled = df.fillna(value=column_medians)
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,6.5,10.0
2,2.0,6.5,9.5
3,4.0,8.0,9.5


#### 8. Fill Missing Values with Forward Fill and Backward Fill


In [36]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,10.0
2,,,
3,4.0,8.0,


In [35]:
# Use a combination of forward fill and backward fill
df_filled = df.fillna(method='ffill').fillna(method='bfill')
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,5.0,10.0
2,2.0,5.0,10.0
3,4.0,8.0,10.0


#### 9. Fill Missing Values with Linear Interpolation

In [37]:
# Fill missing values using linear interpolation
df_filled = df.interpolate()
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,6.0,10.0
2,3.0,7.0,10.0
3,4.0,8.0,10.0


In [38]:
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,10.0
2,,,
3,4.0,8.0,


In [39]:
data = {
    'A': [1, 2, None, 4],
    'B': [5, None, None, 8],
    'C': [9, 10, None, None]
}

df = pd.DataFrame(data)
df

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,,10.0
2,,,
3,4.0,8.0,


In [40]:
# Fill missing values using linear interpolation
df_filled = df.interpolate()
df_filled

Unnamed: 0,A,B,C
0,1.0,5.0,9.0
1,2.0,6.0,10.0
2,3.0,7.0,10.0
3,4.0,8.0,10.0


#### Let's explain the code that uses linear interpolation to fill missing values in the DataFrame:

Explanation:
1. We start by creating a DataFrame (`df`) with missing values (represented as `None` in this example) in columns 'A', 'B', and 'C'.

2. Next, we use the `interpolate()` method, which performs linear interpolation to fill missing values in the DataFrame. Linear interpolation estimates the missing values based on the surrounding non-missing values.

3. The interpolation process takes place row by row:

   - Row 0: There is one missing value in column 'A'. The `interpolate()` method uses linear interpolation to estimate the missing value based on the values in the same row before and after the missing value. In this case, the value of '1.0' is filled in column 'A'.
   - Row 1: There are two missing values in columns 'B' and 'C'. Linear interpolation is used for each column independently. For column 'B', the value is estimated based on '5.0' and '8.0', resulting in '6.0'. For column 'C', the value is estimated based on '9.0' and '10.0', resulting in '10.0'.
   - Row 2: There is one missing value in column 'A'. The value is estimated based on '2.0' and '4.0', resulting in '3.0'.
   - Row 3: There are two missing values in columns 'B' and 'C'. Linear interpolation is used for each column independently. For column 'B', the value is estimated based on '6.0' and '8.0', resulting in '7.0'. For column 'C', the value is estimated based on '10.0' and '10.0' (the last non-missing value), resulting in '10.0'.

4. As a result, the `df_filled` DataFrame is updated with the linearly interpolated values, and the missing values are filled with the estimated values based on linear interpolation.

Linear interpolation is useful for filling missing values when the data shows a linear trend. However, keep in mind that linear interpolation may not be appropriate for all types of data and may not accurately capture more complex relationships between data points. In such cases, other interpolation methods, such as cubic or polynomial interpolation, might be more suitable.