# Handling Missing Data

# 1. Identifying Missing Data 
Before handling missing data, you need to identify it. You can use:

# isnull():
Returns a DataFrame of the same shape as the original, with True for missing values and False for non-missing values.

# notnull(): 
Returns the inverse of isnull().

# info(): 
Gives a summary of the DataFrame, including the count of non-null entries.

# Example DataFrame
Let's consider a dataset that contains information about students, including their names, ages, scores, and grades. Some values are missing.

In [None]:
import pandas as pd
import numpy as np

# Example DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva'],
    'Age': [23, np.nan, 22, 24, np.nan],
    'Score': [85, 90, np.nan, 75, 95],
    'Grade': ['A', 'A', 'B', np.nan, 'A']
}

df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)


In [None]:
# Identifying missing data
print("\nMissing Data (True for missing values):")
print(df.isnull())

# 2. Dropping Missing Values
You can remove rows or columns with missing data using:

# dropna(): 
Removes missing values based on specific conditions:

# axis=0:
Drops rows with missing values.

# axis=1: 
Drops columns with missing values.

# thresh:
Requires a certain number of non-null values to keep the row or column.

In [None]:
# 2. Dropping Missing Values
# Drop rows with any missing values
df_dropped_rows = df.dropna()

# Drop columns with any missing values
df_dropped_columns = df.dropna(axis=1)

# Drop rows with at least one non-null value
df_thresh = df.dropna(thresh=2)

# 3. Filling Missing Values
Instead of dropping missing values, you can fill them with appropriate values using:

fillna(): Allows you to fill missing values with a specified value, the mean, median, or mode, or use a method like forward filling (ffill) or backward filling (bfill).

In [None]:
# Fill missing values with a constant
df_filled_constant = df.fillna(0)

# Fill with the mean of the column
df_filled_mean = df.fillna(df.mean())

# Forward fill
df_ffill = df.fillna(method='ffill')

# Backward fill
df_bfill = df.fillna(method='bfill')


# 1. Forward Fill (ffill)
Forward fill propagates the last valid (non-missing) value forward to the next missing value(s). In other words, it replaces each missing value with the previous non-null value in the same column.

Use case: Forward fill is helpful when you want to assume that the previous known value should continue until a new value is encountered (e.g., filling in daily stock prices by carrying the last known price forward if some dates are missing).

In [None]:
data = {'Temperature': [30, np.nan, np.nan, 28, 27, np.nan, 26]}
df = pd.DataFrame(data)

In [None]:
# Forward fill
df_ffill = df.fillna(method='ffill')
print(df_ffill)

# 2. Back Fill (bfill)
Back fill fills missing values by propagating the next valid (non-missing) value backward. It replaces each missing value with the next non-null value in the same column.

Use case: Back fill is helpful when you want to assume that the next available value was already applicable during the period of missing values (e.g., filling missing inventory counts by using the next available inventory count as an estimate).

In [None]:
# Backward fill
df_bfill = df.fillna(method='bfill')
print(df_bfill)

# 4. Interpolating Missing Values
You can also use interpolation to estimate missing values based on existing data:

# interpolate(): 
Fills missing values using linear interpolation or other methods.

In [None]:
df_interpolated = df.interpolate()

# 5. Replacing Missing Values
You can replace missing values with other values using:

# replace(): 
This method can be useful when you want to replace specific values, including NaN.

In [None]:
df_replaced = df.replace({None: 0})  # Replace None with 0

# 6. Custom Functions
You can define a custom function to fill missing values based on specific logic. For instance, if the score is missing, you might want to set it based on a rul

In [None]:
def fill_custom_score(row):
    if pd.isnull(row['Score']):
        return 80  # Custom logic for filling missing scores
    return row['Score']

df['Score'] = df.apply(fill_custom_score, axis=1)
print("\nDataFrame after applying custom function to fill Score:")
print(df)
