<a href="https://colab.research.google.com/github/Shamma-Samiha/BAsic-Python-Topics/blob/main/Crime_Data_Missing_Value_Handling.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [4]:
import pandas as pd
import numpy as np

In [5]:
from google.colab import files
uploaded = files.upload()


Saving crime_and_incarceration.csv to crime_and_incarceration (1).csv


In [6]:
df = pd.read_csv('crime_and_incarceration.csv')  # Use the exact name of the uploaded file
df.head()

Unnamed: 0,jurisdiction,includes_jails,year,prisoner_count,crime_reporting_change,crimes_estimated,state_population,violent_crime_total,murder_manslaughter,rape_legacy,rape_revised,robbery,agg_assault,property_crime_total,burglary,larceny,vehicle_theft,Unnamed: 17
0,FEDERAL,False,2001,149852,,,,,,,,,,,,,,
1,ALABAMA,False,2001,24741,False,False,4468912.0,19582.0,379.0,1369.0,,5584.0,12250.0,173253.0,40642.0,119992.0,12619.0,
2,ALASKA,True,2001,4570,False,False,633630.0,3735.0,39.0,501.0,,514.0,2681.0,23160.0,3847.0,16695.0,2618.0,
3,ARIZONA,False,2001,27710,False,False,5306966.0,28675.0,400.0,1518.0,,8868.0,17889.0,293874.0,54821.0,186850.0,52203.0,
4,ARKANSAS,False,2001,11489,False,False,2694698.0,12190.0,148.0,892.0,,2181.0,8969.0,99106.0,22196.0,69590.0,7320.0,


# Checking for Missing Values:

In [7]:
# Check missing values in each column
print("📊 Missing values count per column:")
missing_counts = df.isnull().sum()
missing_counts[missing_counts > 0]


📊 Missing values count per column:


Unnamed: 0,0
crime_reporting_change,17
crimes_estimated,17
state_population,17
violent_crime_total,17
murder_manslaughter,17
rape_legacy,67
rape_revised,617
robbery,17
agg_assault,17
property_crime_total,17


# Handling Missing Values

**Mean Imputation: (For numerical stability)**

In [8]:
# Impute numerical columns with mean
df_mean_imputed = df.copy()
for col in df_mean_imputed.select_dtypes(include=[np.number]).columns:
    df_mean_imputed[col].fillna(df_mean_imputed[col].mean(), inplace=True)

print("✅ Mean imputation applied to numeric columns.")


✅ Mean imputation applied to numeric columns.


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_mean_imputed[col].fillna(df_mean_imputed[col].mean(), inplace=True)


**Median Imputation: (For robustness against outliers)**

In [9]:
# Impute numerical columns with median
df_median_imputed = df.copy()
for col in df_median_imputed.select_dtypes(include=[np.number]).columns:
    df_median_imputed[col].fillna(df_median_imputed[col].median(), inplace=True)

print("✅ Median imputation applied to numeric columns.")


✅ Median imputation applied to numeric columns.


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_median_imputed[col].fillna(df_median_imputed[col].median(), inplace=True)


**Mode Imputation: (Suitable for both categorical and skewed data)**

In [10]:
# Impute all columns (including categorical) with mode
df_mode_imputed = df.copy()
for col in df_mode_imputed.columns:
    mode_val = df_mode_imputed[col].mode()
    if not mode_val.empty:
        df_mode_imputed[col].fillna(mode_val[0], inplace=True)

print("✅ Mode imputation applied to all columns.")


✅ Mode imputation applied to all columns.


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_mode_imputed[col].fillna(mode_val[0], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  df_mode_imputed[col].fillna(mode_val[0], inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we ar

**Forward Fill & Backward Fill:**

In [11]:
# Forward fill (fills missing value with previous row value)
df_forward_filled = df.copy()
df_forward_filled.fillna(method='ffill', inplace=True)

# Backward fill (fills missing value with next row value)
df_backward_filled = df.copy()
df_backward_filled.fillna(method='bfill', inplace=True)

print("✅ Forward fill and backward fill methods applied.")


✅ Forward fill and backward fill methods applied.


  df_forward_filled.fillna(method='ffill', inplace=True)
  df_backward_filled.fillna(method='bfill', inplace=True)
  df_backward_filled.fillna(method='bfill', inplace=True)


# Compare Missing Value Handling (Before vs After)

In [13]:
def missing_summary(df, label):
    print(f"\n🔎 Missing summary for {label}:")
    display(df.isnull().sum()[df.isnull().sum() > 0])

# Example:
missing_summary(df, "Original Data")
missing_summary(df_mean_imputed, "Mean Imputed")
missing_summary(df_median_imputed, "Median Imputed")
missing_summary(df_mode_imputed, "Mode Imputed")
missing_summary(df_forward_filled, "Forward Filled")
missing_summary(df_backward_filled, "Backward Filled")


🔎 Missing summary for Original Data:


Unnamed: 0,0
crime_reporting_change,17
crimes_estimated,17
state_population,17
violent_crime_total,17
murder_manslaughter,17
rape_legacy,67
rape_revised,617
robbery,17
agg_assault,17
property_crime_total,17



🔎 Missing summary for Mean Imputed:


Unnamed: 0,0
crime_reporting_change,17
crimes_estimated,17
Unnamed: 17,816



🔎 Missing summary for Median Imputed:


Unnamed: 0,0
crime_reporting_change,17
crimes_estimated,17
Unnamed: 17,816



🔎 Missing summary for Mode Imputed:


Unnamed: 0,0
Unnamed: 17,816



🔎 Missing summary for Forward Filled:


Unnamed: 0,0
crime_reporting_change,1
crimes_estimated,1
state_population,1
violent_crime_total,1
murder_manslaughter,1
rape_legacy,1
rape_revised,613
robbery,1
agg_assault,1
property_crime_total,1



🔎 Missing summary for Backward Filled:


Unnamed: 0,0
rape_legacy,51
Unnamed: 17,816
