# Goal
This ipynb file is supposed to have deep analysis of district wise crimes commited in order to predict the crime hotspots of India in the future and also its dependency on socioeconomic factors.

### Problem Statement
Prediction of crime, prognosis and patrol route map forecasting:


Predict crime hotspots by analysing historical crime data, socio-economic factors, and
environmental variables. Also generate dynamic patrol routes by considering predicted crime
hotspots, traffic conditons, and priority areas.

#### Data Loading and Joining
I will try to join datasets for easier analysis

In [48]:
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots
import json
import requests

I need to analyse weather the district_wise_crimes_committed is aggregrate of crimes against SC, ST and children so we will import these 4 datasets for 2001_2012 and see if it is . Otherwise we will assume that the first one is crime against general and OBC population

In [2]:
dis_w_cr_co_1_12_IPC = pd.read_csv(r"AggregrateData/01_District_wise_crimes_committed_IPC_2001_2012.csv")
dis_w_cr_co_13_IPC = pd.read_csv(r"D:\Data Science\Intern\Code\AggregrateData\01_District_wise_crimes_committed_IPC_2013.csv")
dis_w_cr_co_14_IPC = pd.read_csv(r"D:\Data Science\Intern\Code\AggregrateData\01_District_wise_crimes_committed_IPC_2014.csv")
dis_w_cr_co_1_12_SC = pd.read_csv(r"AggregrateData/02_01_District_wise_crimes_committed_against_SC_2001_2012.csv")
dis_w_cr_co_13_SC = pd.read_csv(r"D:\Data Science\Intern\Code\AggregrateData\02_01_District_wise_crimes_committed_against_SC_2013.csv")
dis_w_cr_co_14_SC = pd.read_csv(r"D:\Data Science\Intern\Code\AggregrateData\02_01_District_wise_crimes_committed_against_SC_2014.csv")
dis_w_cr_co_1_12_ST = pd.read_csv(r"AggregrateData/02_District_wise_crimes_committed_against_ST_2001_2012.csv")
dis_w_cr_co_13_ST = pd.read_csv(r"D:\Data Science\Intern\Code\AggregrateData\02_District_wise_crimes_committed_against_ST_2013.csv")
dis_w_cr_co_14_ST = pd.read_csv(r"D:\Data Science\Intern\Code\AggregrateData\02_District_wise_crimes_committed_against_ST_2014.csv")
dis_w_cr_co_1_12_children = pd.read_csv(r"AggregrateData/03_District_wise_crimes_committed_against_children_2001_2012.csv")
dis_w_cr_co_13_children = pd.read_csv(r"D:\Data Science\Intern\Code\AggregrateData\03_District_wise_crimes_committed_against_children_2013.csv")

I will take the murder count of year 2001 district Adilabad and state Andhra Pradesh to come to my conclusion for the above problem

In [3]:
if(dis_w_cr_co_1_12_IPC.at[0,"MURDER"] == dis_w_cr_co_1_12_SC.at[0,"Murder"] + dis_w_cr_co_1_12_ST.at[0,"Murder"] + dis_w_cr_co_1_12_children.at[0,"Murder"]):
    print("It is aggregrate of 3")
else:
    print("It is count of crimes against general population")    

It is count of crimes against general population


Since the datasets have different number of columns we will start by aggregrating based crime type

In [4]:
# Following are the common columns that we need to change
column_mapping = {
    'STATE/UT': 'States/UTs',
    'DISTRICT': 'District',
    'YEAR': 'Year',
    'MURDER': 'Murder',
    'ATTEMPT TO MURDER': 'Attempt to commit Murder',
    'CULPABLE HOMICIDE NOT AMOUNTING TO MURDER': 'Culpable Homicide not amounting to Murder',
    'RAPE': 'Rape',
    'CUSTODIAL RAPE': 'Custodial Rape',
    'OTHER RAPE': 'Rape other than Custodial',
    'KIDNAPPING & ABDUCTION': 'Kidnapping & Abduction_Total',
    'DACOITY': 'Dacoity',
    'PREPARATION AND ASSEMBLY FOR DACOITY': 'Making Preparation and Assembly for committing Dacoity',
    'ROBBERY': 'Robbery',
    'BURGLARY': 'Criminal Trespass/Burglary',
    'THEFT': 'Theft',
    'AUTO THEFT': 'Auto Theft',
    'OTHER THEFT': 'Other Thefts',
    'RIOTS': 'Unlawful Assembly',
    'CRIMINAL BREACH OF TRUST': 'Criminal Breach of Trust',
    'CHEATING': 'Cheating',
    'COUNTERFIETING': 'Forgery',
    'ARSON': 'Arson',
    'HURT/GREVIOUS HURT': 'Grievous Hurt',
    'DOWRY DEATHS': 'Dowry Deaths',
    'ASSAULT ON WOMEN WITH INTENT TO OUTRAGE HER MODESTY': 'Assault on Women with intent to outrage her Modesty',
    'INSULT TO MODESTY OF WOMEN': 'Insult to the Modesty of Women',
    'CRUELTY BY HUSBAND OR HIS RELATIVES': 'Cruelty by Husband or his Relatives',
    'IMPORTATION OF GIRLS FROM FOREIGN COUNTRIES': 'Importation of Girls from Foreign Country',
    'CAUSING DEATH BY NEGLIGENCE': 'Causing Death by Negligence',
    'OTHER IPC CRIMES': 'Other IPC crimes',
    'TOTAL IPC CRIMES': 'Total Cognizable IPC crimes'
}

swapped_column_mapping = {v: k for k, v in column_mapping.items()}

# Changing the relevant and column names in IPC crimes 2014
dis_w_cr_co_14_IPC.rename(columns= swapped_column_mapping, inplace = True)

In [5]:
dis_w_cr_co_1_12_IPC.columns

Index(['STATE/UT', 'DISTRICT', 'YEAR', 'MURDER', 'ATTEMPT TO MURDER',
       'CULPABLE HOMICIDE NOT AMOUNTING TO MURDER', 'RAPE', 'CUSTODIAL RAPE',
       'OTHER RAPE', 'KIDNAPPING & ABDUCTION',
       'KIDNAPPING AND ABDUCTION OF WOMEN AND GIRLS',
       'KIDNAPPING AND ABDUCTION OF OTHERS', 'DACOITY',
       'PREPARATION AND ASSEMBLY FOR DACOITY', 'ROBBERY', 'BURGLARY', 'THEFT',
       'AUTO THEFT', 'OTHER THEFT', 'RIOTS', 'CRIMINAL BREACH OF TRUST',
       'CHEATING', 'COUNTERFIETING', 'ARSON', 'HURT/GREVIOUS HURT',
       'DOWRY DEATHS', 'ASSAULT ON WOMEN WITH INTENT TO OUTRAGE HER MODESTY',
       'INSULT TO MODESTY OF WOMEN', 'CRUELTY BY HUSBAND OR HIS RELATIVES',
       'IMPORTATION OF GIRLS FROM FOREIGN COUNTRIES',
       'CAUSING DEATH BY NEGLIGENCE', 'OTHER IPC CRIMES', 'TOTAL IPC CRIMES'],
      dtype='object')

In [6]:
# now keeping the common features in all the 3 datasets
columns_swapped_list = list(column_mapping.keys())
dis_w_cr_co_1_12_IPC = dis_w_cr_co_1_12_IPC[columns_swapped_list]
dis_w_cr_co_13_IPC = dis_w_cr_co_13_IPC[columns_swapped_list]
dis_w_cr_co_14_IPC = dis_w_cr_co_14_IPC[columns_swapped_list]

In [7]:
dis_w_cr_co_IPC = pd.concat([dis_w_cr_co_1_12_IPC, dis_w_cr_co_13_IPC, dis_w_cr_co_14_IPC], ignore_index=True)

In [8]:
dis_w_cr_co_1_12_SC.columns

Index(['STATE/UT', 'DISTRICT', 'Year', 'Murder', 'Rape',
       'Kidnapping and Abduction', 'Dacoity', 'Robbery', 'Arson', 'Hurt',
       'Prevention of atrocities (POA) Act',
       'Protection of Civil Rights (PCR) Act', 'Other Crimes Against SCs'],
      dtype='object')

In [9]:
dis_w_cr_co_13_SC.columns

Index(['STATE/UT', 'DISTRICT', 'Year', 'Murder', 'Rape',
       'Kidnapping and Abduction', 'Dacoity', 'Robbery', 'Arson', 'Hurt',
       'Protection of Civil Rights (PCR) Act',
       'Prevention of atrocities (POA) Act', 'Other Crimes Against SCs'],
      dtype='object')

In [10]:
dis_w_cr_co_14_SC.columns

Index(['States/UTs', 'District', 'Year',
       'Protection of Civil Rights Act, 1955', 'POA_Murder',
       'POA_Attempt to commit Murder', 'POA_Rape',
       'POA_Attempt to commit Rape',
       'POA_Assault on women with intent to outrage her Modesty',
       'POA_Sexual Harassment', 'POA_Assault on women with intent to Disrobe',
       'POA_Voyeurism', 'POA_Stalking', 'POA_Other Sexual Harassment',
       'POA_Insult to the Modesty of women',
       'POA_Kidnapping & Abduction_GrandTotal',
       'POA_Kidnaping & Abduction_Total',
       'POA_Kidnaping & Abduction in order to Murder',
       'POA_Kidnapping for Ransom',
       'POA_Kidnapping & Abduction of Women to compel her for marriage',
       'POA_Other Kidnapping', 'POA_Dacoity', 'POA_Dacoity with Murder',
       'POA_Other Dacoity', 'POA_Robbery', 'POA_Arson', 'POA_Grievous Hurt',
       'POA_Hurt', 'POA_Acid attack', 'POA_Attempt to Acid Attack',
       'POA_Riots', 'POA_Other IPC crimes',
       'POA_SC / ST (Prevention o

In [11]:
dis_w_cr_co_13_SC["Total crimes against SCs"] = dis_w_cr_co_13_SC.drop(columns = ['STATE/UT', 'DISTRICT', 'Year']).sum(axis = 1)
dis_w_cr_co_1_12_SC["Total crimes against SCs"] = dis_w_cr_co_1_12_SC.drop(columns = ['STATE/UT', 'DISTRICT', 'Year']).sum(axis = 1)

In [12]:
# SC column mapping
dis_w_cr_co_14_SC.rename(columns = {'States/UTs' : 'STATE/UT',
                                    'District' : 'DISTRICT',
                                    "Protection of Civil Rights Act, 1955":'Protection of Civil Rights (PCR) Act'},                                    
                                    inplace=True)

In [13]:
# Adding up features in the 2014 dataset accordingly for it to match the features of 2001 to 2013 datasets
feature_mapping = {
    'Murder': ['POA_Murder', 'IPC_Murder'],
    'Rape': ['POA_Rape', 'IPC_Rape'],
    'Kidnapping and Abduction': ['POA_Kidnapping & Abduction_GrandTotal', 'IPC_Kidnaping & Abduction'],
    'Dacoity': ['POA_Dacoity', 'IPC_Dacoity'],
    'Robbery': ['POA_Robbery', 'IPC_Robbery'],
    'Arson': ['POA_Arson', 'IPC_Arson'],
    'Hurt': ['POA_Grievous Hurt', 'IPC_Grievous Hurt'],
    'Prevention of atrocities (POA) Act': ['POA_Grievous Hurt', 'POA_Arson', 'POA_Robbery', 'POA_Dacoity', 'POA_Rape', 'POA_Murder', 'POA_Kidnapping & Abduction_GrandTotal','POA_Assault on women with intent to outrage her Modesty', 'POA_Insult to the Modesty of women', 'POA_Sexual Harassment', 'POA_Assault on women with intent to Disrobe', 'POA_Voyeurism', 'POA_Stalking', 'POA_Other Sexual Harassment'],
    'Other Crimes Against SCs': ['Other SLL Crime against SCs', 'Manual Scavengers and Construction of Dry Latrines (P) Act, 1993']
}

for features_name, related_features in feature_mapping.items():
    dis_w_cr_co_14_SC[features_name] = dis_w_cr_co_14_SC[related_features].sum(axis = 1)


In [14]:
important_features_SC = dis_w_cr_co_1_12_SC.columns
dis_w_cr_co_14_SC = dis_w_cr_co_14_SC[important_features_SC]
dis_w_cr_co_14_SC.head()

Unnamed: 0,STATE/UT,DISTRICT,Year,Murder,Rape,Kidnapping and Abduction,Dacoity,Robbery,Arson,Hurt,Prevention of atrocities (POA) Act,Protection of Civil Rights (PCR) Act,Other Crimes Against SCs,Total crimes against SCs
0,Andhra Pradesh,Anantapur,2014,3,1,0,0,0,0,0,14,0,0,170
1,Andhra Pradesh,Chittoor,2014,2,1,1,0,0,3,0,18,0,0,118
2,Andhra Pradesh,Cuddapah,2014,4,5,2,0,0,0,0,17,0,0,262
3,Andhra Pradesh,East Godavari,2014,0,4,0,0,0,0,25,74,6,0,178
4,Andhra Pradesh,Guntakal Railway,2014,0,0,0,0,0,0,0,0,0,0,0


In [15]:
dis_w_cr_co_SC_final = pd.concat([dis_w_cr_co_1_12_SC, dis_w_cr_co_13_SC, dis_w_cr_co_14_SC], ignore_index=True)

Similarly for district wise crimes against STs

In [16]:
dis_w_cr_co_13_ST.columns

Index(['STATE/UT', 'DISTRICT', 'Year', 'Murder', 'Rape',
       'Kidnapping Abduction', 'Dacoity', 'Robbery', 'Arson', 'Hurt',
       'Protection of Civil Rights (PCR) Act',
       'Prevention of atrocities (POA) Act', 'Other Crimes Against STs'],
      dtype='object')

In [17]:
dis_w_cr_co_14_ST.columns

Index(['States/UTs', 'District', 'Year',
       'Protection of Civil Rights Act, 1955', 'POA_Murder',
       'POA_Attempt to commit Murder', 'POA_Rape',
       'POA_Attempt to commit Rape',
       'POA_Assault on women with intent to outrage her Modesty',
       'POA_Sexual Harassment', 'POA_Assault on women with intent to Disrobe',
       'POA_Voyeurism', 'POA_Stalking', 'POA_Other Sexual Harassment',
       'POA_Insult to the Modesty of women',
       'POA_Kidnapping & Abduction_GrandTotal',
       'POA_Kidnaping & Abduction_Total',
       'POA_Kidnaping & Abduction in order to Murder',
       'POA_Kidnapping for Ransom',
       'POA_Kidnapping & Abduction of Women to compel her for marriage',
       'POA_Other Kidnapping', 'POA_Dacoity', 'POA_Dacoity with Murder',
       'POA_Other Dacoity', 'POA_Robbery', 'POA_Arson', 'POA_Grievous Hurt',
       'POA_Hurt', 'POA_Acid attack', 'POA_Attempt to Acid Attack',
       'POA_Riots', 'POA_Other IPC crimes',
       'POA_SC / ST (Prevention o

In [18]:
dis_w_cr_co_13_ST["Total crimes against STs"] = dis_w_cr_co_13_ST.drop(columns = ['STATE/UT', 'DISTRICT', 'Year']).sum(axis=1)
dis_w_cr_co_1_12_ST["Total crimes against STs"] = dis_w_cr_co_1_12_ST.drop(columns = ['STATE/UT', 'DISTRICT', 'Year']).sum(axis=1)

In [19]:
dis_w_cr_co_14_ST.rename(columns = {'States/UTs' : 'STATE/UT',
                                    'District' : 'DISTRICT',
                                    "Protection of Civil Rights Act, 1955":'Protection of Civil Rights (PCR) Act'},                                    
                                    inplace=True)

In [20]:
# Doing the same as we did for the SC dataset
feature_mapping = {
    'Murder': ['POA_Murder', 'IPC_Murder'],
    'Rape': ['POA_Rape', 'IPC_Rape'],
    'Kidnapping Abduction': ['POA_Kidnapping & Abduction_GrandTotal', 'IPC_Kidnaping & Abduction'],
    'Dacoity': ['POA_Dacoity', 'IPC_Dacoity'],
    'Robbery': ['POA_Robbery', 'IPC_Robbery'],
    'Arson': ['POA_Arson', 'IPC_Arson'],
    'Hurt': ['POA_Grievous Hurt', 'IPC_Grievous Hurt'],
    'Prevention of atrocities (POA) Act': [
        'POA_Assault on women with intent to outrage her Modesty',
        'POA_Insult to the Modesty of women',
        'POA_Sexual Harassment',
        'POA_Assault on women with intent to Disrobe',
        'POA_Voyeurism',
        'POA_Stalking',
        'POA_Other Sexual Harassment', 'POA_Murder', 'POA_Rape', 'POA_Kidnapping & Abduction_GrandTotal',
        'POA_Dacoity', 'POA_Robbery', 'POA_Arson', 'POA_Grievous Hurt'
    ],
    'Other Crimes Against STs': [
        'Other SLL Crime against STs',
        'Total IPC Crimes against STs',
        'Manual Scavengers and Construction of Dry Latrines (P) Act, 1993'
    ],
    'Total IPC Crimes against STs': ['IPC_Other IPC crimes']
}

for feature_name, related_features in feature_mapping.items():
    dis_w_cr_co_14_ST[feature_name] = dis_w_cr_co_14_ST[related_features].sum(axis=1)


In [21]:
dis_w_cr_co_14_ST = dis_w_cr_co_14_ST[dis_w_cr_co_13_ST.columns]

In [22]:
dis_w_cr_co_ST_final = pd.concat([dis_w_cr_co_1_12_ST, dis_w_cr_co_13_ST, dis_w_cr_co_14_ST], ignore_index=True)

In [23]:
dis_w_cr_co_ST_final.columns

Index(['STATE/UT', 'DISTRICT', 'Year', 'Murder', 'Rape',
       'Kidnapping Abduction', 'Dacoity', 'Robbery', 'Arson', 'Hurt',
       'Protection of Civil Rights (PCR) Act',
       'Prevention of atrocities (POA) Act', 'Other Crimes Against STs',
       'Total crimes against STs'],
      dtype='object')

In [24]:
dis_w_cr_co_children_final = pd.concat([dis_w_cr_co_1_12_children, dis_w_cr_co_13_children], ignore_index=True)

In [25]:
dis_w_cr_co_IPC.columns

Index(['STATE/UT', 'DISTRICT', 'YEAR', 'MURDER', 'ATTEMPT TO MURDER',
       'CULPABLE HOMICIDE NOT AMOUNTING TO MURDER', 'RAPE', 'CUSTODIAL RAPE',
       'OTHER RAPE', 'KIDNAPPING & ABDUCTION', 'DACOITY',
       'PREPARATION AND ASSEMBLY FOR DACOITY', 'ROBBERY', 'BURGLARY', 'THEFT',
       'AUTO THEFT', 'OTHER THEFT', 'RIOTS', 'CRIMINAL BREACH OF TRUST',
       'CHEATING', 'COUNTERFIETING', 'ARSON', 'HURT/GREVIOUS HURT',
       'DOWRY DEATHS', 'ASSAULT ON WOMEN WITH INTENT TO OUTRAGE HER MODESTY',
       'INSULT TO MODESTY OF WOMEN', 'CRUELTY BY HUSBAND OR HIS RELATIVES',
       'IMPORTATION OF GIRLS FROM FOREIGN COUNTRIES',
       'CAUSING DEATH BY NEGLIGENCE', 'OTHER IPC CRIMES', 'TOTAL IPC CRIMES'],
      dtype='object')

In [26]:
dis_w_cr_co_IPC.describe()

Unnamed: 0,YEAR,MURDER,ATTEMPT TO MURDER,CULPABLE HOMICIDE NOT AMOUNTING TO MURDER,RAPE,CUSTODIAL RAPE,OTHER RAPE,KIDNAPPING & ABDUCTION,DACOITY,PREPARATION AND ASSEMBLY FOR DACOITY,...,ARSON,HURT/GREVIOUS HURT,DOWRY DEATHS,ASSAULT ON WOMEN WITH INTENT TO OUTRAGE HER MODESTY,INSULT TO MODESTY OF WOMEN,CRUELTY BY HUSBAND OR HIS RELATIVES,IMPORTATION OF GIRLS FROM FOREIGN COUNTRIES,CAUSING DEATH BY NEGLIGENCE,OTHER IPC CRIMES,TOTAL IPC CRIMES
count,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0,...,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0,10678.0
mean,2007.698539,88.008616,80.406818,9.616595,58.34838,0.04233,58.30605,93.787226,12.801461,7.105263,...,24.791159,714.011051,20.180371,113.561528,27.417307,209.258288,0.175501,232.668665,2217.561154,5515.511332
std,4.047144,323.658451,317.038205,58.25192,216.304175,1.898937,216.01486,400.204564,55.51693,43.270258,...,96.303616,2958.280266,98.273572,459.092501,167.800303,906.342476,2.228812,969.079851,8217.397429,19397.649619
min,2001.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,2004.0,18.0,10.0,0.0,8.0,0.0,8.0,10.0,1.0,0.0,...,2.0,36.0,1.0,10.0,0.0,11.0,0.0,6.0,260.0,861.0
50%,2008.0,37.0,28.0,2.0,22.0,0.0,22.0,28.0,3.0,0.0,...,8.0,179.0,5.0,34.0,2.0,50.0,0.0,75.0,752.0,2159.0
75%,2011.0,66.0,58.0,6.0,45.0,0.0,44.0,64.0,9.0,2.0,...,19.0,498.0,16.0,85.0,12.0,144.0,0.0,185.0,1644.75,4078.0
max,2014.0,7601.0,7964.0,1616.0,5076.0,189.0,5076.0,12361.0,1319.0,1263.0,...,2830.0,60488.0,2469.0,10001.0,4970.0,23564.0,83.0,16076.0,127869.0,272423.0


## Time series analysis of all crimes

In [29]:
crime_features_ipc = dis_w_cr_co_IPC.columns.drop(['STATE/UT', 'DISTRICT', 'YEAR',])

# Create subplots with shared x-axis
fig = make_subplots(rows=len(crime_features_ipc), cols=1, subplot_titles=crime_features_ipc, shared_xaxes=False)

# Populate subplots with line charts for each feature
for i, feature in enumerate(crime_features_ipc):
    crime_by_year = dis_w_cr_co_IPC.groupby("YEAR")[feature].sum().reset_index()
    trace = px.line(crime_by_year, x='YEAR', y=feature, markers=True, line_shape='linear', title=f'{feature} Over the Years')
    fig.add_trace(trace['data'][0], row=i+1, col=1)

# Update layout
fig.update_layout(height=len(crime_features_ipc) * 300, title_text="IPC Crime Time Series Over the Years")

# Show the plot
fig.show()


In [30]:
dis_w_cr_co_children_final.columns

Index(['STATE/UT', 'DISTRICT', 'Year', 'Murder', 'Rape',
       'Kidnapping and Abduction', 'Foeticide', 'Abetment of suicide',
       'Exposure and abandonment', 'Procuration of minor girls',
       'Buying of girls for prostitution', 'Selling of girls for prostitution',
       'Prohibition of child marriage act', 'Other Crimes', 'Total',
       'Infanticid', 'Other murder'],
      dtype='object')

In [33]:
crime_features_children = dis_w_cr_co_children_final.columns.drop(['STATE/UT', 'DISTRICT', 'Year',])

# Create subplots with shared x-axis
fig = make_subplots(rows=len(crime_features_children), cols=1, subplot_titles=crime_features_children, shared_xaxes=False)

# Populate subplots with line charts for each feature
for i, feature in enumerate(crime_features_children):
    crime_by_year = dis_w_cr_co_children_final.groupby("Year")[feature].sum().reset_index()
    trace = px.line(crime_by_year, x='Year', y=feature, markers=True, line_shape='linear', title=f'{feature} Over the Years')
    fig.add_trace(trace['data'][0], row=i+1, col=1)

# Update layout
fig.update_layout(height=len(crime_features_children) * 300, title_text="Crimes against children Time Series Over the Years")

# Show the plot
fig.show()

In [34]:
dis_w_cr_co_SC_final.columns

Index(['STATE/UT', 'DISTRICT', 'Year', 'Murder', 'Rape',
       'Kidnapping and Abduction', 'Dacoity', 'Robbery', 'Arson', 'Hurt',
       'Prevention of atrocities (POA) Act',
       'Protection of Civil Rights (PCR) Act', 'Other Crimes Against SCs',
       'Total crimes against SCs'],
      dtype='object')

In [35]:
crime_features_SC = dis_w_cr_co_SC_final.columns.drop(['STATE/UT', 'DISTRICT', 'Year',])

# Create subplots with shared x-axis
fig = make_subplots(rows=len(crime_features_SC), cols=1, subplot_titles=crime_features_SC, shared_xaxes=False)

# Populate subplots with line charts for each feature
for i, feature in enumerate(crime_features_SC):
    crime_by_year = dis_w_cr_co_SC_final.groupby("Year")[feature].sum().reset_index()
    trace = px.line(crime_by_year, x='Year', y=feature, markers=True, line_shape='linear', title=f'{feature} Over the Years')
    fig.add_trace(trace['data'][0], row=i+1, col=1)

# Update layout
fig.update_layout(height=len(crime_features_children) * 300, title_text="Crimes against SC Time Series Over the Years")

# Show the plot
fig.show()

In [36]:
dis_w_cr_co_ST_final.columns

Index(['STATE/UT', 'DISTRICT', 'Year', 'Murder', 'Rape',
       'Kidnapping Abduction', 'Dacoity', 'Robbery', 'Arson', 'Hurt',
       'Protection of Civil Rights (PCR) Act',
       'Prevention of atrocities (POA) Act', 'Other Crimes Against STs',
       'Total crimes against STs'],
      dtype='object')

In [37]:
crime_features_ST = dis_w_cr_co_ST_final.columns.drop(['STATE/UT', 'DISTRICT', 'Year',])

# Create subplots with shared x-axis
fig = make_subplots(rows=len(crime_features_ST), cols=1, subplot_titles=crime_features_ST, shared_xaxes=False)

# Populate subplots with line charts for each feature
for i, feature in enumerate(crime_features_ST):
    crime_by_year = dis_w_cr_co_ST_final.groupby("Year")[feature].sum().reset_index()
    trace = px.line(crime_by_year, x='Year', y=feature, markers=True, line_shape='linear', title=f'{feature} Over the Years')
    fig.add_trace(trace['data'][0], row=i+1, col=1)

# Update layout
fig.update_layout(height=len(crime_features_children) * 300, title_text="Crimes against ST Time Series Over the Years")

# Show the plot
fig.show()

## Geospatial analysis of all crimes

1) IPC crimes

In [38]:
dis_w_cr_co_IPC.columns

Index(['STATE/UT', 'DISTRICT', 'YEAR', 'MURDER', 'ATTEMPT TO MURDER',
       'CULPABLE HOMICIDE NOT AMOUNTING TO MURDER', 'RAPE', 'CUSTODIAL RAPE',
       'OTHER RAPE', 'KIDNAPPING & ABDUCTION', 'DACOITY',
       'PREPARATION AND ASSEMBLY FOR DACOITY', 'ROBBERY', 'BURGLARY', 'THEFT',
       'AUTO THEFT', 'OTHER THEFT', 'RIOTS', 'CRIMINAL BREACH OF TRUST',
       'CHEATING', 'COUNTERFIETING', 'ARSON', 'HURT/GREVIOUS HURT',
       'DOWRY DEATHS', 'ASSAULT ON WOMEN WITH INTENT TO OUTRAGE HER MODESTY',
       'INSULT TO MODESTY OF WOMEN', 'CRUELTY BY HUSBAND OR HIS RELATIVES',
       'IMPORTATION OF GIRLS FROM FOREIGN COUNTRIES',
       'CAUSING DEATH BY NEGLIGENCE', 'OTHER IPC CRIMES', 'TOTAL IPC CRIMES'],
      dtype='object')

In [57]:
dis_w_cr_co_IPC["STATE/UT"].unique()

array(['ANDHRA PRADESH', 'ARUNACHAL PRADESH', 'ASSAM', 'BIHAR',
       'CHHATTISGARH', 'GOA', 'GUJARAT', 'HARYANA', 'HIMACHAL PRADESH',
       'JAMMU & KASHMIR', 'JHARKHAND', 'KARNATAKA', 'KERALA',
       'MADHYA PRADESH', 'MAHARASHTRA', 'MANIPUR', 'MEGHALAYA', 'MIZORAM',
       'NAGALAND', 'ODISHA', 'PUNJAB', 'RAJASTHAN', 'SIKKIM',
       'TAMIL NADU', 'TRIPURA', 'UTTAR PRADESH', 'UTTARAKHAND',
       'WEST BENGAL', 'A & N ISLANDS', 'CHANDIGARH', 'D & N HAVELI',
       'DAMAN & DIU', 'DELHI UT', 'LAKSHADWEEP', 'PUDUCHERRY',
       'Andhra Pradesh', 'Arunachal Pradesh', 'Assam', 'Bihar',
       'Chhattisgarh', 'Goa', 'Gujarat', 'Haryana', 'Himachal Pradesh',
       'Jammu & Kashmir', 'Jharkhand', 'Karnataka', 'Kerala',
       'Madhya Pradesh', 'Maharashtra', 'Manipur', 'Meghalaya', 'Mizoram',
       'Nagaland', 'Odisha', 'Punjab', 'Rajasthan', 'Sikkim',
       'Tamil Nadu', 'Tripura', 'Uttar Pradesh', 'Uttarakhand',
       'West Bengal', 'A&N Islands', 'Chandigarh', 'D&N Haveli',
     

In order for map to work we need to camel case all the state names

In [68]:
def capsToCamel(input_string):
    if any(c.islower() for c in input_string):
        return input_string
    else:
        words = input_string.split(" ")
        final_string = " ".join(word.capitalize() for word in words)
        return final_string

dis_w_cr_co_IPC["STATE/UT"] = dis_w_cr_co_IPC["STATE/UT"].apply(capsToCamel)


In [69]:
dis_w_cr_co_IPC["STATE/UT"].unique()

array(['Andhra Pradesh', 'Arunachal Pradesh', 'Assam', 'Bihar',
       'Chhattisgarh', 'Goa', 'Gujarat', 'Haryana', 'Himachal Pradesh',
       'Jammu & Kashmir', 'Jharkhand', 'Karnataka', 'Kerala',
       'Madhya Pradesh', 'Maharashtra', 'Manipur', 'Meghalaya', 'Mizoram',
       'Nagaland', 'Odisha', 'Punjab', 'Rajasthan', 'Sikkim',
       'Tamil Nadu', 'Tripura', 'Uttar Pradesh', 'Uttarakhand',
       'West Bengal', 'A & N Islands', 'Chandigarh', 'D & N Haveli',
       'Daman & Diu', 'Delhi Ut', 'Lakshadweep', 'Puducherry',
       'A&N Islands', 'D&N Haveli', 'Delhi UT', 'Telangana'], dtype=object)

Since our geojson file doesn't have some states hence I will be removing them

In [70]:
removable_states = ['A & N Islands', 'D & N Haveli', 'Daman & Diu', 'A&N Islands', 'D&N Haveli']
dis_w_cr_co_IPC = dis_w_cr_co_IPC[~dis_w_cr_co_IPC["STATE/UT"].isin(removable_states)]

In [71]:
dis_w_cr_co_IPC["STATE/UT"].unique()

array(['Andhra Pradesh', 'Arunachal Pradesh', 'Assam', 'Bihar',
       'Chhattisgarh', 'Goa', 'Gujarat', 'Haryana', 'Himachal Pradesh',
       'Jammu & Kashmir', 'Jharkhand', 'Karnataka', 'Kerala',
       'Madhya Pradesh', 'Maharashtra', 'Manipur', 'Meghalaya', 'Mizoram',
       'Nagaland', 'Odisha', 'Punjab', 'Rajasthan', 'Sikkim',
       'Tamil Nadu', 'Tripura', 'Uttar Pradesh', 'Uttarakhand',
       'West Bengal', 'Chandigarh', 'Delhi Ut', 'Lakshadweep',
       'Puducherry', 'Delhi UT', 'Telangana'], dtype=object)

In [73]:
# Changing values of states to match geojson file
value_mapping = {
    "Jammu & Kashmir":"Jammu and Kashmir",
    "Delhi Ut":"Delhi",
    "Delhi UT":"Delhi"
}
dis_w_cr_co_IPC.replace(value_mapping, inplace=True)

In [74]:
dis_w_cr_co_IPC["STATE/UT"].unique()

array(['Andhra Pradesh', 'Arunachal Pradesh', 'Assam', 'Bihar',
       'Chhattisgarh', 'Goa', 'Gujarat', 'Haryana', 'Himachal Pradesh',
       'Jammu and Kashmir', 'Jharkhand', 'Karnataka', 'Kerala',
       'Madhya Pradesh', 'Maharashtra', 'Manipur', 'Meghalaya', 'Mizoram',
       'Nagaland', 'Odisha', 'Punjab', 'Rajasthan', 'Sikkim',
       'Tamil Nadu', 'Tripura', 'Uttar Pradesh', 'Uttarakhand',
       'West Bengal', 'Chandigarh', 'Delhi', 'Lakshadweep', 'Puducherry',
       'Telangana'], dtype=object)

In [79]:
sum_crime_type

Unnamed: 0,STATE/UT,MURDER
0,Andhra Pradesh,70830
1,Arunachal Pradesh,1992
2,Assam,36474
3,Bihar,96178
4,Chandigarh,578
5,Chhattisgarh,28006
6,Delhi,14616
7,Goa,1114
8,Gujarat,32034
9,Haryana,25118


In [80]:
sum_crime_type = dis_w_cr_co_IPC.groupby(["STATE/UT"])["MURDER"].sum().reset_index()

fig = px.choropleth(
              sum_crime_type,
              geojson="Json_Files\india_state.geojson",
              locations = "STATE/UT", 
              featureidkey="properties.NAME_1",
              color="MURDER",
              hover_name = "STATE/UT",
              color_continuous_scale= "Viridis"
              )

fig.show()