# **Observation2**
## This observation aims to compare the performance of different states in providing justice to domestic violence cases in terms of time taken for case resolution. I also study how the performance has changed over the past decade.

## **Importing relevant modules and files**

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import matplotlib.pyplot as plt
import plotly.express as px

keys= pd.read_csv("/kaggle/input/keys-precog/act_key.csv")
act_sections=pd.read_csv("/kaggle/input/acts-sections/acts_sections.csv")
case_2010=pd.read_csv("/kaggle/input/precog-cases/cases_2010.csv")

  exec(code_obj, self.user_global_ns, self.user_ns)


In [2]:
#handling other years
dict = {'year%s' % n:n for n in range(1,5)}
for i in range(1,5):
    dict['case_201%s' %i]=pd.read_csv("/kaggle/input/precog-cases/cases_201%s.csv" %i)

In [3]:
states=pd.read_csv("/kaggle/input/keys-precog/cases_state_key.csv")

In [4]:
# state names
states=states[['state_code','state_name']]
# display(states)
states.drop_duplicates(keep="first", inplace=True)
states=states.reset_index()[['state_code','state_name']]
display(states)

Unnamed: 0,state_code,state_name
0,2,Andhra Pradesh
1,29,Telangana
2,6,Assam
3,8,Bihar
4,27,Chandigarh
5,18,Chhattisgarh
6,32,DNH at Silvasa
7,31,Diu and Daman
8,30,Goa
9,17,Gujarat


## **Getting all ActIDs whose description contains "Domestic Violence"**

In [5]:
ActIDs=keys[keys.act_s.str.contains('Domestic Violence',na=False)][['act_s','act']]
ActIDs

Unnamed: 0,act_s,act
81,10.Protection of Women from Domestic Violence Act,81.0
159,12 of Domestic Violence Act,159.0
163,12 of Protection of Women from Domestic Viole...,163.0
164,12 of Protection of Women from Domestic Violen...,164.0
165,12 of the Protection of Women from Domestic Vi...,165.0
...,...,...
29341,u/sec.12 and 20(3) of Protection of Women from...,29341.0
29343,"u/sec.12, 18, 18(e), 19 20, 22 of Protection ...",29343.0
29344,"u/sec.12,18, 19,20, 22 and 23 of the Protectio...",29344.0
29363,"u/sec.20,23 and 18 of Protection of woman from...",29363.0


## **Finding all CaseIDs that include this Act**

In [6]:
CaseIDs=act_sections[act_sections.act.isin(ActIDs.act)][['ddl_case_id','act']]
CaseIDs


Unnamed: 0,ddl_case_id,act
2197019,23-02-01-220701003002015,13710.0
3018822,23-09-03-220400001242018,13710.0
3990342,23-25-05-220600000492020,17548.0
4839044,01-31-01-209500000382016,13707.0
4841471,01-31-01-209500000032018,13707.0
...,...,...
76809971,13-17-03-203500007252016,5620.0
76809972,13-64-06-205700000132018,5620.0
76809973,13-65-03-203507006192017,13707.0
76810536,23-05-01-220400003232018,17548.0


## **First trying to handle a single year's cases (2010)**
### Selecting all the Case IDs that were filed in 2010 and extracting the relevant columns-- CaseID, State Code, Filing Date and Decision Date

In [7]:
cases=case_2010[case_2010.ddl_case_id.isin(CaseIDs.ddl_case_id)][['ddl_case_id','state_code','date_of_filing','date_of_decision']]
cases

Unnamed: 0,ddl_case_id,state_code,date_of_filing,date_of_decision
5761,01-01-08-201916000802010,1,2010-08-30,2015-10-01
10012,01-02-02-201903002042010,1,2010-04-26,2014-06-11
10283,01-02-02-201903004762010,1,2010-09-16,2016-12-22
10998,01-02-02-203003005282010,1,2010-07-09,2018-10-12
16314,01-02-05-201913000772010,1,2010-04-06,2013-11-18
...,...,...,...,...
4274863,30-02-05-203700000222010,30,2010-09-01,2011-04-16
4274864,30-02-05-203700000232010,30,2010-10-08,2011-07-09
4274865,30-02-05-203700000242010,30,2010-10-08,2011-08-20
4274867,30-02-05-203700000262010,30,2010-11-24,2013-02-28


In [8]:
cases=cases.merge(states, on="state_code")

## **Finding duration by subtracting end and start dates**
### Here, I dropped all rows containing NA, since predicting when the ongoing cases will end is not helpful

In [9]:
cases['duration']=pd.DatetimeIndex(cases['date_of_decision']).year-pd.DatetimeIndex(cases['date_of_filing']).year
cases=cases.dropna()
print(cases)

                   ddl_case_id  state_code date_of_filing date_of_decision  \
0     01-01-08-201916000802010           1     2010-08-30       2015-10-01   
1     01-02-02-201903002042010           1     2010-04-26       2014-06-11   
2     01-02-02-201903004762010           1     2010-09-16       2016-12-22   
3     01-02-02-203003005282010           1     2010-07-09       2018-10-12   
4     01-02-05-201913000772010           1     2010-04-06       2013-11-18   
...                        ...         ...            ...              ...   
6254  30-02-05-203700000222010          30     2010-09-01       2011-04-16   
6255  30-02-05-203700000232010          30     2010-10-08       2011-07-09   
6256  30-02-05-203700000242010          30     2010-10-08       2011-08-20   
6257  30-02-05-203700000262010          30     2010-11-24       2013-02-28   
6258  32-01-01-208700000202010          32     2010-10-04       2011-09-08   

          state_name  duration  
0        Maharashtra       5.0

## **Grouping entries by state_code and finding the mean and max time for resolution**

In [10]:
grouped= cases.groupby(['state_name']).agg({'duration': ['mean','max']})
display(grouped)

Unnamed: 0_level_0,duration,duration
Unnamed: 0_level_1,mean,max
state_name,Unnamed: 1_level_2,Unnamed: 2_level_2
Andhra Pradesh,3.846154,7.0
Assam,2.707071,8.0
Bihar,6.238095,10.0
Chandigarh,3.280702,8.0
DNH at Silvasa,1.0,1.0
Delhi,6.219178,9.0
Goa,2.0,8.0
Haryana,3.717073,7.0
Himachal Pradesh,7.0,8.0
Jharkhand,4.5,8.0


## **Same as above, except calculating only the mean time per state for plotting**

In [11]:
for_graph= cases.groupby(['state_name']).agg({'duration': ['mean']})
for_graph = for_graph.rename(columns={'mean': 'mean_0'})
display(for_graph)

Unnamed: 0_level_0,duration
Unnamed: 0_level_1,mean_0
state_name,Unnamed: 1_level_2
Andhra Pradesh,3.846154
Assam,2.707071
Bihar,6.238095
Chandigarh,3.280702
DNH at Silvasa,1.0
Delhi,6.219178
Goa,2.0
Haryana,3.717073
Himachal Pradesh,7.0
Jharkhand,4.5


## **Plotting the distribution of times using box plots across the states of India**

In [12]:
fig = px.box(cases,y="duration",x="state_name",title=f"Distrubution of Case time")
fig.show()

## **Doing the same for other years**

In [13]:
for i in range(1,5):
    case=dict['case_201%s' %i][dict['case_201%s' %i].ddl_case_id.isin(CaseIDs.ddl_case_id)][['ddl_case_id','state_code','date_of_filing','date_of_decision']]
    case['duration']=pd.DatetimeIndex(case['date_of_decision']).year-pd.DatetimeIndex(case['date_of_filing']).year
    case=case.dropna()
    case=case.merge(states, on="state_code")
    case= case.groupby(['state_name']).agg({'duration': ['mean']})
    case = case.rename(columns={'mean': 'mean_%s'%i})
    for_graph=for_graph.merge(case, on='state_name')
#     print(case)
#     print(for_graph)

#taking transpose for easier plotting

for_graph = for_graph.T  # Transpose dataframe

display(for_graph)

Unnamed: 0,state_name,Andhra Pradesh,Assam,Bihar,Chandigarh,Delhi,Goa,Haryana,Himachal Pradesh,Jharkhand,Karnataka,...,Maharashtra,Manipur,Orissa,Punjab,Rajasthan,Tamil Nadu,Telangana,Uttar Pradesh,Uttarakhand,West Bengal
duration,mean_0,3.846154,2.707071,6.238095,3.280702,6.219178,2.0,3.717073,7.0,4.5,3.15,...,2.232143,6.0,6.0,4.57377,6.411111,6.333333,3.253623,6.687861,5.209677,6.540541
duration,mean_1,2.261538,1.711111,6.266667,2.45122,5.577869,1.844444,3.032967,5.8,3.212121,3.337662,...,2.773657,4.125,5.0,3.780142,4.8,4.818182,2.354839,6.088235,3.693878,5.479592
duration,mean_2,3.0,1.479871,4.12,2.25,4.549422,1.9375,2.621921,4.823529,3.083333,1.758713,...,2.410941,3.571429,4.302326,3.108475,4.12766,3.444444,2.493939,4.721035,2.850467,4.834254
duration,mean_3,2.111111,1.354239,2.169014,2.625,3.848684,1.723404,2.294118,2.4,3.894737,1.173732,...,2.104661,1.74,3.05036,2.324176,3.393346,2.942029,2.363128,3.426298,1.847926,4.017544
duration,mean_4,1.866667,1.196011,2.12234,1.863469,2.825404,1.345133,1.803606,1.292248,2.08,0.946687,...,1.846063,0.810651,2.087838,1.744413,2.296344,1.746269,2.046083,2.270678,1.585925,2.688222


## **Removing the multi-index from rows**

In [14]:
for_graph=for_graph.reset_index()
for_graph

state_name,level_0,level_1,Andhra Pradesh,Assam,Bihar,Chandigarh,Delhi,Goa,Haryana,Himachal Pradesh,...,Maharashtra,Manipur,Orissa,Punjab,Rajasthan,Tamil Nadu,Telangana,Uttar Pradesh,Uttarakhand,West Bengal
0,duration,mean_0,3.846154,2.707071,6.238095,3.280702,6.219178,2.0,3.717073,7.0,...,2.232143,6.0,6.0,4.57377,6.411111,6.333333,3.253623,6.687861,5.209677,6.540541
1,duration,mean_1,2.261538,1.711111,6.266667,2.45122,5.577869,1.844444,3.032967,5.8,...,2.773657,4.125,5.0,3.780142,4.8,4.818182,2.354839,6.088235,3.693878,5.479592
2,duration,mean_2,3.0,1.479871,4.12,2.25,4.549422,1.9375,2.621921,4.823529,...,2.410941,3.571429,4.302326,3.108475,4.12766,3.444444,2.493939,4.721035,2.850467,4.834254
3,duration,mean_3,2.111111,1.354239,2.169014,2.625,3.848684,1.723404,2.294118,2.4,...,2.104661,1.74,3.05036,2.324176,3.393346,2.942029,2.363128,3.426298,1.847926,4.017544
4,duration,mean_4,1.866667,1.196011,2.12234,1.863469,2.825404,1.345133,1.803606,1.292248,...,1.846063,0.810651,2.087838,1.744413,2.296344,1.746269,2.046083,2.270678,1.585925,2.688222


## **Removing the extra columns introduced due to removal of multi-index**

In [18]:
for_graph=for_graph.drop(columns=['level_0','level_1'])
display(for_graph)

state_name,Andhra Pradesh,Assam,Bihar,Chandigarh,Delhi,Goa,Haryana,Himachal Pradesh,Jharkhand,Karnataka,...,Maharashtra,Manipur,Orissa,Punjab,Rajasthan,Tamil Nadu,Telangana,Uttar Pradesh,Uttarakhand,West Bengal
0,3.846154,2.707071,6.238095,3.280702,6.219178,2.0,3.717073,7.0,4.5,3.15,...,2.232143,6.0,6.0,4.57377,6.411111,6.333333,3.253623,6.687861,5.209677,6.540541
1,2.261538,1.711111,6.266667,2.45122,5.577869,1.844444,3.032967,5.8,3.212121,3.337662,...,2.773657,4.125,5.0,3.780142,4.8,4.818182,2.354839,6.088235,3.693878,5.479592
2,3.0,1.479871,4.12,2.25,4.549422,1.9375,2.621921,4.823529,3.083333,1.758713,...,2.410941,3.571429,4.302326,3.108475,4.12766,3.444444,2.493939,4.721035,2.850467,4.834254
3,2.111111,1.354239,2.169014,2.625,3.848684,1.723404,2.294118,2.4,3.894737,1.173732,...,2.104661,1.74,3.05036,2.324176,3.393346,2.942029,2.363128,3.426298,1.847926,4.017544
4,1.866667,1.196011,2.12234,1.863469,2.825404,1.345133,1.803606,1.292248,2.08,0.946687,...,1.846063,0.810651,2.087838,1.744413,2.296344,1.746269,2.046083,2.270678,1.585925,2.688222


## **Plotting all states together using Plotly**

In [19]:
label_x=['2010','2011','2012','2013','2014']

In [20]:
#plotting using plotly
fig = px.line(for_graph, x = label_x, y = for_graph.columns, title="Changes in resolution time over the years").update_layout(xaxis_title="years", yaxis_title="Avg Resolution time")
fig.show()

## **Conclusion**
### We can see a clear decreasing trend in the time taken which shows increasing sensitisation and progress in gender grievance issues

In [21]:
for_graph.columns

Index(['Andhra Pradesh', 'Assam', 'Bihar', 'Chandigarh', 'Delhi', 'Goa',
       'Haryana', 'Himachal Pradesh', 'Jharkhand', 'Karnataka', 'Kerala',
       'Madhya Pradesh', 'Maharashtra', 'Manipur', 'Orissa', 'Punjab',
       'Rajasthan', 'Tamil Nadu', 'Telangana', 'Uttar Pradesh', 'Uttarakhand',
       'West Bengal'],
      dtype='object', name='state_name')

## **Plotting a few states at a time for better clarity**

In [28]:
# temp=for_graph.iloc[:,0:5]
fig = px.line(for_graph, x=label_x, y=for_graph.iloc[:,0:5].columns).update_layout(xaxis_title="years", yaxis_title="Avg Resolution time")
fig.show()
fig = px.line(for_graph, x=label_x, y=for_graph.iloc[:,5:10].columns).update_layout(xaxis_title="years", yaxis_title="Avg Resolution time")
fig.show()
fig = px.line(for_graph, x=label_x, y=for_graph.iloc[:,10:15].columns).update_layout(xaxis_title="years", yaxis_title="Avg Resolution time")
fig.show()
fig = px.line(for_graph, x=label_x, y=for_graph.iloc[:,15:20].columns).update_layout(xaxis_title="years", yaxis_title="Avg Resolution time")
fig.show()
fig = px.line(for_graph, x=label_x, y=for_graph.iloc[:,20:25].columns).update_layout(xaxis_title="years", yaxis_title="Avg Resolution time")
fig.show()

## **Some Observations**
1. Delhi: From one of the highest times, it experienced a steady decline but still had the highest time by the end of 2014
2. Bihar: Also one of the highest, experienced a erratic decline with time dropping sharply between 2011 and 2013
3. Madhya Pradesh & Maharashtra: These neighbouring states had a very similar pattern of decline with their lines close together
4. Manipur: One of the worst performing states in the beginning of 2010, experienced an impressive decline to less than a year in 2014 for case resolution

## **Performance of states plotted against each other across the years**

In [34]:
pd.options.plotting.backend = "plotly"
for_graph.T.plot(title="performance over the years").update_layout(xaxis_title="states", yaxis_title="Avg Resolution time")

## **Conclusions**
### Overall, most states are improving in their performances over the years and the trends from 2010-2014 suggest that they will continue to do so. 