# **Observation3**
## This observation aims to compare the distribution of verdicts in cases pertaining to Child Labour and if the gender of the petitioner advocate(the one filing the case) plays a influential role or not. I also tried to see if type of case (if the victim being present or giving a statement increased chances of convicting) provides any advantage, however there weren't enough datapoints in these categories so couldn't be done.

## **Importing relevant modules and files**

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# import matplotlib.pyplot as plt
import plotly.express as px

keys= pd.read_csv("/kaggle/input/keys-precog/act_key.csv")
act_sections=pd.read_csv("/kaggle/input/acts-sections/acts_sections.csv")

  exec(code_obj, self.user_global_ns, self.user_ns)


## **Reading case CSVs and concatenating them**

In [2]:
case=pd.read_csv("/kaggle/input/precog-cases/cases_2010.csv")
for i in range(1,3):
    case_year=pd.read_csv("/kaggle/input/precog-cases/cases_201%s.csv" %i)
    case=pd.concat([case, case_year])
    

## **Getting all ActIDs whose description contains "Child Labour"**

In [3]:
ActIDs=keys[keys.act_s.str.contains('Child Labour',na=False)][['act_s','act']]
ActIDs

Unnamed: 0,act_s,act
4438,Child Labour (Prohibition and Control) Act 1986,4438.0
4439,Child Labour (Prohibition and Regulation) Act,4439.0
4440,Child Labour (Prohibition and Regulation) Act ...,4440.0
4441,"Child Labour (Prohibition and Regulation) Act,...",4441.0
4442,Child Labour (Prohibition and Regulation) Rules,4442.0
4443,"Child Labour (Prohibition and Regultion) Act, ...",4443.0
4444,Child Labour (Restriction & Regulation) Act 1986,4444.0
4445,Child Labour ACt 1986,4445.0
4446,Child Labour Act,4446.0
4447,Child Labour Act \,4447.0


## **Finding all CaseIDs that include this Act**

In [4]:
CaseIDs=act_sections[act_sections.act.isin(ActIDs.act)][['ddl_case_id','act']]
CaseIDs


Unnamed: 0,ddl_case_id,act
4850505,13-06-03-205800104282014,4439.0
4851792,22-10-05-203900000782018,4439.0
4851793,22-10-05-203900000562019,4439.0
4851794,22-10-05-203900000772018,4439.0
4851795,22-10-05-203900000572019,4439.0
...,...,...
76560476,22-11-06-203900000332018,4439.0
76562370,11-05-05-224600000262015,4439.0
76768008,03-02-14-201200005812012,4457.0
76777723,03-02-14-201200003222010,4457.0


## **Reading Disposition id vs Disposition name file**
### Selecting the relevant columns and trucating to the head of document (since there are 51 possible verdicts)

In [5]:
disp_name=pd.read_csv("/kaggle/input/keys-precog/disp_name_key.csv")
disp_name=disp_name[['disp_name','disp_name_s']]
disp_name=disp_name.head(51)
disp_name

Unnamed: 0,disp_name,disp_name_s
0,1,258 crpc
1,2,abated
2,3,absconded
3,4,acquitted
4,5,allowed
5,6,appeal accepted
6,7,award
7,8,bail granted
8,9,bail refused
9,10,bail rejected


## **Taking all cases on child labour and getting their corresponding dispositions**

In [6]:
cases=case[case.ddl_case_id.isin(CaseIDs.ddl_case_id)][['ddl_case_id','disp_name']]
#dropping disp_var_missing
cases = cases[cases.disp_name != 26]
cases

Unnamed: 0,ddl_case_id,disp_name
502168,01-23-15-200305001922010,32
502171,01-23-15-200305001952010,24
1028187,03-02-14-201200003222010,19
1547653,04-04-05-202905003832010,12
1577581,04-05-34-202900008852010,4
...,...,...
6178949,26-08-02-202102946072016,14
6205947,26-10-02-202120361772016,19
6211695,26-11-02-202100896322016,34
6211863,26-11-02-202100921062016,37


## **Grouping by dispositions to obtain frequency of each verdict**

In [7]:
frequencies = cases.groupby('disp_name').count()
# frequencies = cases.groupby(['disp_name']).agg({'ddl_case_id': ['sum']})
frequencies=frequencies.rename(columns={'ddl_case_id': 'freq'})
frequencies = frequencies.reset_index()
display(frequencies)

Unnamed: 0,disp_name,freq
0,1,2
1,2,4
2,3,26
3,4,105
4,5,13
5,7,3
6,11,1
7,12,1
8,13,1
9,14,1


## Final frequency of each verdict

In [8]:
#merging
frequencies=frequencies.merge(disp_name, on='disp_name')
#select needed columns
frequencies=frequencies[['disp_name_s','freq']]
frequencies


Unnamed: 0,disp_name_s,freq
0,258 crpc,2
1,abated,4
2,absconded,26
3,acquitted,105
4,allowed,13
5,award,3
6,cancelled,1
7,closed,1
8,committed,1
9,compounded,1


## **Plotting frequency of each verdict**

In [19]:
fig = px.bar(frequencies, x = 'disp_name_s', y = 'freq',title='verdict vs frequency')
fig.show()

## **Conclusion** : Maximum number of cases were acquitted/ dismissed, which highlights the sad state of affairs. While filing a case, a petitioner would have sufficient proof of the Defendant being repsponsible for some form of child labour. Acquittal or Dismissal shows how these cases are often neglected.

## **Seeing how the verdict changes based on gender of petitioner advocate**
### Intial plan was to observe this bias through the gender of the judge, but there weren't enough datapoints as mentioned earlier. The motivation here was to observe any kind of bias which might exist related to gender.

## Extracting relevant columns from the case database into a new dataframe

In [10]:
cases=case[case.ddl_case_id.isin(CaseIDs.ddl_case_id)][['ddl_case_id','female_adv_pet','disp_name']]
#dropping disp_var_missing
cases = cases[cases.disp_name != 26]
cases

Unnamed: 0,ddl_case_id,female_adv_pet,disp_name
502168,01-23-15-200305001922010,-9999,32
502171,01-23-15-200305001952010,-9999,24
1028187,03-02-14-201200003222010,-9998,19
1547653,04-04-05-202905003832010,-9999,12
1577581,04-05-34-202900008852010,-9999,4
...,...,...,...
6178949,26-08-02-202102946072016,-9999,14
6205947,26-10-02-202120361772016,-9999,19
6211695,26-11-02-202100896322016,-9999,34
6211863,26-11-02-202100921062016,-9999,37


## Initialising a dict to store 3 different dataframes for 3 different categories-- males, females and unknown

In [11]:
dict={}
dict['undef']=cases[(cases.female_adv_pet==-9999) | (cases.female_adv_pet==-9998)][['ddl_case_id','disp_name']]
dict['males']=cases[(cases.female_adv_pet==0)][['ddl_case_id','disp_name']]
dict['females']=cases[(cases.female_adv_pet==1)][['ddl_case_id','disp_name']]

## Defining a function to modify the dataframes following the same steps as above

In [12]:
def for_plotting(string):
    # grouping by the disposition name
    dict[string]= dict[string].groupby('disp_name').count()
    # renaming column and resetting index after grouping and to handle multi-index
    dict[string]=dict[string].rename(columns={'ddl_case_id': 'freq'})
    dict[string] = dict[string].reset_index()
    # merging with disp_name and choosing relevant columns to obtain name instead of codes
    dict[string]=dict[string].merge(disp_name, on='disp_name')
    dict[string]=dict[string][['disp_name_s','freq']]

## Calling the above function for the 3 dataframes

In [13]:
for_plotting('undef')
for_plotting('males')
for_plotting('females')


## Plotting all 3 in one graph for comparison

In [15]:
#alternative code for plotting
import plotly.graph_objects as go

f1 = go.Figure(
    data = [
        go.Bar(x=dict['undef']['disp_name_s'], y=dict['undef']['freq'], name="undef"),
        go.Bar(x=dict['males']['disp_name_s'], y=dict['males']['freq'], name="males"),
        go.Bar(x=dict['females']['disp_name_s'], y=dict['females']['freq'], name="females"),
    ],
    layout = {"xaxis": {"title": "verdicts"}, "yaxis": {"title": "frequency"}, "title": "verdict vs frequency of judges"}
)
f1

### Clear takeaways from this plot:
1. The number of undefined entries is much larger than males of females showing problems with data(missing names) and the neural network used for name segregation(unknown gender)
2. after acquitted, convicted is the more frequent than dismissal in the case of male petitioners


## Plotting all categories one at a time for comparison within categories

## Females

In [16]:
f1 = go.Figure(
    data = [
        go.Bar(x=dict['females']['disp_name_s'], y=dict['females']['freq'], name="females"),
    ],
    layout = {"xaxis": {"title": "verdicts"}, "yaxis": {"title": "frequency"}, "title": "verdict vs frequency of judges"}
)
f1

## Unknown

In [17]:
f1 = go.Figure(
    data = [
        go.Bar(x=dict['undef']['disp_name_s'], y=dict['undef']['freq'], name="undef"),
#         go.Bar(x=dict['males']['disp_name_s'], y=dict['males']['freq'], name="males"),
#         go.Bar(x=dict['females']['disp_name_s'], y=dict['females']['freq'], name="females"),
    ],
    layout = {"xaxis": {"title": "verdicts"}, "yaxis": {"title": "frequency"}, "title": "verdict vs frequency of judges"}
)
f1

## Males

In [18]:
f1 = go.Figure(
    data = [
#         go.Bar(x=dict['undef']['disp_name_s'], y=dict['undef']['freq'], name="undef"),
        go.Bar(x=dict['males']['disp_name_s'], y=dict['males']['freq'], name="males"),
#         go.Bar(x=dict['females']['disp_name_s'], y=dict['females']['freq'], name="females"),
    ],
    layout = {"xaxis": {"title": "verdicts"}, "yaxis": {"title": "frequency"}, "title": "verdict vs frequency of judges"}
)
f1

### Surprisingly, more male petitioners had their accused acquitted than in females, However this could be due to a huge gap of the number in each category as well.
### The unknown category has comparable numbers for dismissed, acquitted, and convicted. The first 2 still dominating showing the neglect these cases receive.