# **Observation1**
I will be observing the relationship between **gender and the tenure of judges with respect to their post**.  
I am studying 5 posts sampled from the entire judicial heirarchy. 
The posts in decreasing order of power are:-
1. chief judicial magistrate
2. district and sessions court
3. civil court
4. civil judge senior division
5. civil judge junior division

## **Importing the necessary modules and csv files**

In [25]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt

judges= pd.read_csv("/kaggle/input/judges-clean/judges_clean.csv")


## **Seeing a small sample of data to better understand the column names, date formats and data**

In [26]:
judges.head()

Unnamed: 0,ddl_judge_id,state_code,dist_code,court_no,judge_position,female_judge,start_date,end_date
0,1,1,1,1,chief judicial magistrate,0 nonfemale,20-09-2013,20-02-2014
1,2,1,1,1,chief judicial magistrate,0 nonfemale,31-10-2013,20-02-2014
2,3,1,1,1,chief judicial magistrate,0 nonfemale,21-02-2014,31-05-2016
3,4,1,1,1,chief judicial magistrate,0 nonfemale,01-06-2016,06-06-2016
4,5,1,1,1,chief judicial magistrate,0 nonfemale,06-06-2016,07-07-2018


## **Creating a new Dataframe (handling only females at first)**
### with only the required data -- Judge ID, Judge position and Tenure



In [27]:
#choosing only the female judges
female_judges=judges[judges.female_judge == '1 female'][['ddl_judge_id','judge_position','start_date','end_date']]
#creating a new column for tenure
female_judges['end'] = pd.DatetimeIndex(female_judges['end_date']).year
female_judges['start'] = pd.DatetimeIndex(female_judges['start_date']).year
female_judges=female_judges.dropna()
# female_judges['end'] = female_judges['end'].fillna(2022).astype(np.int64)
female_judges['tenure'] = female_judges['end']-female_judges['start']
#dropping unnecessary columns
female_judges.drop('end',axis=1, inplace=True)
female_judges.drop('end_date',axis=1, inplace=True)
female_judges.drop('start',axis=1, inplace=True)
female_judges.drop('start_date',axis=1, inplace=True)
display(female_judges)

Unnamed: 0,ddl_judge_id,judge_position,tenure
13,14,civil judge junior division,2.0
20,21,civil judge junior division,1.0
42,43,district and sessions court,1.0
84,85,civil judge senior division,2.0
93,94,chief judicial magistrate,2.0
...,...,...,...
98471,98472,criminal cases,0.0
98473,98474,criminal cases,9.0
98474,98475,criminal cases,1.0
98475,98476,criminal cases,1.0


## **Finding the most common posts**

In [28]:
temp=female_judges.groupby(['judge_position']).agg({'ddl_judge_id': ['sum']})
temp.columns = ['freq']
temp = temp.reset_index()
display(temp.sort_values('freq',ascending=False))

Unnamed: 0,judge_position,freq
92,district and sessions court,169865260
75,chief judicial magistrate,160826121
81,civil judge junior division,139283663
82,civil judge senior division,116844272
79,civil court,83840797
...,...,...
100,"fcj courts, kadapa",23757
13,2-additional civil judge junior division,23259
112,"jcj court, puttaparthy",22541
72,cantonment court,12782


## **Grouping of attributes (position and tenure) to obtain count of judges**

In [29]:
grouped_multiple = female_judges.groupby(['judge_position', 'tenure']).agg({'ddl_judge_id': ['sum']})
grouped_multiple.columns = ['freq']
grouped_multiple = grouped_multiple.reset_index()
display(grouped_multiple)

Unnamed: 0,judge_position,tenure,freq
0,1-additional chief judicial magistrate,0.0,170792
1,1-additional chief judicial magistrate,3.0,41305
2,1-additional civil judge and judicial magistrate,0.0,138062
3,1-additional civil judge and judicial magistrate,1.0,109727
4,1-additional civil judge and judicial magistrate,2.0,27432
...,...,...,...
574,wac,1.0,124301
575,wac,2.0,147637
576,wakf tribunal,0.0,35687
577,womens court,0.0,97490


## **Initialising a dictionary for desired plot**
### Keys consist of positions and values are list of list(pair). The first number is for tenure and the second is for frequecy

In [30]:
#initialising a dictionary "graphing"
graphing={}
for ind in grouped_multiple.index:
    graphing[grouped_multiple['judge_position'][ind]]=[]

for ind in grouped_multiple.index:
    graphing[grouped_multiple['judge_position'][ind]].append([grouped_multiple['tenure'][ind],grouped_multiple['freq'][ind]])

for ind in grouped_multiple.index:
    graphing[grouped_multiple['judge_position'][ind]]=np.array(graphing[grouped_multiple['judge_position'][ind]])
# display(graphing)

## **Collecting Data for Male judges**
### Using the same code as above but with different variable names

In [33]:
# males

male_judges=judges[judges.female_judge == '0 nonfemale'][['ddl_judge_id','judge_position','start_date','end_date']]
male_judges['end'] = pd.DatetimeIndex(male_judges['end_date']).year
male_judges['start'] = pd.DatetimeIndex(male_judges['start_date']).year
male_judges['end'] = male_judges['end'].fillna(2022).astype(np.int64)
male_judges['tenure'] = male_judges['end']-male_judges['start']
male_judges.drop('end',axis=1, inplace=True)
male_judges.drop('end_date',axis=1, inplace=True)
male_judges.drop('start',axis=1, inplace=True)
male_judges.drop('start_date',axis=1, inplace=True)
display(male_judges)

Unnamed: 0,ddl_judge_id,judge_position,tenure
0,1,chief judicial magistrate,1
1,2,chief judicial magistrate,1
2,3,chief judicial magistrate,2
3,4,chief judicial magistrate,0
4,5,chief judicial magistrate,2
...,...,...,...
98467,98468,district and sessions court,0
98469,98470,district and sessions court,4
98470,98471,criminal cases,2
98472,98473,criminal cases,6


In [34]:
grouped_multiple_m = male_judges.groupby(['judge_position', 'tenure']).agg({'ddl_judge_id': ['sum']})
grouped_multiple_m.columns = ['freq']
grouped_multiple_m = grouped_multiple_m.reset_index()
display(grouped_multiple_m)

Unnamed: 0,judge_position,tenure,freq
0,1-9th a.d.j,4,41980
1,1-additional additional district judge,1,43925
2,1-additional additional district judge,2,43895
3,1-additional chief judicial magistrate,0,1800043
4,1-additional chief judicial magistrate,1,1154156
...,...,...,...
1702,wakf tribunal,5,81063
1703,womens court,0,194985
1704,womens court,3,97495
1705,x additional special court for prevention of c...,0,103102


In [35]:
graphing_m={}
for ind in grouped_multiple_m.index:
    graphing_m[grouped_multiple_m['judge_position'][ind]]=[]

for ind in grouped_multiple_m.index:
    graphing_m[grouped_multiple_m['judge_position'][ind]].append([grouped_multiple_m['tenure'][ind],grouped_multiple_m['freq'][ind]])

for ind in grouped_multiple_m.index:
    graphing_m[grouped_multiple_m['judge_position'][ind]]=np.array(graphing_m[grouped_multiple_m['judge_position'][ind]])
# print(graphing_m)

## **Plotting using plotly for females**

In [31]:
#alternative code for plotting
import plotly.graph_objects as go

x1 = graphing['chief judicial magistrate'][:,0]
y1 = graphing['chief judicial magistrate'][:,1]
x2 = graphing['district and sessions court'][:,0]
y2 = graphing['district and sessions court'][:,1]
x3 = graphing['civil court'][:,0]
y3 = graphing['civil court'][:,1]
x4 = graphing['civil judge senior division'][:,0]
y4 = graphing['civil judge senior division'][:,1]
x5 = graphing['civil judge junior division'][:,0]
y5 = graphing['civil judge junior division'][:,1]

f1 = go.Figure(
    data = [
        go.Scatter(x=x1, y=y1, name="CJM"),
        go.Scatter(x=x2, y=y2, name="DnSC"),
        go.Scatter(x=x3, y=y3, name="CC"),
        go.Scatter(x=x4, y=y4, name="Senior CJ"),
        go.Scatter(x=x5, y=y5, name="Junior CJ"),
    ],
    layout = {"xaxis": {"title": "tenure"}, "yaxis": {"title": "frequency"}, "title": "tenure vs frequency of female judges"}
)
f1

## **Plot using Plotly for males**

In [39]:
#alternative code for plotting
import plotly.graph_objects as go

x1 = graphing_m['chief judicial magistrate'][:,0]
y1 = graphing_m['chief judicial magistrate'][:,1]
x2 = graphing_m['district and sessions court'][:,0]
y2 = graphing_m['district and sessions court'][:,1]
x3 = graphing_m['civil court'][:,0]
y3 = graphing_m['civil court'][:,1]
x4 = graphing_m['civil judge senior division'][:,0]
y4 = graphing_m['civil judge senior division'][:,1]
x5 = graphing_m['civil judge junior division'][:,0]
y5 = graphing_m['civil judge junior division'][:,1]

f1 = go.Figure(
    data = [
        go.Scatter(x=x1, y=y1, name="CJM"),
        go.Scatter(x=x2, y=y2, name="DnSC"),
        go.Scatter(x=x3, y=y3, name="CC"),
        go.Scatter(x=x4, y=y4, name="Senior CJ"),
        go.Scatter(x=x5, y=y5, name="Junior CJ"),
    ],
    layout = {"xaxis": {"title": "tenure"}, "yaxis": {"title": "frequency"}, "title": "tenure vs frequency of male judges"}
)
f1

## **Conclusion**
### For **females**, we can see that the rate of drop of tenure is higher for higher positions like District Judges and Chief Judicial magistrates, medium for for Civil Judges and low for civil courts. So a female judges drop in tenure has a relation with her post.
### However for **males**, apart from District judges, the drop is slower and widely spread. 

## **Plotting men vs women using Plotly**
### Plotting *tenure vs frequency* for each position separately with males and females being on the same graph for efficient comparison

In [37]:
positions= ["chief judicial magistrate" , "district and sessions court" , "civil court" , "civil judge senior division" , "civil judge junior division"]
    
for pos in positions:
    x1 = graphing[pos][:,0]
    y1 = graphing[pos][:,1]
    x2 = graphing_m[pos][:,0]
    y2 = graphing_m[pos][:,1]
    fig = go.Figure(
    data = [
        go.Scatter(x=x1, y=y1, name="females"),
        go.Scatter(x=x2, y=y2, name="males"),
    ],
    layout = {"xaxis": {"title": "tenure"}, "yaxis": {"title": "frequency"}, "title": pos}
    )
    fig.show()

# **Conclusions**
## Apart from the blatant gap in frequencies, the female curve drops close to zero almost 3 years before the male curve in almost every case. And the later tenure years (8+ years) are generally dominated by males. These observations show the gender gap in the judiciary.
