# *Data Preprocessing*

*The data has been preprocessed to extract the subset of features needed for analysis.*

*The selected features include:*

1. State Code
2. District Code
3. Court Number
4. Judge Position
5. Defendant's Gender
6. Gender of Defendant's Advocate
7. Gender of Petitioner's Advocate
8. Case Type
9. Case Purpose
10. Case Completion Time
11. Judge's Gender
12. Judge's Experience

*By focusing on these specific features, we have obtained a subset of data that can be used for further analysis and classification tasks.*

*To ensure a balanced representation of "acquitted" and "convicted" cases, a subset of cases from the time period between 2010 and 2015 has been selected. This specific time range was chosen to obtain a sufficient number of cases for both dispositions, thereby avoiding any potential class imbalance issues.*

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# You can write up to 20GB to the current directory (/kaggle/working/) that gets preserved as output when you create a version using "Save & Run All" 
# You can also write temporary files to /kaggle/temp/, but they won't be saved outside of the current session

/kaggle/input/court-data/judges_clean/judges_clean.csv
/kaggle/input/court-data/acts_sections/acts_sections.csv
/kaggle/input/court-data/cases/cases/cases_2015.csv
/kaggle/input/court-data/cases/cases/cases_2012.csv
/kaggle/input/court-data/cases/cases/cases_2018.csv
/kaggle/input/court-data/cases/cases/cases_2013.csv
/kaggle/input/court-data/cases/cases/cases_2017.csv
/kaggle/input/court-data/cases/cases/cases_2010.csv
/kaggle/input/court-data/cases/cases/cases_2014.csv
/kaggle/input/court-data/cases/cases/cases_2016.csv
/kaggle/input/court-data/cases/cases/cases_2011.csv
/kaggle/input/court-data/keys/keys/type_name_key.csv
/kaggle/input/court-data/keys/keys/cases_district_key.csv
/kaggle/input/court-data/keys/keys/act_key.csv
/kaggle/input/court-data/keys/keys/disp_name_key.csv
/kaggle/input/court-data/keys/keys/purpose_name_key.csv
/kaggle/input/court-data/keys/keys/cases_state_key.csv
/kaggle/input/court-data/keys/keys/section_key.csv
/kaggle/input/court-data/keys/keys/cases_court_

## *Cases from 2010-2015*

In [2]:
cases_2010 = pd.read_csv("/kaggle/input/court-data/cases/cases/cases_2010.csv")
cases_2010.drop(columns=["year", "cino", "female_petitioner", "date_first_list", "date_last_list", "date_next_list"], inplace=True)
cases_2010.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_filing,date_of_decision
0,01-01-01-200308002162010,1,1,1,chief judicial magistrate,0 male,0,-9998,790,5228.0,42,2010-12-13,2011-06-19
1,01-01-01-200707000172010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587,3627.0,42,2010-02-25,2010-11-21
2,01-01-01-200707000182010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587,3627.0,42,2010-02-25,2010-11-21
3,01-01-01-200707000192010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587,3627.0,42,2010-02-25,2010-11-21
4,01-01-01-200707000202010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587,3627.0,42,2010-02-25,2010-11-21


In [3]:
cases_2011 = pd.read_csv("/kaggle/input/court-data/cases/cases/cases_2011.csv")
cases_2011.drop(columns=["year", "cino", "female_petitioner", "date_first_list", "date_last_list", "date_next_list"], inplace=True)
cases_2011.drop(cases_2011[cases_2011["disp_name"] != 19].index, inplace=True)
cases_2011.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_filing,date_of_decision
300,01-01-01-203008000702011,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,4628,677.0,19,2011-03-10,2013-06-12
468,01-01-01-203008002472011,1,1,1,chief judicial magistrate,-9998 unclear,0,-9998,4628,4126.0,19,2011-09-05,2015-07-06
474,01-01-01-203008002532011,1,1,1,chief judicial magistrate,0 male,0,0,4628,677.0,19,2011-09-20,2012-01-25
524,01-01-01-203008003042011,1,1,1,chief judicial magistrate,0 male,0,-9998,4628,4126.0,19,2011-12-14,2016-02-25
547,01-01-01-203407016602011,1,1,1,chief judicial magistrate,0 male,-9998,0,5039,4343.0,19,2011-08-25,2014-02-07


In [4]:
cases_2012 = pd.read_csv("/kaggle/input/court-data/cases/cases/cases_2012.csv")
cases_2012.drop(columns=["year", "cino", "female_petitioner", "date_first_list", "date_last_list", "date_next_list"], inplace=True)
cases_2012.drop(cases_2012[cases_2012["disp_name"] != 19].index, inplace=True)
cases_2012.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_filing,date_of_decision
357,01-01-01-203008000982012,1,1,1,chief judicial magistrate,0 male,0,-9998,5036,737.0,19,2012-05-19,2014-07-04
375,01-01-01-203008001182012,1,1,1,chief judicial magistrate,0 male,-9998,0,5036,4501.0,19,2012-07-03,2015-04-29
411,01-01-01-203008001562012,1,1,1,chief judicial magistrate,0 male,0,0,5036,4501.0,19,2012-07-25,2015-06-04
457,01-01-01-203008002032012,1,1,1,chief judicial magistrate,0 male,0,0,5036,4501.0,19,2012-09-18,2014-03-14
516,01-01-01-203308002662012,1,1,1,chief judicial magistrate,0 male,-9999,0,3292,5383.0,19,2012-02-17,2012-02-17


In [5]:
cases_2013 = pd.read_csv("/kaggle/input/court-data/cases/cases/cases_2013.csv")
cases_2013.drop(columns=["year", "cino", "female_petitioner", "date_first_list", "date_last_list", "date_next_list"], inplace=True)
cases_2013.drop(cases_2013[cases_2013["disp_name"] != 19].index, inplace=True)
cases_2013.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_filing,date_of_decision
356,01-01-01-203008000612013,1,1,1,chief judicial magistrate,0 male,1,-9998,5238.0,4512.0,19,2013-04-03,2017-09-19
364,01-01-01-203008000692013,1,1,1,chief judicial magistrate,0 male,0,-9998,5238.0,754.0,19,2013-04-16,2016-12-27
375,01-01-01-203008000802013,1,1,1,chief judicial magistrate,0 male,-9999,-9998,5238.0,754.0,19,2013-05-07,2016-12-27
464,01-01-01-203008001712013,1,1,1,chief judicial magistrate,-9998 unclear,0,-9998,5238.0,4085.0,19,2013-08-30,2015-05-27
553,01-01-01-203008007322013,1,1,1,chief judicial magistrate,0 male,-9999,0,5238.0,5148.0,19,2013-09-20,2013-09-20


In [6]:
cases_2014 = pd.read_csv("/kaggle/input/court-data/cases/cases/cases_2014.csv")
cases_2014.drop(columns=["year", "cino", "female_petitioner", "date_first_list", "date_last_list", "date_next_list"], inplace=True)
cases_2014.drop(cases_2014[cases_2014["disp_name"] != 19].index, inplace=True)
cases_2014.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_filing,date_of_decision
123,01-01-01-201908001282014,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,1907.0,809.0,19,2014-04-25,2014-05-20
447,01-01-01-203008000092014,1,1,1,chief judicial magistrate,0 male,0,-9998,5274.0,4823.0,19,2014-01-03,2015-12-05
476,01-01-01-203008000382014,1,1,1,chief judicial magistrate,0 male,0,0,5274.0,4823.0,19,2014-02-13,2014-10-08
479,01-01-01-203008000412014,1,1,1,chief judicial magistrate,0 male,0,-9998,5274.0,809.0,19,2014-02-21,2015-10-14
494,01-01-01-203008000562014,1,1,1,chief judicial magistrate,0 male,0,-9998,5274.0,809.0,19,2014-03-06,2017-03-30


In [7]:
cases_2015 = pd.read_csv("/kaggle/input/court-data/cases/cases/cases_2015.csv")
cases_2015.drop(columns=["year", "cino", "female_petitioner", "date_first_list", "date_last_list", "date_next_list"], inplace=True)
cases_2015.drop(cases_2015[cases_2015["disp_name"] != 19].index, inplace=True)
cases_2015.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_filing,date_of_decision
525,01-01-01-203008000092015,1,1,1,chief judicial magistrate,1 female,0,0,5506.0,922.0,19,2015-01-12,2015-03-07
538,01-01-01-203008000222015,1,1,1,chief judicial magistrate,1 female,-9999,0,5506.0,5446.0,19,2015-01-23,2015-06-30
575,01-01-01-203008000592015,1,1,1,chief judicial magistrate,0 male,0,-9998,5506.0,5446.0,19,2015-03-16,2019-01-22
593,01-01-01-203008000772015,1,1,1,chief judicial magistrate,0 male,0,-9998,5506.0,5446.0,19,2015-04-13,2018-10-23
606,01-01-01-203008000902015,1,1,1,chief judicial magistrate,-9998 unclear,1,-9998,5506.0,5446.0,19,2015-05-25,2017-07-27


*Concatinating the data frames from all the 6 years*

In [8]:
cases_6yrs = pd.concat([cases_2010, cases_2011, cases_2012, cases_2013, cases_2014, cases_2015], axis=0)
cases_6yrs.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_filing,date_of_decision
0,01-01-01-200308002162010,1,1,1,chief judicial magistrate,0 male,0,-9998,790.0,5228.0,42,2010-12-13,2011-06-19
1,01-01-01-200707000172010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587.0,3627.0,42,2010-02-25,2010-11-21
2,01-01-01-200707000182010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587.0,3627.0,42,2010-02-25,2010-11-21
3,01-01-01-200707000192010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587.0,3627.0,42,2010-02-25,2010-11-21
4,01-01-01-200707000202010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587.0,3627.0,42,2010-02-25,2010-11-21


In [9]:
import warnings
warnings.simplefilter('ignore')
import gc
import subprocess


del cases_2010
gc.collect()
del cases_2011
gc.collect()
del cases_2012
gc.collect()
del cases_2013
gc.collect()
del cases_2014
gc.collect()
del cases_2015
gc.collect()

0

*To ensure a clean and accurate prediction process, any instances with missing values (NaN) in the dataset have been dropped.*

In [10]:
cases_6yrs.dropna(inplace=True)

*The date and time variables representing the case filing and decision date have been converted to datetime series. This conversion enables us to compute and compare the duration between different dates and times accurately.*

In [11]:
cases_6yrs['date_of_decision'] =  pd.to_datetime(cases_6yrs['date_of_decision'], errors='coerce')
cases_6yrs['date_of_filing'] =  pd.to_datetime(cases_6yrs['date_of_filing'], errors='coerce')
cases_6yrs.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 4930096 entries, 0 to 10475841
Data columns (total 13 columns):
 #   Column            Dtype         
---  ------            -----         
 0   ddl_case_id       object        
 1   state_code        int64         
 2   dist_code         int64         
 3   court_no          int64         
 4   judge_position    object        
 5   female_defendant  object        
 6   female_adv_def    int64         
 7   female_adv_pet    int64         
 8   type_name         float64       
 9   purpose_name      float64       
 10  disp_name         int64         
 11  date_of_filing    datetime64[ns]
 12  date_of_decision  datetime64[ns]
dtypes: datetime64[ns](2), float64(2), int64(6), object(3)
memory usage: 526.6+ MB


In [12]:
cases_6yrs['case_duration'] = (cases_6yrs['date_of_decision'] - cases_6yrs['date_of_filing']).dt.days
cases_6yrs.drop(columns=['date_of_filing'], inplace=True)
cases_6yrs.drop(cases_6yrs[ cases_6yrs['case_duration'] <= 0 ].index, inplace = True)

cases_6yrs.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_decision,case_duration
0,01-01-01-200308002162010,1,1,1,chief judicial magistrate,0 male,0,-9998,790.0,5228.0,42,2011-06-19,188.0
1,01-01-01-200707000172010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587.0,3627.0,42,2010-11-21,269.0
2,01-01-01-200707000182010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587.0,3627.0,42,2010-11-21,269.0
3,01-01-01-200707000192010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587.0,3627.0,42,2010-11-21,269.0
4,01-01-01-200707000202010,1,1,1,chief judicial magistrate,-9998 unclear,-9999,0,2587.0,3627.0,42,2010-11-21,269.0


*Removing the rows with unknown gender and then converting object datatype to integer.*

In [13]:
cases_6yrs.drop(cases_6yrs[(cases_6yrs["female_adv_def"] != 0) & (cases_6yrs["female_adv_def"] != 1)].index, inplace=True)
cases_6yrs.drop(cases_6yrs[(cases_6yrs["female_adv_pet"] != 0) & (cases_6yrs["female_adv_pet"] != 1)].index, inplace=True)

cases_6yrs.drop(cases_6yrs[(cases_6yrs["female_defendant"] != "0 male") & (cases_6yrs["female_defendant"] != "1 female")].index, inplace=True)
cases_6yrs["female_defendant"] = np.where(cases_6yrs["female_defendant"] == "1 female", 1, 0)
cases_6yrs

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_decision,case_duration
34,01-01-01-201908000292010,1,1,1,chief judicial magistrate,0,0,1,1429.0,3006.0,30,2010-12-01,279.0
41,01-01-01-201908000382010,1,1,1,chief judicial magistrate,0,0,1,1429.0,3734.0,25,2011-10-15,565.0
54,01-01-01-201908000512010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3280.0,30,2013-08-13,1216.0
71,01-01-01-201908000702010,1,1,1,chief judicial magistrate,0,0,1,1429.0,509.0,25,2015-07-27,1886.0
88,01-01-01-201908000892010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,25,2012-04-09,655.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...
10474642,32-01-01-208710000082015,32,1,1,principal district and sessions court,0,0,0,5956.0,5446.0,19,2016-08-11,400.0
10474643,32-01-01-208710000092015,32,1,1,principal district and sessions court,0,0,0,5956.0,922.0,19,2017-12-19,897.0
10474649,32-01-01-208710000152015,32,1,1,principal district and sessions court,0,0,0,5956.0,5446.0,19,2016-05-10,265.0
10474657,32-01-01-208710000232015,32,1,1,principal district and sessions court,0,0,0,5956.0,5446.0,19,2016-09-09,270.0


## *Judges and Cases Relational data*

In [14]:
judges_keys = pd.read_csv("/kaggle/input/court-data/keys/keys/judge_case_merge_key.csv")
judges_keys.drop(columns=["ddl_filing_judge_id"], inplace=True)

judges_keys.head()

Unnamed: 0,ddl_case_id,ddl_decision_judge_id
0,01-01-01-201900000022018,5.0
1,01-01-01-201900000032017,5.0
2,01-01-01-201900000032018,94.0
3,01-01-01-201900000042016,5.0
4,01-01-01-201900000042018,156.0


*Merging two dataframes based on a common key, that is, `case id`. The resulting merged dataframe provides a consolidated view of the data, incorporating information from both original data sources.*

In [15]:
cases = pd.merge(cases_6yrs, judges_keys, on='ddl_case_id', how='left')
cases.dropna(inplace=True)
cases

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_decision,case_duration,ddl_decision_judge_id
2,01-01-01-201908000512010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3280.0,30,2013-08-13,1216.0,51.0
5,01-01-01-201908000952010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,22,2012-08-07,769.0,50.0
6,01-01-01-201908001112010,1,1,1,chief judicial magistrate,0,1,0,1429.0,3280.0,30,2013-09-26,1164.0,52.0
7,01-01-01-201908001122010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3734.0,51,2014-06-17,1426.0,3.0
8,01-01-01-201908001162010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,25,2013-02-12,932.0,50.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
425167,29-08-17-202100002432015,29,8,17,additional jmfc,0,0,0,956.0,3258.0,19,2018-05-29,1055.0,97561.0
425176,29-10-01-202100000022015,29,10,1,district and sessions court,0,0,0,956.0,5446.0,19,2018-02-23,1122.0,97951.0
425177,29-10-01-203400000572015,29,10,1,district and sessions court,0,0,0,6043.0,5446.0,19,2017-01-10,699.0,97951.0
425178,29-10-01-203400000922015,29,10,1,district and sessions court,0,0,0,6043.0,5446.0,19,2017-02-10,690.0,97951.0


In [16]:
del cases_6yrs
gc.collect()

0

## *Judges data*

In [17]:
judges = pd.read_csv("/kaggle/input/court-data/judges_clean/judges_clean.csv")
judges.drop(columns=["state_code", "dist_code", "court_no", "judge_position", "end_date"], inplace=True)
judges.head()

Unnamed: 0,ddl_judge_id,female_judge,start_date
0,1,0 nonfemale,20-09-2013
1,2,0 nonfemale,31-10-2013
2,3,0 nonfemale,21-02-2014
3,4,0 nonfemale,01-06-2016
4,5,0 nonfemale,06-06-2016


Merging two dataframes based on a common key, that is, `judge id`.

In [18]:
cases = pd.merge(cases, judges, how='left', left_on='ddl_decision_judge_id', right_on='ddl_judge_id')
cases.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,date_of_decision,case_duration,ddl_decision_judge_id,ddl_judge_id,female_judge,start_date
0,01-01-01-201908000512010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3280.0,30,2013-08-13,1216.0,51.0,51,0 nonfemale,10-06-2013
1,01-01-01-201908000952010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,22,2012-08-07,769.0,50.0,50,0 nonfemale,01-10-2011
2,01-01-01-201908001112010,1,1,1,chief judicial magistrate,0,1,0,1429.0,3280.0,30,2013-09-26,1164.0,52.0,52,0 nonfemale,24-09-2013
3,01-01-01-201908001122010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3734.0,51,2014-06-17,1426.0,3.0,3,0 nonfemale,21-02-2014
4,01-01-01-201908001162010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,25,2013-02-12,932.0,50.0,50,0 nonfemale,01-10-2011


*The date and time variables representing the judge's starting date has been converted to datetime series to compute the judges experience.*

In [19]:
cases['start_date'] =  pd.to_datetime(cases['start_date'], errors='coerce')
cases['judge_experience'] = (cases['date_of_decision'] - cases['start_date']).dt.days
cases.drop(columns=['date_of_decision', 'start_date'], inplace=True)
cases.drop(cases[ cases['judge_experience'] <= 0 ].index, inplace = True)

cases.head()

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,case_duration,ddl_decision_judge_id,ddl_judge_id,female_judge,judge_experience
1,01-01-01-201908000952010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,22,769.0,50.0,50,0 nonfemale,575
2,01-01-01-201908001112010,1,1,1,chief judicial magistrate,0,1,0,1429.0,3280.0,30,1164.0,52.0,52,0 nonfemale,2
3,01-01-01-201908001122010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3734.0,51,1426.0,3.0,3,0 nonfemale,116
4,01-01-01-201908001162010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,25,932.0,50.0,50,0 nonfemale,764
5,01-01-01-201908001252010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3280.0,30,806.0,50.0,50,0 nonfemale,652


In [20]:
cases.drop(cases[(cases["female_judge"] != "0 nonfemale") & (cases["female_judge"] != "1 female")].index, inplace=True)
cases["female_judge"] = np.where(cases["female_judge"] == "1 female", 1, 0)

cases

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,disp_name,case_duration,ddl_decision_judge_id,ddl_judge_id,female_judge,judge_experience
1,01-01-01-201908000952010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,22,769.0,50.0,50,0,575
2,01-01-01-201908001112010,1,1,1,chief judicial magistrate,0,1,0,1429.0,3280.0,30,1164.0,52.0,52,0,2
3,01-01-01-201908001122010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3734.0,51,1426.0,3.0,3,0,116
4,01-01-01-201908001162010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,25,932.0,50.0,50,0,764
5,01-01-01-201908001252010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3280.0,30,806.0,50.0,50,0,652
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
33223,29-08-17-202100002432015,29,8,17,additional jmfc,0,0,0,956.0,3258.0,19,1055.0,97561.0,97561,0,586
33224,29-10-01-202100000022015,29,10,1,district and sessions court,0,0,0,956.0,5446.0,19,1122.0,97951.0,97951,1,426
33225,29-10-01-203400000572015,29,10,1,district and sessions court,0,0,0,6043.0,5446.0,19,699.0,97951.0,97951,1,17
33226,29-10-01-203400000922015,29,10,1,district and sessions court,0,0,0,6043.0,5446.0,19,690.0,97951.0,97951,1,48


In [21]:
cases.drop(columns=["ddl_decision_judge_id","ddl_judge_id"], inplace=True)

## *Disposition data*

In [22]:
disp_key = pd.read_csv("/kaggle/input/court-data/keys/keys/disp_name_key.csv")
disp_key = disp_key.sort_values(by='disp_name', ascending=True, ignore_index=True)
disp_key.drop(columns=['year', 'count'], inplace=True)
disp_key.drop_duplicates(subset='disp_name_s', inplace=True, ignore_index=True)
disp_key

Unnamed: 0,disp_name,disp_name_s
0,1,258 crpc
1,2,abated
2,3,absconded
3,4,acquitted
4,5,allowed
5,6,appeal accepted
6,7,award
7,8,bail granted
8,9,bail order
9,9,bail refused


*Merging two dataframes based on a common key, that is,`disposition name`.*

In [23]:
cases = pd.merge(cases, disp_key, how='left', on='disp_name')
cases.drop(columns=['disp_name'], inplace=True)
cases

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,case_duration,female_judge,judge_experience,disp_name_s
0,01-01-01-201908000952010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,769.0,0,575,dismissed
1,01-01-01-201908001112010,1,1,1,chief judicial magistrate,0,1,0,1429.0,3280.0,1164.0,0,2,judgement
2,01-01-01-201908001122010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3734.0,1426.0,0,116,withdrawn
3,01-01-01-201908001162010,1,1,1,chief judicial magistrate,0,0,0,1429.0,1963.0,932.0,0,764,disposed-otherwise
4,01-01-01-201908001252010,1,1,1,chief judicial magistrate,0,0,0,1429.0,3280.0,806.0,0,652,judgement
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30777,29-08-17-202100002432015,29,8,17,additional jmfc,0,0,0,956.0,3258.0,1055.0,0,586,convicted
30778,29-10-01-202100000022015,29,10,1,district and sessions court,0,0,0,956.0,5446.0,1122.0,1,426,convicted
30779,29-10-01-203400000572015,29,10,1,district and sessions court,0,0,0,6043.0,5446.0,699.0,1,17,convicted
30780,29-10-01-203400000922015,29,10,1,district and sessions court,0,0,0,6043.0,5446.0,690.0,1,48,convicted


*Filtering the data to only disposition of `convicted` or `acquitted`.*

In [24]:
cases.drop(cases[(cases["disp_name_s"] != "convicted") & (cases["disp_name_s"] != "acquitted")].index, inplace=True)
cases

Unnamed: 0,ddl_case_id,state_code,dist_code,court_no,judge_position,female_defendant,female_adv_def,female_adv_pet,type_name,purpose_name,case_duration,female_judge,judge_experience,disp_name_s
11,01-01-01-203008000082010,1,1,1,chief judicial magistrate,0,0,0,4018.0,3280.0,1827.0,0,326,convicted
15,01-01-01-203008000402010,1,1,1,chief judicial magistrate,0,1,0,4018.0,3280.0,1850.0,0,379,convicted
16,01-01-01-203008000472010,1,1,1,chief judicial magistrate,0,0,0,4018.0,3280.0,1849.0,0,228,acquitted
24,01-01-01-203008000892010,1,1,1,chief judicial magistrate,0,0,0,4018.0,3006.0,1873.0,0,458,acquitted
27,01-01-01-203008000992010,1,1,1,chief judicial magistrate,0,0,0,4018.0,509.0,1330.0,0,39,acquitted
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
30777,29-08-17-202100002432015,29,8,17,additional jmfc,0,0,0,956.0,3258.0,1055.0,0,586,convicted
30778,29-10-01-202100000022015,29,10,1,district and sessions court,0,0,0,956.0,5446.0,1122.0,1,426,convicted
30779,29-10-01-203400000572015,29,10,1,district and sessions court,0,0,0,6043.0,5446.0,699.0,1,17,convicted
30780,29-10-01-203400000922015,29,10,1,district and sessions court,0,0,0,6043.0,5446.0,690.0,1,48,convicted


In [25]:
cases["disp_name_s"].value_counts()

convicted    13447
acquitted     7472
Name: disp_name_s, dtype: int64

*converting to boolean*

In [26]:
cases["disp_name_s"] = np.where(cases["disp_name_s"] == "convicted", 1, 0)

In [27]:
cases.to_csv("/kaggle/working/cases_convicted_acquitted.csv", index=False)