# Clean Slate: Exploring dispositions in the datasets
> Prepared by [Laura Feeney](https://github.com/laurafeeney) for Code for Boston's [Clean Slate project](https://github.com/codeforboston/clean-slate).

## Purpose
Exploring duplicate charges, dispositions, and moves from one court to another. 

-----

### Step 0
Import data, programs, etc.

-----

In [1]:
import pandas as pd
pd.set_option("display.max_rows", 200)
import numpy as np
import regex as re
import glob, os
import datetime 
from datetime import date 

In [2]:
# processed individual-level data from NW district with expungability.

suff = pd.read_csv('../../data/processed/merged_suff.csv', encoding='cp1252',
                    dtype={'Analysis notes':str, 'extra_criteria':str, 'Expungeable': str}) 

nw = pd.read_csv('../../data/processed/merged_nw.csv', encoding='cp1252',
                    dtype={'Analysis notes':str, 'extra_criteria':str, 'Expungeable': str}) 

ms = pd.read_csv('../../data/processed/merged_ms.csv', encoding='cp1252',
                    dtype={'Analysis notes':str, 'extra_criteria':str, 'Expungeable': str}, low_memory=False) 

In [3]:
print("Suffolk")
#drop CMR offenses, remaining unique people
suff = suff.loc[suff['CMRoffense']=='no'].copy()
print('Suffolk: Number of unique people after dropping CMRs:', suff['Person ID'].nunique())
total_people_suff = suff['Person ID'].nunique()

print("\nNorthwestern")
#drop CMR offenses, remaining unique people
nw = nw.loc[nw['CMRoffense']=='no'].copy()
print('NW: Number of unique people after dropping CMRs:', nw['Person ID'].nunique())

#drop people >21 in NW
nw=nw.loc[nw['Age at Offense']<21].copy()
print('NW: Number of unique people under 21 after dropping CMRs:', nw['Person ID'].nunique())
total_people_nw = nw['Person ID'].nunique()

print("\nMiddlesex - no person ID, only case numbers")
#drop CMR offenses, remaining unique people
ms = ms.loc[ms['CMRoffense']==False].copy()
print('MS: Number of unique cases after dropping CMRs:', ms['Case Number'].nunique())

#drop people not in Juvenile court in ms
ms = ms.loc[ms['JuvenileC']==True].copy()
print('MS: Number of unique cases after dropping CMRs, in Juvenile Court:', ms['Case Number'].nunique())
total_people_ms = ms['Case Number'].nunique()

Suffolk
Suffolk: Number of unique people after dropping CMRs: 90719

Northwestern
NW: Number of unique people after dropping CMRs: 19686
NW: Number of unique people under 21 after dropping CMRs: 2854

Middlesex - no person ID, only case numbers
MS: Number of unique cases after dropping CMRs: 163727
MS: Number of unique cases after dropping CMRs, in Juvenile Court: 5816


In [4]:
# In Suffolk, there is only one juvenile court is listed, and it does not appear on this list. 
# https://www.suffolkdistrictattorney.com/about-the-office/contact-directions
# "Some of these courts have juvenile sessions for offenders under the age of 18; courts without a juvenile 
# session send their cases to the Boston Juvenile Court downtown."

print("Suffolk DA courts \n", suff['Court'].value_counts(), "\n")
print("Northwestern DA courts \n", nw['Court'].value_counts(), "\n")
print("Middlesex DA courts \n", ms['Court Location'].value_counts())

Suffolk DA courts 
 BMC     66433
DOR     65212
ROX     44288
CHE     35929
WROX    23391
SUP     18585
EBOS    13853
SBO     13571
BRI     11710
CHA      7853
Name: Court, dtype: int64 

Northwestern DA courts 
 Belchertown District Court    3275
Greenfield District Court     1634
Northampton District Court    1288
Hadley Juvenile Court         1158
Orange District Court          799
Greenfield Juvenile Court      616
Orange Juvenile Court          401
Belchertown Juvenile Court     367
Hampshire Superior Court       178
Franklin Superior Court         87
Name: Court, dtype: int64 

Middlesex DA courts 
 LOWJU    6577
CAMJU    3392
FRAJU    3079
WALJU    1232
Name: Court Location, dtype: int64


In [5]:
print("Suffolk Dispositions if offense is expungeable")
a = suff['Disposition'].loc[suff['Expungeable']=="Yes"].value_counts(dropna=False).rename_axis('unique_values').to_frame('counts')
b = suff['Disposition'].loc[suff['Expungeable']=="Yes"].value_counts(dropna=False, normalize = True).rename_axis('unique_values').to_frame('percent')*100
disp_stats = pd.concat([a, b], axis=1)

disp_stats['cumulative percent'] = disp_stats.percent.cumsum()
disp_stats

Suffolk Dispositions if offense is expungeable


Unnamed: 0_level_0,counts,percent,cumulative percent
unique_values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Dismissed,111818,51.918059,51.918059
Plea,44568,20.693306,72.611364
,36160,16.789399,89.400763
Continued w/o Finding,8526,3.958695,93.359458
Nole Prosequi,6169,2.86432,96.223778
Pre Trial Probation,3422,1.588864,97.812642
Verdict - Jury Trial,1814,0.842256,98.654898
Verdict - Bench Trial,1730,0.803254,99.458152
Convert to Civil,681,0.316194,99.774346
Filed w/o Change of Plea,276,0.128149,99.902495


In [6]:
print("Northwestern Dispositions if offense is expungeable")
a = nw['Disposition'].loc[nw['Expungeable']=="Yes"].value_counts(dropna=False).rename_axis('unique_values').to_frame('counts')
b = nw['Disposition'].loc[nw['Expungeable']=="Yes"].value_counts(dropna=False, normalize = True).rename_axis('unique_values').to_frame('percent')*100
disp_stats = pd.concat([a, b], axis=1)

disp_stats['cumulative percent'] = disp_stats.percent.cumsum()
disp_stats

Northwestern Dispositions if offense is expungeable


Unnamed: 0_level_0,counts,percent,cumulative percent
unique_values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Dismissed at Request of Comm,1442,19.502299,19.502299
Continued w/o Finding,1237,16.729781,36.23208
c276s87 finding,1000,13.524479,49.756559
Nolle Prosequi,845,11.428185,61.184744
Not Responsible,762,10.305653,71.490398
Responsible,512,6.924533,78.414931
Guilty,330,4.463078,82.878009
Dismissed,275,3.719232,86.597241
Dismissed on Payment,239,3.232351,89.829592
Dismissed Prior to Arraignment,230,3.11063,92.940222


In [7]:
print("Middlesex Dispositions if offense is expungeable")
a = ms['Disposition Description'].loc[ms['Expungeable']=="Yes"].value_counts(dropna=False).rename_axis('unique_values').to_frame('counts')
b = ms['Disposition Description'].loc[ms['Expungeable']=="Yes"].value_counts(dropna=False, normalize = True).rename_axis('unique_values').to_frame('percent')*100
disp_stats = pd.concat([a, b], axis=1)

disp_stats['cumulative percent'] = disp_stats.percent.cumsum()
disp_stats

Middlesex Dispositions if offense is expungeable


Unnamed: 0_level_0,counts,percent,cumulative percent
unique_values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
DISMISSED W/O PREJUDICE,2769,28.324468,28.324468
PRE-TRIAL PROBATION,2558,26.166121,54.490589
CONTINUED W/O FINDING,1430,14.62766,69.118249
DISMISSED PRIOR TO ARRAIGNMENT,549,5.615794,74.734043
DELINQUENT CHANGE OF PLEA,477,4.879296,79.613339
NOT RESPONSIBLE,297,3.038052,82.651391
DISMISSED BY COURT (PRIOR TO ARRAIGNMENT),255,2.608429,85.25982
NOLLE PROSEQUI,156,1.595745,86.855565
DISMISSED W/O PREJUDICE LACK OF PROSECUTION,150,1.53437,88.389935
GUILTY CHANGE OF PLEA,143,1.462766,89.8527
