# Clean Slate: Exploring dispositions in the datasets
> Prepared by [Laura Feeney](https://github.com/laurafeeney) for Code for Boston's [Clean Slate project](https://github.com/codeforboston/clean-slate).

## Purpose
Exploring duplicate charges, dispositions, and moves from one court to another. 

-----

### Step 0
Import data, programs, etc.

-----

In [1]:
import pandas as pd
pd.set_option("display.max_rows", 200)
import numpy as np
import regex as re
import glob, os
import datetime 
from datetime import date 


In [2]:
# processed individual-level data from NW district with expungability.

suff = pd.read_csv('../data/processed/merged_suff.csv', encoding='cp1252',
                    dtype={'Analysis notes':str, 'extra_criteria':str, 'Expungeable': str}) 

nw = pd.read_csv('../data/processed/merged_nw.csv', encoding='utf8',
                    dtype={'Analysis notes':str, 'extra_criteria':str, 'Expungeable': str}) 

ms = pd.read_csv('../data/processed/merged_ms.csv', encoding='cp1252',
                    dtype={'Analysis notes':str, 'extra_criteria':str, 'Expungeable': str}, low_memory=False) 

In [3]:
print("Suffolk")
#drop CMR offenses, remaining unique people
suff = suff.loc[suff['CMRoffense']=='no'].copy()
print('Suffolk: Number of unique people after dropping CMRs:', suff['Person ID'].nunique())
total_people_suff = suff['Person ID'].nunique()

print("\nNorthwestern")
#drop CMR offenses, remaining unique people
nw = nw.loc[nw['CMRoffense']=='no'].copy()
print('NW: Number of unique people after dropping CMRs:', nw['Person ID'].nunique())

#drop people >21 in NW
nw=nw.loc[nw['Age at Offense']<21].copy()
print('NW: Number of unique people under 21 after dropping CMRs:', nw['Person ID'].nunique())
total_people_nw = nw['Person ID'].nunique()

print("\nMiddlesex - no person ID, only case numbers")
#drop CMR offenses, remaining unique people
ms = ms.loc[ms['CMRoffense']==False].copy()
print('MS: Number of unique cases after dropping CMRs:', ms['Case Number'].nunique())

#drop people not in Juvenile court in ms
ms = ms.loc[ms['JuvenileC']==True].copy()
print('MS: Number of unique cases after dropping CMRs, in Juvenile Court:', ms['Case Number'].nunique())
total_people_ms = ms['Case Number'].nunique()

Suffolk
Suffolk: Number of unique people after dropping CMRs: 90719

Northwestern
NW: Number of unique people after dropping CMRs: 19686
NW: Number of unique people under 21 after dropping CMRs: 2854

Middlesex - no person ID, only case numbers
MS: Number of unique cases after dropping CMRs: 163727
MS: Number of unique cases after dropping CMRs, in Juvenile Court: 5816


In [4]:
# In Suffolk, there is only one juvenile court is listed, and it does not appear on this list. 
# https://www.suffolkdistrictattorney.com/about-the-office/contact-directions
# "Some of these courts have juvenile sessions for offenders under the age of 18; courts without a juvenile 
# session send their cases to the Boston Juvenile Court downtown."

print("Suffolk DA courts \n", suff['Court'].value_counts(), "\n")
print("Northwestern DA courts \n", nw['Court'].value_counts(), "\n")
print("Middlesex DA courts \n", ms['Court Location'].value_counts())

Suffolk DA courts 
 BMC     66433
DOR     65212
ROX     44288
CHE     35929
WROX    23391
SUP     18585
EBOS    13853
SBO     13571
BRI     11710
CHA      7853
Name: Court, dtype: int64 

Northwestern DA courts 
 Belchertown District Court    3275
Greenfield District Court     1634
Northampton District Court    1288
Hadley Juvenile Court         1158
Orange District Court          799
Greenfield Juvenile Court      616
Orange Juvenile Court          401
Belchertown Juvenile Court     367
Hampshire Superior Court       178
Franklin Superior Court         87
Name: Court, dtype: int64 

Middlesex DA courts 
 LOWJU    6577
CAMJU    3392
FRAJU    3079
WALJU    1232
Name: Court Location, dtype: int64


In [5]:
print("Suffolk Disposition Reasons")
a = suff['Description Disposition Reason'].value_counts(dropna=False).rename_axis('unique_values').to_frame('counts')
b = suff['Description Disposition Reason'].value_counts(dropna=False, normalize = True).rename_axis('unique_values').to_frame('percent')
disp_stats_sf = pd.concat([a, b], axis=1)

disp_stats_sf['cumulative percent'] = disp_stats_sf.percent.cumsum()
disp_stats_sf.style.format({ 'counts' : '{:,}', 'percent' : '{:,.1%}', 'cumulative percent' : '{:,.1%}'})


Suffolk Disposition Reasons


Unnamed: 0_level_0,counts,percent,cumulative percent
unique_values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
,89897,29.9%,29.9%
DWOP - no victim,24104,8.0%,37.9%
Dismissed Upon Payment of Court Costs,20732,6.9%,44.8%
Dismissed by Commonwealth,20166,6.7%,51.5%
Guilty - Committed,16291,5.4%,56.9%
Guilty - Probation,13041,4.3%,61.2%
Dismissed to Subsequent Indictment,12522,4.2%,65.4%
CWF (ASF),12063,4.0%,69.4%
Dismissed for Agreed Plea,10479,3.5%,72.9%
Dismissed WO Prosecution,9751,3.2%,76.1%


In [6]:
print("Suffolk Dispositions (missing main dispo reason)")
x = suff.loc[suff['Description Disposition Reason'].isnull()]
a = x['Disposition'].value_counts(dropna=False).rename_axis('unique_values').to_frame('counts')
b = x['Disposition'].value_counts(dropna=False, normalize = True).rename_axis('unique_values').to_frame('percent')
disp_stats_sf2 = pd.concat([a, b], axis=1)

disp_stats_sf2['cumulative percent'] = disp_stats_sf2.percent.cumsum()
disp_stats_sf2.style.format({ 'counts' : '{:,}', 'percent' : '{:,.1%}', 'cumulative percent' : '{:,.1%}'})

Suffolk Dispositions (missing main dispo reason)


Unnamed: 0_level_0,counts,percent,cumulative percent
unique_values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
,51151,56.9%,56.9%
Continued w/o Finding,13105,14.6%,71.5%
Nole Prosequi,12303,13.7%,85.2%
Dismissed,7333,8.2%,93.3%
Pre Trial Probation,4630,5.2%,98.5%
Plea,508,0.6%,99.0%
Filed w/o Change of Plea,322,0.4%,99.4%
Diversion,141,0.2%,99.6%
Convert to Civil,120,0.1%,99.7%
General Continuance,105,0.1%,99.8%


In [7]:
pd.crosstab(suff['Description Disposition Reason'], suff['Disposition'], dropna=False)

Disposition,Bound Over,Continued w/o Finding,Convert to Civil,Dismissed,Diversion,Filed w/o Change of Plea,General Continuance,No True Bill,Nole Prosequi,Pending,Plea,Pre Trial Probation,Verdict - Bench Trial,Verdict - Jury Trial
Description Disposition Reason,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
CWF (ASF),0,0,0,0,0,0,0,0,0,0,12063,0,0,0
DWOP - no evidence,0,0,0,6275,0,0,0,0,0,0,0,0,0,0
DWOP - no police,0,0,0,1489,0,0,0,0,0,0,0,0,0,0
DWOP - no victim,0,0,0,24104,0,0,0,0,0,0,0,0,0,0
DWOP - no witness,0,0,0,4570,0,0,0,0,0,0,0,0,0,0
Delinquent,0,0,0,0,0,0,0,0,0,0,0,0,0,2
Delinquent - Committed,0,0,0,0,0,0,0,0,0,0,150,0,0,0
Delinquent - Filed,0,0,0,0,0,0,0,0,0,0,16,0,0,0
Delinquent - Fine,0,0,0,0,0,0,0,0,0,0,10,0,0,0
Delinquent - Probation,0,0,0,0,0,0,0,0,0,0,42,0,0,0


In [8]:
print("Northwestern Dispositions")
a = nw['Disposition'].value_counts(dropna=False).rename_axis('unique_values').to_frame('counts')
b = nw['Disposition'].value_counts(dropna=False, normalize = True).rename_axis('unique_values').to_frame('percent')
disp_stats_nw = pd.concat([a, b], axis=1)

disp_stats_nw['cumulative percent'] = disp_stats_nw.percent.cumsum()
disp_stats_nw.style.format({ 'counts' : '{:,}', 'percent' : '{:,.1%}', 'cumulative percent' : '{:,.1%}'})

Northwestern Dispositions


Unnamed: 0_level_0,counts,percent,cumulative percent
unique_values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Continued w/o Finding,1812,18.5%,18.5%
Dismissed at Request of Comm,1808,18.4%,36.9%
Nolle Prosequi,1437,14.7%,51.6%
c276s87 finding,1322,13.5%,65.1%
Not Responsible,762,7.8%,72.8%
Guilty,517,5.3%,78.1%
Responsible,512,5.2%,83.3%
Dismissed,345,3.5%,86.9%
,319,3.3%,90.1%
Dismissed Prior to Arraignment,288,2.9%,93.1%


In [9]:
x = suff.loc[suff['Disposition'].notnull()]
x['Description Disposition Reason'].value_counts(dropna=False)

NaN                                      38746
DWOP - no victim                         24104
Dismissed Upon Payment of Court Costs    20732
Dismissed by Commonwealth                20166
Guilty - Committed                       16291
Guilty - Probation                       13041
Dismissed to Subsequent Indictment       12522
CWF (ASF)                                12063
Dismissed for Agreed Plea                10479
Dismissed WO Prosecution                  9751
Dismissed by Court                        9516
Guilty - Fine                             7493
Dismissed Prior to Arraignment            7180
Dismissed for Community Service           6809
DWOP - no evidence                        6275
Not Guilty                                4819
DWOP - no witness                         4570
Guilty - Suspended Sentence               4107
Guilty                                    4087
Dismissed WO Prejudice                    3099
Guilty - Filed                            2729
Guilty - Spli

In [10]:
print("Middlesex Dispositions")
a = ms['Disposition Description'].value_counts(dropna=False).rename_axis('unique_values').to_frame('counts')
b = ms['Disposition Description'].value_counts(dropna=False, normalize = True).rename_axis('unique_values').to_frame('percent')
disp_stats_ms = pd.concat([a, b], axis=1)

disp_stats_ms['cumulative percent'] = disp_stats_ms.percent.cumsum()
disp_stats_ms.style.format({ 'counts' : '{:,}', 'percent' : '{:,.1%}', 'cumulative percent' : '{:,.1%}'})

Middlesex Dispositions


Unnamed: 0_level_0,counts,percent,cumulative percent
unique_values,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
DISMISSED W/O PREJUDICE,4287,30.0%,30.0%
PRE-TRIAL PROBATION,3402,23.8%,53.8%
CONTINUED W/O FINDING,2025,14.2%,68.0%
DISMISSED PRIOR TO ARRAIGNMENT,745,5.2%,73.2%
DELINQUENT CHANGE OF PLEA,694,4.9%,78.1%
NOT RESPONSIBLE,401,2.8%,80.9%
NOLLE PROSEQUI,342,2.4%,83.3%
DISMISSED BY COURT (PRIOR TO ARRAIGNMENT),339,2.4%,85.7%
DISMISSED W/O PREJUDICE LACK OF PROSECUTION,310,2.2%,87.9%
GUILTY CHANGE OF PLEA,221,1.5%,89.4%


In [11]:
with pd.ExcelWriter('../data/processed/dispositions.xlsx') as writer:  
    disp_stats_ms.to_excel(writer, sheet_name='Middlesex')
    disp_stats_nw.to_excel(writer, sheet_name='Northwestern')
    disp_stats_sf.to_excel(writer, sheet_name='Suffolk')
    disp_stats_sf2.to_excel(writer, sheet_name='Suffolk-addtl')