# Clean Slate: Estimating offenses eligible for expungement under varying conditions
> Prepared by [Laura Feeney](https://github.com/laurafeeney) for Code for Boston's [Clean Slate project](https://github.com/codeforboston/clean-slate).

## Summary
This notebook takes somewhat processed data from the Middlesex DA and attempts to answer how many individuals may be eligible for expungement under varying conditions.

This dataset does not contain any information to identify specific individuals across multiple cases. We can see what charges are heard in Juvenile court, but we do not otherwise have an indicator of age. 

So, we can provide a count and % of incidents heard in Juvenile court that are expungeable. 

### Original Questions

1. How many people (under age 21) are eligible for expungement today? This would be people with only **one charge** that is not part of the list of ineligible offenses (per section 100J). 


2. How many people (under age 21) would be eligible based on only having **one incident** (which could include multiple charges) that are not part of the list of ineligible offenses?
 - How many people (under age 21) would be eligible based on only having **one incident** if only sex-based offenses or murder were excluded from expungement?
 

3. How many people (under age 21) would be eligible based on who has **not been found guilty** (given current offenses that are eligible for expungement)?
 - How many people (under age 21) would be eligible based on who has **not been found guilty** for all offenses except for murder or sex-based offenses?

-----

### Step 0
Import data, programs, etc.

-----

In [1]:
import pandas as pd
pd.set_option("display.max_rows", 200)
import numpy as np
import regex as re
import glob, os
import datetime 
from datetime import date 
from collections import defaultdict, Counter

In [2]:
# processed individual-level data from MS district with expungability.

ms = pd.read_csv('../../data/processed/merged_ms.csv', encoding='utf8',
                    dtype={'Analysis notes':str, 'extra_criteria':str, 'Expungeable': str}, low_memory=False) 

ms_original = ms

In [3]:
ms.loc[ms.Expungeable =='m', 'Expungeable'] = None 
print("Middlesex Expungement Counts")
a = ms['Expungeable'].value_counts(dropna=False).rename_axis('count').to_frame('counts')
b = ms['Expungeable'].value_counts(dropna=False, normalize = True).rename_axis('percent').to_frame('percent')
exp_stats = pd.concat([a, b], axis=1)
exp_stats.style.format({ 'counts' : '{:,}', 'percent' : '{:,.1%}'})

Middlesex Expungement Counts


Unnamed: 0,counts,percent
Yes,273094,69.6%
No,112554,28.7%
NA - CMR,5047,1.3%
Attempt,1519,0.4%
,391,0.1%


In [4]:
ms['offenses_per_case']=ms.groupby('Case Number')['Case Number'].transform('count')
ms.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 392605 entries, 0 to 392604
Data columns (total 16 columns):
 #   Column                   Non-Null Count   Dtype  
---  ------                   --------------   -----  
 0   Case Number              392605 non-null  object 
 1   Offense Date             392578 non-null  object 
 2   Court Location           392605 non-null  object 
 3   Charge                   392605 non-null  object 
 4   Charge/Crime Type        392605 non-null  object 
 5   Disposition Description  392605 non-null  object 
 6   CMRoffense               392605 non-null  bool   
 7   Chapter                  392605 non-null  object 
 8   Section                  389306 non-null  object 
 9   Paragraph                309266 non-null  object 
 10  JuvenileC                392605 non-null  bool   
 11  years_since_offense      392605 non-null  int64  
 12  sex                      392605 non-null  int64  
 13  murder                   360336 non-null  float64
 14  Expu

In [5]:
# Only indication of juvenile is if tried in juvenile court. Looks like no cases are heard in 2 courts (presumably would get 
# a different case number)
ms['juvenile'] = ms.groupby('Case Number')['JuvenileC'].transform('max')
pd.crosstab(ms['JuvenileC'], ms['juvenile'])

juvenile,False,True
JuvenileC,Unnamed: 1_level_1,Unnamed: 2_level_1
False,378306,0
True,0,14299


## Step 0.5 - Prepare data

- Drop CMR offenses
- Prepare dates, date since offense
- Generate indicators for incidents, and code incidents as expungeable, sex-related etc
- Generate indicator for found guilty / not found guilty 

**CMR** : There are many offenses that are violations of the Code of Massachusetts Regulations (CMR) rather than a criminal offense. These include things like some driving or boating infractions (e.g., not having headlights on), or not having a hunting/fishing license. Per conversations with Sana, dropping all CMR offenses.

In [6]:
### dates ###
# The file source said, "The following is data from our Damion Case Management
# System pertaining to prosecution statistics for the time period from 
# January 1, 2014, through January 1, 2020."

reference_date = datetime.date(2020, 9, 1) # using "today.date() wouldn't be stable"

ms['Offense Date'] = pd.to_datetime(ms['Offense Date']).dt.date
ms = ms[~ms['Offense Date'].isnull()]

offenses_2014_2019 = ms['Offense Date'].loc[
    (ms['Offense Date'] >= datetime.date(2014, 1, 1)) & 
    (ms['Offense Date'] <= datetime.date(2019, 12, 31))].count()

Percent_14_19 = "{:.1%}".format(offenses_2014_2019/ms['Offense Date'].count())
             
print(Percent_14_19, 'percent of offenses are between Jan 1 2014 and Dec 31 2019')

print("The earliest offense date is", min(ms['Offense Date']))
print("The max offense date is", max(ms['Offense Date']), "\n")

print(ms['years_since_offense'].describe())

93.9% percent of offenses are between Jan 1 2014 and Dec 31 2019
The earliest offense date is 1951-06-30
The max offense date is 2019-12-30 

count    392578.000000
mean          3.766933
std           2.452424
min           0.000000
25%           2.000000
50%           4.000000
75%           5.000000
max          69.000000
Name: years_since_offense, dtype: float64


In [7]:
# CMR offenses -- Drop all CMR offenses and Drop CMR-related columns

print(f'There are {ms.shape[0]} total offenses including CMR.')

ms = ms.loc[ms['CMRoffense'] == False]
ms = ms.drop(columns = ['CMRoffense'])

print(f'After we drop CMR, there are {ms.shape[0]} total offenses.')

# Check that the 'expungeable' column no longer has CMRs 

print("Middlesex Expungement Counts")
a = ms['Expungeable'].value_counts(dropna=False).rename_axis('count').to_frame('counts')
b = ms['Expungeable'].value_counts(dropna=False, normalize = True).rename_axis('percent').to_frame('percent')
exp_stats = pd.concat([a, b], axis=1)
exp_stats.style.format({ 'counts' : '{:,}', 'percent' : '{:,.1%}'})

There are 392578 total offenses including CMR.
After we drop CMR, there are 387531 total offenses.
Middlesex Expungement Counts


Unnamed: 0,counts,percent
Yes,273067,70.5%
No,112554,29.0%
Attempt,1519,0.4%
,391,0.1%


In [8]:
#Data prep.
# We only have Case Number, and cases are all for an offense on the same date. 

# If an incident includes one offense that is not expungeable, we mark the entire incident as not expungeable.
#Attempts *are not* considered expungeable in this one. 
ms['Exp'] = ms['Expungeable']=="Yes"
ms['Inc_Expungeable_Attempts_Not'] = ms.groupby(['Case Number'])['Exp'].transform('min')

# If an incident includes one offense that is not expungeable, we mark the entire incident as not expungeable.
#Attempts *are* considered expungeable in this one. 
ms['ExpAtt'] = (ms['Expungeable']=="Yes") | (ms['Expungeable']=="Attempt")
ms['Inc_Expungeable_Attempts_Are'] = ms.groupby(['Case Number'])['ExpAtt'].transform('min')

# If an incident includes an offense that is a murder and/or sex crime, we code the whole incident as regarding
# murder and/or sex.
ms['sm'] = (ms['sex'] == 1) | (ms['murder'] ==1)
ms['Incident_Murder_Sex'] = ms.groupby(['Case Number'])['sm'].transform('max')

#unneeded calculation columns
ms = ms.drop(columns=['Exp', 'ExpAtt', 'sm'])

In [9]:
sorted(ms['Disposition Description'].unique())

['BOUND OVER/PROBABLE CAUSE FOUND',
 'CONTINUED W/O FINDING',
 'CONTINUED W/O FINDING LESSER OFFENSE',
 'CWOF AFTER JURY TRIAL',
 'DECLINED JURISDICTION',
 'DELINQUENT BENCH TRIAL',
 'DELINQUENT CHANGE OF PLEA',
 'DELINQUENT CHANGE OF PLEA LESSER OFFENSE',
 'DELINQUENT JURY TRIAL',
 'DISMISSED BY COURT (POST-ARRAIGNMENT)',
 'DISMISSED BY COURT (PRIOR TO ARRAIGNMENT)',
 'DISMISSED BY FINES',
 'DISMISSED ON COMMUNITY SERVICE',
 'DISMISSED ON COURT COSTS',
 'DISMISSED ON RESTITUTION',
 'DISMISSED OUTRIGHT AT REQUEST OF CW (POST-ARR)',
 'DISMISSED OUTRIGHT AT REQUEST OF CW (PRIOR TO ARR)',
 'DISMISSED PRIOR TO ARRAIGNMENT',
 'DISMISSED PURSUANT TO ACCORD AND SATISFACTION',
 'DISMISSED W/O  PREJUDICE',
 'DISMISSED W/O PREJUDICE LACK OF PROSECUTION',
 'DISMISSED WITH PREJUDICE LACK OF PROSECUTION',
 'DIVERSION PASS',
 'FILED WITHOUT CHANGE OF PLEA',
 'GENERAL CONTINUANCE',
 'GENERAL CONTINUANCE LESSER OFFENSE',
 'GUILTY BENCH TRIAL',
 'GUILTY BENCH TRIAL LESSER INCLUDED',
 'GUILTY CHANGE OF 

In [10]:
guilty_dispos = ['DELINQUENT BENCH TRIAL', 'DELINQUENT CHANGE OF PLEA', 
                'DELINQUENT CHANGE OF PLEA LESSER OFFENSE', 'DELINQUENT JURY TRIAL',
                'GUILTY BENCH TRIAL', 'GUILTY BENCH TRIAL LESSER INCLUDED',
                'GUILTY CHANGE OF PLEA', 'GUILTY CHANGE OF PLEA LESSER OFFENSE', 
                'GUILTY FILED', 'GUILTY FINES', 'GUILTY JURY TRIAL', 
                'GUILTY JURY TRIAL LESSER INCLUDED', 
                'Guilty Jury Trial (and Bench) Lesser Included', 'RESPONSIBLE']

ms['guilty'] = ms['Disposition Description'].isin(guilty_dispos)

# there are no 'missing' values for guilty or dispo description, so no need to recode missing as 2 or -1 as in Suff & NW
assert ms['guilty'].count() == len(ms) == ms['Disposition Description'].count()

ms['Incident_Guilty'] = ms.groupby(['Case Number', 'Offense Date'])['guilty'].transform('max')

## Step 1
Summary stats

In [11]:
# distribution of # of charges
print(ms['offenses_per_case'].describe())

print('\n cutting off top 1%: \n', ms['offenses_per_case'].loc[ms['offenses_per_case']< ms['offenses_per_case'].quantile(.99)].describe())


count    387531.000000
mean          4.923764
std           9.074454
min           1.000000
25%           2.000000
50%           3.000000
75%           5.000000
max         176.000000
Name: offenses_per_case, dtype: float64

 cutting off top 1%: 
 count    383569.000000
mean          4.182087
std           4.531870
min           1.000000
25%           2.000000
50%           3.000000
75%           5.000000
max          43.000000
Name: offenses_per_case, dtype: float64


In [12]:
### No indicator for unique individuals. Only proxy is case number. This will mean any estimates are well over estimated.

# offenses and incidents (cases)
Nu_tot = ms['Charge'].count()
Incidents_tot = ms['Case Number'].nunique()

one_off = ms[ (ms['offenses_per_case']==1)]['Case Number'].nunique()

In [13]:
# offenses related to sex or murder
Nu_sex = ms[ms['sex'] == 1]['sex'].count()
Nu_murder = ms[ms['murder'] == 1]['murder'].count()
Nu_sex_murder = ms[ms['Incident_Murder_Sex']  == 1]['Incident_Murder_Sex'].count()
Nu_sex_murder_inc = ms[ms['Incident_Murder_Sex']  == 1].groupby(['Case Number']).ngroups

In [14]:
# Juvenile stats
Number_Cases_Juvenile = ms[ms['juvenile']==True]['Case Number'].nunique()

Number_Off_Juvenile = ms[ms['juvenile']==True]['Case Number'].count()

Juvenile_one_off = ms[ (ms['offenses_per_case']==1) & 
                      (ms['juvenile']==True)]['Case Number'].nunique()



In [15]:
offenses = ['Total offenses', Nu_tot, '','']
total = ['Total incidents (cases)', Incidents_tot, '','' ]

oneoff = ['Incidents with a single offense', one_off, '{:,.2%}'.format(one_off/Nu_tot), '{:,.2%}'.format(one_off/Incidents_tot)]

juv_header = ['Juvenile stats', 0, '', '']
juvenile_off = ['Total juvenile offenses', Number_Off_Juvenile, '{:,.2%}'.format(Number_Off_Juvenile/Nu_tot), '']
juvenile_inc = ['Total juvenile incidents', Number_Cases_Juvenile, '', '{:,.2%}'.format(Number_Cases_Juvenile/Incidents_tot)]
juv_one = ['Juvenile incidents with a single offense', Juvenile_one_off, '{:,.2%}'.format(Juvenile_one_off/Nu_tot), '{:,.2%}'.format(Juvenile_one_off/Incidents_tot)]

sm_header = ['Sex and murder stats (all ages)', 0, '', '']
sex_offenses = ['Sex offenses', Nu_sex, '{:,.2%}'.format(Nu_sex / Nu_tot),'']
murder = ['Murder offenses', Nu_murder, '{:,.2%}'.format(Nu_murder / Nu_tot),'']
sex_murder = ['Incidents with sex or murder', Nu_sex_murder_inc, '', '{:,.2%}'.format(Nu_sex_murder_inc / Incidents_tot)]



stats = [offenses, total, oneoff, juv_header, juvenile_off, juvenile_inc, juv_one, sm_header, sex_offenses, murder, sex_murder]
statsdf = pd.DataFrame(stats, columns = ['Question', 'Number', '% total offenses', '% total incidents'])
statsdf = statsdf.set_index('Question')

statsdf.style.format({'Number' : '{:,}'})


Unnamed: 0_level_0,Number,% total offenses,% total incidents
Question,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Total offenses,387531,,
Total incidents (cases),163711,,
Incidents with a single offense,64399,16.62%,39.34%
Juvenile stats,0,,
Total juvenile offenses,14280,3.68%,
Total juvenile incidents,5816,,3.55%
Juvenile incidents with a single offense,2696,0.70%,1.65%
Sex and murder stats (all ages),0,,
Sex offenses,4884,1.26%,
Murder offenses,947,0.24%,


**Dispositions and Guilty**
Referencing this sheet to determine which to code as not found guilty vs found guilty.
https://docs.google.com/spreadsheets/d/1axzGGxgQFPwpTw7EbBlC519L43fOkqC5/edit#gid=487812267

In [16]:
print("Top 10 dispositions - all cases")

a = ms['Disposition Description'].value_counts().rename_axis('Dispositions').to_frame('counts')
b = ms['Disposition Description'].value_counts(normalize=True).rename_axis('Dispositions').to_frame('percent')
disp_stats = pd.concat([a, b], axis=1)

disp_stats['cumulative percent'] = disp_stats.percent.cumsum()
disp_stats[0:10].style.format({ 'counts' : '{:,}', 'percent' : '{:,.1%}', 'cumulative percent' : '{:,.1%}'})

Top 10 dispositions - all cases


Unnamed: 0_level_0,counts,percent,cumulative percent
Dispositions,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
DISMISSED W/O PREJUDICE,102609,26.5%,26.5%
GUILTY CHANGE OF PLEA,47112,12.2%,38.6%
CONTINUED W/O FINDING,42796,11.0%,49.7%
NOT RESPONSIBLE,37150,9.6%,59.3%
DISMISSED BY FINES,31844,8.2%,67.5%
RESPONSIBLE,23818,6.1%,73.6%
NOLLE PROSEQUI,22359,5.8%,79.4%
PRE-TRIAL PROBATION,20293,5.2%,84.6%
DISMISSED W/O PREJUDICE LACK OF PROSECUTION,11428,2.9%,87.6%
DISMISSED ON COURT COSTS,10689,2.8%,90.3%


In [17]:
a = ms['Disposition Description'].loc[ms['juvenile']==True].value_counts().rename_axis('Dispositions').to_frame('counts')
b = ms['Disposition Description'].loc[ms['juvenile']==True].value_counts(normalize=True).rename_axis('Dispositions').to_frame('percent')
disp_stats = pd.concat([a, b], axis=1)

disp_stats['cumulative percent'] = disp_stats.percent.cumsum()
print('top 10 dispositions for all cases in juvenile court')
disp_stats[0:10].style.format({ 'counts' : '{:,}', 'percent' : '{:,.1%}', 'cumulative percent' : '{:,.1%}'})

top 10 dispositions for all cases in juvenile court


Unnamed: 0_level_0,counts,percent,cumulative percent
Dispositions,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
DISMISSED W/O PREJUDICE,4287,30.0%,30.0%
PRE-TRIAL PROBATION,3402,23.8%,53.8%
CONTINUED W/O FINDING,2025,14.2%,68.0%
DISMISSED PRIOR TO ARRAIGNMENT,745,5.2%,73.2%
DELINQUENT CHANGE OF PLEA,694,4.9%,78.1%
NOT RESPONSIBLE,401,2.8%,80.9%
NOLLE PROSEQUI,342,2.4%,83.3%
DISMISSED BY COURT (PRIOR TO ARRAIGNMENT),339,2.4%,85.7%
DISMISSED W/O PREJUDICE LACK OF PROSECUTION,310,2.2%,87.9%
GUILTY CHANGE OF PLEA,221,1.5%,89.4%


In [18]:
print('Guilty dispositions')

a = ms['guilty'].value_counts(normalize=True).rename_axis('Found Guilty').to_frame('Percent')
b = ms['Incident_Guilty'].value_counts(normalize=True).rename_axis('Found Guilty').to_frame('Percent')
guilty_stats = pd.concat([a, b], keys=['Offenses', 'Incidents'])
guilty_stats.style.format({ 'Percent' : '{:,.1%}'})

Guilty dispositions


Unnamed: 0_level_0,Unnamed: 1_level_0,Percent
Unnamed: 0_level_1,Found Guilty,Unnamed: 2_level_1
Offenses,False,78.1%
Offenses,True,21.9%
Incidents,False,67.9%
Incidents,True,32.1%


## Question 1

Original: 1. How many people (under age 21) are eligible for expungement today? This would be people with only **one charge** that is not part of the list of ineligible offenses (per section 100J). 

What we can answer: 
- How many cases include only 1 offense, heard in a Juvenile court, and the charge is not part of the list of ineligible offenses from section 100J. 
----

In [19]:
def date_range(x):
    greater3 = x.loc[(x['years_since_offense'] > 3)]['Case Number'].nunique()
    greater7 = x.loc[(x['years_since_offense'] > 7)]['Case Number'].nunique()

    print(greater3, "occured more than 3 years before", reference_date)
    print(greater7, "occured more than 7 years before", reference_date)
    
def eligible_juvs(y):
    People_eligible = y['Case Number'].nunique()
    pct_juv = '{:.2%}'.format(People_eligible/Number_Cases_Juvenile)
    return People_eligible , pct_juv
    
def eligible_all_ages(y):
    People_eligible = y['Case Number'].nunique()
    pct_tot = '{:.2%}'.format(People_eligible/Incidents_tot)
    return People_eligible, pct_tot

In [26]:
x = ms.loc[
    (ms['offenses_per_case']==1) &
    (ms['Expungeable'] != 'No') &
    (ms['juvenile'] == True)
]

q1 = 'q1', 'Incidents with a single offense: no offense ineligible', eligible_juvs(x)
print(q1)

#all ages
x = ms.loc[
    (ms['offenses_per_case']==1) &
    (ms['Expungeable'] != 'No') 
]
q1 = q1, eligible_all_ages(x)
      
#date_range(x)

#print(x['Disposition Description'].value_counts(dropna=False)[0:10])

('q1', 'Incidents with a single offense: no offense ineligible', (1676, '28.82%'))


## Question 2
Original: How many people (under age 21) would be eligible based on only having one incident (which could include multiple charges) that are not part of the list of ineligible offenses?


*We cannot answer this -- we do not have a person-level identifier or any proxy for an identifier.*

- How many incidents are heard in juvenile court where no offenses are on the list of ineligible offenses


In [27]:
x = ms.loc[
    (ms['Inc_Expungeable_Attempts_Are'] == True) &
    (ms['juvenile'] == True)
]

q2 = 'q2', 'Incidents: no offenses ineligible', eligible_juvs(x)
print(q2)

#date_range(x)
print(x['Disposition Description'].value_counts(dropna=False)[0:10])

#all ages
x = ms.loc[
    (ms['Inc_Expungeable_Attempts_Are'] == True) 
]
q2 = q2, eligible_all_ages(x)

('q2', 'Incidents: no offenses ineligible', (3210, '55.19%'))
PRE-TRIAL PROBATION                            2185
DISMISSED W/O  PREJUDICE                       2167
CONTINUED W/O FINDING                          1166
DISMISSED PRIOR TO ARRAIGNMENT                  460
DELINQUENT CHANGE OF PLEA                       329
NOT RESPONSIBLE                                 194
DISMISSED BY COURT (PRIOR TO ARRAIGNMENT)       172
DISMISSED BY FINES                              112
GUILTY CHANGE OF PLEA                            99
DISMISSED W/O PREJUDICE LACK OF PROSECUTION      94
Name: Disposition Description, dtype: int64


## Verdict: Disposition / guilty
Original Q3: How many people (under age 21) would be eligible based on who has not been found guilty (given current offenses that are eligible for expungement)?

Guilty is defined above in section 0.5

*Because we do not have an individual identifier, this is just a sub-set of the previous question. This will remove any incidents where at least 1 offense had a disposition indicating guilty (looks like its mostly taking form delinquent change of plea or guilty change of plea). If we had an indicator of individuals across offenses, this might increase the number of people eligible for expungement, because it would waive the single offense/incident criterion. In this case, it reduces the number of incidents eligible, because it restricts to only those not found guilty.*

- How many incidents have no offenses ineligible and no offenses with a guilty verdict

In [28]:
x = ms.loc[
    (ms['Inc_Expungeable_Attempts_Are']) &
    (ms['juvenile'] == True) &
    (ms['Incident_Guilty'] != True)
]

q3 = 'q3', 'Incidents: no offenses ineligible, no guilty dispositions', eligible_juvs(x)
print(q3)

#date_range(x)

x['Disposition Description'].value_counts(dropna=False)[0:10]

#all ages
x = ms.loc[
    (ms['Inc_Expungeable_Attempts_Are']) &
    (ms['Incident_Guilty'] != True)
]
q3 = q3, eligible_all_ages(x)

('q3', 'Incidents: no offenses ineligible, no guilty dispositions', (2969, '51.05%'))


## Sex or murder related
Original: How many people (under age 21) would be eligible based on only having one incident if only sex-based offenses or murder were excluded from expungement?


*We cannot answer this -- we do not have a person-level identifier or any proxy for an identifier.*

- Incidents heard in juvenile court are eligible where no charges are related to sex or murder

In [29]:
x = ms.loc[
    (ms['Incident_Murder_Sex'] == False) &
    (ms['juvenile'] == True)
]

q4 = 'q4', 'Incidents: no offenses related to sex or murder', eligible_juvs(x)
print(q4) 

date_range(x)

x['Disposition Description'].value_counts(dropna=False)[0:10]

#all ages
x = ms.loc[
    (ms['Incident_Murder_Sex'] == False) 
]
q4 = q4, eligible_all_ages(x)

('q4', 'Incidents: no offenses related to sex or murder', (5682, '97.70%'))
3195 occured more than 3 years before 2020-09-01
25 occured more than 7 years before 2020-09-01


### No Sex or Murder and not found Guilty
Original: How many people (under age 21) would be eligible based on who has not been found guilty for all offenses except for murder or sex-based offenses?

*Because we do not have an individual identifier, this is just a sub-set of Question 2b. This will remove any incidents where at least 1 offense had a disposition indicating guilty (looks like its mostly taking form delinquent change of plea or guilty change of plea). If we had an indicator of individuals across offenses, this might increase the number of people eligible for expungement, because it would waive the single offense/incident criterion. In this case, it reduces the number of incidents eligible, because it restricts to only those not found guilty.*

- Incidents where no charges are related to sex or murder and no offenses have a guilty disposition

In [31]:
x = ms.loc[
    (ms['Incident_Murder_Sex'] == False) &
    (ms['juvenile'] == True) &
    (ms['Incident_Guilty'] != True)
]

q5 = 'q5','Incidents: no offenses related to sex or murder, no guilty dispositions', eligible_juvs(x)
print(q5)

#date_range(x)

x['Disposition Description'].value_counts(dropna=False)[0:10]

# all ages
x = ms.loc[
    (ms['Incident_Murder_Sex'] == False) &
    (ms['Incident_Guilty'] != True)
]
q5 = q5, eligible_all_ages(x)

('q5', 'Incidents: no offenses related to sex or murder, no guilty dispositions', (5201, '89.43%'))


In [34]:
a = [q1, q2, q3, q4, q5]


ans = pd.DataFrame(a , columns = ['A', 'B'])
ans[['q', 'Question', 'Juv']] = pd.DataFrame(ans['A'].tolist())  
ans[['# Juv Incidents', '% Juv']] = pd.DataFrame(ans['Juv'].tolist()) 
ans[['# All Age Incidents', '% All Ages']] = pd.DataFrame(ans['B'].tolist())  
ans = ans[['q', 'Question', '# Juv Incidents', '% Juv', '# All Age Incidents', '% All Ages']].set_index('q')
ans.style.format({'# Juv Incidents':'{:,}', '# All Age Incidents':'{:,}'})

Unnamed: 0_level_0,Question,# Juv Incidents,% Juv,# All Age Incidents,% All Ages
q,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
q1,Incidents with a single offense: no offense ineligible,1676,28.82%,42416,25.91%
q2,Incidents: no offenses ineligible,3210,55.19%,101433,61.96%
q3,"Incidents: no offenses ineligible, no guilty dispositions",2969,51.05%,74088,45.26%
q4,Incidents: no offenses related to sex or murder,5682,97.70%,159892,97.67%
q5,"Incidents: no offenses related to sex or murder, no guilty dispositions",5201,89.43%,118333,72.28%
