# Problem Set 3 (24 points)

Use the same `sentencing_cleaned` data from Problem Set 2 for this assignment. 

In Problem Set 2, you investigated one form of disparity in the US criminal justice system: probation versus incarceration.

Here, you'll investigate a second type of disparity---the length of a defendant's sentence---and also investigate the disparities faced by defendants sentenced by the same judge for the same crime. 

As a reminder, the codebook is available at this link:  https://datacatalog.cookcountyil.gov/api/views/tg8v-tm6u/files/8597cdda-f7e1-44d1-b0ce-0a4e43f8c980?download=true&filename=CCSAO%20Data%20Glossary.pdf)

# 0. Load packages and imports

In [1]:
## basic functionality
import pandas as pd
import numpy as np
import re
import os
import plotnine
from plotnine import *

## repeated printouts
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"

## Suppress traceback 
%xmode Minimal 

Exception reporting mode: Minimal


## 1.1 Filter to defendants who were incarcerated and construct a sentence length variable (10 points)

**Part A:**

- Filter to sentences that involve incarceration (same Illinois Department of Corrections logic as in problem set two: 
incarceration is indicated by `COMMITMENT_TYPE` == "Illinois Department of Corrections")
- Filter out non-numeric sentence lengths (e.g., Term, Pounds, or Dollars)
- Filter to Black or White defendants

### Part A

In [2]:
## Load data 

# check the path
os.getcwd()

# read in the data 
sent_df = pd.read_csv("pset2_inputdata/sentencing_cleaned.csv")

# For reference, view info and head of df
sent_df.info()
sent_df.head(5)

'/Users/maggiesullivan/Documents/DS_1/PS/PS3'



<class 'pandas.core.frame.DataFrame'>
RangeIndex: 135165 entries, 0 to 135164
Data columns (total 52 columns):
 #   Column                             Non-Null Count   Dtype  
---  ------                             --------------   -----  
 0   CASE_ID                            135165 non-null  int64  
 1   CASE_PARTICIPANT_ID                135165 non-null  int64  
 2   RECEIVED_DATE                      135165 non-null  object 
 3   OFFENSE_CATEGORY                   135165 non-null  object 
 4   PRIMARY_CHARGE_FLAG                135165 non-null  bool   
 5   CHARGE_ID                          135165 non-null  int64  
 6   CHARGE_VERSION_ID                  135165 non-null  int64  
 7   DISPOSITION_CHARGED_OFFENSE_TITLE  135165 non-null  object 
 8   CHARGE_COUNT                       135165 non-null  int64  
 9   DISPOSITION_DATE                   135165 non-null  object 
 10  DISPOSITION_CHARGED_CHAPTER        135165 non-null  object 
 11  DISPOSITION_CHARGED_ACT            1342

Unnamed: 0,CASE_ID,CASE_PARTICIPANT_ID,RECEIVED_DATE,OFFENSE_CATEGORY,PRIMARY_CHARGE_FLAG,CHARGE_ID,CHARGE_VERSION_ID,DISPOSITION_CHARGED_OFFENSE_TITLE,CHARGE_COUNT,DISPOSITION_DATE,...,simplified_offense_derived,is_black_derived,is_hisp_derived,is_white_derived,is_other_derived,is_male_derived,age_derived,sentenceymd_derived,sentenceym_derived,judgeid_derived
0,149765331439,175691153649,8/15/1984 12:00:00 AM,PROMIS Conversion,True,50510062193,112898098217,FIRST DEGREE MURDER,1,12/17/2014 12:00:00 AM,...,Homicide,True,False,False,False,True,27.0,2014-10-16,2014-10-01,judge_40
1,150065796098,162105612284,8/23/1984 12:00:00 AM,PROMIS Conversion,True,50792360681,113332130159,FIRST DEGREE MURDER,1,8/6/2014 12:00:00 AM,...,Homicide,True,False,False,False,True,30.0,2014-08-06,2014-08-01,judge_310
2,154954734978,225758446387,6/8/2001 12:00:00 AM,PROMIS Conversion,True,54885211141,174293345821,VIO BAIL BOND/CLASS 1,1,12/2/2013 12:00:00 AM,...,PROMIS Conversion,False,True,False,False,True,38.0,2013-12-02,2013-12-01,judge_162
3,155222744754,217349881776,1/31/2001 12:00:00 AM,PROMIS Conversion,True,53899906462,280120721775,POSS AMT CON SUB EXCEPT (A)/(D),1,9/10/2012 12:00:00 AM,...,Narcotics,True,False,False,False,False,33.0,2012-09-10,2012-09-01,judge_331
4,155327892699,217212381455,2/6/2001 12:00:00 AM,PROMIS Conversion,True,53938518259,164877860811,DUI LIC SUSPENDED OR REVOKED (EFFECTIVE 4-13-2...,1,9/19/2014 12:00:00 AM,...,PROMIS Conversion,False,False,True,False,True,49.0,2014-09-19,2014-09-01,judge_314


In [5]:
##Filter to sentences that involve incarceration 
incarceration = sent_df.loc[sent_df.COMMITMENT_TYPE == "Illinois Department of Corrections"].copy()

# check to see this worked
incarceration[['COMMITMENT_TYPE', 'COMMITMENT_TERM']].sample(5)
len(incarceration)

Unnamed: 0,COMMITMENT_TYPE,COMMITMENT_TERM
9362,Illinois Department of Corrections,6.0
50873,Illinois Department of Corrections,30.0
15266,Illinois Department of Corrections,4.0
100593,Illinois Department of Corrections,3.0
60298,Illinois Department of Corrections,2.0


68241

In [120]:
# Filter out non-numeric sentence lengths (e.g., Term, Pounds, or Dollars)
incarceration_clean = incarceration[~incarceration.COMMITMENT_UNIT.isin(['Pounds', 'Term', "Dollars"])].copy()

# check original incarceration data
commitment_lengths = incarceration.groupby('COMMITMENT_UNIT').size().reset_index().copy()
commitment_lengths

# check incarceration_clean 
commitment_lengths_clean = incarceration_clean.groupby('COMMITMENT_UNIT').size().reset_index().copy()
commitment_lengths_clean

Unnamed: 0,COMMITMENT_UNIT,0
0,Days,67
1,Dollars,4
2,Hours,1
3,Months,7728
4,Natural Life,71
5,Pounds,1
6,Term,10
7,Weeks,1
8,Year(s),60358


Unnamed: 0,COMMITMENT_UNIT,0
0,Days,67
1,Hours,1
2,Months,7728
3,Natural Life,71
4,Weeks,1
5,Year(s),60358


In [122]:
# View the RACE values
incarceration_clean.groupby('RACE').size().copy()

# Filter for Black and White participants 
incarceration_clean2 = incarceration_clean[(incarceration_clean['is_white_derived'] == True) | 
                        (incarceration_clean['is_black_derived'] == True)].copy()

# Check to see this worked
incarceration_clean2[['RACE','is_white_derived', 'is_black_derived']].sample(15)

RACE
ASIAN                                   7
American Indian                        33
Asian                                 218
Biracial                                6
Black                               49840
HISPANIC                              880
Unknown                                71
White                                7876
White [Hispanic or Latino]           8722
White/Black [Hispanic or Latino]      404
dtype: int64

Unnamed: 0,RACE,is_white_derived,is_black_derived
44238,Black,False,True
125727,Black,False,True
48031,Black,False,True
33309,Black,False,True
81416,Black,False,True
17570,Black,False,True
61835,Black,False,True
17245,White,True,False
7391,Black,False,True
15713,Black,False,True


### Part B


**Part B**: Then, follow the instructions in the codebook (combining `COMMITMENT_TERM` with `COMMITMENT_UNIT`) to create a standard sentence length in days column (`senlength_derived`) that measures the sentence in days. To simplify, you can assume that:

- 1 hour = 1/24th of a day
- 1 year = 365 days
- 1 month = 30.5 days
- 1 week = 7 days
- Natural life = difference between the age of 100 and the defendant's age at incident (cleaned; if missing, code to age 20); note that this is a simplification since age at incident != age at sentencing 

Print the following cols for an example of each type (eg an example of originally hours; an example of natural life): `COMMITMENT_TERM`, `COMMITMENT_UNIT`, `age_derived` and your new standardized sentence length column

Print the summary of that column (`senlength_derived`) using the .describe() command

**Concepts tested and resources**: there are many approaches but a couple ways are:
- np.select covered in the slides and this activity notebook: https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/01_pandas_datacleaning_solutions.ipynb
- writing a function that takes in one row as an argument and has a series of if, elif, else conditions where different commitment_units are translated into days. To execute this function, you can use the .apply function but apply it with axis = 1 (row-wise). Resources for that include: (1) the activity notebook on user-defined functions (https://github.com/rebeccajohnson88/PPOL564_slides_activities/blob/main/activities/fall_22/solutions/02_functions_part1_solutions.ipynb); (2) the activity notebook covering apply (same as above)

**Hint on output**: see GitHub issue for the summary stats we get from running .describe()

In [124]:
## combine COMMITMENT_TERM with COMMITMENT_UNIT to create a standard sentence length in days column (senlength_derived)
incarceration_clean3 = incarceration_clean2.copy()

# First, we need to clean the AGE_AT_INCIDENT column to code any Nan as 20
incarceration_clean3['age_derived'] = np.where(incarceration_clean3.AGE_AT_INCIDENT.isnull(), 20,
                                             incarceration_clean3.AGE_AT_INCIDENT).copy()

# Check to see this worked
incarceration_clean3.loc[incarceration_clean3.age_derived == 20,
                         ['AGE_AT_INCIDENT', 'age_derived']].head()

Unnamed: 0,AGE_AT_INCIDENT,age_derived
5,,20.0
12,,20.0
69,,20.0
189,20.0,20.0
215,20.0,20.0


In [125]:
# Next, let's create a function which allows us to calculate the COMMITMENT_UNIT in terms of days 

def find_sentlength(one_row): # BUT I WANT TWO COLUMNS : pd.DataFrame)
    '''
    Function to return the 
    Parameters:
        one_row (pd.DataFrame: pandas dataframe)
        
    Returns: 
        num (float): float 
    '''
    if one_row["COMMITMENT_UNIT"] == "Hours":
        new_len = float(one_row['COMMITMENT_TERM']) * (1/24)
        return new_len
    elif one_row['COMMITMENT_UNIT'] == "Days":
        new_len = float(one_row['COMMITMENT_TERM'])
        return new_len
    elif one_row['COMMITMENT_UNIT'] == "Weeks":
        new_len = float(one_row['COMMITMENT_TERM']) * (7)
        return new_len
    elif one_row['COMMITMENT_UNIT'] == "Months":
        new_len = float(one_row['COMMITMENT_TERM']) * (30.5)
        return new_len
    elif one_row['COMMITMENT_UNIT'] == "Year(s)":
        new_len = float(one_row['COMMITMENT_TERM']) * (365)
        return new_len
    elif one_row["COMMITMENT_UNIT"] == "Natural Life":
        new_len = (100 - float(one_row['age_derived'])) * (365)
        return new_len
    else:
        return one_row['COMMITMENT_TERM']

In [126]:
# Finally run the function and create a new column "sentlength_derived"
incarceration_clean3['sentlength_derived'] = incarceration_clean3.apply(find_sentlength, axis = 1).copy()

# Check this worked (1)
incarceration_clean3[['COMMITMENT_TERM', 'COMMITMENT_UNIT', 'sentlength_derived']].sample(10)

Unnamed: 0,COMMITMENT_TERM,COMMITMENT_UNIT,sentlength_derived
77400,1.0,Year(s),365.0
96100,2.0,Year(s),730.0
21754,4.0,Year(s),1460.0
78639,2.0,Year(s),730.0
65964,1.0,Year(s),365.0
33190,4.0,Year(s),1460.0
4081,4.0,Year(s),1460.0
89948,2.0,Year(s),730.0
74334,1.0,Year(s),365.0
70686,18.0,Months,549.0


In [127]:
# Check this worked (2)
incarceration_clean3[['COMMITMENT_UNIT','sentlength_derived']].groupby(['COMMITMENT_UNIT']).count().reset_index().copy()

Unnamed: 0,COMMITMENT_UNIT,sentlength_derived
0,Days,56
1,Hours,1
2,Months,6801
3,Natural Life,59
4,Weeks,1
5,Year(s),51371


In [129]:
## print an example of each type of committment unit and what it's senlength_derived is
incarc_df = incarceration_clean3.copy()

incarc_df[['COMMITMENT_UNIT','COMMITMENT_TERM', 'age_derived','sentlength_derived']].groupby('COMMITMENT_UNIT').sample(1)


Unnamed: 0,COMMITMENT_UNIT,COMMITMENT_TERM,age_derived,sentlength_derived
132842,Days,2.0,29.0,2.0
92475,Hours,1.0,22.0,0.041667
126434,Months,30.0,57.0,915.0
39700,Natural Life,0.0,48.0,18980.0
15310,Weeks,2.0,23.0,14.0
15868,Year(s),7.0,62.0,2555.0


In [130]:
## Use the .describe() command to find summary stats of the senlength_derived column
incarc_df.sentlength_derived.describe()

count     58289.000000
mean       1396.720826
std        2062.874778
min           0.000000
25%         366.000000
50%         915.000000
75%        1460.000000
max      147825.000000
Name: sentlength_derived, dtype: float64

## 1.2 Examine disparities in length within the same judge and offense category: constructing matched pairs (14 points)



### Part A 

Keep the above ~58k row dataset subsetting only to sentences involving incarceration. Then, further subset the rows to:
- Those sentenced `judgeid_derived` = `judge_21` 
- `simplified_offense_derived` == "Narcotics"

Use `shape` to print the dimensions of the resulting dataframe

**Concepts and resources**: row subsetting using logical conditions; see above resources
 

In [131]:
## your code here to filter rows and check the shape
incarceration_clean.shape
incarc_df.shape

(68226, 52)

(58289, 53)

In [132]:
# subset the rows to those sentenced by judgeid_derived = judge_21
incarc_ju21 = incarc_df[incarc_df.judgeid_derived == "judge_21"].copy()

In [133]:
# subset to rows with simplified_offense_derived == "Narcotics"
incarc_ju21_narc = incarc_ju21[incarc_ju21.simplified_offense_derived == "Narcotics"].reset_index().copy()

In [134]:
# check the shape
incarc_ju21_narc.shape

(53, 54)

### Part B

For each defendant sentenced by judge_21, you want to construct "matched groups" of defendants who:

- Are the same exact age and
- Are the same gender but 
- Differ in race from the focal defendant

Write a user-defined function to find any/all matched defendants for each focal defendant of judge 21. You can structure the function in various ways but one way is to write a function similar to the class example where we find similar crimes to a focal crime for one focal crime; in this case, we want to:

- Iterate over unique defendants sentenced by judge 21 (use `CASE_PARTICIPANT_ID` to identify each unique defendant)
- Find other defendants in the judge 21 pool who (1) have a different race from that focal defendant but (2) the same gender and age 

In [52]:
#Find a sample to test approach 
incarc_ju21_narc.CASE_PARTICIPANT_ID.sample()

41448    760456238817
Name: CASE_PARTICIPANT_ID, dtype: int64

In [135]:
# I don't remmeber why I did this but too afraid to delete
incarc_ju21_narc.sample(5)

Unnamed: 0,index,CASE_ID,CASE_PARTICIPANT_ID,RECEIVED_DATE,OFFENSE_CATEGORY,PRIMARY_CHARGE_FLAG,CHARGE_ID,CHARGE_VERSION_ID,DISPOSITION_CHARGED_OFFENSE_TITLE,CHARGE_COUNT,...,is_black_derived,is_hisp_derived,is_white_derived,is_other_derived,is_male_derived,age_derived,sentenceymd_derived,sentenceym_derived,judgeid_derived,sentlength_derived
8,41448,399913908091,760456238817,5/31/2013 12:00:00 AM,Narcotics,True,369868334009,688385217366,POSSESSION OF CANNABIS,1,...,False,False,True,False,True,29.0,2014-05-02,2014-05-01,judge_21,365.0
33,61940,407703520450,784728112643,7/14/2014 12:00:00 AM,Narcotics,True,382243565708,711736341399,[POSSESSION OF CONTROLLED SUBSTANCE WITH INTEN...,1,...,True,False,False,False,True,32.0,2014-12-01,2014-12-01,judge_21,1460.0
11,44894,401217830722,764493293552,8/5/2013 12:00:00 AM,Narcotics,True,371969358733,692322053374,POSSESSION OF A CONTROLLED SUBSTANCE,1,...,False,False,True,False,True,34.0,2014-01-13,2014-01-01,judge_21,365.0
9,43109,400554781879,762395776638,7/2/2013 12:00:00 AM,Narcotics,True,370883552776,690303048134,POSSESSION OF A CONTROLLED SUBSTANCE,1,...,True,False,False,False,True,22.0,2014-02-04,2014-02-01,judge_21,365.0
27,57587,406029670900,779496588747,4/17/2014 12:00:00 AM,Narcotics,True,379730229337,706970678144,OBTAIN SUBSTANCE BY FRAUD/SUBQ,1,...,False,False,True,False,False,28.0,2014-09-11,2014-09-01,judge_21,730.0


In [136]:
#Define example focal defendant
focal_defendant_ex = incarc_ju21_narc[incarc_ju21_narc.CASE_PARTICIPANT_ID == 760456238817]

#Print
focal_defendant_ex 

Unnamed: 0,index,CASE_ID,CASE_PARTICIPANT_ID,RECEIVED_DATE,OFFENSE_CATEGORY,PRIMARY_CHARGE_FLAG,CHARGE_ID,CHARGE_VERSION_ID,DISPOSITION_CHARGED_OFFENSE_TITLE,CHARGE_COUNT,...,is_black_derived,is_hisp_derived,is_white_derived,is_other_derived,is_male_derived,age_derived,sentenceymd_derived,sentenceym_derived,judgeid_derived,sentlength_derived
8,41448,399913908091,760456238817,5/31/2013 12:00:00 AM,Narcotics,True,369868334009,688385217366,POSSESSION OF CANNABIS,1,...,False,False,True,False,True,29.0,2014-05-02,2014-05-01,judge_21,365.0


In [140]:
#Define focal group for this example focal defendant
focal_group_ex = incarc_ju21_narc[incarc_ju21_narc.age_derived.isin(focal_defendant_ex['age_derived']) &
                                 incarc_ju21_narc.is_male_derived.isin(focal_defendant_ex['is_male_derived']) &
                                 ~incarc_ju21_narc.is_black_derived.isin(focal_defendant_ex['is_black_derived'])].copy()

# Print - from a manual review, this should yield just 805461688906
focal_group_ex[['CASE_PARTICIPANT_ID',"is_male_derived",'age_derived', 'is_black_derived']] 


Unnamed: 0,CASE_PARTICIPANT_ID,is_male_derived,age_derived,is_black_derived
48,805461688906,True,29.0,True


In [159]:
def matchmaker(one_id):
    '''
    Function prints a defendant and any other defendants in the 
        data set that:
            Are the same exact age and
            Are the same gender but
            Differ in race from the focal defendant
    Parameters: pd.DataFrame
    Returns: pd.DataFrame and string descriptions

    '''
    fd = incarc_ju21_narc[incarc_ju21_narc.CASE_PARTICIPANT_ID == one_id].copy()
    focal_group = incarc_ju21_narc[incarc_ju21_narc.age_derived.isin(fd['age_derived']) &
                               incarc_ju21_narc.is_male_derived.isin(fd['is_male_derived']) &
                               ~incarc_ju21_narc.is_black_derived.isin(fd['is_black_derived'])].copy()
    matches = focal_group[['CASE_PARTICIPANT_ID',"is_male_derived",'age_derived', 'is_black_derived',
                          'sentlength_derived', 'RACE']]
    matches['focal_id'] = one_id
    matches['focal_race'] = fd.RACE.iloc[0]
    matches['focal_senlength'] = fd.sentlength_derived.iloc[0]
    return matches

In [155]:
incarc_ju21_narc.columns.unique()

Index(['index', 'CASE_ID', 'CASE_PARTICIPANT_ID', 'RECEIVED_DATE',
       'OFFENSE_CATEGORY', 'PRIMARY_CHARGE_FLAG', 'CHARGE_ID',
       'CHARGE_VERSION_ID', 'DISPOSITION_CHARGED_OFFENSE_TITLE',
       'CHARGE_COUNT', 'DISPOSITION_DATE', 'DISPOSITION_CHARGED_CHAPTER',
       'DISPOSITION_CHARGED_ACT', 'DISPOSITION_CHARGED_SECTION',
       'DISPOSITION_CHARGED_CLASS', 'DISPOSITION_CHARGED_AOIC',
       'CHARGE_DISPOSITION', 'CHARGE_DISPOSITION_REASON', 'SENTENCE_JUDGE',
       'SENTENCE_COURT_NAME', 'SENTENCE_COURT_FACILITY', 'SENTENCE_PHASE',
       'SENTENCE_DATE', 'SENTENCE_TYPE', 'CURRENT_SENTENCE_FLAG',
       'COMMITMENT_TYPE', 'COMMITMENT_TERM', 'COMMITMENT_UNIT',
       'LENGTH_OF_CASE_in_Days', 'AGE_AT_INCIDENT', 'RACE', 'GENDER',
       'INCIDENT_CITY', 'INCIDENT_BEGIN_DATE', 'INCIDENT_END_DATE',
       'LAW_ENFORCEMENT_AGENCY', 'LAW_ENFORCEMENT_UNIT', 'ARREST_DATE',
       'FELONY_REVIEW_DATE', 'FELONY_REVIEW_RESULT', 'ARRAIGNMENT_DATE',
       'UPDATED_OFFENSE_CATEGORY', 'is

In [160]:
matchmaker(one_id = incarc_ju21_narc.CASE_PARTICIPANT_ID.unique()[0])

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,CASE_PARTICIPANT_ID,is_male_derived,age_derived,is_black_derived,sentlength_derived,RACE,focal_id,focal_race,focal_senlength
29,780425400115,True,21.0,True,1460.0,Black,203605700713,White,1095.0


In [163]:
all_match = [matchmaker(one_id) for one_id in incarc_ju21_narc.CASE_PARTICIPANT_ID.unique()]

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user

**Part B**: using the results from Part A, use `pd.concat` or another approach to create a dataframe that compares the (1) race and sentence length for the focal defendant to (2) the sentence length for other defendants. Using this dataframe, show this comparison for focal defendant: `CASE_PARTICIPANT_ID` == `808109112733`

In [164]:
# create a new df that compares the race and sentence length for the focal def to that of other defs
all_match_df = pd.concat(all_match)

# test to see this worked
all_match_df.head()

# filter for defendant 808109112733
all_match_df[all_match_df.focal_id == 808109112733]

Unnamed: 0,CASE_PARTICIPANT_ID,is_male_derived,age_derived,is_black_derived,sentlength_derived,RACE,focal_id,focal_race,focal_senlength
29,780425400115,True,21.0,True,1460.0,Black,203605700713,White,1095.0
25,778820978039,True,33.0,True,8760.0,Black,738433538059,White,1825.0
32,784727452038,True,32.0,True,1460.0,Black,750050286216,White,2190.0
33,784728112643,True,32.0,True,1460.0,Black,750050286216,White,2190.0
34,784728961993,True,32.0,True,1460.0,Black,750050286216,White,2190.0


Unnamed: 0,CASE_PARTICIPANT_ID,is_male_derived,age_derived,is_black_derived,sentlength_derived,RACE,focal_id,focal_race,focal_senlength
14,768307912970,True,24.0,False,730.0,White,808109112733,Black,2190.0
17,769939231128,True,24.0,False,730.0,White,808109112733,Black,2190.0
22,774967571640,True,24.0,False,365.0,White,808109112733,Black,2190.0


**Part C**: group by the focal defendant's race and find the proportion of that defendant's matches who had a LONGER sentence than the focal defendant

In [169]:
# create a boolean list of where the sentlength_derived > the focal sentlength
test = all_match_df.sentlength_derived > all_match_df.focal_senlength

# check to see this worked
test

# use this object to create a new column 
all_match_df['sentlength_higher_tf'] = test

# check to see this worked 
all_match_df.head()

# group by race and find the proportion of people who had a LONGER sentence than the focal defendant
all_match_df.groupby('focal_race').agg({"sentlength_higher_tf": "mean"}).reset_index().copy()

29     True
25     True
32    False
33    False
34    False
      ...  
17    False
22    False
14     True
17     True
22    False
Length: 62, dtype: bool

Unnamed: 0,CASE_PARTICIPANT_ID,is_male_derived,age_derived,is_black_derived,sentlength_derived,RACE,focal_id,focal_race,focal_senlength,sentlength_higher_tf
29,780425400115,True,21.0,True,1460.0,Black,203605700713,White,1095.0,True
25,778820978039,True,33.0,True,8760.0,Black,738433538059,White,1825.0,True
32,784727452038,True,32.0,True,1460.0,Black,750050286216,White,2190.0,False
33,784728112643,True,32.0,True,1460.0,Black,750050286216,White,2190.0,False
34,784728961993,True,32.0,True,1460.0,Black,750050286216,White,2190.0,False


Unnamed: 0,focal_race,sentlength_higher_tf
0,Black,0.483871
1,White,0.451613


**Part D**: write 1-2 lines commenting on the results from Part C. What other defendant or offense-level characteristics would you like to match on to investigate claims about racial disparities? 


Response: From this data, we can see that if the focal defendant was black, then 48 percent of similar white defendants (in terms of gender and age) received longer sentences. Meanwhile for white focal defendants, similar black defendants reeived longer sentences 45 percent of the time. For further analysis, it would also be interesting to examine the type of narcotics charge (ex. possession of meth vs cannabis, etc) or the month and/or year they were convicted to see if the judge's sentencing trends changed over time. 

In [168]:
incarc_ju21_narc.DISPOSITION_CHARGED_OFFENSE_TITLE.unique()

array(['[POSSESSION OF CONTROLLED SUBSTANCE WITH INTENT TO DELIVER/ DELIVERY OF A CONTROLLED SUBSTANCE]',
       'POSSESSION OF A CONTROLLED SUBSTANCE',
       '[POSSESSION OF CONTROLLED SUBSTANCE WITH INTENT TO DELIVER/DELIVERY OF A CONTROLLED SUBSTANCE]',
       'POSSESSION OF CANNABIS',
       '[POSSESSION OF CANNABIS WITH INTENT TO DELIVER/DELIVERY OF CANNABIS]',
       'OBTAIN SUBSTANCE BY FRAUD/SUBQ', 'METHAMPHETAMINE POSSESSION'],
      dtype=object)