# COGS 108 - Data Checkpoint

# Names

- Allison Bhavsar
- Jon Chang
- Mikaela Grenion
- Nathan Nakamura
- Tilak Patel

<a id='research_question'></a>
# Research Question

What is the relationship between race and reason for wrongful conviction among exonerees in the United States?

# Dataset(s)

This dataset is from the National Registry of Exonerations that is keeping track of all known exonerations in the United States from 1989 to present. There are 2765 observations (exonerees) with their names, age, race, sex, location, crimes commited, and 15 variables for all types of exonerations. 

- Dataset Name: The National Registry of Exonerations
- Link to the dataset: https://www.law.umich.edu/special/exoneration/Pages/detaillist.aspx
- Number of observations: 2765

# Setup

In [1]:
import numpy as np
import pandas as pd
df = pd.read_excel ('data/publicspreadsheet.xlsx')
df

Unnamed: 0,Last Name,First Name,Age,Race,Sex,State,County,Tags,Worst Crime Display,List Add'l Crimes Recode,...,*,FC,MWID,F/MFE,P/FA,OM,ILD,Posting Date,OM Tags,Occurred
0,Abbitt,Joseph,31.0,Black,Male,North Carolina,Forsyth,CV;#IO,Child Sex Abuse,Sexual Assault;#Kidnapping;#Burglary/Unlawful ...,...,,,MWID,,,,,2011-09-01,,1991
1,Abdal,Warith Habib,43.0,Black,Male,New York,Erie,IO,Sexual Assault,Robbery,...,,,MWID,F/MFE,,OM,,2011-08-29,OF;#WH;#NW;#WT,1982
2,Abernathy,Christopher,17.0,White,Male,Illinois,Cook,CIU;#CV;#H;#IO,Murder,Rape;#Robbery,...,,FC,,,P/FA,OM,,2015-02-13,OF;#WH;#NW;#INT,1984
3,Abney,Quentin,32.0,Black,Male,New York,New York,CV,Robbery,,...,,,MWID,,,,,2019-05-13,,2005
4,Acero,Longino,35.0,Hispanic,Male,California,Santa Clara,NC;#P,Sex Offender Registration,,...,,,,,,,ILD,2011-08-29,,1994
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2760,Zawacki,Richard,38.0,White,Male,Indiana,Huntington,CV;#NC,Child Sex Abuse,,...,,,,,P/FA,,,2017-08-08,,2000
2761,Zimmer,Walter,40.0,White,Male,Ohio,Cuyahoga,CDC;#H;#IO,Manslaughter,Attempted Murder;#Robbery;#Assault;#Kidnapping...,...,*,,,F/MFE,P/FA,OM,,2013-10-04,FA;#WH;#NW;#PJ,1997
2762,Zimmerman,Evan,53.0,White,Male,Wisconsin,Dodge,H;#IO,Murder,,...,*,FC,,,,,ILD,2011-08-29,,2000
2763,Zinkiewicz,Tyrone,38.0,White,Male,Ohio,Montgomery,CV;#NC,Other Nonviolent Felony,,...,,,,,P/FA,OM,,2014-10-25,PR;#WH,1988


# Data Cleaning

Describe your data cleaning steps here.

In [2]:
# Dropped columns irrelevant to our research question and dropped columns with identifying information like names
df2 = df.drop(["Last Name", "First Name", "Sex", "County", "State", "Posting Date", "Tags", "OM Tags", "*", "List Add'l Crimes Recode"], axis=1)

# Dropped rows where race infornation is unknown (either "Other" or "Don't Know")
df2 = df2[(df2['Race'] != "Don't Know") & (df2['Race'] != 'Other')]

In [3]:
# Renamed acronyms with clearer names
df2 = df2.rename(columns={"FC" : "False Confession", "MWID": "Witness Misidentification", "F/MFE": "False Evidence", "P/FA": "Perjury", "OM": "Offical Misconduct", "ILD": "Legal Defense"})

In [4]:
# Made exoneration reason columns booleans (1 if it was a reason, 0 otherwise replacing NaNs)
df2.update(df2[["DNA", "False Confession", "Witness Misidentification","False Evidence","Perjury","Offical Misconduct","Legal Defense"]].fillna(0))
df2 = df2.replace({"DNA": 1,"FC" :1,"MWID":1, "F/MFE": 1, "P/FA": 1, "OM": 1, "ILD": 1})

In [5]:
df2

Unnamed: 0,Age,Race,Worst Crime Display,Sentence,Convicted,Exonerated,DNA,False Confession,Witness Misidentification,False Evidence,Perjury,Offical Misconduct,Legal Defense,Occurred
0,31.0,Black,Child Sex Abuse,Life,1995,2009,1,0,1,0,0,0,0,1991
1,43.0,Black,Sexual Assault,20 to Life,1983,1999,1,0,1,1,0,1,0,1982
2,17.0,White,Murder,Life without parole,1987,2015,1,1,0,0,1,1,0,1984
3,32.0,Black,Robbery,20 to Life,2006,2012,0,0,1,0,0,0,0,2005
4,35.0,Hispanic,Sex Offender Registration,2 years and 4 months,1994,2006,0,0,0,0,0,0,1,1994
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2760,38.0,White,Child Sex Abuse,4 years,2000,2001,0,0,0,0,1,0,0,2000
2761,40.0,White,Manslaughter,50 years,1998,2011,1,0,0,1,1,1,0,1997
2762,53.0,White,Murder,Life,2001,2005,1,1,0,0,0,0,1,2000
2763,38.0,White,Other Nonviolent Felony,5 to 15 years,1988,1992,0,0,0,0,1,1,0,1988


In order to get the data into a usable format, we dropped columns corresponding to unnecessary variables and personal information, and rows where race information was missing since race is relevant to our research question. We also renamed the exoneration reason columns to make their meanings more clear, and we changed their values to 1s and 0s to make it easier to perform analysis on these variables.