## Police Shootings Database
#### *Caleb Davis*

The CORGIS Police Shootings database doesn't have all the recent reportings, so we'll need to update it by grabbing the most recent csv and cleaning it. Some of the string formatting for dates as well as the way certain columns handle missing values seems a little unintuitive, so I'll try to clean that up a bit as well.

First we'll take a look at the raw csv file I found on the [Washington Post github](https://github.com/washingtonpost/data-police-shootings)

In [1]:
import csv
import pandas as pd

In [2]:
pd.read_csv('raw/fatal-police-shootings-data.csv')

Unnamed: 0,id,name,date,manner_of_death,armed,age,gender,race,city,state,signs_of_mental_illness,threat_level,flee,body_camera,longitude,latitude,is_geocoding_exact
0,3,Tim Elliot,2015-01-02,shot,gun,53.0,M,A,Shelton,WA,True,attack,Not fleeing,False,-123.122,47.247,True
1,4,Lewis Lee Lembke,2015-01-02,shot,gun,47.0,M,W,Aloha,OR,False,attack,Not fleeing,False,-122.892,45.487,True
2,5,John Paul Quintero,2015-01-03,shot and Tasered,unarmed,23.0,M,H,Wichita,KS,False,other,Not fleeing,False,-97.281,37.695,True
3,8,Matthew Hoffman,2015-01-04,shot,toy weapon,32.0,M,W,San Francisco,CA,True,attack,Not fleeing,False,-122.422,37.763,True
4,9,Michael Rodriguez,2015-01-04,shot,nail gun,39.0,M,H,Evans,CO,False,attack,Not fleeing,False,-104.692,40.384,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6134,6695,Dustin Roy Black,2021-03-13,shot,gun,24.0,M,W,Austin,AR,False,attack,Not fleeing,False,-91.988,34.981,True
6135,6700,,2021-03-13,shot,undetermined,,M,,Monahans,TX,False,undetermined,,False,,,True
6136,6701,Douglas William Stroble,2021-03-13,shot,gun,25.0,M,,Wasilla,AK,False,attack,Not fleeing,False,,,True
6137,6697,Kelly Shannon Bowen,2021-03-14,shot,gun and vehicle,51.0,M,W,Reidsville,GA,False,attack,Other,False,-82.110,32.100,True


This data is already formated pretty well, however there are a few things that could be useful, such as integer dates. Let's clean this up a bit and get it ready for CORGIS. (This code is heavily based on kafura's original script [here](https://github.com/corgis-edu/corgis/blob/master/source/police-shootings/clean-police-shootings.py), I slightly modified some of the string formatting)

In [29]:
with open('raw/fatal-police-shootings-data.csv', 'r') as rawFile:
    with open('police_shootings-corgis.csv', 'w', encoding='utf-8', newline='') as cleanFile:
        reader = csv.reader(rawFile)
        writer = csv.writer(cleanFile, quotechar='"', quoting=csv.QUOTE_NONNUMERIC, lineterminator='\n')
        
        nameRow = next(reader)
        outRow = [None] * 16
        
        for inpRow in reader:
            # we ignore row[0] since this contains the labels
            
            # We'll organize the data about the victim first
            outRow[0] = str(inpRow[1]) if inpRow[1] != "" else "Unknown" # Name
            
            outRow[1] = int(inpRow[5]) if inpRow[5] != "" else 0 # Age, places 0 if unknown
            
            gender = {"M":"Male", "F":"Female"}
            outRow[2] = gender.get(inpRow[6], "Unknown") # Gender
            
            race = {
                "A":"Asian",
                "B":"African American",
                "W":"White",
                "H":"Hispanic",
                "N":"Native American",
                "O":"Other"
            }
            outRow[3] = race.get(inpRow[7], "Unknown") # Race
            
            
            # Next we'll add the data about the incident Setting
            year, month, day = inpRow[2].split('-')
            
            outRow[4] = int(month) # Month as int
            outRow[5] = int(day) # Day as int
            outRow[6] = int(year) # Year as int
            
            outRow[7] = "{}/{}/{}".format(month, day, year) # full date as string
            
            outRow[8] = str(inpRow[8]) # City
            outRow[9] = str(inpRow[9]) # State
            
            
            # Next we'll add data about the incident Factors
            outRow[10] = str(inpRow[4]) if inpRow[4] != "" else "unknown" # Armed
            outRow[11] = bool(inpRow[10]) # Sign of mental illness
            outRow[12] = str(inpRow[11]) if inpRow[11] != "" else "unknown"# Threat Level
            outRow[13] = str(inpRow[12]) if inpRow[12] != "" else "unknown"# Flee
            outRow[14] = str(inpRow[3]) # manner of death
            outRow[15] = bool(inpRow[13]) # body camera\
            
            # And write to the cleaned up csv
            
            writer.writerow(outRow)

Now that we have our cleaned up file, let's see how it looks when loaded straight to pandas

In [36]:
df = pd.read_csv('police_shootings-corgis.csv', header=None)
df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
0,Tim Elliot,53,Male,Asian,1,2,2015,01/02/2015,Shelton,WA,gun,True,attack,Not fleeing,shot,True
1,Lewis Lee Lembke,47,Male,White,1,2,2015,01/02/2015,Aloha,OR,gun,True,attack,Not fleeing,shot,True
2,John Paul Quintero,23,Male,Hispanic,1,3,2015,01/03/2015,Wichita,KS,unarmed,True,other,Not fleeing,shot and Tasered,True
3,Matthew Hoffman,32,Male,White,1,4,2015,01/04/2015,San Francisco,CA,toy weapon,True,attack,Not fleeing,shot,True
4,Michael Rodriguez,39,Male,Hispanic,1,4,2015,01/04/2015,Evans,CO,nail gun,True,attack,Not fleeing,shot,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6134,Dustin Roy Black,24,Male,White,3,13,2021,03/13/2021,Austin,AR,gun,True,attack,Not fleeing,shot,True
6135,Unknown,0,Male,Unknown,3,13,2021,03/13/2021,Monahans,TX,undetermined,True,undetermined,unknown,shot,True
6136,Douglas William Stroble,25,Male,Unknown,3,13,2021,03/13/2021,Wasilla,AK,gun,True,attack,Not fleeing,shot,True
6137,Kelly Shannon Bowen,51,Male,White,3,14,2021,03/14/2021,Reidsville,GA,gun and vehicle,True,attack,Other,shot,True


Great, it looks like our new csv is formatted exactly as we need it for CORGIS! We will add the column headers to the metadata file as such:
```
Victim.Name
Victim.Age
Victim.Gender
Victim.Race

Setting.Date.Month
Setting.Date.Day
Setting.Date.Year
Setting.Date.Full
Setting.Location.City
Setting.Location.State

Factors.Armed
Factors.Mental-Illness
Factors.Threat-Level
Factors.Fleeing
Factors.Manner
Factors.Body-Cam
```



In [34]:
meta = pd.read_csv('police_shootings-meta.csv', header=None)
meta

Unnamed: 0,0,1,2,3,4
0,Name,Police Shootings,,,
1,Version,0.0.2,,,
2,Author,Caleb Davis (caldavis@udel.edu),,,
3,Created,03/22/21,,,
4,Data File,police_shootings-corgis.csv,,,
5,Overview,“The Washington Post is compiling a database o...,,,
6,Data Source,Washington Post Github: https://github.com/was...,,,
7,Description,https://www.washingtonpost.com/national/how-th...,,,
8,Tags,"""violence, crime, violent, police, shootings, ...",,,
9,Row,Shootings,,,


With these two files, CORGIS will be able to produce a proper dataframe with headers:

In [41]:
df.columns = meta[1][15:]
df

1,Victim.Name,Victim.Age,Victim.Gender,Victim.Race,Setting.Date.Month,Setting.Date.Day,Setting.Date.Year,Setting.Date.Full,Setting.Location.City,Setting.Location.State,Factors.Armed,Factors.Mental-Illness,Factors.Threat-Level,Factors.Fleeing,Factors.Manner,Factors.Body-Cam
0,Tim Elliot,53,Male,Asian,1,2,2015,01/02/2015,Shelton,WA,gun,True,attack,Not fleeing,shot,True
1,Lewis Lee Lembke,47,Male,White,1,2,2015,01/02/2015,Aloha,OR,gun,True,attack,Not fleeing,shot,True
2,John Paul Quintero,23,Male,Hispanic,1,3,2015,01/03/2015,Wichita,KS,unarmed,True,other,Not fleeing,shot and Tasered,True
3,Matthew Hoffman,32,Male,White,1,4,2015,01/04/2015,San Francisco,CA,toy weapon,True,attack,Not fleeing,shot,True
4,Michael Rodriguez,39,Male,Hispanic,1,4,2015,01/04/2015,Evans,CO,nail gun,True,attack,Not fleeing,shot,True
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
6134,Dustin Roy Black,24,Male,White,3,13,2021,03/13/2021,Austin,AR,gun,True,attack,Not fleeing,shot,True
6135,Unknown,0,Male,Unknown,3,13,2021,03/13/2021,Monahans,TX,undetermined,True,undetermined,unknown,shot,True
6136,Douglas William Stroble,25,Male,Unknown,3,13,2021,03/13/2021,Wasilla,AK,gun,True,attack,Not fleeing,shot,True
6137,Kelly Shannon Bowen,51,Male,White,3,14,2021,03/14/2021,Reidsville,GA,gun and vehicle,True,attack,Other,shot,True
