In [1]:
suppressMessages(library('readxl'))
suppressMessages(library('janitor'))
suppressMessages(library('tidyverse'))
suppressMessages(library('data.table'))

## Load data

Data is from the Invisible Institute's [Citizens Police Data Project](https://data.cpdp.co/), obtained through a series of FOIA requests they made to the Chicago Police Department. GitHub repo [here](https://github.com/invinst/chicago-police-data/tree/master/data/unified_data) and [here](https://github.com/invinst/chicago-police-data/tree/master/data/context_data) (for the complaint categories). The following datasets are loaded:

- `complaints.complaints`: provides information on each complaint including complaint date and closed date
- `complaints.accused`: provides information on the specific allgetions under each complaint including the names of the officer(s) accused, the complaint category and the outcome of the complaint
- `complaint.categories`: categories of complaints (for example, if the complaint is a citizen or departmental complaint)
- `officer.filed.complaints`: provides the IDs of the minority of complaints that are made by officers rather than by civilians (up to 2016)
- `final.profiles`: provides a roster of the officers who were active over the period, with appointment date and resignation date

Notes on the raw data:
- The earliest `complaint_date` (date the complaint was filed) in the data is in October 1967 (though older data is less reliably recorded); the most recent is in March 2018
- Each complaint (aka incident) is uniquely identified by `cr_id`
- Each officer is uniquely identified by `link_UID`
- The data includes pending complaints
- The data includes complaints made by civilians as well as other officers  
- The data includes complaints investigated by the Chicago Police Department's Bureau of Internal Affairs as well as Chicago's civilian review agency (currently COPA, or the Civilian Office of Police Accountability). The two agencies have different [jurisdictions](https://www.chicagocopa.org/investigations/jurisdiction/) when it comes to the types of complaints that are investigated by each
- There can more than one allegation under a given complaint. For example, an incident can have several allegations against multiple officers. However, [according to the Invisible Institute](http://how.cpdp.works/en/articles/1889809-why-is-this-information-imperfect), an incident _cannot_ contain more than one allegation against the same police officer. In other words, a given police officer can have a maximum of one allegation under a given complaint. Also, multiple officers named under the same complaint may have their allegations labeled under the same complaint category: 
> "Due to limitations in the data systems used by the CPD and its oversight agencies, most complaints are given a single complaint category, typically the most serious allegation. This means that if one officer is accused of excessive force and two fellow officers are accused of not reporting the excessive force, all three officers may have a complaint marked as excessive force."

More information is in the Invisible Institute's [Data Dictionary](https://github.com/invinst/chicago-police-data/blob/master/data/unified_data/data-dictionary/data-dictionary.yaml) on GitHub

In [2]:
complaints.accused <- fread('../input/chicago/complaints-accused.csv.gz')
complaints.complaints <- fread('../input/chicago/complaints-complaints.csv.gz')
complaint.categories <- read_excel('../input/chicago/Complaint Categories.xlsx') %>% clean_names()
officer.filed.complaints <- fread('../input/chicago/officer-filed-complaints__2017-09.csv.gz')
final.profiles <- fread('../input/chicago/final-profiles.csv.gz')

### Merge datasets

In [3]:
complaints.accused.merge <- merge(complaints.accused,
                                  complaints.complaints %>% select(cr_id, complaint_date, closed_date),
                                  by = 'cr_id') %>%
merge(final.profiles %>% select(link_UID, 
                                appointed_date, 
                                resignation_date, 
                                officer_gender = gender,
                                officer_race = race,
                                officer_birthyear = birth_year), by = 'link_UID')

In [4]:
complaints.accused.merge.categories <- merge(complaints.accused.merge %>% filter(complaint_code != ''),
                                             complaint.categories %>% select(x111_0, description, category, citizen_dept),
                                             by.x = 'complaint_code',
                                             by.y = 'x111_0')

### Exclude officer-filed complaints and pending complaints and filter time period to 2007-2017

We chose to run our analysis for the 2007-2017 time period because it offers a robust ten-year time frame which we also use for NYC, and because 2017 is the last full year in the Invisible Institute's database.

Note the officer-filed complaints dataset, `officer.filed.complaints`, only records complaints through 2016, so there may be a small number of officer-filed complaints still in our data. 

In [5]:
complaints.accused.merge.filter <- complaints.accused.merge.categories %>% filter(complaint_date >= '2007-01-01' & 
                                                                                  complaint_date <= '2017-12-31') %>% 
                                   merge(officer.filed.complaints %>% mutate(cr_id = as.character(cr_id), 
                                                                                officer_filed = 1), by = 'cr_id', 
                                         all.x = T) %>% filter(is.na(officer_filed)) %>% select(-officer_filed) %>%
filter(!is.na(closed_date)) # per correspondance with Invisible Institute, complaints with no closed date are pending

Department-facing complaints are labeled "DEPT" and civilian-facing are labeled "CITIZEN" (per methodology in [this paper](https://journals.sagepub.com/doi/full/10.1177/2378023119879798))

In [6]:
table(complaints.accused.merge.filter$citizen_dept)


      ? CITIZEN    DEPT 
    945   37270   16518 

In [7]:
complaints.accused.merge.filter.citizen <- complaints.accused.merge.filter %>% filter(citizen_dept == 'CITIZEN') 

### Save data

In [9]:
write_csv(complaints.accused.merge.filter.citizen, '../output/chicago_clean.csv')