In [1]:
suppressMessages(library('tidyverse'))
suppressMessages(library('lubridate'))

## Load data

Data from 2016 onwards is released by the Philadelphia Police Department on [OpenDataPhilly](https://www.opendataphilly.org/dataset/police-complaints), the city's open data website. The following datasets are loaded:

- `complaints.against.police`: the [Complaints Against Police](https://www.opendataphilly.org/dataset/police-complaints/resource/e7477284-0045-4f37-8aeb-182616f736e8) dataset, which provides information about civilian complaints alleging police misconduct
- `complaints.against.police.findings`: the [CAP Findings](https://www.opendataphilly.org/dataset/police-complaints/resource/7f7d472f-c49c-4364-b6e0-3a079e6b7d7f) dataset, which provides information about the police officer(s) involved in each complaint, and the status of the allegations

Notes on the raw data:

- The earliest `date_received` (date the complaint was received) in the data is in January 2016; the most recent is in March 2021. The department posts data for the past five years on a rolling basis
- Each complaint is uniqely identified by `complaint_id`
- Each officer is uniquely identified by `officer_id`. A small number of these are `NA` and are ommitted from our analysis
- The data includes pending complaints
- In Philadelphia, the internal affairs bureau investigates the vast majority of civilian complaints (unlike in other cities like NYC, where a civilian oversight agency has a broad jurisdiction). The data includes civilian-filed complaints that are investigated by internal affairs
- There can be more than one allegation under a given complaint. For example, an incident can have several allegations against multiple officers. An incident _can_ and often does contain more than one allegation against the same police officer (note this is where the Philadelphia data differs from the Chicago data)

More information on each field in the data is available on the Metadata pages for [Complaints Against Police](https://metadata.phila.gov/#home/datasetdetails/5a3827b4b9464c55711a0816/representationdetails/5a3827dbb954635579423e0f/) and [Complaints Against Police Findings](https://metadata.phila.gov/#home/datasetdetails/5a3827b4b9464c55711a0816/representationdetails/5a3827b6b9464c55711a081a/)

(We will load the OpenPhilly data from the API and save a static copy for future reference)

In [2]:
url.cap <- 'https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+ppd_complaints&filename=ppd_complaints&format=csv&skipfields=cartodb_id,the_geom,the_geom_webmercator'
url.cap.findings <- 'https://phl.carto.com/api/v2/sql?q=SELECT+*+FROM+ppd_complaint_disciplines&filename=ppd_complaint_disciplines&format=csv&skipfields=cartodb_id,the_geom,the_geom_webmercator'

In [3]:
complaints.against.police <- read_csv(url.cap)
complaints.against.police.findings <- read_csv(url.cap.findings)

write_csv(complaints.against.police, '../input/philly/complaints_against_police.csv')
write_csv(complaints.against.police.findings, '../input/philly/complaints.against.police.findings.csv')


[36m──[39m [1m[1mColumn specification[1m[22m [36m───────────────────────────────────────────────────────────────────────────────────────────────[39m
cols(
  complaint_id = [31mcol_character()[39m,
  date_received = [34mcol_date(format = "")[39m,
  district_occurrence = [31mcol_character()[39m,
  general_cap_classification = [31mcol_character()[39m,
  summary = [31mcol_character()[39m
)



[36m──[39m [1m[1mColumn specification[1m[22m [36m───────────────────────────────────────────────────────────────────────────────────────────────[39m
cols(
  complaint_id = [31mcol_character()[39m,
  officer_id = [32mcol_double()[39m,
  po_race = [31mcol_character()[39m,
  po_sex = [31mcol_character()[39m,
  po_assigned_unit = [31mcol_character()[39m,
  allegations_investigated = [31mcol_character()[39m,
  investigative_findings = [31mcol_character()[39m,
  disciplinary_findings = [31mcol_character()[39m
)




The Philadelphia Police Department releases data for a trailing five-year period, meaning there is data from prior to 2016 that has been overwritten by newly-released data on OpenDataPhilly. Data from April 2015 to December 2015 is from [Sam Learner](https://www.samlearner.com/), who previously collected it for a data visualization [story](https://pudding.cool/2020/10/police-misconduct/) on police misconduct investigations in Philadelphia published by The Pudding.

Note Learner also used pre-2015 data from Philly Declaration's [Police Accountability Project](https://github.com/PhillyDeclaration/Philadelphia-Police-Accountability-Project) in his report; however, we are not able to include this information because police officers in the earlier dataset were identified by first and last initial rather than by a unique ID number.

(We will load the data from Learner's GitHub [repo](https://github.com/sdl60660/philly_police_complaints), filter it to include only 2015 data, and save a static copy for future reference)

In [4]:
complaints.old <- read.csv('https://raw.githubusercontent.com/sdl60660/philly_police_complaints/master/raw_data/ppd_complaints.csv', 
         stringsAsFactors = F) 

complaint.disciplines.old <- read.csv('https://raw.githubusercontent.com/sdl60660/philly_police_complaints/master/raw_data/ppd_complaint_disciplines.csv', 
                                      stringsAsFactors = F) %>% 
        filter(!is.na(officer_id))

complaints.old.merge <- merge(complaint.disciplines.old, complaints.old,
                              by = 'complaint_id', all.x = T) %>% mutate(date_received = mdy(date_received)) %>% 
filter(year(date_received) == 2015)

write_csv(complaints.old.merge, '../input/philly/complaints_2015.csv')

### Combine 2015 data with data from 2016 onwards

In [5]:
complaints.against.police.merge <- merge(complaints.against.police, 
                                         complaints.against.police.findings, 
                                         by = 'complaint_id') %>% filter(!is.na(officer_id))

complaints.against.police.merge.2015 <- rbind(complaints.against.police.merge, 
                                              complaints.old.merge %>% select(-officer_initials, 
                                                                              -officer_complaint_id,
                                                                              -discipline_id, 
                                                                              -shortened_summary,
                                                                              -po_district_number))

### Save data

In [6]:
write_csv(complaints.against.police.merge.2015, '../output/philly_clean.csv')