In [1]:
suppressMessages(library('tidyverse'))
suppressMessages(library('lubridate'))

## Load data

Data is from the New York Civil Liberties Union's [NYPD Misconduct Complaint Database](https://www.nyclu.org/en/campaigns/nypd-misconduct-database), obtained through FOIA requests by the NYCLU to NYC's Civilian Complaint Review Board (CCRB), which is responsible for civilian oversight of the city's police. GitHub repo [here](https://github.com/new-york-civil-liberties-union/NYPD-Misconduct-Complaint-Database-Updated).

Notes on the raw data:

- The earliest `ReceivedDate` (date on which the CCRB received the complaint) in the data is in January 1967 (though older data is less reliably recorded); the most recent is in February 2021
- Each complaint (aka incident) is uniquely identified by `ComplaintID`
- Each officer is uniquely identified by `OfficerID`
- The data does not include pending complaints
- The data includes only CCRB-investigated complaints made by civilians. CCRB's [jurisdiction](https://www1.nyc.gov/site/ccrb/about/frequently-asked-questions-faq.page) covers four types of misconduct: force that is excessive or unnecessary; abuse of authority; discourtesy; and offensive language, collectively referred to as "FADO"
- There can be more than one allegation under a given complaint. For example, an incident can have several allegations against multiple officers. Each allegation is uniquely identifed by `AllegationID`. An incident _can_ and often does contain more than one allegation against the same police officer (note this is where the NYC data differs from the Chicago data)

More information on each field in the data is in the [CCRB Filespecs](https://github.com/new-york-civil-liberties-union/NYPD-Misconduct-Complaint-Database-Updated/blob/main/CCRB%20Filespecs%2004.20.2021.xlsx) on the NYCLU's GitHub

In [2]:
complaints <- read_csv('../input/nyc/CCRB Complaint Database Raw 04.20.2021.csv')


[36m──[39m [1m[1mColumn specification[1m[22m [36m───────────────────────────────────────────────────────────────────────────────────────────────[39m
cols(
  .default = col_character(),
  AllegationID = [32mcol_double()[39m,
  OfficerID = [32mcol_double()[39m,
  ShieldNo = [32mcol_double()[39m,
  DaysOnForce = [32mcol_double()[39m,
  ComplaintID = [32mcol_double()[39m,
  ImpactedAge = [32mcol_double()[39m
)
[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m for the full column specifications.




### Format dates and filter time period to 2007-2017

We chose to run our analysis for the 2007-2017 time period because it offers a robust ten-year time frame which we also use for Chicago. Also, since the NYCLU database only includes closed cases, it will necessarily contain fewer cases in more recent years such as 2020 and 2021, as fewer of them will have been completed as of this year.

In [3]:
complaints <- complaints %>% mutate(ReceivedDate = mdy(ReceivedDate), 
                                    IncidentDate = mdy(IncidentDate), 
                                    CloseDate = mdy(CloseDate)) 

In [4]:
complaints.filtered <- complaints %>% filter(ReceivedDate >= '2007-01-01' & ReceivedDate <= '2017-12-31')

### Save data 

In [5]:
write_csv(complaints.filtered, '../output/nyc_clean.csv')