Import, tidy and analyze the NYPD Shooting Incident dataset obtained. Be sure your project is reproducible and contains some visualization and analysis. You may use the data to do any analysis that is of interest to you. You should include at least two visualizations and one model. Be sure to identify any bias possible in the data and in your analysis.

In [1]:
library(tidyverse)
library(lubridate)

-- [1mAttaching core tidyverse packages[22m ------------------------ tidyverse 2.0.0 --
[32mv[39m [34mdplyr    [39m 1.1.4     [32mv[39m [34mreadr    [39m 2.1.5
[32mv[39m [34mforcats  [39m 1.0.0     [32mv[39m [34mstringr  [39m 1.5.1
[32mv[39m [34mggplot2  [39m 3.5.1     [32mv[39m [34mtibble   [39m 3.2.1
[32mv[39m [34mlubridate[39m 1.9.3     [32mv[39m [34mtidyr    [39m 1.3.1
[32mv[39m [34mpurrr    [39m 1.0.2     
-- [1mConflicts[22m ------------------------------------------ tidyverse_conflicts() --
[31mx[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31mx[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()
[36mi[39m Use the conflicted package ([3m[34m<http://conflicted.r-lib.org/>[39m[23m) to force all conflicts to become errors


In [2]:
source_url <- 'https://data.cityofnewyork.us/api/views/833y-fsy8/rows.csv?accessType=DOWNLOAD'

incident_df <- read.csv(source_url)
summary(incident_df)

  INCIDENT_KEY        OCCUR_DATE         OCCUR_TIME            BORO          
 Min.   :  9953245   Length:28562       Length:28562       Length:28562      
 1st Qu.: 65439914   Class :character   Class :character   Class :character  
 Median : 92711254   Mode  :character   Mode  :character   Mode  :character  
 Mean   :127405824                                                           
 3rd Qu.:203131993                                                           
 Max.   :279758069                                                           
                                                                             
 LOC_OF_OCCUR_DESC     PRECINCT     JURISDICTION_CODE LOC_CLASSFCTN_DESC
 Length:28562       Min.   :  1.0   Min.   :0.0000    Length:28562      
 Class :character   1st Qu.: 44.0   1st Qu.:0.0000    Class :character  
 Mode  :character   Median : 67.0   Median :0.0000    Mode  :character  
                    Mean   : 65.5   Mean   :0.3219                      
           

In [3]:
glimpse(incident_df)

Rows: 28,562
Columns: 21
$ INCIDENT_KEY            [3m[90m<int>[39m[23m 244608249, 247542571, 84967535, 202853370, 270~
$ OCCUR_DATE              [3m[90m<chr>[39m[23m "05/05/2022", "07/04/2022", "05/27/2012", "09/~
$ OCCUR_TIME              [3m[90m<chr>[39m[23m "00:10:00", "22:20:00", "19:35:00", "21:00:00"~
$ BORO                    [3m[90m<chr>[39m[23m "MANHATTAN", "BRONX", "QUEENS", "BRONX", "BROO~
$ LOC_OF_OCCUR_DESC       [3m[90m<chr>[39m[23m "INSIDE", "OUTSIDE", "", "", "", "", "", "", "~
$ PRECINCT                [3m[90m<int>[39m[23m 14, 48, 103, 42, 83, 23, 113, 77, 48, 49, 73, ~
$ JURISDICTION_CODE       [3m[90m<int>[39m[23m 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ LOC_CLASSFCTN_DESC      [3m[90m<chr>[39m[23m "COMMERCIAL", "STREET", "", "", "", "", "", ""~
$ LOCATION_DESC           [3m[90m<chr>[39m[23m "VIDEO STORE", "(null)", "", "", "", "MULTI DW~
$ STATISTICAL_MURDER_FLAG [3m[90m<chr>[39m[23m "true", "true", "false", "false",

In [23]:


desc_counts <- lapply(incident_df[, c("LOC_CLASSFCTN_DESC", "LOCATION_DESC", "PERP_RACE", "VIC_RACE","LOC_OF_OCCUR_DESC")], table)

print(desc_counts)

$LOC_CLASSFCTN_DESC

                 (null)  COMMERCIAL    DWELLING     HOUSING       OTHER 
      25596           2         208         243         460          59 
PARKING LOT  PLAYGROUND      STREET     TRANSIT     VEHICLE 
         15          41        1886          23          29 

$LOCATION_DESC

                                             (null)                       ATM 
                    14977                      1711                         1 
                     BANK            BAR/NIGHT CLUB         BEAUTY/NAIL SALON 
                        3                       668                       119 
              CANDY STORE               CHAIN STORE                CHECK CASH 
                        7                         7                         1 
        CLOTHING BOUTIQUE           COMMERCIAL BLDG                DEPT STORE 
                       14                       304                         9 
           DOCTOR/DENTIST                DRUG STORE       DRY 

```r
# Modify, reorder, and select columns in a pipeline
cleaned_df <- df %>%
  # Rename 'category' to 'type' and 'value' to 'score'
  rename(type = category, score = value) %>%
  
  # Reorder columns: put 'type' first, followed by 'id', and 'date' and 'score'
  select(type, id, date, score) %>%
  
  # Remove rows where 'score' is less than 15
  selec(score >= 15)

  # remove completely
  select(-bad_column)
  ```

- [ ] Keep cleaning, renaming, removing
- [ ] Aggregate by day for over-time viz
- [ ] Aggregate by month, boro
- [ ] Figure out a model?

In [9]:
# make a nicer datetime column
clean_incident_df <- incident_df %>%
  mutate(datetime = as.POSIXct(paste(OCCUR_DATE, OCCUR_TIME), format="%m/%d/%Y %H:%M:%S")
  ) %>%
  

glimpse(clean_incident_df)

Rows: 28,562
Columns: 22
$ INCIDENT_KEY            [3m[90m<int>[39m[23m 244608249, 247542571, 84967535, 202853370, 270~
$ OCCUR_DATE              [3m[90m<chr>[39m[23m "05/05/2022", "07/04/2022", "05/27/2012", "09/~
$ OCCUR_TIME              [3m[90m<chr>[39m[23m "00:10:00", "22:20:00", "19:35:00", "21:00:00"~
$ BORO                    [3m[90m<chr>[39m[23m "MANHATTAN", "BRONX", "QUEENS", "BRONX", "BROO~
$ LOC_OF_OCCUR_DESC       [3m[90m<chr>[39m[23m "INSIDE", "OUTSIDE", "", "", "", "", "", "", "~
$ PRECINCT                [3m[90m<int>[39m[23m 14, 48, 103, 42, 83, 23, 113, 77, 48, 49, 73, ~
$ JURISDICTION_CODE       [3m[90m<int>[39m[23m 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0~
$ LOC_CLASSFCTN_DESC      [3m[90m<chr>[39m[23m "COMMERCIAL", "STREET", "", "", "", "", "", ""~
$ LOCATION_DESC           [3m[90m<chr>[39m[23m "VIDEO STORE", "(null)", "", "", "", "MULTI DW~
$ STATISTICAL_MURDER_FLAG [3m[90m<chr>[39m[23m "true", "true", "false", "false",

In [7]:
#| label: my_chunk_name
#| echo: false

sum <- 5 + 5 
print(sum)

[1] 10


In [8]:
#| echo: true

name <- 'cody'

paste('The name is',name)
