# MPA 5830 - Task 04

# Chicago Bird Collisions

[Winger et al, 2019](https://royalsocietypublishing.org/doi/10.1098/rspb.2019.0364#d3e550) examined nocturnal flight-calling behavior and vulnerability to artificial light in migratory birds. 

> "Understanding interactions between biota and the built environment is increasingly important as human modification of the landscape expands in extent and intensity. For migratory birds, collisions with lighted structures are a major cause of mortality, but the mechanisms behind these collisions are poorly understood. Using 40 years of collision records of passerine birds, we investigated the importance of species' behavioral ecologies in predicting rates of building collisions during nocturnal migration through Chicago, IL and Cleveland, OH, USA. "

> "One of the few means to examine species-specific dynamics of social biology during nocturnal bird migration is through the study of short vocalizations made in flight by migrating birds. Many species of birds, especially passerines (order Passeriformes), produce such vocal signals during their nocturnal migrations. These calls (hereafter, ‘flight calls’) are hypothesized to function as important social cues for migrating birds that may aid in orientation, navigation and other decision-making behaviors.not all nocturnally migratory species make flight calls, raising the possibility that different lineages of migratory birds vary in the degree to which social cues and collective decisions are important for accomplishing migration. "

I have only uploaded the raw and tamed Chicago data-set as it is the most complete, but you can access the full raw data [here](https://datadryad.org/resource/doi:10.5061/dryad.8rr0498). 

Each row in the `bird_collisions.csv` data-set accounts for a single observation of a bird collision. You can aggregate by species/genus, time, or other factors.

h/t to [Data is Plural 2019/04/10](https://docs.google.com/spreadsheets/d/1wZhPLMCHKJvwOkP4juclhjFgqIY8fQFMemwKL2c64vk/edit#gid=0)

### Important Notes but Spoilers

An important point but somewhat spoiler from the authors
> From 2000 to 2018, D.E.W. and M.H. recorded data on the status of night-time lighting at McCormick Place during pre-dawn walks to collect collisions by recording the proportion of the 17 window bays that were illuminated... We used this index to test whether building lighting influenced the number of collisions and whether the influence of light levels on collisions counts varied across the sets of species with different flight-calling behavior or habitat preferences.

There is a factor data column (`bird_collisions$locality`) that indicates if the data was collected at McCormick Place (MP) or elsewhere in Chicago (CHI). If you `dplyr::filter` to only use `MP` you can `dplyr::left_join` the light data and the bird collision data to look at the effects of light on bird collisions from 2000 on.

## Get the data!

In [1]:
library(tidyverse)

readr::read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/bird_collisions.csv"
    ) -> bird_collisions 

readr::read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-30/mp_light.csv"
    ) -> mp_light 

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.6     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.4     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.0.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()

[1mRows: [22m[34m69695[39m [1mColumns: [22m[34m8[39m

[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[31mchr[39m  (7): genus, species, locality, family, flight_call, habitat, stratum
[34mdate[39m (1): date


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m

### Citations

When using this data, please cite the original publication:

> Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Nocturnal flight-calling behavior predicts vulnerability to artificial light in migratory birds. Proceedings of the Royal Society B 286(1900): 20190364. https://doi.org/10.1098/rspb.2019.0364

If using the data alone, please cite the [Dryad data package](https://cran.r-project.org/web/packages/rdryad/rdryad.pdf):

> Winger BM, Weeks BC, Farnsworth A, Jones AW, Hennen M, Willard DE (2019) Data from: Nocturnal flight-calling behavior predicts vulnerability to artificial light in migratory birds. Dryad Digital Repository. https://doi.org/10.5061/dryad.8rr0498


### Data Dictionary

#### `bird_collisions.csv` 
|variable    |class     |description |
|:-----------|:---------|:-----------|
|genus       | factor | Bird Genus          |
|species     | factor | Bird species           |
|date        | date    | Date of collision death (ymd)           |
|locality    | factor | MP or CHI - recording at either McCormick Place or greater Chicago area           |
|family      | factor | Bird Family          |
|flight_call | factor | Does the bird use a flight call - yes or no           |
|habitat     | factor | Open, Forest, Edge - their habitat affinity          |
|stratum     | factor  | Typical occupied stratum - ground/low or canopy/upper           |

#### `mp_light.csv` 
|variable    |class  |description |
|:-----------|:------|:-----------|
|date        | date | Date of light recording  (ymd)        |
|light_score | integer | Number of windows lit at the McCormick Place, Chicago - higher = more light          |

## Now the questions ...

### Question (a) 
Does the number of bird collisions vary by month? By day of the month? By day of the week? By year? 

Show your code and then answer, in words after your code chunk, what year, month day of the month, and day of the week had the most bird collisions. 

### Question (b)
What locality has had the most hits per year -- McCormick Place or the greater Chicago area?

### Question (c)
Now filter the bird collision data to keep only records from McCormick Place. Then join the resulting data frame to the mp_light data set, joining the two such that bird collision records and light-score records are matched up correctly by date. Eliminate any rows of data where light_score is missing. Save the resulting data-set in RData format as `birds.df`

### Question (d) 
Now we want to know if the distribution of bird collisions differs by how brightly lit (a higher `light_score` indicates more brightness) or dimly lit (lower `light_score`) the windows of McCormick Place were. 

To answer this question, we want to do two things. 

* First, convert `light_score` into a grouped, ordinal variable with 4 groups (essentially creating quartiles). 
* Second, we now want to see the number of fatalities that fall within each of these five groups of light_scores. 

# Philadelphia Parking Violations

These data come from Philly's open data portal and were used in one `tidyduesday` iteration. These particular data are for 2017. Use these data to answer the questions that follow. 

In [None]:
readr::read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-12-03/tickets.csv"
  ) -> tickets


## Problem (a) 
Make a proper date-time column called `tixdt`, and then from this extract the month, day of the month, day of the week, hour, and minute. Month and day of the week should be fully labelled. Save the resulting data-set as `tix.df`, and make sure you also save it in the RData format as `tix.df.RData` to the `data` folder. 

## Problem (b)

What months, day of the month, day of the week, hours, and minutes were most likely to see a ticket issued? You will need one calculation for each of these. 

## Problem (c)

What combination of hour and minute we most likely to see a ticket issued? What hour and minute were least likely to see a ticket issued?