# Exercises for Practice

## Exercise 01 

The data below come from [tidytuesday](https://github.com/rfordatascience/tidytuesday/tree/master/data/2019/2019-09-10) and provide information on accidents at theme parks. You can see more of these [data available here](https://ridesdatabase.org/saferparks/data/). The data give you some details of where and when the accident occurred, and something about the injured party as well. 

In [None]:
library(readr)

read_csv(
    "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-09-10/saferparks.csv"
    ) -> safer_parks

|variable             |class     |description |
|:--------------------|:---------|:-----------|
|acc_id               |double    | Unique ID |
|acc_date             |character | Accident Date |
|acc_state            |character | Accident State |
|acc_city             |character | Accident City |
|fix_port             |character |.           |
|source               |character | Source of injury report |
|bus_type             |character | Business type |
|industry_sector      |character | Industry sector |
|device_category      |character | Device category |
|device_type          |character | Device type |
|tradename_or_generic |character | Common name of the device |
|manufacturer         |character | Manufacturer of device |
|num_injured          |double    | Num injured |
|age_youngest         |double    | Youngest individual injured |
|gender               |character | Gender of individual injured |
|acc_desc             |character | Description of accident |
|injury_desc          |character | Injury description |
|report               |character | Report URL |
|category             |character | Category of accident |
|mechanical           |double    | Mechanical failure (binary NA/1) |
|op_error             |double    | Operator error (binary NA/1)|
|employee             |double    | Employee error (binary NA/1)|
|notes                |character | Additional notes| 

Working with the `safer_parks` data, complete the following tasks. 

### Problem (a)
Using `acc_date`, create a new date variable called `idate` that is a proper date column generated via ``{lubridate}``. 

In [None]:
library(tidyverse)

In [None]:
head(safer_parks)

In [None]:
safer_parks %>%
    mutate(
        idate = mdy(acc_date)
    ) -> safer_parks

head(safer_parks$idate)

### Problem (b)
Now create new columns for (i) the month of the accident, and (ii) the day of the week. These should not be abbreviated (i.e., we should see the values as 'Monday' instead of 'Mon', "July" instead of "Jul"). 

What month had the highest number of accidents? 

What day of the week had the highest number of accidents? 

In [None]:
safer_parks %>%
    mutate(
        month = month(idate, label = TRUE, abbr = FALSE),
        day = wday(idate, label = TRUE, abbr = FALSE) 
    ) -> safer_parks

In [None]:
head(safer_parks)

In [None]:
safer_parks %>%
    group_by(month) %>%
    tally() %>%
    arrange(-n)

July had the most accidents (1702)

In [None]:
safer_parks %>%
    group_by(day) %>%
    tally() %>%
    arrange(-n)

In terms of days of the week, Saturdays had the most accidents (2070)

### Problem (c)
What if you look at days of the week by month? Does the same day of the week show up with the most accidents regardless of month or do we see some variation? 

In [None]:
safer_parks %>%
    group_by(month, day) %>%
    tally() %>%
    group_by(month) %>%
    pivot_wider(
        names_from = day,
        values_from = n
    )

Saturdays are the most common days for all months


### Problem (d)
What were the `five` dates with the most number of accidents? 

In [None]:
safer_parks %>%
    group_by(idate) %>%
    tally() %>%
    arrange(-n) %>%
    top_n(., 10)

### Problem (e)
Using the Texas injury data, answer the following question: What ride was the safest? [Hint: For each ride (`ride_name`) you will need to calculate the number of days between accidents. The ride with the highest number of days is the safest.] 

In [None]:
read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-09-10/tx_injuries.csv"
  ) -> tx_injuries


|variable          |class     |description |
|:-----------------|:---------|:-----------|
|injury_report_rec |double    | Unique Record ID |
|name_of_operation |character | Company name |
|city              |character | City |
|st                |character | State (all TX) |
|injury_date       |character | Injury date - note there are some different formats |
|ride_name         |character | Ride Name |
|serial_no         |character | Serial number of ride |
|gender            |character | Gender of the injured individual |
|age               |character | Age of the injured individual |
|body_part         |character | Body part injured |
|alleged_injury    |character | Alleged injury - type of injury |
|cause_of_injury   |character | Approximate cause of the injury (free text) |
|other             |character | Anecdotal information in addition to cause of injury |

You should note that this assumes each ride was in operation for the same amount of time. If this is not true then our estimates will be unreliable. 

In [None]:
tx_injuries %>%
  mutate(date = mdy(injury_date)) %>%
  group_by(ride_name) %>%
  arrange(date) %>%
  mutate(
    tspan = interval(lag(date, order_by = ride_name), date),
    tspan.days = as.duration(tspan)/ddays(1)
  ) %>%
  select(date, ride_name, tspan, tspan.days) %>%
  arrange(-tspan.days)


## Exercise 02
These data (see below) come from this story: [The next generation: The space race is dominated by new contenders](https://www.economist.com/graphic-detail/2018/10/18/the-space-race-is-dominated-by-new-contenders). You have data on space missions over time, with dates of the launch, the launching agency/country, type of launch vehicle, and so on. 


| variable    | definition                               |
| ----------- | ---------------------------------------- |
| tag         | Harvard or [COSPAR][cospar] id of launch |
| JD          | [Julian Date][jd] of launch              |
| launch_date | date of launch                           |
| launch_year | year of launch                           |
| type        | type of launch vehicle                   |
| variant     | variant of launch vehicle                |
| mission     | space mission                            |
| agency      | launching agency                         |
| state_code  | launching agency's state                 |
| category    | success (O) or failure (F)               |
| agency_type | type of agency                           |

In [None]:
read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-01-15/launches.csv"
  ) -> launches

### Problem (a) 
Create a new column called `date` that stores `launch_date` as a proper data field in ymd format from {lubridate}. 

In [None]:
launches %>%
  mutate(
      date = ymd(launch_date)
  ) -> lau.df

In [None]:
lau.df %>%
    select(date) %>%
    head()

### Problem (b) 
Creating columns as needed, calculate and show the number of launches first by year, then by month, and then by day of the week. The result should be arranged in descending order of the number of launches. 

In [None]:
lau.df %>%
  mutate(
    year = year(date),
    month = month(date, abbr = FALSE, label = TRUE),
    day = day(date),
    dow = wday(date, abbr = FALSE, label = TRUE)
  ) -> lau.df

In [None]:
lau.df %>%
  filter(!is.na(year)) %>%
  count(year, sort = TRUE) 

In [None]:
lau.df %>%
  filter(!is.na(month)) %>%
  count(month, sort = TRUE) 

In [None]:
lau.df %>%
  filter(!is.na(day)) %>%
  count(day, sort = TRUE) 

In [None]:
lau.df %>%
  filter(!is.na(dow)) %>%
  count(dow, sort = TRUE) 

### Problem (c) 
How many launches were successful `(O)` versus failed `(F)` by country and year? The countries of interest will be state_code values of "CN", "F", "J", "RU", "SU", "US". You do not need to arrange your results in any order. 

In [None]:
lau.df %>%
  filter(
    !is.na(date),
    state_code %in% c("CN", "F", "J", "RU", "SU", "US")
    ) %>%
  count(state_code, year, category)