# Exercises for Practice

## Exercise 01

Using the `cmhflights` data from last week, create a column that unites the three columns `Year`, `Month`, and `DayofMonth` into a single column that we will name `date_of_flight`. This column should separate the three fields by "-", as in, for example, `2017-1-9`. 

In [None]:
library(tidyverse)

In [None]:
load("data/cmhflights_01092017.RData")

In [None]:
cmhflights %>%
    select(
        Year, Month, DayofMonth, OriginCityName, DestCityName
    ) -> cmh

head(cmh)

In [None]:
cmh %>%
    unite(
        col = "date_of_flight",
        c("Year", "Month", "DayofMonth"),
        sep = "-",
        remove = FALSE
    ) -> cmh

In [None]:
cmh %>%
    head()

## Exercise 02

Sticking with `cmhflights`, separate `OriginCityName` into two new columns, `origin_city` and `origin_state`. Do the same for `DestCityName`, calling the new columns `destination_city` and `destination_state`, respectively. Both city columns should only display the name of the city, while both state columns should only display the abbreviated state name (for example, "CA", "OH", etc.)  


In [None]:
cmh %>%
    separate(
        col = OriginCityName,
        into = c("origin_city", "origin_state"),
        sep = ", ",
        remove = FALSE
        ) %>%
    separate(
        col = DestCityName,
        into = c("destination_city", "destination_state"),
        sep = ", ",
        remove = FALSE
        ) -> cmh 

In [None]:
cmh %>%
    head()

## Exercise 03

Tidy the `weather` data such that the resulting data-set, called `wdf`, has the `days` (the d1-d31 columns) as rows and `TMIN` and `TMAX` as columns. 

In [None]:
read.delim(
 file = "http://stat405.had.co.nz/data/weather.txt",
 stringsAsFactors = FALSE
 ) -> weather

weather

The end result should be as shown below:

| id          | date     | TMIN | TMAX |
| :--         | :--      | :--  | :--  |
| MX000017004 | 2010-1-1 | NA   | NA   |
| MX000017004 | 2010-1-2 | NA   | NA   |
| MX000017004 | 2010-1-3 | NA   | NA   |
| MX000017004 | 2010-1-4 | NA   | NA   |

In [None]:
weather %>%
    group_by(id, year, month, element) %>%
    pivot_longer(
        names_to = "day",
        values_to = "temperature",
        col = 5:35
    ) -> wdf

In [None]:
head(wdf)

In [None]:
wdf %>%
    mutate(
        day = stringr::str_remove_all(day, "d")
        ) %>%
    unite(
        col = "date",
        c("year", "month", "day"),
        sep = "-"
        ) -> wdf

In [None]:
glimpse(wdf)

In [None]:
library(stringr)

In [None]:
wdf %>%
    mutate(
    new_element = str_remove_all(element, "T")
    ) %>%
    ungroup() %>%
    select(-element) -> wdf

In [None]:
head(wdf)

In [None]:
tail(wdf)

In [None]:
wdf %>%
    group_by(id, date) %>%
    pivot_wider(
        names_from = new_element,
        values_from = temperature
     ) -> wdf_wide

In [None]:
head(wdf_wide)

In [None]:
tail(wdf_wide)

In [None]:
wdf_wide

Just for fun, chaining it all together ...

In [None]:
read.delim(
    file = "http://stat405.had.co.nz/data/weather.txt",
    stringsAsFactors = FALSE
    ) %>%
    group_by(id, year, month, element) %>%
    pivot_longer(
        names_to = "day",
        values_to = "temperature",
        col = 5:35
    ) %>%
    mutate(
        day = stringr::str_remove_all(day, "d")
        ) %>%
    unite(
        col = "date",
        c("year", "month", "day"),
        sep = "-"
        ) %>%
    group_by(id, date) %>%
    pivot_wider(
        names_from = element,
        values_from = temperature
     ) -> wdf_wide

head(wdf_wide)

I don;t like all these missing value rows that show `NA` for `TMAX` and `TMIN` so let us `filter(...)` them out.

In [None]:
wdf_wide %>%
    filter(
        !is.na(TMAX) & !is.na(TMIN)
        ) -> wdf_wide_clean

In [None]:
head(wdf_wide_clean)