diff --git a/01-Visualize.Rmd b/01-Visualize.Rmd
index 64cfd64..0d1c474 100755
--- a/01-Visualize.Rmd
+++ b/01-Visualize.Rmd
@@ -94,7 +94,7 @@ Make a density plot of `budget` colored by `clean_test`.
 
 ## Your Turn 7
 
-Make a density plot of `clean_test` colored by `clean_test`.
+Make a barchart of `clean_test` colored by `clean_test`.
 
 ```{r}
 
diff --git a/03-Tidy.Rmd b/03-Tidy.Rmd
index 0b189b2..23f1503 100755
--- a/03-Tidy.Rmd
+++ b/03-Tidy.Rmd
@@ -5,7 +5,7 @@ editor_options:
   chunk_output_type: inline
 ---
 
-<!-- This file by Amelia McNamara is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
+<!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
 
 ```{r setup}
 library(tidyverse)
diff --git a/cheatsheets/dplyr-data-transformation.pdf b/cheatsheets/dplyr-data-transformation.pdf
old mode 100755
new mode 100644
index 27646dc..22c6003
Binary files a/cheatsheets/dplyr-data-transformation.pdf and b/cheatsheets/dplyr-data-transformation.pdf differ
diff --git a/cheatsheets/ggplot2-data-visualization.pdf b/cheatsheets/ggplot2-data-visualization.pdf
old mode 100755
new mode 100644
index 51998d8..0eab6ca
Binary files a/cheatsheets/ggplot2-data-visualization.pdf and b/cheatsheets/ggplot2-data-visualization.pdf differ
diff --git a/cheatsheets/lubridate-dates-times.pdf b/cheatsheets/lubridate-dates-times.pdf
old mode 100755
new mode 100644
index 4a71263..f183fa0
Binary files a/cheatsheets/lubridate-dates-times.pdf and b/cheatsheets/lubridate-dates-times.pdf differ
diff --git a/cheatsheets/purrr-iterate.pdf b/cheatsheets/purrr-iterate.pdf
old mode 100755
new mode 100644
index d90a621..db4d751
Binary files a/cheatsheets/purrr-iterate.pdf and b/cheatsheets/purrr-iterate.pdf differ
diff --git a/cheatsheets/rstudio-ide.pdf b/cheatsheets/rstudio-ide.pdf
old mode 100755
new mode 100644
index dd2c750..486f7fd
Binary files a/cheatsheets/rstudio-ide.pdf and b/cheatsheets/rstudio-ide.pdf differ
diff --git a/cheatsheets/stringr-strings.pdf b/cheatsheets/stringr-strings.pdf
old mode 100755
new mode 100644
index d7157bd..5286cc1
Binary files a/cheatsheets/stringr-strings.pdf and b/cheatsheets/stringr-strings.pdf differ
diff --git a/slides/05-Data-Types.pdf b/slides/05-Data-Types.pdf
old mode 100755
new mode 100644
index 020fca5..041a4b4
Binary files a/slides/05-Data-Types.pdf and b/slides/05-Data-Types.pdf differ
diff --git a/slides/06-Iteration.pdf b/slides/06-Iteration.pdf
old mode 100755
new mode 100644
index 7d152ee..25be1aa
Binary files a/slides/06-Iteration.pdf and b/slides/06-Iteration.pdf differ
diff --git a/solutions/01-Visualize-solutions.Rmd b/solutions/01-Visualize-solutions.Rmd
new file mode 100644
index 0000000..c87d9bc
--- /dev/null
+++ b/solutions/01-Visualize-solutions.Rmd
@@ -0,0 +1,172 @@
+---
+title: "Visualization - solutions"
+output: html_notebook
+editor_options: 
+  chunk_output_type: inline
+---
+
+<!-- This file by Amelia McNamara is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
+
+## Setup
+
+The first chunk in an R Notebook is usually titled "setup," and by convention includes the R packages you want to load. Remember, in order to use an R package you have to run some `library()` code every session. Execute these lines of code to load the packages. 
+
+```{r setup}
+library(ggplot2)
+library(fivethirtyeight)
+```
+
+## Bechdel test data
+
+We're going to start by playing with data collected by the website FiveThirtyEight on movies and [the Bechdel test](https://en.wikipedia.org/wiki/Bechdel_test). 
+
+To begin, let's just preview our data. There are a couple ways to do that. One is just to type the name of the data and execute it like a piece of code. 
+
+```{r}
+bechdel
+```
+
+Notice that you can page through to see more of the dataset. 
+
+Sometimes, people prefer to see their data in a more spreadsheet-like format, and RStudio provides a way to do that. Go to the Console and type `View(bechdel)` to see the data preview. 
+
+(An aside-- `View` is a special function. Since it makes something happen in the RStudio interface, it doesn't work properly in R Notebooks. Most R functions have names that start with lowercase letters, so the uppercase "V" is there to remind you of its special status.)
+
+
+
+## Consider
+What relationship do you expect to see between movie budget (budget) and domestic gross(domgross)?
+
+## Your Turn 1
+
+Run the code on the slide to make a graph. Pay strict attention to spelling, capitalization, and parentheses!
+
+```{r}
+ggplot(data = bechdel) +
+  geom_point(mapping = aes(x = budget, y = domgross))
+```
+
+## Your Turn 2
+
+Add `color`, `size`, `alpha`, and `shape` aesthetics to your graph. Experiment.  
+
+```{r}
+ggplot(data = bechdel) +
+  geom_point(mapping = aes(x = budget, y = domgross, color=clean_test))
+
+ggplot(bechdel) + 
+  geom_point(mapping = aes(x = budget, y = domgross, size=clean_test))
+ggplot(bechdel) + 
+  geom_point(mapping = aes(x = budget, y = domgross, shape=clean_test))
+ggplot(bechdel) + 
+  geom_point(mapping = aes(x = budget, y = domgross, alpha=clean_test))
+
+```
+
+## Set vs map
+
+```{r}
+ggplot(bechdel) + 
+    geom_point(mapping = aes(x = budget, y = domgross), color="blue")
+```
+
+## Your Turn 3
+
+Replace this scatterplot with one that draws boxplots. Use the cheatsheet. Try your best guess.
+
+```{r}
+ggplot(data = bechdel) + geom_point(aes(x = clean_test, y = budget))
+
+ggplot(data = bechdel) + geom_boxplot(aes(x = clean_test, y = budget))
+```
+
+## Your Turn 4
+
+Make a histogram of the `budget` variable from `bechdel`.
+
+```{r}
+ggplot(bechdel) + 
+  geom_histogram(aes(x=budget))
+```
+
+## Your Turn 5
+Try to find a better binwidth for `budget`.
+
+```{r}
+ggplot(data = bechdel) +
+  geom_histogram(mapping = aes(x = budget), binwidth=10000000)
+```
+
+## Your Turn 6
+
+Make a density plot of `budget` colored by `clean_test`.
+
+```{r}
+ggplot(data = bechdel) +
+  geom_density(mapping = aes(x = budget))
+
+ggplot(data = bechdel) +
+  geom_density(mapping = aes(x = budget, color=clean_test))
+```
+
+
+## Your Turn 7
+
+Make a barchart of `clean_test` colored by `clean_test`.
+
+```{r}
+ggplot(data=bechdel) +
+  geom_bar(mapping = aes(x = clean_test, fill = clean_test))
+```
+
+
+## Your Turn 8
+
+Predict what this code will do. Then run it.
+
+```{r}
+ggplot(bechdel) + 
+  geom_point(aes(budget, domgross)) +
+  geom_smooth(aes(budget, domgross))
+```
+
+## global vs local
+
+```{r}
+ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) +
+  geom_point(mapping = aes(color = clean_test)) +
+  geom_smooth()
+```
+
+```{r}
+ggplot(data = bechdel, mapping = aes(x = budget, y = domgross)) +
+  geom_point(mapping = aes(color = clean_test)) +
+  geom_smooth(data = filter(bechdel, clean_test == "ok"))
+```
+
+## Your Turn
+
+What does `getwd()` return?
+
+```{r}
+getwd()
+```
+
+## Your Turn 9
+
+Save the last plot and then locate it in the files pane. If you run your `ggsave()` code inside this notebook, the image will be saved in the same directory as your .Rmd file (likely, project -> code), but if you run `ggsave()` in the Console it will be in your working directory. 
+
+```{r}
+ggsave("my-plot.png")
+```
+
+***
+
+# Take aways
+
+You can use this code template to make thousands of graphs with **ggplot2**.
+
+```{r eval = FALSE}
+ggplot(data = <DATA>) +
+  <GEOM_FUNCTION>(mapping = aes(<MAPPINGS>))
+```
\ No newline at end of file
diff --git a/solutions/02-Transform-Solutions.Rmd b/solutions/02-Transform-Solutions.Rmd
new file mode 100644
index 0000000..2645232
--- /dev/null
+++ b/solutions/02-Transform-Solutions.Rmd
@@ -0,0 +1,357 @@
+---
+title: "Transform Data - solutions"
+output: html_notebook
+editor_options: 
+  chunk_output_type: inline
+---
+
+<!-- This file by Amelia McNamara is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
+
+```{r setup}
+library(dplyr)
+library(babynames)
+library(nycflights13)
+library(skimr)
+```
+
+## Babynames
+
+```{r}
+babynames
+skim(babynames)
+skim_with(integer = list(p25 = NULL, p75=NULL))
+```
+
+
+## Your Turn 1
+Run the skim_with() command, and then try skimming babynames again to see how the output is different
+```{r}
+skim(babynames)
+```
+
+## Select
+
+```{r}
+select(babynames, name, prop)
+```
+
+## Your Turn 2
+
+Alter the code to select just the `n` column:
+
+```{r}
+select(babynames, n)
+```
+
+## Consider
+
+Which of these is NOT a way to select the `name` and `n` columns together?
+
+```{r}
+select(babynames, -c(year, sex, prop))
+select(babynames, name:n)
+select(babynames, starts_with("n"))
+select(babynames, ends_with("n"))
+```
+
+## Your Turn 3
+
+Show:
+
+* All of the names where prop is greater than or equal to 0.08  
+* All of the children named "Sea"  
+* All of the names that have a missing value for `n`  
+
+```{r}
+filter(babynames, prop >= 0.08)
+filter(babynames, name == "Sea")
+filter(babynames, is.na(n))
+```
+
+## Your Turn 4
+
+Use Boolean operators to alter the code below to return only the rows that contain:
+
+* Girls named Sea  
+* Names that were used by exactly 5 or 6 children in 1880  
+* Names that are one of Acura, Lexus, or Yugo
+
+```{r}
+filter(babynames, name == "Sea", sex == "F")
+filter(babynames, n == 5 | n == 6, year == 1880)
+filter(babynames, name %in% c("Acura", "Lexus", "Yugo"))
+```
+
+## Arrange
+
+```{r}
+arrange(babynames, n)
+```
+
+## Your Turn 5
+
+Arrange babynames by `n`. Add `prop` as a second (tie breaking) variable to arrange on. Can you tell what the smallest value of `n` is?
+
+```{r}
+arrange(babynames, n, prop)
+```
+
+## desc
+
+```{r}
+arrange(babynames, desc(n))
+```
+
+## Your Turn 6
+
+Use `desc()` to find the names with the highest prop.
+Then, use `desc()` to find the names with the highest n.
+
+```{r}
+arrange(babynames, desc(prop))
+arrange(babynames, desc(n))
+```
+
+## Steps and the pipe
+
+```{r}
+babynames %>%
+  filter(year == 2015, sex == "M") %>%
+  select(name, n) %>%
+  arrange(desc(n))
+```
+
+## Your Turn 7
+
+Use `%>%` to write a sequence of functions that: 
+
+1. Filter babynames to just the girls that were born in 2015  
+2. Select the `name` and `n` columns  
+3. Arrange the results so that the most popular names are near the top.
+
+```{r}
+babynames %>% 
+  filter(year == 2015, sex == "F") %>% 
+  select(name, n) %>% 
+  arrange(desc(n))
+```
+
+## Your Turn 8
+
+1. Trim `babynames` to just the rows that contain your `name` and your `sex`  
+2. Trim the result to just the columns that will appear in your graph (not strictly necessary, but useful practice)  
+3. Plot the results as a line graph with `year` on the x axis and `prop` on the y axis
+
+```{r}
+babynames %>% 
+  filter(name == "Amelia", sex == "F") %>%
+  select(year, prop) %>%
+  ggplot() +
+    geom_line(mapping = aes(year, prop))
+```
+
+## Your Turn 9
+
+Use summarise() to compute three statistics about the data:
+
+1. The first (minimum) year in the dataset  
+2. The last (maximum) year in the dataset  
+3. The total number of children represented in the data
+
+```{r}
+babynames %>% 
+  summarise(first = min(year), 
+            last = max(year), 
+            total = sum(n))
+```
+
+## Your Turn 10
+
+Extract the rows where `name == "Khaleesi"`. Then use `summarise()` and a summary functions to find:
+
+1. The total number of children named Khaleesi
+2. The first year Khaleesi appeared in the data
+
+```{r}
+babynames %>% 
+  filter(name == "Khaleesi") %>% 
+  summarise(total = sum(n), first = min(year))
+```
+
+## Toy data for transforming
+
+```{r}
+# Toy dataset to use
+pollution <- tribble(
+       ~city,   ~size, ~amount, 
+  "New York", "large",      23,
+  "New York", "small",      14,
+    "London", "large",      22,
+    "London", "small",      16,
+   "Beijing", "large",      121,
+   "Beijing", "small",      56
+)
+```
+
+## Summarize
+
+```{r}
+pollution %>% 
+ summarise(mean = mean(amount), sum = sum(amount), n = n())
+```
+
+```{r}
+pollution %>% 
+  group_by(city) %>%
+  summarise(mean = mean(amount), sum = sum(amount), n = n())
+```
+
+
+## Your Turn 11
+
+Use `group_by()`, `summarise()`, and `arrange()` to display the ten most popular names. Compute popularity as the total number of children of a single gender given a name.
+
+```{r}
+babynames %>%
+  group_by(name, sex) %>% 
+  summarise(total = sum(n)) %>% 
+  arrange(desc(total))
+```
+
+## Your Turn 12
+
+Use grouping to calculate and then plot the number of children born each year over time.
+
+```{r}
+babynames %>%
+  group_by(year) %>% 
+  summarise(n_children = sum(n)) %>% 
+  ggplot() +
+    geom_line(mapping = aes(x = year, y = n_children))
+```
+
+## Ungroup
+
+```{r}
+babynames %>%
+  group_by(name, sex) %>% 
+  summarise(total = sum(n)) %>% 
+  arrange(desc(total))
+```
+
+## Mutate
+
+```{r}
+babynames %>%
+  mutate(percent = round(prop*100, 2))
+```
+
+## Your Turn 13
+
+Use `min_rank()` and `mutate()` to rank each row in `babynames` from largest `n` to lowest `n`.
+
+```{r}
+babynames %>% 
+  mutate(rank = min_rank(desc(prop)))
+```
+
+## Your Turn 14
+
+Compute each name's rank _within its year and sex_. 
+Then compute the median rank _for each combination of name and sex_, and arrange the results from highest median rank to lowest.
+
+```{r}
+babynames %>% 
+  group_by(year, sex) %>% 
+  mutate(rank = min_rank(desc(prop))) %>% 
+  group_by(name, sex) %>% 
+  summarise(score = median(rank)) %>% 
+  arrange(score)
+```
+
+## Flights data
+```{r}
+flights
+skim(flights)
+```
+
+## Toy data
+
+```{r}
+band <- tribble(
+   ~name,     ~band,
+  "Mick",  "Stones",
+  "John", "Beatles",
+  "Paul", "Beatles"
+)
+
+instrument <- tribble(
+    ~name,   ~plays,
+   "John", "guitar",
+   "Paul",   "bass",
+  "Keith", "guitar"
+)
+
+instrument2 <- tribble(
+    ~artist,   ~plays,
+   "John", "guitar",
+   "Paul",   "bass",
+  "Keith", "guitar"
+)
+```
+
+## Mutating joins
+
+```{r}
+band %>% left_join(instrument, by = "name")
+```
+
+## Your Turn 15
+
+Which airlines had the largest arrival delays? Complete the code below.
+
+1. Join `airlines` to `flights`
+2. Compute and order the average arrival delays by airline. Display full names, no codes.
+
+```{r}
+flights %>% 
+  drop_na(arr_delay) %>%
+  left_join(airlines, by = "carrier") %>%
+  group_by(name) %>%
+  summarise(delay = mean(arr_delay)) %>%
+  arrange(delay)
+```
+
+## Different names
+
+```{r}
+band %>% left_join(instrument2, by = c("name" = "artist"))
+```
+
+## Your Turn 16
+
+How many airports in `airports` are serviced by flights originating in New York (i.e. flights in our dataset?) Notice that the column to join on is named `faa` in the **airports** data set and `dest` in the **flights** data set.
+
+
+```{r}
+airports %>%
+  semi_join(flights, by = c("faa" = "dest")) %>%
+  distinct(faa)
+```
+
+***
+
+# Take aways
+
+* Extract variables with `select()`  
+* Extract cases with `filter()`  
+* Arrange cases, with `arrange()`  
+
+* Make tables of summaries with `summarise()`  
+* Make new variables, with `mutate()`  
+* Do groupwise operations with `group_by()`
+
+* Connect operations with `%>%`  
+
+* Use `left_join()`, `right_join()`, `full_join()`, or `inner_join()` to join datasets
+* Use `semi_join()` or `anti_join()` to filter datasets against each other
diff --git a/solutions/03-Tidy-Solutions.Rmd b/solutions/03-Tidy-Solutions.Rmd
new file mode 100644
index 0000000..6cbdd2b
--- /dev/null
+++ b/solutions/03-Tidy-Solutions.Rmd
@@ -0,0 +1,158 @@
+---
+title: "Tidy -- Solutions"
+output:
+  github_document: 
+    df_print: tibble
+  html_document:
+    df_print: paged
+---
+
+<!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
+
+
+```{r setup}
+library(tidyverse)
+library(babynames)
+
+# Toy data
+cases <- tribble(
+  ~Country, ~"2011", ~"2012", ~"2013",
+      "FR",    7000,    6900,    7000,
+      "DE",    5800,    6000,    6200,
+      "US",   15000,   14000,   13000
+)
+pollution <- tribble(
+       ~city,   ~size, ~amount,
+  "New York", "large",      23,
+  "New York", "small",      14,
+    "London", "large",      22,
+    "London", "small",      16,
+   "Beijing", "large",     121,
+   "Beijing", "small",     121
+)
+bp_systolic <- tribble(
+  ~ subject_id,  ~ time_1, ~ time_2, ~ time_3,
+             1,       120,      118,      121,
+             2,       125,      131,       NA,
+             3,       141,       NA,       NA 
+)
+bp_systolic2 <- tribble(
+  ~ subject_id,  ~ time, ~ systolic,
+             1,       1,        120,
+             1,       2,        118,
+             1,       3,        121,
+             2,       1,        125,
+             2,       2,        131,
+             3,       1,        141
+)
+```
+
+## Tidy and untidy data
+
+`table1` is tidy:
+```{r}
+table1 
+```
+
+For example, it's easy to add a rate column with `mutate()`:
+```{r}
+table1 %>%
+  mutate(rate = cases/population)
+```
+
+`table2` isn't tidy, the count column really contains two variables:
+```{r}
+table2
+```
+
+It makes it very hard to manipulate.
+
+## Your Turn 1
+
+Is `bp_systolic` tidy?
+
+```{r}
+bp_systolic2 
+```
+
+## Your Turn 2
+
+Using `bp_systolic2` with `group_by()`, and `summarise()`:
+
+* Find the average systolic blood pressure for each subject
+* Find the last time each subject was measured
+
+```{r}
+bp_systolic2 %>% 
+  group_by(subject_id) %>%
+  summarise(avg_bp = mean(systolic),
+    last_time = max(time))
+```
+
+## Your Turn 3
+
+On a sheet of paper, draw how the cases data set would look if it had the same values grouped into three columns: **country**, **year**, **n**
+
+-----------------------------
+   country     year   cases  
+------------- ------ --------
+ Afghanistan   1999    745   
+
+ Afghanistan   2000    2666  
+
+   Brazil      1999   37737  
+
+   Brazil      2000   80488  
+
+    China      1999   212258 
+
+    China      2000   213766 
+-----------------------------
+
+## Your Turn 4
+
+Use `gather()` to reorganize `table4a` into three columns: **country**, **year**, and **cases**.
+
+```{r}
+table4a %>%
+  gather(key = "year", 
+    value = "cases", -country) %>%
+  arrange(country)
+```
+
+## Your Turn 5
+
+On a sheet of paper, draw how `pollution` would look if it had the same values grouped into three columns: **city**, **large**, **small**
+
+--------------------------
+   city     large   small 
+---------- ------- -------
+ Beijing     121     121  
+
+  London     22      16   
+
+ New York    23      14   
+--------------------------
+
+## Your Turn 6
+
+Use `spread()` to reorganize `table2` into four columns: **country**, **year**, **cases**, and **population**.
+
+```{r}
+table2 %>%
+  spread(key = type, value = count)
+```
+
+***
+
+# Take Aways
+
+Data comes in many formats but R prefers just one: _tidy data_.
+
+A data set is tidy if and only if:
+
+1. Every variable is in its own column
+2. Every observation is in its own row
+3. Every value is in its own cell (which follows from the above)
+
+What is a variable and an observation may depend on your immediate goal.
\ No newline at end of file
diff --git a/solutions/04-Case-Study-Solutions.Rmd b/solutions/04-Case-Study-Solutions.Rmd
new file mode 100644
index 0000000..842ed7d
--- /dev/null
+++ b/solutions/04-Case-Study-Solutions.Rmd
@@ -0,0 +1,211 @@
+---
+title: 'Case Study: Friday the 13th Effect (Solution)'
+output:
+  html_document:
+    df_print: paged
+  github_document: 
+    df_print: tibble  
+---
+
+<!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License. -->
+
+```{r setup}
+library(fivethirtyeight)
+library(tidyverse)
+```
+
+## Task
+
+Reproduce this figure from fivethirtyeight's article [*Some People Are Too Superstitious To Have A Baby On Friday The 13th*](https://fivethirtyeight.com/features/some-people-are-too-superstitious-to-have-a-baby-on-friday-the-13th/):
+
+![](resources/bialik-fridaythe13th-2.png)
+
+## Data
+
+In the `fivethiryeight` package there are two datasets containing birth data, but for now let's just work with one `US_births_1994_2003`.  Note that since we have data from 1994-2003, our results may differ somewhat from the figure based on 1994-2014.
+
+## Your Turn 1 
+
+With your neighbour, brainstorm the steps needed to get the data in a form ready to make the plot.
+
+```{r}
+US_births_1994_2003
+```
+
+## Some overviews of the data
+
+Whole time series:
+```{r}
+ggplot(US_births_1994_2003, aes(x = date, y = births)) +
+  geom_line()
+```
+There is so much fluctuation it's really hard to see what is going on.
+
+Let's try just looking at one year:
+```{r}
+US_births_1994_2003 %>%
+  filter(year == 1994) %>%
+  ggplot(mapping = aes(x = date, y = births)) +
+    geom_line()
+```
+Strong weekly pattern accounts for most variation.
+
+## Strategy
+
+Use the figure as a guide for what the data should like to make the final plot.  We want to end up with something like:
+
+---------------------------
+ day_of_week   avg_diff_13 
+------------- -------------
+     Mon         -2.686    
+
+    Tues         -1.378    
+
+     Wed         -3.274    
+     
+     ...          ...
+     
+---------------------------     
+
+There is more than one way to get there, but we
+ll roughly follow this strategy:
+
+* Get just the data for the 6th, 13th, and 20th
+* Calculate variable of interest:
+    * (For each month/year):
+        * Find average births on 6th and 20th
+        * Find percentage difference between births on 13th and average births on 6th and 20th
+    
+    * Average percent difference by day of the week
+* Create plot
+
+## Your Turn 2
+
+Extract just the 6th, 13th and 20th of each month:
+
+```{r}
+US_births_1994_2003 %>%
+  select(-date) %>% 
+  filter(date_of_month %in% c(6, 13, 20)) 
+```
+
+## Your Turn 3
+
+Which arrangement is tidy?
+
+**Option 1:**
+
+-----------------------------------------------------
+ year   month   date_of_month   day_of_week   births 
+------ ------- --------------- ------------- --------
+ 1994     1           6            Thurs      11406  
+
+ 1994     1          13            Thurs      11212  
+
+ 1994     1          20            Thurs      11682  
+-----------------------------------------------------
+
+**Option 2:**
+
+----------------------------------------------------
+ year   month   day_of_week     6      13      20   
+------ ------- ------------- ------- ------- -------
+ 1994     1        Thurs      11406   11212   11682 
+----------------------------------------------------
+
+(**Hint:** think about our next step *"Find the percent difference between the 13th and the average of the 6th and 12th"*. In which layout will this be easier using our tidy tools?)
+
+**Solution**: Option 2, since then we can easily use `mutate()`.
+
+## Your Turn 4
+
+Tidy the filtered data to have the days in columns.
+
+```{r}
+US_births_1994_2003 %>%
+  select(-date) %>% 
+  filter(date_of_month %in% c(6, 13, 20)) %>%
+  spread(date_of_month, births)
+```
+
+## Your Turn 5
+
+Now use `mutate()` to add columns for:
+
+* The average of the births on the 6th and 20th
+* The percentage difference between the number of births on the 13th and the average of the 6th and 20th
+
+```{r}
+US_births_1994_2003 %>%
+  select(-date) %>% 
+  filter(date_of_month %in% c(6, 13, 20)) %>%
+  spread(date_of_month, births) %>%
+  mutate(
+    avg_6_20 = (`6` + `20`)/2,
+    diff_13 = (`13` - avg_6_20) / avg_6_20 * 100
+  )
+```
+
+## A little additional exploring
+
+Now we have a percent difference between the 13th and the 6th and 20th of each month, it's probably worth exploring a little (at the very least to check our calculations seem reasonable).
+
+To make it a little easier let's assign our current data to a variable
+```{r}
+births_diff_13 <- US_births_1994_2003 %>%
+  select(-date) %>% 
+  filter(date_of_month %in% c(6, 13, 20)) %>%
+  spread(date_of_month, births) %>%
+  mutate(
+    avg_6_20 = (`6` + `20`)/2,
+    diff_13 = (`13` - avg_6_20) / avg_6_20 * 100
+  )
+```
+
+Then take a look
+```{r}
+births_diff_13 %>% 
+  ggplot(mapping = aes(day_of_week, diff_13)) +
+    geom_point()
+```
+
+Looks like we are on the right path.  There's a big outlier one Monday
+```{r}
+births_diff_13 %>%
+  filter(day_of_week == "Mon", diff_13 > 10)
+```
+
+Seem's to be driven but a particularly low number of births on the 6th of Sep 1999. Maybe a holiday effect? Labour Day was of the 6th of Sep that year.
+
+## Your Turn 6
+
+Summarize each day of the week to have mean of diff_13.
+
+Then, recreate the fivethirtyeight plot.
+
+```{r}
+US_births_1994_2003 %>%
+  select(-date) %>% 
+  filter(date_of_month %in% c(6, 13, 20)) %>%
+  spread(date_of_month, births) %>%
+  mutate(
+    avg_6_20 = (`6` + `20`)/2,
+    diff_13 = (`13` - avg_6_20) / avg_6_20 * 100
+  ) %>%
+  group_by(day_of_week) %>%
+  summarise(avg_diff_13 = mean(diff_13)) %>%
+  ggplot(aes(x = day_of_week, y = avg_diff_13)) +
+    geom_bar(stat = "identity")
+```
+
+## Extra Challenges
+
+* If you wanted to use the `US_births_2000_2014` data instead, what would you need to change in the pipeline?  How about using both `US_births_1994_2003` and `US_births_2000_2014`?
+
+* Try not removing the `date` column. At what point in the pipeline does it cause problems? Why?
+
+* Can you come up with an alternative way to investigate the Friday the 13th effect?  Try it out!
+
+## Takeaways
+
+The power of the tidyverse comes from being able to easily combine functions that do simple things well.  
\ No newline at end of file
diff --git a/solutions/06-Iterate-solutions.Rmd b/solutions/06-Iterate-solutions.Rmd
new file mode 100644
index 0000000..8bbb371
--- /dev/null
+++ b/solutions/06-Iterate-solutions.Rmd
@@ -0,0 +1,164 @@
+---
+title: "Iteration (solutions)"
+output:
+  html_document:
+    df_print: paged
+  github_document: 
+    df_print: tibble  
+---
+
+<!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
+
+```{r setup}
+library(tidyverse)
+
+# Toy data
+set.seed(1000)
+exams <- list(
+  student1 = round(runif(10, 50, 100)),
+  student2 = round(runif(10, 50, 100)),
+  student3 = round(runif(10, 50, 100)),
+  student4 = round(runif(10, 50, 100)),
+  student5 = round(runif(10, 50, 100))
+)
+
+extra_credit <- list(0, 0, 10, 10, 15)
+```
+
+## Your Turn 1
+
+What kind of object is `mod`?  Why are models stored as this kind of object?
+
+```{r}
+mod <- lm(price ~ carat + cut + color + clarity, data = diamonds)
+# View(mod)
+```
+
+`mod` is a list.  A list is used because we need to store lots of heterogeneous information.
+
+## Quiz
+
+What's the difference between a list and an **atomic** vector?
+
+Atomic vectors are: "logical", "integer", "numeric" (synonym "double"), "complex", "character" and "raw" vectors.
+
+Lists can hold data of different types and different lengths, we can even put lists inside other lists.
+
+## Your Turn 2
+
+Here is a list:
+
+```{r}
+a_list <- list(num = c(8, 9), 
+            log = TRUE,    
+            cha = c("a", "b", "c"))
+```
+
+Here are two subsetting commands. Do they return the same values? Run the code chunk above, _and then_ run the code chunks below to confirm
+
+```{r}
+a_list["num"]
+```
+
+```{r}
+a_list$num
+```
+
+## Your Turn 3
+
+What will each of these return? Run the code chunks to confirm.
+
+```{r}
+vec <- c(-2, -1, 0, 1, 2)
+abs(vec)
+```
+
+`abs()` returns the absolute value of each element.
+
+```{r, error = TRUE}
+lst <- list(-2, -1, 0, 1, 2)
+abs(lst)
+```
+
+Out intent might be to take the absolute value of each element, but we get an error, because `abs()` doens't know how to handle a list.
+
+## Your Turn 4
+
+Run the code in the chunks. What does it return?
+
+```{r}
+list(student1 = mean(exams$student1),
+     student2 = mean(exams$student2),
+     student3 = mean(exams$student3),
+     student4 = mean(exams$student4),
+     student5 = mean(exams$student5))
+```
+
+This chunk manually iterates over the elements of `exams` taking the mean of each element, and returning the results in a list.
+
+```{r}
+library(purrr)
+map(exams, mean)
+```
+
+This does the exact same thing, but automatically.
+
+
+## Your Turn 5
+
+Calculate the variance (`var()`) of each student’s exam grades.
+
+```{r}
+exams %>% map(var)
+```
+
+## Your Turn 6
+
+Calculate the max grade (max())for each student. Return the result as a vector.
+
+```{r}
+exams %>% map_dbl(max)
+```
+
+## Your Turn 7
+
+Write a function that counts the best exam twice and then takes the average. Use it to grade all of the students.
+
+1. Write code that solves the problem for a real object  
+2. Wrap the code in `function(){}` to save it  
+3. Add the name of the real object as the function argument 
+
+```{r}
+double_best <- function(x) {
+  (sum(x) + max(x)) / (length(x) + 1)
+}
+
+exams %>%
+  map_dbl(double_best)
+```
+
+### Your Turn 8
+
+Compute a final grade for each student, where the final grade is the average test score plus any `extra_credit` assigned to the student. Return the results as a double (i.e. numeric) vector.
+
+```{r}
+exams %>% 
+  map2_dbl(extra_credit, function(x, y) mean(x) + y)
+```
+
+
+***
+
+# Take Aways
+
+Lists are a useful way to organize data, but you need to arrange manually for functions to iterate over the elements of a list.
+
+You can do this with the `map()` family of functions in the purrr package.
+
+To write a function, 
+
+1. Write code that solves the problem for a real object  
+2. Wrap the code in `function(){}` to save it  
+3. Add the name of the real object as the function argument 
+
+This sequence will help prevent bugs in your code (and reduce the time you spend correcting bugs). 
diff --git a/solutions/07-Model-Solutions.Rmd b/solutions/07-Model-Solutions.Rmd
new file mode 100644
index 0000000..67320a7
--- /dev/null
+++ b/solutions/07-Model-Solutions.Rmd
@@ -0,0 +1,125 @@
+---
+title: "Model (solutions)"
+output:
+  html_document:
+    df_print: paged
+  github_document: 
+    df_print: tibble  
+---
+
+```{r setup, message=FALSE}
+library(tidyverse)
+library(modelr)
+library(broom)
+
+wages <- heights %>% filter(income > 0)
+```
+
+## Your Turn 1
+
+Fit the model on the slide and then examine the output. What does it look like?
+
+```{r}
+mod_e <- lm(log(income) ~ education, data = wages)
+mod_e
+```
+
+## Your Turn 2
+
+Use a pipe to model `log(income)` against `height`. Then use broom and dplyr functions to extract:
+
+1. The **coefficient estimates** and their related statistics 
+2. The **adj.r.squared** and **p.value** for the overall model
+
+```{r}
+mod_h <- wages %>% lm(log(income) ~ height, data = .)
+mod_h %>%
+  tidy()
+
+mod_h %>% 
+  glance() %>% 
+  select(adj.r.squared, p.value)
+```
+
+## Your Turn 3
+
+Model `log(income)` against `education` _and_ `height`. Do the coefficients change?
+
+```{r}
+mod_eh <- wages %>% 
+  lm(log(income) ~ education + height, data = .)
+
+mod_eh %>% 
+  tidy()
+```
+
+## Your Turn 4
+
+Model `log(income)` against `education` and `height` and `sex`. Can you interpret the coefficients?
+
+```{r}
+mod_ehs <- wages %>% 
+  lm(log(income) ~ education + height + sex, data = .)
+
+mod_ehs %>% 
+  tidy()
+```
+
+## Your Turn 5
+
+Use a broom function and ggplot2 to make a line graph of `height` vs `.fitted` for our heights model, `mod_h`.
+
+_Bonus: Overlay the plot on the original data points._
+
+```{r}
+mod_h <- wages %>% lm(log(income) ~ height, data = .)
+
+mod_h %>% 
+  augment(data = wages) %>% 
+  ggplot(mapping = aes(x = height, y = .fitted)) +
+    geom_point(mapping = aes(y = log(income)), alpha = 0.1) +
+    geom_line(color = "blue")
+```
+
+## Your Turn 6
+
+Repeat the process to make a line graph of `height` vs `.fitted` colored by `sex` for model mod_ehs. Are the results interpretable? Add `+ facet_wrap(~education)` to the end of your code. What happens?
+
+```{r}
+mod_ehs <- wages %>% lm(log(income) ~ education + height + sex, data = .)
+
+mod_ehs %>% 
+  augment(data = wages) %>% 
+  ggplot(mapping = aes(x = height, y = .fitted, color = sex)) +
+    geom_line() +
+    facet_wrap(~ education)
+```
+
+## Your Turn 7
+
+Use one of `spread_predictions()` or `gather_predictions()` to make a line graph of `height` vs `pred` colored by `model` for each of mod_h, mod_eh, and mod_ehs. Are the results interpretable? 
+
+Add `+ facet_grid(sex ~ education)` to the end of your code. What happens?
+
+```{r warning = FALSE, message = FALSE}
+mod_h <- wages %>% lm(log(income) ~ height, data = .)
+mod_eh <- wages %>% lm(log(income) ~ education + height, data = .)
+mod_ehs <- wages %>% lm(log(income) ~ education + height + sex, data = .)
+
+wages %>% 
+  gather_predictions(mod_h, mod_eh, mod_ehs) %>% 
+  ggplot(mapping = aes(x = height, y = pred, color = model)) +
+    geom_line() +
+    facet_grid(sex ~ education)
+```
+
+***
+
+# Take Aways
+
+* Use `glance()`, `tidy()`, and `augment()` from the **broom** package to return model values in a data frame.
+
+* Use `add_predictions()` or `gather_predictions()` or `spread_predictions()` from the **modelr** package to visualize predictions.
+
+* Use `add_residuals()` or `gather_residuals()` or `spread_residuals()` from the **modelr** package to visualize residuals.
+
diff --git a/solutions/08-Organize-Solutions.Rmd b/solutions/08-Organize-Solutions.Rmd
new file mode 100644
index 0000000..ff70215
--- /dev/null
+++ b/solutions/08-Organize-Solutions.Rmd
@@ -0,0 +1,143 @@
+---
+title: "Organize with List Columns"
+output:
+  html_document:
+    df_print: paged
+  github_document: 
+    df_print: tibble  
+---
+
+<!-- This file by Charlotte Wickham is licensed under a Creative Commons Attribution 4.0 International License, adapted from the orignal work at https://github.com/rstudio/master-the-tidyverse by RStudio. -->
+
+```{r setup}
+library(tidyverse)
+library(gapminder)
+library(broom)
+
+nz <- gapminder %>%
+  filter(country == "New Zealand")
+us <- gapminder %>%
+  filter(country == "United States")
+```
+
+## Your turn 1
+
+How has life expectancy changed in other countries?
+Make a line plot of lifeExp vs. year grouped by country.  
+Set alpha to 0.2, to see the results better.
+
+```{r}
+gapminder %>% 
+  ggplot(mapping = aes(x = year, y = lifeExp, group = country)) +
+    geom_line(alpha = 0.2)
+```
+
+## Quiz
+
+How is a data frame/tibble similar to a list?
+
+```{r}
+gapminder_sm <- gapminder[1:5, ]
+```
+
+It is a list!  Columns are like elements of a list
+
+You can extract them with `$` of `[[`
+```{r}
+gapminder_sm$country
+gapminder_sm[["country"]]
+```
+
+Or get a new smaller list with `[`:
+```{r}
+gapminder_sm["country"]
+```
+
+## Quiz
+
+If one of the elements of a list can be another list,
+can one of the columns of a data frame be another list?
+
+**Yes!**.
+
+```{r}
+tibble(
+  num = c(1, 2, 3),
+  cha = c("one", "two", "three"),
+  listcol = list(1, c("1", "two", "FALSE"), FALSE)
+)
+```
+
+And we call it a **list column**.
+
+## Your turn 2
+
+Run this chunk:
+```{r}
+gapminder_nested <- gapminder %>%
+  group_by(country) %>%
+  nest()
+
+fit_model <- function(df) lm(lifeExp ~ year, data = df)
+
+gapminder_nested <- gapminder_nested %>% 
+  mutate(model = map(data, fit_model))
+
+get_rsq <- function(mod) glance(mod)$r.squared
+
+gapminder_nested <- gapminder_nested %>% 
+  mutate(r.squared = map_dbl(model, get_rsq))
+```
+
+Then filter `gapminder_nested` to find the countries with r.squared less than 0.5.  
+
+```{r}
+gapminder_nested %>% 
+  filter(r.squared < 0.5)
+```
+
+## Your Turn 3
+
+Edit the code in the chunk provided to instead find and plot countries with a slope above 0.6 years/year.
+
+```{r}
+get_slope <- function(mod) {
+  tidy(mod) %>% filter(term == "year") %>% pull(estimate)
+}
+
+# Add new column with r-sqaured
+gapminder_nested <- gapminder_nested %>% 
+  mutate(slope = map_dbl(model, get_slope))
+
+# filter out low r-squared countries
+big_slope <- gapminder_nested %>% 
+  filter(slope > 0.6)
+
+# unnest and plot result
+unnest(big_slope, data) %>%
+  ggplot(aes(x = year, y = lifeExp)) + 
+    geom_line(aes(color = country))
+```
+
+## Your Turn 4
+
+**Challenge:**
+
+1. Create your own copy of `gapminder_nested` and then add one more list column: `output` which contains the output of `augment()` for each model.
+
+2.  Plot the residuals against time for the countries with small r-squared.
+
+```{r}
+charlotte_gapminder <- gapminder_nested
+
+charlotte_gapminder %>%
+  mutate(output = model %>% map(augment)) %>%
+  unnest(output) %>%
+  filter(r.squared < 0.5) %>%
+  ggplot() + 
+    geom_line(aes(year, .resid, color = country))
+
+```
+
+# Take away
+