# Models and Maps

## Models

Let's again consider the car dataset from second notebook.

In that notebook we plotted *qsec* as a function of *hp*. However we might be interested a better model. Let's load the data.

In [None]:
library(tidyverse)

data(mtcars)

mtcars_tbl <- as_tibble(rownames_to_column(mtcars,var='model'))

str(mtcars_tbl)

Now let's fit three different linear models with `lm` from `stats`-package [[lm]](https://www.rdocumentation.org/packages/stats/versions/3.4.3/topics/lm).

First model will be `qsec ~ wt`, while second will be `qsec ~ hp`. Let's combine both of these effects into a third model `qsec ~ hp / wt`.

`summary` will show a summary of the model.

In [None]:
lm1_model <- function(data) lm(qsec ~ wt,      data=data)
lm2_model <- function(data) lm(qsec ~ hp,      data=data)
lm3_model <- function(data) lm(qsec ~ hp / wt, data=data)

summary(lm1_model(mtcars_tbl))
summary(lm2_model(mtcars_tbl))
summary(lm3_model(mtcars_tbl))

One can add arbitrary amount of terms into these models. There's plenty of other models in R libraries one might want to use.

# Nesting

Let's say we want to calculate the same models for each group specified by a cylinder. 

This means we need to do iteration over the groups and for this to work, we should split the data into chunks that will be iterated over. 

To do this we can use the `nest`-function ([[nest]](http://tidyr.tidyverse.org/reference/nest.html)).

In [None]:
mtcars_nested <- mtcars_tbl %>%
    # Convert cyl into a factor
    mutate_at(vars(cyl),as.factor) %>%
    # Group by cyl
    group_by(cyl) %>%
    # Nest the data
    nest()

print(mtcars_nested)

This produces a `tibble` where all data is stored in a column of a type `list` and name *data*.

## Maps

### Example 1: running linear models on groups

Now that we have our list to iterate over, we can use `map` to do the iteration.

`map` is provided by the purrr-package. There are variants of it based on the return value of the used function. 

In this case we receive the results for a model as strange S3-objects, so we want to use the `map`-function that creates a list from the outputs [[map]](http://purrr.tidyverse.org/reference/map.html).

In [None]:
# Map each data to model, pipe resulting fits to summary-function
map(mtcars_nested$data,lm3_model) %>%
    map(summary)

A more *tidyverse*-approach to using the `map` is to use it with `mutate` to store the fits into a new columns. This makes it easy to run multiple models and store their results.

In [None]:
mtcars_nested <- mtcars_nested %>%
    mutate(
        model1=map(data, lm1_model),
        model2=map(data, lm2_model),
        model3=map(data, lm3_model)
    )

# Check structure
print(mtcars_nested)

Package `broom` comes with nice functions `tidy` and `glance` that can be used to obtain coefficients or tests of the models in nice tibbles [[broom vignette]](https://cran.r-project.org/web/packages/broom/vignettes/broom.html).

In [None]:
library(broom)

tidy(mtcars_nested$model3[[1]])

glance(mtcars_nested$model3[[1]])

Let's use `tidy` to get the model parameters.

In [None]:
mtcars_nested <- mtcars_nested %>%
    mutate(
        model1_coefs=map(model3,tidy),
        model2_coefs=map(model3,tidy),
        model3_coefs=map(model3,tidy)
    )

print(mtcars_nested)

Let's limit ourselves to model no. 3, as that is the most interesting  and use `unnest` to unnest the coefficients.

In [None]:
mtcars_model3 <- mtcars_nested %>%
    select(cyl,model3_coefs) %>%
    unnest(model3_coefs)

print(mtcars_model3)

### Example 2: Getting summaries of subgroups

Lets say we want to store statistics calculated from `iris`-dataset with our data. Let's nest the data.

In [None]:
iris_nested <- as_tibble(iris) %>%
    group_by(Species) %>%
    nest()

print(iris_nested)

Now the data belonging to each species is stored in the data-variable. Now we cannot, however just use summarize the data as the summary would not be done against the `tibble` stored in the data. Instead we need to define a function that acts on the data itself and use a map that acts on the list on which the data-`tibble`s are stored.

In [None]:
iris_statistics <- function(tbl) {
    return(as_tibble(tbl %>%
        summarize(
            Petal.Length_mean=mean(Petal.Length),
            Petal.Width_mean=mean(Petal.Width),
            Petal.Length_var=var(Petal.Length),
            Petal.Width_var=var(Petal.Width),
            Petal_cor=cor(Petal.Length,Petal.Width)))
    )
}

as_tibble(iris) %>%
    group_by(Species) %>%
    iris_statistics()

On nested data the function is used with:

In [None]:
iris_nested <- iris_nested %>%
    mutate(statistics=map(data,iris_statistics))

print(iris_nested)

Now our statistics are stored in the variable `statistics`. They are not that easy to access, though. Let's use `unnest` to reverse the nesting in the `statistics`-variable.

In [None]:
iris_nested <- iris_nested %>%
    unnest(statistics)

print(iris_nested)

# Exercise time:

Do this exercise to `storms`-dataset initialized below that is a subset of NOAA Atlantic hurricane database [[storms]](http://dplyr.tidyverse.org/reference/storms.html).

1. Group the dataset based on `name`. Nest the data.
2. Use map to calculate the minimum pressure, maximum wind speed and maximum category for each storm. Store these to the object. Unwind them into variables.
3. Plot a scatterplot with x-axis showing minimum pressure, y-axis showing maximum wind speed and colour showing maximum category.

In [None]:
data(storms)

str(storms)

# Solutions:

## 1.

In [None]:
storms_nested <- storms %>%
    mutate(name=as.factor(name)) %>%
    group_by(name) %>%
    nest()

print(storms_nested)

## 2.

In [None]:
storm_stats <- function(storm_data) {
    output <- storm_data %>%
        summarize(min_pressure=min(pressure),max_wind=max(wind),max_category=max(category))
}

storms_nested <- storms_nested %>%
    mutate(stats=map(data,storm_stats)) %>%
    unnest(stats)

print(storms_nested)

## 3.

In [None]:
storms_nested %>%
    ggplot(aes(x=min_pressure,y=max_wind,color=max_category)) +
    geom_point() +
    scale_x_reverse() +
    labs(x='Minimum pressure [mbar]',y='Maximum windspeed [km/h]',color='Storm category')