# Class 12: Multiple regression and count and binary response variables

This class notebook is designed to let you practice the basics of multiple regression, log-linear regression (counts response) and logistic regression (binary response). The datasets are small and simple compared to the project datasets. This is so that you can learn the concepts of data modelling and hypothesis testing without the additional burden of cleaning and manipulating large, messy datasets. 

Everything you need to complete these analyses is covered in today's lecture and the accompanying example notebooks. The example notebooks work through the examples in the lecture in more detail. And you should use those as a reference to completing the analyses in this notebook.

## Imports

In [1]:
import warnings
import numpy as np
import pandas as pd
import seaborn as sns
from statsmodels.formula.api import glm, ols
from statsmodels.genmod.families import Poisson, Binomial

warnings.filterwarnings("ignore")

## Two categorical explanatory variables. Numerical response variable.

### Does the petal length of iris species depend on where they grow?

<div>
<img src="attachment:51518iris img1.png" width='50%' title=""/>
</div>

In the lecture we examined the relationship between iris sepal length and species and site of growth.

Now examine the relationship between iris **petal** length and species and site of growth.

The data collected by the researchers are in the file `../Datasets/iris.csv`.

<div class="alert alert-warning">

Use the [iris1.ipynb](iris1.ipynb) notebook to help you answer this question.
</div>

- Read in the data and use an appropriate graph to visually examine the relationship between petal length and species and site. 

- Write the model formula for the relationship between petal length and species and site of growth.
- Fit the model and test the null hypothesis.

- Can you simplify your model by removing terms from the formula?
- Fit your simpler model and test if it is prefered over the more complicated model.

- Report the outcome of the test as you would in a scientific report or paper.

> Write your conclusion here.

## One categorical and one numerical explanatory variable. Numerical response variable.

### Were Neanderthal brains smaller than early modern human brains?

<div>
<img src="attachment:faces.jpg" width='50%' title=""/>
</div>

Did Neanderthals have smaller brains than early modern humans, once differences in body size are taken into account. The data in `../Datasets/neanderthals.csv` contains body mass (in kg) and brain mass (in g) of neanderthal and early modern human specimens.

<div class="alert alert-warning">

Use the [iris2.ipynb](iris2.ipynb) notebook to help you answer this question.
</div>

- Read in the data and use an appropriate graph to visually examine the relationship between brain mass and body mass for the two species.

- Write the model formula for the relationship between brain mass and body mass for the two species.
- Fit the model.

- Can you simplify your model by removing terms from the formula? Start with the interaction term first.
- Fit your simpler model and test if it is prefered over the more complicated model.

- Report the outcome of the test as you would in a scientific report or paper.

> Write your conclusion here.

## One numerical explanatory variable. Count response variable.

### How much DEET stops mosquitoes biting?

<div>
<img src="attachment:bite.jpeg" width='50%' title=""/>
</div>

Scientists carried out a clinical trial to investigate the effectiveness of DEET in preventing mosquito bites. DEET was applied to the arms of 52 volunteers. The dose, in units of mg/cm<sup>2</sup>, each volunteer received varied. A cage of 60 mosquitoes was placed on the arm of each volunteer and the number of bites recorded. 

The data are in the file `../Datasets/bites.csv`.

<div class="alert alert-warning">

Use the [cancer_clusters.ipynb](cancer_clusters.ipynb) notebook to help you answer this question.
</div>

- Read in the data and use an appropriate graph to visually examine the relationship between DEET dose and the number of bites.

- Write the model formula for the relationship between bites received and DEET dose..
- Fit the model.

- Remake the regression plot with the correct log-linear model, not seaborn's linear model.

- What is the mathematical formula that relates the mean number of bites to dose, i.e., what is the formula of the regression line?

> Write your regression line formula here

- What is the minimum dose of DEET that results in less than one bite on average?

## Two numerical explanatory variables. Binary response variable.

### Geographic determinants of Adelaïde Warbler presence on Puerto Rican islands

<div>
<img src="attachment:reinita-mariposera-adelaide-warbler.jpg" width='30%' title=""/>
</div>

Scientists have long been interested in how island geography affects the presence or absence of bird species. This dataset comprises 50 islands off the coast of Puerto Rico. For each island we have its area in km<sup>2</sup>, its distance from the mainland and whether the Adelaïde Warbler is present or absent.
 
We are interested in how an island's area and its distance from the mainland affect the probability of finding the Adelaïde Warbler on that island.

The data are in the file `../Datasets/warbler.csv`. The variable "area" is the area of each island in km<sup>2</sup> and the variable "isolation" is the distance of each island from the mainland in km.

<div class="alert alert-warning">

Use the [Adelaide_Warbler.ipynb](Adelaide_Warbler.ipynb) notebook to help you answer this question.
</div>

- Read in the data.
- Transform the variable "incidence" into a 0/1 binary variable.
- Use appropriate graphs to visually examine the relationships between island area, island isolation and warbler presence. You'll need one graph for island area and another graph for island isolation.

- Write the model formula for the relationships between island area, island isolation and warbler presence. Don't forget the interaction between area and isolation.
- Fit the model.

- Can you simplify your model by removing terms from the formula? Start with the interaction term first.
- Fit your simpler model and test if it is prefered over the more complicated model.

- Report the outcome of the tests as you would in a scientific report or paper.

> Write your conclusion here.