# Homework 5 - Investigating Mammalian Fecundity and Conservation

## Logistics

**Due date**: The homework is due 11:59pm on Tuesday, February 11.

You will submit your work on [MarkUs](https://markus-ds.teach.cs.toronto.edu).
To submit your work:

1. Download this file (`Homework_5.ipynb`) from JupyterHub. (See [our JupyterHub Guide](../../../../guides/jupyterhub_guide.ipynb) for detailed instructions.)
2. Submit this file to MarkUs under the **hw5** assignment. (See [our MarkUs Guide](../../../../guides/markus_guide.ipynb) for detailed instructions.)
All homeworks will take place in a Jupyter notebook (like this one). When you are done, you will download this notebook and submit it to MarkUs.
We've incuded submission instructions at the end of this notebook.


## Introduction

In this assignment, we will ask a question about how mammalian reproductive strategies relate to their conservation risk. We will combine information from two different datasets to investigate this question.

### Data science question for the week:

This week, we will be combining data on mammalian ecology to compute a metric that we will call "maximum lifetime fecundity". This value estimates a species' __reproductive potential__. That is, *how prodigious is each species at producing new offspring*? 

As biologists, we may have an intuition that species that are more capable at reproducing quickly maybe more resilient to extinction as the environment changes. We will see how this metric relates to extinction risk, as calclated from IUCN data to ask a targeted question about mammalian conservation:

**_Is there a difference in extinction risk between species with higher reproductive potential (greater maximum lifetime fecundity) vs. species with lower reproductive potential smaller maximum lifetime fecundity)?_**

## Problem 1. Calculate maximum lifetime fecundity across mammals

### Problem 1a. Read in Mammalian life history data

Please open the file `fecundity.csv` and read all the lines into Python's memory. Assign the header to the variable `lh_header` and the rest of the data to `lh_data`. You may wish to examine the header and first couple lines of the data file.

In [1]:
# Write your code here


### Problem 1b. Calculate our lifetime fecundity metric

We will now estimate a new measurement that we will call `max_lifetime_fecundity`. Use the following formula to calculate this metric:

$$ \frac{longevity - maturity}{interbirth} * litter\_size $$

This will be computed using the following columns:

- `maturity_d`: How long it takes for the average individual to grow to sexual maturity. This is the earliest age at which an individual can reproduce. This is measured in *days* as the interval between birth and the time when the individual first reproduces.
 
- `longevity_m`: The maximum observed lifespan of an individual within each species, expressed in *months*.

- `interbirth_d`: How long adult females wait, on average, between giving birth and becoming pregnant again, expressed in *days*.

- `litter_size_ind`: How many babies females within each species have at one time, on average. The units are individuals, or number of offspring.

#### Instructions

Create an empty dictionary and assign it to a variable called `max_fecund`.

Loop over the lines of data and split up the columns using a comma delimiter and apply the above formula to the data across each column. Keep in mind that you will need to mind missing data, which is expressed in this datafile as an empty string, `""`. Since you will not be able to calculate the metric unless all four measurements are present, use exception handling to skip over any line that is missing any one or more of these.

Once you calcuate the metric for each line, store the name of the species represented on that line as the key in `max_fecund` with the metric you calculate as the value.

#### A note on units (important!!!)

The three measurements relating to time (`'maturity_d'`, `'longevity_m'`, and `'interbirth_d'`) are expressed in two different units. As you loop over the lines and calculate this metric from the values in each column, convert each of these columns so that they are expressed in years. 


In [2]:
# Write your code here


### Problem 1c. Interpret

Please explain what our maximum fecundity metric is measuring **(2pts)**. In what units is it expressed **(2pts)**? Can you think of any reasons the metric might be inaccurate **(optional)**?

**WRITE YOUR RESPONSE HERE.**

## Problem 2. IUCN Conservation Risk

Next, we will read in the IUCN conservation risk data and use it to classify whether species are or are not at risk.

### Problem 2a. Read in IUCN Data

Read in the IUCN data. Assign the header (first line) to the variable `iucn_header`. Assign the rest of the data to `iucn_data`. You may wish to examine the header and first few lines to familiarize yourself with how the dataset is structured.

In [3]:
# Write your code here


### Problem 2b. Define IUCN Risk Levels

Next, we will want to order the IUCN risk levels according to their severity. We will use the following numbering scheme (defined in lecture):

![](images/iucn.svg)

First, run the code below to create a dictionary, assigned to the variable `iucn_map`, mapping each IUCN level with a numbered level, expressed as a python integer. 

Next, create an empty dictionary and assign it to the variable `at_risk`. Loop over the lines of the file, splitting them using a comma (the delimiting character for this dataset). Next, we will want to know if the species for each row is at significant conservation risk. **We will consider any species at IUCN level "VU" or above (numerically, level 3 or above) to be at risk**. Look up the numeric expression of the IUCN level using `iucn_map`. Add the species name for each line to `at_risk` as the key. If the IUCN level is at or above 3, set the value as `True`. If it is below 3, set it to `False`. 

As an example of how things should look, let's consider humans, Homo sapiens. Homo sapiens are at IUCN level "LC", or 1 in our numeric mapping scheme. Humans would be stored in `at_risk` as follows:

```python
{"Homo_sapiens": False}
```

On the other hand, the bat species _Acerodon jubatus_ is at level "EN", or 4 in our numbered scheme, and so would be assigned the value `True` when added to the dictionary. Our dictionary with both species would appear as follows: 

```python
{"Homo_sapiens": False, "Acerodon_jubatus": True}
```




In [5]:
# Run this cell to define iucn_map

iucn_map = {'LC': 1,'NT': 2,'VU': 3,'EN': 4,'CR': 5,'EW': 6,'EX': 7,'DD': 0}

In [6]:
# Write your code here


### Problem 3. Put everything together

Here, we will want to calculate whether species that are at risk tend to have more or fewer offspring throughout their lifespan. To do this, we can calculate the mean maximum lifetime fecundity separately for species that are and are not at risk. First, copy the `calc_mean()` function that you defined in last week's homework and paste it below.

In [7]:
# Write your code here


### Problem 3a. Partition data according to threatened status

In this problem, you will follow a similar procedure to last week's homework, where you compared risk between democrat and republican-controlled states. Create two empty lists and assign them, respectively, to the variables `threat_fecund` and `unthreat_fecund`. Next, loop over `max_fecund`. Look up whether each species is threatened by checking `at_risk`. If it is, append its maximum lifetime fecundity stored as the value in `max_fecund` to `threat_fecund`, and if not, append it to `unthreat_fecund`. 

HINT: one new hiccup this week is that some of the species in `max_fecund` do not have entries in `at_risk`. This is because they were not present in the original IUCN data. As a result, you will want to use exception handling to skip over any instances where looking up a species from `max_fecund` in `at_risk` fails. 

In [8]:
# Write your code here


### Problem 3b. Compare mean lifetime fecundity between threatened and unthreatened species

Calculate the mean lifetime fecundity in each of the lists populated in the previous question. Assign each to `mean_threat_fecund` and `mean_unthreat_fecund`. 

In [9]:
# Write your code here


In [10]:
print(mean_threat_fecund)
print(mean_unthreat_fecund)

### Problem 3c. Interpret

1. Do threatened species or non-threatened species tend to have more potential offspring over their lifespan **(1 pt)**? 

2. Please speculate on why this might be the case **(2pt)**. 

**WRITE YOUR RESPONSE HERE.**