# Live Lecture 20

- **Driver**: 2nd alphabetically by first name
- **Proposer**: 3rd alphabetically by first name
- **Reporter**: 1st alphabetically by first name

## Visualization Case Study II: Trends in Global Data

Gapminder is a foundation, based on Sweden, that aims to enhance basic awareness of basic facts about the socioeconomic global world. As part of their efforts, they collect detailed statistics on life expectancy, population, and GDP, sometimes going back over many years. 

In this case study, we'll look at an excerpt of the Gapminder data. This excerpt has been packaged up and made available via Jenny Bryan's [`gapminder` repo](https://github.com/jennybc/gapminder) on Github. 

Here's some familiar code to grab the data. 

In [None]:
import urllib
import pandas as pd
def retrieve_data(url):
    """
    Retrieve a file from the specified url and save it in a local file 
    called data.csv. 
    """
    
    # grab the data and parse it
    filedata = urllib.request.urlopen(url) 
    to_write = filedata.read()
    
    # write to file
    with open("data.csv", "wb") as f:
        f.write(to_write)
        
retrieve_data("https://philchodrow.github.io/PIC16A/datasets/gapminder.csv")
gapminder = pd.read_csv("data.csv")
gapminder

## Part A (10 minutes)

We are going to visualize long-term trends on a *per-continent* basis. As a warmup, write a simple function that will return the mean value of a user-specified `metric` on a user-specified `continent`, in each year. For example: 

```python
continent_trend('pop', 'Asia')
```
```

year	pop
--- 	---
1952	4.228356e+07
1957	4.735699e+07
1962	5.140476e+07
1967	5.774736e+07
1972	6.518098e+07
1977	7.225799e+07
1982	7.909502e+07
1987	8.700669e+07
1992	9.494825e+07
1997	1.025238e+08
2002	1.091455e+08
2007	1.155138e+08
```


You should use the double bracket notation `df[[column]]` where needed to ensure that your result is displayed as a data frame, not a series. 

Comments and docstrings not needed. 

In [None]:
# your code here



## Part B (15 minutes)

Write a function called `trend_lines()` which accepts a user-specified metric and a user-specified axis. This function should plot a different trend line for each of the five continents in the data. You should use your function from Part A. 

For example, 

```python
fig, ax = plt.subplots(1)
trend_lines('lifeExp', ax)
```

<figure class="image" style="width:70%">
  <img src="https://philchodrow.github.io/PIC16A/live_lectures/trend_lines_example.png
" alt="A plot with five trend lines, all generally sloping upwards. The lines are not labeled or otherwise annotated." width="400px">
</figure>

It's not necessary to attend to labels etc. in this part. 

**Hint:** If you use `axis.plot()` on a data frame whose index is numerical, the index will be automatically plotted on the horizontal axis. In particular, you don't need to worry about plotting the years on the horizontal axis -- `matplotlib` will handle that for you.  

Comments and docstrings not needed. While it is possible to solve this problem with `apply`, a simple `for`-loop is also fine. 

In [None]:
# your code here



## Part C (10 minutes)

Add labels to your lines and automated titles to your plots. Label the horizontal axis appropriately. Add a legend (it might overlap the data, that's ok). You should be able to make an informative visualization like this: 

```python
fig, ax = plt.subplots(1, 3, figsize = (12, 3))
trend_lines('lifeExp', ax[0])
trend_lines('pop', ax[1])
trend_lines('gdpPercap', ax[2])
ax[0].legend()
plt.tight_layout()
```

<figure class="image" style="width:70%">
  <img src="https://philchodrow.github.io/PIC16A/live_lectures/trend_lines_example_2.png
" alt="A three-paneled plot, each with five trend lines corresponding to continents gradually sloping upwards. The first panel gives life expectancies, the second population, and the third GDP per capita." width="800px">
</figure>

In [None]:
# your code here

