# Global Studies - Methods Track

## Tutorial 4: Data Visualization with Google Colab  

**Date:** October 2020

**Author:** Pedro V Hernández Serrano

**License:** [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)  


![](https://github.com/MaastrichtU-IDS/jupyter-workshop/blob/master/images/um-landpage.png?raw=1)

To get started, insert your name and try running the **code cell** below (by pressing the ► button, or clicking on the cell and pressing ctrl+enter on your keyboard):

In [None]:
name = '' #@param {type:"string"}
print('\x1b[6;30;43m''Hello '+ name +' ,are you ready??!!!''\x1b[0m')
print("You've successfully run some Python code")
print("Congratulations!")

## Notebook for Data Visualization
Data visualization is the creation and study of the visual representation of data, meaning ”information that has been abstracted in some schematic form, including attributes or variables for the units of information” - Michael Friendly	(2008).	    
“The great fun of information visualization is that it gives you clues to answers to questions you didn’t know you had” - Prof.Ben Schneiderman, University	of	Maryland.
Data Visualization will help you to:

- Explore (Finding unknown)
- Analyze (Test a hypothesis)
- Present (Tell a story)

**Note:** You are expected to have watched and completed the videos from week 9: [Methods Lecture 7.1: VizLab Fundamentals](https://canvas.maastrichtuniversity.nl/courses/2365/modules/items/74389) and the [Methods Lecture 7.2: VizLab Examples](https://canvas.maastrichtuniversity.nl/courses/2365/modules/items/75677) before you continue. 

---
## Creating Plots and Visuals with a Programming Language?

The power of programming languages is such that it allows people to create reusable libraries (or packages) whose purpose is to tackle specific tasks, for example creating a software program, statistical analysis or more impressively, visualizations. Like in a recent scientific event, Dr. Katie Bouman lead the team that "took a picture" of a black hole for the first time in history using merely Python libraries to achieve this. Check out the story here: [How-imaging-a-blackhole-gives-us-one-more-reason-to-embrace-python](https://analyticsindiamag.com/how-imaging-a-blackhole-gives-us-one-more-reason-to-embrace-python-for-larger-datasets/)

The examples given in this notebook have been adapted from the [Medium announcement article](https://medium.com/@plotlygraphs/introducing-plotly-express-808df010143d) introducing [Plotly Express](https://plotly.express): a library based on Plotly.py for rapid data exploration and figure generation.

**Note:** Since Plotly Express is an external library, it is required to be imported (sometimes also installed) before executing. But no worries!! Installing and importing things in a programming environment is quite simple!

---
## Importing Plotly library

Step 1: Add a new code cell   
Step 2: Copy this text:    `import plotly.express as px`  
Step 3: Paste in in the new cell.  
Step 4: Execute - no response is a good response 🤠

---
## The Data: Global flows

For this notebook, we will be using the **[Gapminder example](https://www.gapminder.org/fw/world-health-chart/)** a  dataset that is already contained in the Plotly library (how convenient!). The dataset contains information on countries' life expectancy, population and GDP per capita per year. 

This dataset became famous since it has been constantly used for illustrating the power of data visualization, in different conferences, presentations, dashboards and a number of infographics. 
Likewise, we will walk through different tasks using it.

Down here there is a video explaining the **Gapminder** data in an amazing real-life data visualization exercise (click play ►)   
Here is a [link to the video](youtube.com/watch?v=jbkSRLYSojo) as well

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('jbkSRLYSojo')

Executing the following commands in the cells you will be able to see a part of the data:

*Here you read the data and save it in a variable called `data`*

In [None]:
data = px.data.gapminder()

*Here you execute the variable `data`, showing a chunk of the content*

In [None]:
data

This command will show you the columns contained in the dataset

In [None]:
data.columns

---
## Creating plots and graphics

In the following section, we will be walking through the activities, each one will correspond to different graphical representations, you can always come back to the [data-to-viz.com](https://www.data-to-viz.com/) website revised in one of the lectures to learn about the plots in more detail 

### Activity 1: **Ploting a scatter plot**

The following command will take a "cut" of the data taking only the year 2007, and similarly as above, it will save the chunk in a variable called `data2007`

In [None]:
data2007 = data.query("year == 2007")

The following command will:
1. Use the `data2007` cut
2. Take the `gdpPercap` as X axis and `lifeExp` as Y axis value
3. Render a scatter plot.  
**Note:** you can hover your cursor on the plot and explore individual data elements

In [None]:
px.scatter(data2007, x="gdpPercap", y="lifeExp")

📚**YOUR TURN**: Reproduce the above example in new **code cells**. 
1. Create a cut of the **year 1997**
2. Create a scatter plot of that new chunk
3. Use the **quantity** `pop` (population) instead of the **quantity** `lifeExp` (Life Expectancy)

### Activity 2: **Adding Color and Size**

The following command is very similar than the last one, except that this contains **new parameters**: `color`, `size` and `size_max`.   
Execute the cell ► and observe what happens.   
**Note:** You can scroll down the values and also use the zoom option in the plot

In [None]:
px.scatter(data2007, x="gdpPercap", y="lifeExp", color="country", size="pop", size_max=50)

📚**YOUR TURN**: Reproduce the above example in new **code cells**. 
1. Use the previous cut of the **year 1997**
2. Create a scatter plot including the new parameters
3. Use the **category** `continent` instead of the **category** `country`
4. Change the `size_max` value to **100** instead of **50**

**Extra**: Curious about which point is which country? Add a `hover_name = country` parameter and you can easily identify any point: are there any "outliers"?... just mouse over the point you're interested in!

### Activity 3: **Understanding a Graphic**

Execute the following cell code, you will notice that there are even more parameters now, everytime a new parameter is added to the formula, the plotting result is changing

In [None]:
px.scatter(data2007, x="gdpPercap", y="lifeExp", color="continent", size="pop", 
           size_max=50, hover_name="country", facet_col="continent", log_x=False, trendline="ols")

📚**YOUR TURN**: Reproduce the above example in new code cells.   
Change the value in `log_x=False` for `log_x=True` and respond the following in a text cell:    
1. What changes can you observe?   
2. Is there any visual difference on the trends per continent?   
3. Which country has the lowest GDO per Capita and Life Expentancy from Asia?
4. Which countries have the highest Life Expentancy in Europe and Africa and what is the difference in years?

### Activity 4: **Quantity over time**

The following command will take a "cut" of the original data taking only the one country at once. I have selected Mexico and called it `data_mexico`

In [None]:
data_mexico = data.query("country == 'Mexico' ")

The following command will:
1. Use the `data_mexico` cut
2. Take the `year` as X axis and `lifeExp` (Population) as Y axis value
3. Render an horizontal bar plot.  

In [None]:
px.bar(data_mexico, x='year', y='pop')

📚**YOUR TURN**: Reproduce the above example in new **code cells**. 
1. Create a cut of a country. **Use your home country**
2. Create a bar plot of that new chunk
3. Execute and download the plot as *png*

### Activity 5: **Test you knowledge on Pie charts**

Two pie charts are presented   
1. Explain in a new text cell what is wrong about these graphics.
2. Propose a solution for better data story telling using the correct graphics.

**Pie chart 1: Global vegan cheese market** 

![](https://64.media.tumblr.com/a937c9fdc10e2a880c85a08349d56998/tumblr_qf0idiOVVJ1sgh0voo1_1280.png)

**Pie chart 2: Advantages of working from home** 

![](https://64.media.tumblr.com/2e8a439b671e1ab092a862ee09bfb4f3/tumblr_q8q6uu5FGa1sgh0voo1_1280.jpg)

### Bonus Activity 6: **Sankey diagram**

Reproduce the full example presented in the second video of **Methods Lecture 7.2: VizLab Examples** The video walks you throught the creation and visualization of a Sankey diagram to represent global flows, in this case the flow of avocado export and import in selected countries.

Once you have created your own Sankey diagram **save the image and insert it in this notebook** using the command for including images.

In [None]:
from IPython.display import YouTubeVideo
YouTubeVideo('ik_3FTWTihg')

### Bonus Activity 7: **Animated plot**

Execute the following code cell, you will notice a new play/stop button inside the graph, click on it and enjoy the visuals!

What would it be your **data story**? (Relfect on this and discuss in the tutorial)

In [None]:
px.scatter(data, 
           x="gdpPercap", 
           y="lifeExp",
           size="pop", size_max=60, 
           color="continent", hover_name="country",
           animation_frame="year", 
           animation_group="country", log_x=True, range_x=[100,100000], range_y=[25,90],
           labels=dict(pop="Population", gdpPercap="GDP per Capita", lifeExp="Life Expectancy"))


## Congratulations! 

You have completed the Data Visualization week, you should be proud of yourself! 🎉🎉🎉