## Refactored's Visual Wizards Workshop - World Welfare and GDP

Hans Rosling, a reknowned statistician had presented a TED talk in 2006, about world welfare and GDP data. You can see Hans' original talk below.

In [None]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo(id='hVimVzgtD6w', width=900, height=550)

In this workshop we will try to visualize a dataset that contains information about Gross Domestic Product (GDP) per capita, Population, Child Mortality Rate (children 5 or below) and Life Expectancy (average lifespan) of the 180+ countries of this world, over the span of 1950 to 2017. 

We will work to recreate some of the visualizations Hans had created.

This data was sourced by the world demographic data repositories of the United Nations (UN) and a website which collates various such valuable demographic data "Our World in Data" (https://ourworldindata.org).

Before we begin we would like to understand a few terms -
1. Gross Domestic Product (GDP): The Gross Domestic Product is the total value (in dollars or any other currency) of goods produced and services provided within a country.
2. GDP Per Capita: The GDP per capita is the value derived when the GDP of the country is divided by its population. The number signifies the approximate amount of money (in dollars or other currency) in terms of products or services that is available for each individual within the country.
3. Child Mortality Rate (MR): Mortality Rate is given here by the percentage of deaths of child under the age of 5.

Child Mortality Rate and Life Expectancy are two of the measures which convey the quality, availability and affordability of food, water, healthcare etc., within a country. When these metrics are combined with the GDP per capita of a country, we can understand whether the economic condition of the country is supportive of basic survival. 

We will now try to analyze the data collected, by building visualizations using 'pyvizwizards' library. The pyvizwizards library is created by the Refactored team and has functions which can be used to load, select and visualize the above datasets. It is based on Plotly and Plotly Express - an interactive plotting library that helps visualize multi-dimensional data.

As a pre-requisite to building visualizations using python and pyizwizards module, there are a couple of things we need to do:

### How to execute code in a Jupyter Notebook?

Just click on any code cell and press <b><h3>"Control+Enter"</h3></b> <br>keys (+ means press together) to execute the code in the cell.

### Kernel status

When we run a code, sometimes it may take a long time to run. In such a situation, we are to be patient and check the status of the kernel, before running the next piece of code. The circle to the top right corner of this notebook shows whether the kernel is running and busy, or is idle and ready to execute some code.

<b>Kernel is idle when the circle is hollow and looks like a ring</b>

<img src="kernel_idle.PNG">
<br>

<b>Kernel is busy when the circle is filled, as shown below</b>

<img src="kernel_busy.PNG">

### Installing required libraries

The pyvizwizards library can be used by simply writing the command "import pyvizwizards as pv" and executing it a code cell. 'import pyvizwizards' imports the module into this notebook's working environment and 'as pv' is the alias that we are assigning in order to access everything inside the pyvizwizard module. Any function that you see given below needs to be called with 'pv' alias.

```python
# the import statement
import pyvizwizards as pv

# sample function call statement
pv.function_name()
```

Note that the import statement needs to be executed only once through the notebook. However, functions may have to be called repeatedly to get various results.

Now, in order for 'pyvizwizards' to work, we must ensure we have plotly and plotly express installed in this environment. Please run each of the code cells given below.

In [None]:
!pip install plotly

In [None]:
!pip install plotly_express


### Step 1: Loading the Data

The first step within our exercise is to load the data. There are two different datasets - Life Expectancy dataset and Mortality Rate dataset. They can be loaded using below given functions.

```python
# import statement when starting code execution for the first time in the notebook
import pyvizwizards as pv

# load life expectancy dataset using this method - function takes no arguments
pv.load_le_data()

# load mortality rate dataset using this method - function takes no arguments
pv.load_mr_data()
```

The above functions return the dataframe as the answer. Now you are required to store this answer (which is the entire dataset you want to load), into an object which you can later use in your analysis. Hence, create an object and assign it with the value the function would return by equating the new object with the function result. This is simple done as below:

```python
my_dataframe = pv.load_le_data()
```

### Step 2: Understanding the Data

The next step is to view and understand the data using a few functions:

#### To view the whole dataset

```python
# just type the name of the data frame object into which you had loaded the data
my_dataframe
```

#### To view only the first or last few rows

```python
# To view the first 5 lines of the dataset use the head function
my_dataframe.head()

# To view the last 5 lines of the dataset use the tail function
my_dataframe.tail()
```

#### To view the type of data in each column

```python
# To see the type of object and data type details of each column, use the info function
my_dataframe.info()
```

### Step 3: Selecting the Data

Let us first try to build a visualization with the datasets we have just loaded. Copy paste the code given below changing only the dataframe parameter with the name of the dataframe you have given above.

```python
# For mortality rate dataset
pv.gdptrend_mr_plot(dataframe)

# For life expectancy dataset
pv.bar_le_plot(dataframe)
```

If you see, cluttered graphs, that is because the dataset you have chosen to build a visual on, was too deep and too wide, i.e. many many points of data. Hence, there is a need to filter/select data that we want to focus on in order to answer specific questions.

We can now select a slice of the dataset using selector functions:

#### Life Expectancy Data Selector Function - le_data_selector(dframe, c_list, y_range):

The le_data_selector() function can be used to select a portion of the huge life expectancy dataset. This function accepts 3 parameters - the dataframe object which contains the life expectancy dataset, the list of countries we want to filter data for, the time range, in years, that we would like to filter the dataset by.

```python
# below is an example of how to use the le_data_selection function
pv.le_data_selector(dataframe,['Country1','Country2','Country3',...],[Starting year, Ending year])
```

#### Mortality Rate Data Selector Function - mr_data_selector(dframe, c_list, y_range):

The mr_data_selector() function can be used to select a portion of the huge mortality rate dataset. Similar to the life expectancy data selector function, this function also accepts 3 parameters - the dataframe object which contains the life expectancy dataset, the list of countries we want to filter data for, the time range, in years, that we would like to filter the dataset by.

```python
# below is an example of how to use the le_data_selection function
pv.mr_data_selector(dataframe,['Country1','Country2','Country3',...],[Starting year, Ending year])
```

These above functions return a filtered dataset as the result, which can then be used for focussed analysis. These functions should be used repeatedly to create new dataframes for each question we would be answering.

### Step 4: Visualizing the Data

Data Visualizations help us understand information, identify trends and answer questions quickly. We will now create various visualizations in order to answer certain questions.

### Q1 - Showcase the contribution of each continent to World GDP. Which continent contributes the highest in the years -
#### a. 1987
#### b. 1997
#### c. 2007

In the above scenario, we are to showcase the contribution of each continent to World GDP. If we imagine the total World GDP to be a hug pie, we then need to show each continent as a slice of that pie. Hence, when a breakup of a total is what is to be visualized, a pie chart is the way to go.

We can create a simple pie chart using the following function:

```python
pv.pie_le_plot(dataframe, year)
```

The pie_le_plot uses the life expectancy dataset and accepts two parameters - the dataframe object that contains the dataset and the year for which you want to visualize the total GDP breakup.

### Q2 - What are the trends in GDP per capita of each of the country groups given below, in the year range of 1990-2015?
#### a. United Kingdom, Germany, France
#### b. India, China, Brazil
#### c. Zimbabwe, Nigeria, Afghanistan

For the above question, GDP per capita is the main metric whose change is to be observed over time for each individual country. These trends are drawn for each country within the group and then compared with each other to make some observations. In order to visualize the change in a specific metric within the same category, trend line (or line plot) is one of the better charts to build.

We can create a simple line plot using the following function:

```python
pv.gdptrend_mr_plot(dataframe)
```

The gdptrend_mr_plot utilizes the mortality rate dataset and builds the line plot using a single parameter - the dataframe object that contains the dataset.

#### Bonus: Q2 - In the above trend plot, compare the GDP trends of countries within your group. Do you observe any interesting patterns? Why are these patterns created? You may conduct external research to understand and answer why these interesting patterns arise.

### Q3 - Compare the Life Expectancy of countries with each other in the below given groups, for the year 2007
#### a. United Kingdom, India, Zimbabwe
#### b. Germany, China, Nigeria
#### c. France, Brazil, Afghanistan

For the above question, Life Expectancy is the main metric whose change is to be observed for each individual country and then compare it with each other. Whenever we are required to compare a same attribute or quality across multiple categories/entities, we can use the bar chart. The length of the bars under each category are representative of the value the specific metric holds. This allows for easy visual comparison.

We can create a simple line plot using the following function:

```python
pv.bar_le_plot(dataframe)
```

The bar_le_plot utilizes the life expectancy dataset and builds the line plot using a single parameter - the dataframe object that contains the dataset.

#### Bonus: Q3 - In the above bar plot, compare the Life Expectancy across countries within your group. Do you observe any interesting patterns? Why are these patterns created? You may conduct external research to understand and answer why these interesting patterns arise.

### Q4 - Present a colorful visualization of GDP per capita between the years 1990 to 2015, for countries in the below given group, on a map
#### 'Afghanistan','China','India','United States','United Kingdom','Brazil','Zimbabwe', 'Namibia',
#### 'Nigeria','Somalia','Thailand','France','Germany','Mexico','Australia','New Zealand'

For this question, we are required to create a map chart. Map charts are helpful in visualizing data that is changing on a geographical scale - from country to country/region to region.

We can create a map chart by using the following function:

```python
pv.map_le_plot(dataframe)
```

Note that this function uses the life expectancy dataset. This is because geographical data about country locations (contained in a variable called 'iso_alpha') is only available in life expectancy dataset.

#### Bonus: Q4 - In the above map, compare the GDP per capita across countries using their color code. Do you observe any interesting patterns? Why are these patterns created? You may conduct external research to understand and answer why these interesting patterns arise.

### Q5 - How has life expectancy changed over the years (starting from 1952 to 2007), in countries, across various continents? Build an animated plot to visualize the change in life expectancy over the years for the below given countries
#### a. For Asia - India, China, Afghanistan, Thailand, Japan
#### b. For Americas - United States, Canada, Mexico, Brazil, Peru
#### c. For Africa - Nigeria, Somalia, Namibia, South Africa, Zimbabwe

For this animated plot we use a bubble chart. Bubble charts can be used to visualize multi-dimensional (upto 4 dimensions) data.
* 1st dimension - X-axis
* 2nd dimension - Y- axis
* 3rd dimension - Size of the Bubble
* 4th dimension - Color of the Bubble

In this case, the life expectancy dataset will be used to build this visualization. The chart would have GDP per capita on X-axis, Life Expectancy in years on the Y-axis, the population of the country as the size of the bubble, and the bubbles would be color coded based on the country - so it would be easy to track the moment of a bubble (a country) over time.

We can use the below function to create the animated plot

```python
pv.animated_le_plot(dataframe)
```

The animated_le_plot() function accepts only 1 parameter - the dataframe object which contains the dataset that we want to visualize.

#### Bonus: Q5 - For your group, write down and present the trends that you have observed for Life Expectancy in countries over the years? Which countries are progressing? Which countries are getting worse? Why is it happening? Conduct external research to find suitable arguments and justifications to support your presentation.

### Data Bloopers - The Magical Somalia

In [None]:
import pyvizwizards as pv

mr_df = pv.load_mr_data()

funny_df = pv.mr_data_selector(mr_df,['Afghanistan','United States','Brazil','Zimbabwe','Namibia','Nigeria','Somalia','Thailand','Mexico'],[1990,2015])

pv.animated_mr_plot(funny_df)