# World Development in Numbers

Hans Rosling's been called the Jedi master of data visualization, dubbed a statistics guru, and introduced as the man in whose hands data sings. When Time magazine included him in its [2012 list of the world's 100 most influential people](http://www.time.com/time/specials/packages/article/0,28804,2111975_2111976_2112170,00.html), it said his "stunning renderings of the numbers … have moved millions of people worldwide to see themselves and our planet in new ways".

In [None]:
%%html

<div style="max-width:854px">
<div style="position:relative;height:0;padding-bottom:56.25%">
<iframe src="https://embed.ted.com/talks/lang/en/hans_rosling_asia_s_rise_how_and_when" 
    width="854" 
    height="480" 
    style="position:absolute;left:0;top:0;width:100%;height:100%"
    frameborder="0" 
    scrolling="no" 
    allowfullscreen>
</iframe>
</div>
</div>

In the above video, Hans Rosling took us through 200 years of global development. Plotting life expectancy against income for every country since 1810, Hans showed how our world is radically different from what most of us imagine.

In this notebook, we're going to conduct an analysis about the countries, populations, health and wealth in our world. And it will be achieved by only a few lines of SQL statements and plotting functions.

## Preparations

The following modules will be used:

* `asqlcell` for analytical SQL capabilities.
* `plotly` for data visualization.

The installation is simple as:

In [None]:
%pip install asqlcell plotly --upgrade

Simply import these modules:

In [None]:
import asqlcell
import plotly.express as px

Now we are ready to proceed with the data analysis.

## Data

The data can be found in [Gapminder World](https://www.kaggle.com/datasets/tklimonova/gapminder-datacamp-2007) from Kaggle. The same data compressed in gzip format is also included in the project.

Instead of using Pandas, we can use SQL statements to inspect data:

In [None]:
%%sql inspect

SELECT *
FROM 'gapminder.csv.gz'
LIMIT 20

Here we use `%%sql` as a cell magic indicating that the cell block should be executed as a SQL statement. Magics are special commands to add functionalities that are not straightforward to achieve with the Jupyter notebook interface.

The result set is stored in a dataframe named `inspect` and rendered as a table view above. The column names are quite straightforward。

# Observation

First let us get the number of countries per continent. This can be achieved by using the `COUNT` function with a `DISTINCT` clause to eliminate the repetitive appearance of the same country:

In [None]:
%%sql country_count_by_continent

SELECT
    continent,
    COUNT(DISTINCT country) AS count
FROM 'gapminder.csv.gz'
GROUP BY continent
ORDER BY count
DESC

As the result set is already in the dataframe, it can be plotted as follows:

In [None]:
px.bar(country_count_by_continent,
       x='continent',
       y='count')

Next we'd like to query health (average life expectation), wealth (average gpd per capita) and total population for each continent in the year 2007. 

In [None]:
%%sql health_wealth_by_continent

SELECT
    continent,
    AVG(life_exp) AS health,
    AVG(gdp_cap) AS wealth,
    SUM(population) AS population
FROM 'gapminder.csv.gz'
WHERE year=2007
GROUP BY continent

A scatter chart can be built as follows:

In [None]:
px.scatter(health_wealth_by_continent,
           x='health',
           y='wealth',
           size='population',
           color='continent')

We can see that people in Oceania and Europe are living a healthy and wealthy life.

Let's further drill in Oceania and investigate on health status:

In [None]:
%%sql health_of_oceania

SELECT
    year,
    life_exp AS health,
    country
FROM 'gapminder.csv.gz'
WHERE continent='Oceania'

Line chart is very helpful to compare the Oceania contries by year:

In [None]:
px.line(health_of_oceania, x='year', y='health', color='country')

Last but not least, we would also like to know the change of wealth (total GDP) and health (life expectation) over time:

In [None]:
%%sql health_wealth_by_year

SELECT
    life_exp AS health,
    gdp_cap * population AS wealth,
    country,
    year,
    population
FROM 'gapminder.csv.gz'

Let's recreate Gapminder animation with bubble chart animation by Plotly Express:

In [None]:
px.scatter(health_wealth_by_year,
           x="wealth",
           y="health",
           animation_frame="year",
           animation_group="country",
           size="population",
           color="country",
           hover_name="country",
           width=900,
           height=600)