<a href="https://colab.research.google.com/github/clemvnt/training-datamining-mds/blob/master/20200415_05_Berlin.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Goal**

Top 10 richest regions & countries in GDP per capita (wealth created per habitant) and in GDP current (total wealth created by the country)

**Data**

GDP CURRENT & GDP PER CAPITA by countries, agregated by region

**Sources**

* World Bank national accounts data
* OECD National Accounts data files 

**Notes**

The top 10 for GDP current is including the G8, should the European Union be included in this ranking, it would come up 2nd biggest economy after the USA. 

In the top 10 for GDP per capita, the ranking include smaller countries, only the USA remains in this ranking from the GDP current ranking.

**Step 1 : Import dependencies**

In [0]:
import pandas as pd
from pandas_datareader import wb
import plotly.graph_objects as go

pd.options.display.float_format = '{:.0f}'.format

**Step 2 : Get the association between the country and the region**



In [0]:
countries = wb.get_countries()
countries = countries[['name', 'region']]
countries

Unnamed: 0,name,region
0,Aruba,Latin America & Caribbean
1,Afghanistan,South Asia
2,Africa,Aggregates
3,Angola,Sub-Saharan Africa
4,Albania,Europe & Central Asia
...,...,...
299,Sub-Saharan Africa excluding South Africa and ...,Aggregates
300,"Yemen, Rep.",Middle East & North Africa
301,South Africa,Sub-Saharan Africa
302,Zambia,Sub-Saharan Africa


**Step 3 : Get indicators**


In [0]:
indicators = wb.download(indicator=['NY.GDP.PCAP.CD', 'NY.GDP.MKTP.CD'], country='all', start=2018, end=2018)
indicators = indicators.reset_index()
indicators = indicators[['country', 'NY.GDP.PCAP.CD', 'NY.GDP.MKTP.CD']]
indicators

Unnamed: 0,country,NY.GDP.PCAP.CD,NY.GDP.MKTP.CD
0,Arab World,6609,2774314967156
1,Caribbean small states,9991,73523538155
2,Central Europe and the Baltics,15929,1632912822080
3,Early-demographic dividend,3582,11638859227437
4,East Asia & Pacific,11143,25942413437360
...,...,...,...
259,Virgin Islands (U.S.),,
260,West Bank and Gaza,3199,14615900000
261,"Yemen, Rep.",944,26914402224
262,Zambia,1540,26720073436


**Step 4 : Format a master table**

1. Associate countries with regions
1. Reorder & rename columns
1. Clean up the data
1. Group rows by columns 

In [0]:
master_table = pd.merge(indicators, countries, left_on='country', right_on='name')

master_table = master_table[['region', 'country', 'NY.GDP.PCAP.CD', 'NY.GDP.MKTP.CD']]
master_table.columns = ['Region', 'Country', 'GDP / Capita', 'Current GDP']

master_table = master_table[master_table['Region'] != 'Aggregates']
master_table = master_table[(master_table['GDP / Capita'] > 0) | (master_table['Current GDP'] > 0)]
master_table = master_table.fillna(0)

master_table = pd.melt(master_table, id_vars=['Region', 'Country'], value_vars=['GDP / Capita', 'Current GDP'], var_name='Indicator', value_name='Value')
master_table = master_table.set_index(['Region', 'Country', 'Indicator'])
master_table = master_table.sort_index()

master_table

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Value
Region,Country,Indicator,Unnamed: 3_level_1
East Asia & Pacific,American Samoa,Current GDP,636000000
East Asia & Pacific,American Samoa,GDP / Capita,11467
East Asia & Pacific,Australia,Current GDP,1433904348500
East Asia & Pacific,Australia,GDP / Capita,57374
East Asia & Pacific,Brunei Darussalam,Current GDP,13567351175
...,...,...,...
Sub-Saharan Africa,Uganda,GDP / Capita,643
Sub-Saharan Africa,Zambia,Current GDP,26720073436
Sub-Saharan Africa,Zambia,GDP / Capita,1540
Sub-Saharan Africa,Zimbabwe,Current GDP,31000519447


**Step 4 : Visualize data with a chart**

In [0]:
table = master_table.reset_index()
gdp_per_capita_per_region = table[table['Indicator'] == 'GDP / Capita'][['Region', 'Value']].groupby('Region').mean().sort_values('Value', ascending=False)
current_gdp_per_region = table[table['Indicator'] == 'Current GDP'][['Region', 'Value']].groupby('Region').mean().sort_values('Value', ascending=False)
gdp_per_capita_per_country = table[table['Indicator'] == 'GDP / Capita'][['Country', 'Value']].sort_values('Value', ascending=False).head(10)
current_gdp_per_country = table[table['Indicator'] == 'Current GDP'][['Country', 'Value']].sort_values('Value', ascending=False).head(10)

data = [
  go.Bar(x=gdp_per_capita_per_region.index, y=gdp_per_capita_per_region['Value']),
  go.Bar(x=current_gdp_per_region.index, y=current_gdp_per_region['Value'], visible=False),
  go.Bar(x=gdp_per_capita_per_country['Country'], y=gdp_per_capita_per_country['Value'], visible=False),
  go.Bar(x=current_gdp_per_country['Country'], y=current_gdp_per_country['Value'], visible=False),
]

layout = go.Layout(
  title='Top 10 richest regions & countries',
  updatemenus=list([
    dict(showactive=True, type="buttons", active=0, buttons=[
      {'label': 'GDP / Capita per region', 'method': 'update', 'args': [{'visible': [True, False, False, False]}]},
      {'label': 'Current GDP per region', 'method': 'update', 'args': [{'visible': [False, True, False, False]}]},
      {'label': 'GDP / Capita per country', 'method': 'update', 'args': [{'visible': [False, False, True, False]}]},
      {'label': 'Current GDP per country', 'method': 'update', 'args': [{'visible': [False, False, False, True]}]}
    ])
  ])
)

go.Figure(data, layout)