# GDP per capita Bar Chart Race

Create a Bar Chart Race to show the evolution of GDP per cápita at current market prices by NUTS 2 or NUTS 3 regions for the period 2010-2019

@author: Sergio Pereira

@email: <sergio@sergiopereira.gal>

## Libraries

This project uses [Pandas](https://pandas.pydata.org/) and [Bar_Chart_Race](https://www.dexplo.org/bar_chart_race/)

In [15]:
# Import libraries

import pandas as pd
import numpy as np
import bar_chart_race as bcr

## Data sources

This project takes two files downloaded from Eurostat, the statistical office of the European Union.

This two datasets are used:

- Eurostat [GDP) at current market prices by NUTS 3 regions](https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=nama_10r_3gdp&lang=en)
- Eurostat [Average annual population to calculate regional GDP data](https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=nama_10r_3gdp&lang=en)

Another file is the NUTS classification (Country, NUT1, NUT2, NUT3) for each entity in the previous files. This file only contains info for Spain and Portugal, and has been created manually based on this file from Eurostat: [Area by NUTS 3 region](https://appsso.eurostat.ec.europa.eu/nui/show.do?dataset=reg_area3&lang=en)

In [16]:
# Read GDP data (file downloaded from Eurostat)
gdp = pd.read_csv('gdp.csv', decimal='.', encoding = 'unicode_escape')
gdp['Value'].fillna('0',inplace=True) 
gdp['Value'] = gdp['Value'].astype(float)

In [17]:
# Read average annual population to calculate regional GDP per capita 
pop = pd.read_csv('popul.csv', decimal='.', encoding = 'unicode_escape')
pop['Value'].fillna('0',inplace=True)
pop['Value'] = pop['Value'].astype(float)

In [18]:
# Read the classification file, with all the values for GEO and the level (Country, NUT1, NUT2, NUT3) for Spain and Portugal
clas = pd.read_csv('clasif.csv')


## Data Processing

Merge the input files to build our complete dataset, some data cleaning, and data processing to prepare the dataset to generate the bar chart race

In [19]:
# Merge both datasets
df = pd.merge(gdp,pop, how='left', left_on=['GEO','TIME'], right_on=['GEO','TIME'])
df.rename(columns={'Value_x':'GDP', 'Value_y':'Population'},inplace=True)

In [20]:
# Exclude rows with Population = 0, to avoid division by zero
df = df.loc[~(df['Population'] == 0)]

In [21]:
# Add column for GDP per capita
df['GDP/cap']=round(df.GDP.div(df.Population)*1000,2)

In [22]:
# Merge with the previous dataset
df = pd.merge(df,clas, how='left', left_on=['GEO'], right_on=['Value'])

In [23]:
# Create a subset with the rows for the desired levels. In this case, selected only Country and NUT2
sub = df[df['Type'].isin(['Country','NUT2'])].copy()

In [24]:
# Replace infinite updated data with nan
sub.replace([np.inf, -np.inf], np.nan, inplace=True)
# Drop rows with NaN
sub.dropna(subset=['GDP/cap'],inplace=True)
#Remove duplicates
sub.drop_duplicates(inplace=True)

In [25]:
# Select only the necessary columns
sub = sub[['GEO','TIME','GDP/cap']]

In [26]:
# Pivot the table to put in the necessary format for Bar_Chart_Race
df = sub.pivot_table(values = 'GDP/cap',index = ['TIME'],columns = 'GEO')

## Generate the graph

The bar_chart_race allows to generate an output video file in mp4 format ([FFmpeg](https://www.ffmpeg.org/) must be installed)

In [27]:
# Generate the output: an mp4 file with 20 categories
bcr.bar_chart_race(df = df, 
                   n_bars = 20, 
                   period_length=1500,
                   figsize=(10, 7),
                   dpi=300,
                   sort='desc',
                   title='GDP per capita, NUT2 Regions 2010-2019 (Spain)',
                   filename = 'gdp.mp4')


  ax.set_yticklabels(self.df_values.columns)
  ax.set_xticklabels([max_val] * len(ax.get_xticks()))


(c) Sergio Pereira 2022