# GDP and Population

We start by import the modules that we need for our analysis

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import ipywidgets as widgets
from matplotlib_venn import venn2

# autoreload modules when code is run
%load_ext autoreload
%autoreload 2

# For this projekt we are going to use the eurostat module, and therefore you will need to run this line of code if you havent installed it yet. If the eurostat module is already installed, you can add a # in front of the next line.
%pip install eurostat
# user written modules
import dataproject as dp

Note: you may need to restart the kernel to use updated packages.


In our py-file, we have set up so that our program gets its data directly from the eurostat website. We start by taking a look at the data, so we can clean it

In [4]:
print(dp.df.head())

  freq        unit na_item geo\TIME_PERIOD  1975  1976  1977  1978  1979  \
0    A  CLV05_MEUR     B1G              AT   NaN   NaN   NaN   NaN   NaN   
1    A  CLV05_MEUR     B1G              BA   NaN   NaN   NaN   NaN   NaN   
2    A  CLV05_MEUR     B1G              BE   NaN   NaN   NaN   NaN   NaN   
3    A  CLV05_MEUR     B1G              BG   NaN   NaN   NaN   NaN   NaN   
4    A  CLV05_MEUR     B1G              CH   NaN   NaN   NaN   NaN   NaN   

   1980  ...      2013      2014      2015      2016      2017      2018  \
0   NaN  ...  251167.1  252879.7  255017.1  259996.4  266016.9  272985.2   
1   NaN  ...    9118.0    9224.4    9358.7    9658.1    9978.2   10355.7   
2   NaN  ...  308495.4  313684.3  320657.4  323677.5  328736.7  334655.5   
3   NaN  ...   25127.0   25373.6   26129.3   26806.7   27691.2   28654.5   
4   NaN  ...  381505.9  390405.6  396747.1  405030.0  410565.5  422877.6   

       2019      2020      2021      2022  
0  277001.4  259083.6  269392.5  283659.7 

As we can see the data is very confusing and does not tell as alot. We therefore have to clean it a bit.

We start by filtering for the rows we want. We do this in our py file were we chose the gross domestic product in Chain linked volumes (2015), million euro.

After that we remove the columns that we dont want and rename the columns so that they are representative of the data

We then realize that some of our data are aggregates of specific european zones. As we only are after the data for the specific countries, we remove these.
We are now left with this dataset:

In [13]:
print(dp.gdp.head())

         unit na_item Country_code      2012      2013      2014      2015  \
0  CLV15_MEUR    B1GQ           AL   9768.40   9866.20  10041.30  10264.10   
1  CLV15_MEUR    B1GQ           AT 338486.50 338572.80 340811.70 344269.20   
2  CLV15_MEUR    B1GQ           BA  13695.80  14017.60  14179.30  14791.10   
3  CLV15_MEUR    B1GQ           BE 400181.00 402018.80 408364.80 416701.40   
4  CLV15_MEUR    B1GQ           BG  44117.00  43869.70  44293.90  45812.30   

       2016      2017      2018      2019      2020      2021      2022  
0  10604.40  11007.60  11450.00  11689.00  11282.10       NaN       NaN  
1 351118.30 359048.50 367756.80 373337.10 349242.10 365156.50 383403.20  
2  15270.70  15766.10  16369.80  16842.20  16334.40  17541.80  18225.30  
3 421979.70 428814.00 436502.40 446283.80 422356.80 448263.60 462119.80  
4  47204.90  48508.80  49811.20  51822.60  49771.20  53571.00  55371.70  


Next we need to get our data for the different populations. This time we filer directly from the eurostat website, ensuring we only get the data we need

In [14]:
print(dp.population.head())

  Country_code       2012       2013       2014       2015       2016  \
0           AD   78115.00   76246.00        NaN        NaN        NaN   
1           AL 2903008.00 2897770.00 2892394.00 2885796.00 2875592.00   
2           AM 3274285.00        NaN        NaN 3010598.00 2998577.00   
3           AT 8408121.00 8451860.00 8507786.00 8584926.00 8700471.00   
4           AZ 9235085.00 9356483.00 9477119.00 9593038.00 9705643.00   

        2017       2018       2019        2020        2021       2022  
0        NaN        NaN   76177.00         NaN         NaN        NaN  
1 2876591.00 2870324.00 2862427.00  2845955.00  2829741.00        NaN  
2 2986151.00 2972732.00 2965269.00  2959694.00  2963251.00        NaN  
3 8772865.00 8822267.00 8858775.00  8901064.00  8932664.00 8978929.00  
4 9809981.00 9898085.00 9981457.00 10067108.00 10119133.00        NaN  


Next we convert the data to "long" datasets, making it possible for us to merge them, the merged set will be called "inner".

In our inner dataset we then delete all the rows that how NaN as values for all years for either GDP or population. Next we use the populations and GDP to calculate GDP per capita for the selected countries

In [15]:
print(dp.inner.head())

  Country_code  year        unit na_item       GDP  Population  GDP/Cap
0           AL  2012  CLV15_MEUR    B1GQ   9768.40  2903008.00  3364.92
1           AT  2012  CLV15_MEUR    B1GQ 338486.50  8408121.00 40257.09
2           BA  2012  CLV15_MEUR    B1GQ  13695.80  3839265.00  3567.30
3           BE  2012  CLV15_MEUR    B1GQ 400181.00 11075889.00 36130.82
4           BG  2012  CLV15_MEUR    B1GQ  44117.00  7327224.00  6020.97


Finally we check that we have equal values for all the different parameters, making it possible for us to create scatter plots.

In [16]:
dp.inner.count()

Country_code    396
year            396
unit            396
na_item         396
GDP             396
Population      396
GDP/Cap         396
dtype: int64

The first plot we are making is an interactive plot were you can see the development in GDP for each country in the period 2012-2022

In [2]:
def plot_e(inner, Country_code): 
    I = dp.inner['Country_code'] == Country_code
    ax=dp.inner.loc[I,:].plot(x='year', y='GDP/Cap', style='-o', legend=False)
    ax.set_xlim(dp.inner['year'].min(), dp.inner['year'].max())
    ax.set_ylabel('GDP per capita in euros')
    ax.set_title(f"GDP per capita 2012-2022 for {Country_code}")
    ax.set_xticks(np.arange(dp.inner['year'].min(), dp.inner['year'].max()+1))

widgets.interact(plot_e, 
    inner = widgets.fixed(dp.inner),
    Country_code = widgets.Dropdown(name='Country_code', 
                                    options=dp.inner.Country_code.unique(), 
                                    value='DK')
    ); 

interactive(children=(Dropdown(description='Country_code', index=9, options=('AL', 'AT', 'BA', 'BE', 'BG', 'CH…

Next up we create an interactive scatterplot of GDP per capita and population for the highlighted years.

In [3]:
def plot_f(inner, year):
    I = dp.inner['year'] == year
    ax = dp.inner.loc[I,:].plot(x='GDP/Cap', y='Population', style='o', legend=False)
    ax.set_ylabel('Population in millions')
    ax.set_xlabel('GDP per capita in euros')
    ax.set_title(f"Scatterplot of GDP per capita and Population for {year}")
    plt.subplots_adjust(left=0.2, right=1, top=0.9, bottom=0.1)
    plt.show()
    
year_widget = widgets.Dropdown(options=dp.inner['year'].unique(), value=2022, description='Year:')
widgets.interact(plot_f, inner=widgets.fixed(dp.inner), year=year_widget)

interactive(children=(Dropdown(description='Year:', index=10, options=(2012, 2013, 2014, 2015, 2016, 2017, 201…

<function __main__.plot_f(inner, year)>

We will now do some standard caluculations for the data, mean value, average, etc.
In this we also change the format of the output, to make it more readable. 

In [19]:
pd.options.display.float_format = '{:.2f}'.format
dp.inner.describe()

Unnamed: 0,year,GDP,Population,GDP/Cap
count,396.0,396.0,396.0,396.0
mean,2016.93,455078.59,16427120.83,28464.5
std,3.14,732122.54,23181705.27,22191.51
min,2012.0,3353.7,319575.0,3364.92
25%,2014.0,36851.28,2076912.75,11794.94
50%,2017.0,177109.7,6981901.5,20282.0
75%,2020.0,452793.88,11412821.5,40758.39
max,2022.0,3261011.6,83614362.0,98633.75


From the above calculations we can see that the mean value for GDP per capita across all the countries and the years is 28464.5 euros, with a standard deviation of 22191.51 euros. 
The GDP per capita values vary quiet a bit with a minimum value of 3364.92 euros and a maximum of 98633.75 euros. 

By looking at the scatterplot that shows population against GDP per capita in euros, it can be seen that the countries with a high GDP per capita also have a relatively small population, ofcourse this does not mean that a high GDP per capita equals a small population, it is more likely that it is the other way around. 

By looking at the scatterplot that shows how the GDP per capita has evolved for each country over the years 2012-2022, we can see that there has been a genereal growth in the GDP per capita, with the exception of 2020, which most likely is due to the corona pandemic