diff --git a/lectures/datasets/mpd2020.xlsx b/lectures/datasets/mpd2020.xlsx new file mode 100644 index 000000000..d5076da25 Binary files /dev/null and b/lectures/datasets/mpd2020.xlsx differ diff --git a/lectures/long_run_growth.md b/lectures/long_run_growth.md index 1948322f6..4546e43ae 100644 --- a/lectures/long_run_growth.md +++ b/lectures/long_run_growth.md @@ -3,6 +3,8 @@ jupytext: text_representation: extension: .md format_name: myst + format_version: 0.13 + jupytext_version: 1.14.1 kernelspec: display_name: Python 3 (ipykernel) language: python @@ -11,7 +13,10 @@ kernelspec: # Long Run Growth -```{index} single: Introduction to Economics +```{admonition} Lecture IN-WORK +:class: warning + +This lecture is still **under construction** ``` ```{contents} Contents @@ -20,47 +25,362 @@ kernelspec: ## Overview -This lecture is about how different economies grow over the long run. +This lecture looks at different growth trajectories across countries over the long term. + +While some countries have experienced long term rapid growth across that has last a hundred years, others have not. + +First let us import the packages needed to explore what the data says about long run growth. + +```{code-cell} ipython3 +import pandas as pd +import os +import matplotlib as mpl +import matplotlib.pyplot as plt +import numpy as np +from matplotlib.lines import Line2D +``` -As we will see, some countries have had very different growth experiences -since the end of WWII. ++++ {"user_expressions": []} -References: +A project initiated by [Angus Maddison](https://en.wikipedia.org/wiki/Angus_Maddison) has collected many historical time series that study economic growth. -* https://www.imf.org/en/Publications/fandd/issues/Series/Back-to-Basics/gross-domestic-product-GDP -* https://www.stlouisfed.org/open-vault/2019/march/what-is-gdp-why-important -* https://wol.iza.org/articles/gross-domestic-product-are-other-measures-needed +We can use the [Maddison Historical Statistics](https://www.rug.nl/ggdc/historicaldevelopment/maddison/) to look at many different countries, including some countries dating back to the first century. +```{tip} +The data can be downloaded from [this webpage](https://www.rug.nl/ggdc/historicaldevelopment/maddison/) and clicking on the `Latest Maddison Project Release`. In this lecture we use the [Maddison Project Database 2020](https://www.rug.nl/ggdc/historicaldevelopment/maddison/releases/maddison-project-database-2020) using the `Excel` Format. The code we use here assumes you have downloaded that file and will teach you how to use [pandas](https://pandas.pydata.org) to import that data into a DataFrame. +``` -One drawback of focusing on GDP growth is that it makes no allowance for -depletion and degradation of natural resources. +**TODO:** This is using locally imported data, should we fetch this file as part of the lecture? +```{code-cell} ipython3 +data = pd.read_excel("datasets/mpd2020.xlsx", sheet_name='Full data') +data +``` -GDP per capita is gross domestic product divided by population. ++++ {"user_expressions": []} -GDP is the sum of gross value added by all resident producers in the economy -plus any product taxes and minus any subsidies not included in the value of -the products. +We can see that this dataset contains GDP per capita (gdppc) and population (pop) for many countries and years. -We use World Bank data on GPD per capita in current U.S. dollars. +Let's look at how many and which countries are available in this dataset -We require the following imports. +```{code-cell} ipython3 +data.country.unique() +``` ```{code-cell} ipython3 -import pandas as pd -import os -import matplotlib as mpl -import matplotlib.pyplot as plt -import numpy as np +len(data.country.unique()) +``` + ++++ {"user_expressions": []} + +We can now explore some of the 169 countries that are available. + +Let's now loop over each country to understand which years are available for each country + +```{code-cell} ipython3 +cntry_years = [] +for cntry in data.country.unique(): + cy_data = data[data.country == cntry]['year'] + ymin, ymax = cy_data.min(), cy_data.max() + cntry_years.append((cntry, ymin, ymax)) +cntry_years = pd.DataFrame(cntry_years, columns=['country', 'Min Year', 'Max Year']).set_index('country') +cntry_years +``` + +```{code-cell} ipython3 +cntry_years.loc['Australia'] +``` + ++++ {"user_expressions": []} + +Let us now reshape the original data into some convenient variables to enable quicker access to countries time series data. + +We can build a useful mapping between country code's and country names in this dataset + +```{code-cell} ipython3 +code_to_name = data[['countrycode','country']].drop_duplicates().reset_index(drop=True).set_index(['countrycode']) +``` + ++++ {"user_expressions": []} + +Then we can quickly focus on GDP per capita (gdp) + +```{code-cell} ipython3 +data +``` + +```{code-cell} ipython3 +gdppc = data.set_index(['countrycode','year'])['gdppc'] +gdppc = gdppc.unstack('countrycode') +``` + +```{code-cell} ipython3 +gdppc +``` + ++++ {"user_expressions": []} + +Looking at the United Kingdom we can first confirm we are using the correct country code + +```{code-cell} ipython3 +code_to_name.loc['GBR'] +``` + ++++ {"user_expressions": []} + +and then using that code to access and plot the data + +```{code-cell} ipython3 +fig = plt.figure(dpi=110) +gdppc['GBR'].plot(ax = fig.gca()) +``` + +We can see that the data is non-continuous for longer periods in early part of this milenium so we could choose to interpolate to get a continuous line plot. + +```{code-cell} ipython3 +fig = plt.figure(dpi=110) +cntry = 'GBR' +gdppc[cntry].interpolate().plot( + ax = fig.gca(), + title = f'GDP per Capita ({cntry})', + ylabel = 'International $\'s', + xlabel = 'Year' +); +``` + ++++ {"user_expressions": []} + +:::{note} +[International Dollars](https://en.wikipedia.org/wiki/International_dollar) are a hypothetical unit of currency that has the same purchasing power parity that the U.S. Dollar has in the United States and any given time. They are also known as Geary–Khamis dollar (GK Dollars). +::: + +As you can see from this chart economic growth started in earnest in the 18th Century and continued for the next two hundred years. + +How does this compare with other countries growth trajectories? Let's look at the United States (USA), United Kingdom (GBR), and China (CHN) + +```{code-cell} ipython3 +fig = plt.figure(dpi=110) +ax = fig.gca() +cntry = ['USA', 'GBR', 'CHN'] +gdppc[cntry].plot( + ax = ax, + title = f'GDP per Capita', + ylabel = 'International $\'s', + xlabel = 'Year' +) + +# Build Custom Legend +legend_elements = [Line2D([0], [0], color='blue', lw=4, label=code_to_name.loc['USA']['country']), + Line2D([0], [0], color='orange', lw=4, label=code_to_name.loc['GBR']['country']), + Line2D([0], [0], color='green', lw=4, label=code_to_name.loc['CHN']['country'])] +ax.legend(handles=legend_elements, loc='center right', bbox_to_anchor=(1.4,0.5)); + +#TODO: Define Styles for Countries to match colors and line styles (@mmcky) +``` + ++++ {"user_expressions": []} + +This dataset has been carefully curated to enable cross-country comparisons. + +Let's compare the growth trajectories of Australia (AUS) and Argentina (ARG) + +```{code-cell} ipython3 +fig = plt.figure(dpi=110) +gdppc[['AUS', 'ARG']].plot(ax = fig.gca()) +``` + ++++ {"user_expressions": []} + +As you can see the countries had similar GDP per capita levels with divergence starting around 1940. Australia's growth experience is both more continuous and less volatile post 1940. + ++++ {"user_expressions": []} + +## The Industrialized World + +Now we can look at total Gross Domestic Product (GDP) rather than focusing on GDP per capita (as a proxy for living standards). + +```{code-cell} ipython3 +data = pd.read_excel("datasets/mpd2020.xlsx", sheet_name='Full data') +data.set_index(['countrycode', 'year'], inplace=True) +data['gdp'] = data['gdppc'] * data['pop'] +gdp = data['gdp'].unstack('countrycode') +``` + ++++ {"user_expressions": []} + +### Early Industralization (1820 to 1940) + ++++ {"user_expressions": []} + +Gross Domestic Product + +```{code-cell} ipython3 +cntry = ['DEU', 'SUN', 'USA', 'GBR', 'FRA', 'JPN', 'CHN'] +start_year, end_year = (1820,1940) +fig = plt.figure(dpi=110) +gdp[cntry].loc[start_year:end_year].interpolate().plot( + ax=fig.gca(), +); +``` + ++++ {"user_expressions": []} + +GDP per Capita + +```{code-cell} ipython3 +cntry = ['DEU', 'SUN', 'USA', 'GBR', 'FRA', 'JPN', 'CHN'] +start_year, end_year = (1820,1940) +fig = plt.figure(dpi=110) +gdppc[cntry].loc[start_year:end_year].interpolate().plot( + ax=fig.gca() +); +``` + ++++ {"user_expressions": []} + +## The Modern Era (1970 to 2018) + ++++ {"user_expressions": []} + +Gross Domestic Product (GDP) + +```{code-cell} ipython3 +cntry = ['DEU', 'SUN', 'USA', 'GBR', 'FRA', 'JPN', 'CHN'] +start_year, end_year = (1970, 2018) +fig = plt.figure(dpi=110) +gdp[cntry].loc[start_year:end_year].interpolate().plot(ax=fig.gca()) +plt.savefig(f"plot-for-tom-gdp-{start_year}-to-{end_year}.png", dpi=200) +``` + ++++ {"user_expressions": []} + +GDP per Capita + +```{code-cell} ipython3 +cntry = ['DEU', 'SUN', 'USA', 'GBR', 'FRA', 'JPN', 'CHN'] +start_year, end_year = (1970, 2018) +fig = plt.figure(dpi=110) +gdppc[cntry].loc[start_year:end_year].interpolate().plot( + ax=fig.gca() +); ``` + ++++ {"user_expressions": []} + + +--- + +## Other Interesting Plots + +Here are a collection of interesting plots that could be linked to interesting stories + +Looking at China GDP per capita levels from 1500 through to the 1970's showed a long period of declining GDP per capital levels from 1700's to early 20th century. (Closed Border / Inward Looking Domestic Focused Policies?) + +```{code-cell} ipython3 +fig = plt.figure(dpi=110) +gdppc['CHN'].loc[1500:1980].interpolate().plot(ax=fig.gca()); +``` + ++++ {"user_expressions": []} + +China (CHN) then followed a very similar growth story from the 1980s through to current day China. + +```{code-cell} ipython3 +fig = plt.figure(dpi=110) +gdppc[['CHN', 'GBR']].interpolate().plot(ax = fig.gca()) +``` + +```{code-cell} ipython3 + +``` + ++++ {"user_expressions": []} + +## Regional Analysis + +```{code-cell} ipython3 +data = pd.read_excel("datasets/mpd2020.xlsx", sheet_name='Regional data', header=(0,1,2), index_col=0) +data.columns = data.columns.droplevel(level=2) +``` + +```{code-cell} ipython3 +regionalgdppc = data['gdppc_2011'].copy() +regionalgdppc.index = pd.to_datetime(regionalgdppc.index, format='%Y') +``` + +```{code-cell} ipython3 +regionalgdppc.interpolate(method='time', inplace=True) +``` + +```{code-cell} ipython3 +worldgdppc = regionalgdppc['World GDP pc'] +``` + +```{code-cell} ipython3 +fig = plt.figure(dpi=110) +ax = worldgdppc.plot( + ax = fig.gca(), + title='World GDP per capita', + xlabel='Year', + ylabel='2011 US$', +) +``` + +```{code-cell} ipython3 +fig = plt.figure(dpi=110) +regionalgdppc[['Western Offshoots', 'Sub-Sahara Africa']].plot(ax = fig.gca()) +``` + +```{code-cell} ipython3 +fig = plt.figure(dpi=200) +line_styles = ['-', '--', ':', '-.', '.', 'o'] # TODO: Improve this +ax = regionalgdppc.plot(ax = fig.gca(), style=line_styles) +plt.legend(loc='center left', bbox_to_anchor=(1.0, 0.5)) +``` + +```{code-cell} ipython3 + +``` + +```{code-cell} ipython3 + +``` + +```{code-cell} ipython3 + +``` + +```{code-cell} ipython3 + +``` + +```{code-cell} ipython3 + +``` + +```{code-cell} ipython3 + +``` + +```{code-cell} ipython3 + +``` + ++++ {"user_expressions": []} + +# Prior Work using World Bank Data (@aakash) + ++++ + The following code reads in the data into a pandas data frame. ```{code-cell} ipython3 wbi = pd.read_csv("datasets/GDP_per_capita_world_bank.csv") ``` ++++ {"user_expressions": []} ## Comparison of GDP between different Income Groups @@ -117,11 +437,13 @@ ax.set_xlabel("year") ax.set_ylabel("GDP per capita (current US$) ") ``` ++++ {"user_expressions": []} + ### Plot for Upper middle and lower middle income groups Now, we compare the time-series graphs of GDP per capita for upper middle and lower middle income group countries, taking one country from each group. China and Pakistan was chosen as they are from the same region. On analysing the graph, the difference is quite striking from 90s onwards. But also expected, as during that time China opened up for trade and labour. -It can be concluded that, further inspection reveals the economies are vastly different in the present time, unlike what the previous graph was suggesting. +It can be concluded that, further inspection reveals the economies are vastly different in the present time, unlike what the previous graph was suggesting. ```{code-cell} ipython3 # China, Pakistan (Upper middle income and lower middle income) @@ -160,11 +482,11 @@ ax.set_xlabel("year") ax.set_ylabel("GDP per capita (current US$) ") ``` - ++++ {"user_expressions": []} ## Histogram comparison between 1960, 1990, 2020 -We compare histograms of the **log** of GDP per capita for the years 1960, 1990 and 2020 for around 170 countries. The years have been chosen to give sufficient time gap between the histograms. We see that the overall plot is shifting towards right, denoting the upward trend in GDP per capita worldwide. And also, the overall distribution is becoming more Gaussian. Which indicates that the economies have gotten more uniform over the years. Economic disparities are getting lesser possibly because of globalisation, technological advancements, better use of resources etc. +We compare histograms of the **log** of GDP per capita for the years 1960, 1990 and 2020 for around 170 countries. The years have been chosen to give sufficient time gap between the histograms. We see that the overall plot is shifting towards right, denoting the upward trend in GDP per capita worldwide. And also, the overall distribution is becoming more Gaussian. Which indicates that the economies have gotten more uniform over the years. Economic disparities are getting lesser possibly because of globalisation, technological advancements, better use of resources etc. ```{code-cell} ipython3 def get_log_hist(data, years): @@ -180,3 +502,7 @@ def get_log_hist(data, years): wbiall = wbi.drop(['Country Name' , 'Indicator Name', 'Indicator Code'], axis=1) get_log_hist(wbiall, ['1960', '1990', '2020']) ``` + +```{code-cell} ipython3 + +```