# Scientific Python Final Project
Topic: Examining the potential barriers to structural transformation in the Thai economy

Phakphum Jatupitpornchan

## Introduction

One main reason why many countries are still poor is that many of labor force are working in agricultural sector which is significantly less productive than other sectors. Strucutral transformation is usually defined as the process of transition from an economy dominated by agriculture to one dominated by industry and services. This process is often associated with economic growth and development. 

In this project, I aim to find suggestive evidence of what might be important factors that hinder the structural transformation in the Thai economy. The purpose of this exercise is to identify potential barriers which can be studied further in depth in the future. 

I use data from the World Development Indicators (WDI) and the UNDP Human Development Index to make international comparisons. I also use data from the Thai Labour Force Survey to make spatial comparisons within the country and to examine other dimensions of the problem.

Unfortunately, I found that Pyreadstat cannot read files on remote drives[^1]. Therefore, you will have to download the data onto your local machine to run the code. The data can be downloaded from the following links: https://drive.google.com/file/d/1O9mBmdLpJLvWrlo2EA0FgxixG82dwaAO/view?usp=sharing. 

I would like to apologize in advance since total file size is quite large (1.16 GB.).

[^1]: https://stackoverflow.com/questions/74214114/loading-a-sav-file-in-google-collab

## Results from international data.

First, I import the data. The data from the World Development Indicators (WDI) contains the statistics of various countries on GDP per Capita PPP (constant 2017 international $) and the employment share in agriculture from 1992 to 2021. The data from the UNDP Human Development contains the mean years of schooling among the population aged 25 years and older from 1992 to 2019.

Then, I pre-process the data by formatting them into a more usable way and merging them together.

In [1]:
## Import libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
## Import data
data_WB = pd.read_excel("Data_Final_Project\P_Data_Extract_From_World_Development_Indicators.xlsx")
data_UNDP = pd.read_csv("Data_Final_Project\HDR21-22_Composite_indices_complete_time_series.csv")


In [3]:
data_WB.head()

Unnamed: 0,Country Name,Country Code,Series Name,Series Code,1992 [YR1992],1993 [YR1993],1994 [YR1994],1995 [YR1995],1996 [YR1996],1997 [YR1997],...,2012 [YR2012],2013 [YR2013],2014 [YR2014],2015 [YR2015],2016 [YR2016],2017 [YR2017],2018 [YR2018],2019 [YR2019],2020 [YR2020],2021 [YR2021]
0,Afghanistan,AFG,Employment in agriculture (% of total employme...,SL.AGR.EMPL.ZS,64.771185,65.059005,65.186009,65.169112,65.061514,65.187699,...,50.416688,47.697315,44.798594,44.593516,44.337137,43.989031,44.4536,45.01604,45.983408,46.587823
1,Afghanistan,AFG,"GDP per capita, PPP (constant 2017 internation...",NY.GDP.PCAP.PP.KD,..,..,..,..,..,..,...,2122.830759,2165.340915,2144.449634,2108.714173,2101.422187,2096.093111,2060.698973,2079.921861,1968.341002,1516.273265
2,Albania,ALB,Employment in agriculture (% of total employme...,SL.AGR.EMPL.ZS,53.198792,52.987349,52.735871,52.413215,52.203394,51.75494,...,45.995631,44.198027,42.257063,41.283525,40.040852,38.078346,37.285732,36.416856,36.190744,35.640848
3,Albania,ALB,"GDP per capita, PPP (constant 2017 internation...",NY.GDP.PCAP.PP.KD,3264.836677,3598.827815,3921.634093,4471.623506,4908.956329,4400.334211,...,11228.005157,11361.307891,11586.873945,11878.495523,12291.901997,12771.054137,13317.1842,13653.248783,13278.434516,14596.015558
4,Algeria,DZA,Employment in agriculture (% of total employme...,SL.AGR.EMPL.ZS,23.967045,23.853056,23.734053,23.556856,23.372809,23.075537,...,10.755027,10.753295,9.746314,8.834767,8.535155,10.16129,10.092127,9.798442,10.023791,10.033098


In [4]:
data_UNDP.head()

Unnamed: 0,iso3,country,mys_1992,mys_1993,mys_1994,mys_1995,mys_1996,mys_1997,mys_1998,mys_1999,...,mys_2012,mys_2013,mys_2014,mys_2015,mys_2016,mys_2017,mys_2018,mys_2019,mys_2020,mys_2021
0,AFG,Afghanistan,1.067586,1.115817,1.164047,1.212277,1.251383,1.290489,1.329594,1.3687,...,2.209473,2.261614,2.313755,2.365896,2.46366,2.561425,2.659189,2.756953,2.854718,2.98507
1,AGO,Angola,,,,,,,,3.3935,...,3.909642,3.950166,3.99069,4.70404,5.417391,5.417391,5.417391,5.417391,5.417391,5.417391
2,ALB,Albania,7.350875,7.348996,7.347118,7.345239,7.627026,7.908813,8.190599,8.472386,...,10.02511,10.196281,10.370374,10.547439,10.727528,10.910692,11.096983,11.286455,11.286455,11.286455
3,AND,Andorra,,,,,,,,,...,10.587085,10.616062,10.64504,10.57328,10.5561,10.555773,10.555446,10.55512,10.55512,10.55512
4,ARE,United Arab Emirates,6.357381,6.656119,6.954857,7.253594,7.499132,7.74467,7.990207,8.235745,...,10.169965,10.338129,10.506293,10.674456,10.84262,12.0554,12.484,12.69403,12.69403,12.69403


In [None]:
### Pivot the WB data. Have each Series as a column.

## Illustrative Example of Analysis on One Year of Microdata (Can be skipped)

Since I will have to loop through all years of the Thai Labour Force Survey data that I have, it might be hard to understand the code. Therefore, I will illustrate the analysis on one year of data first. The example is also used to built functions that will be applied to all years of data as well.

## Analysis from All Years of Microdata.

We loop thorugh each year of the Thai Labour Force Survey data in this section. In each year, the data is pre-processed, tabulated, and stored in dictionaries. The values in the dictionaries are combined into dataframes and visualized.

## Conclusion