## More Commands with  Pandas
## Econ 148

This Demo seeks to introduce students to pandas using an economics dataset. We will use data from the Economic Transformation Database (ETD) which presents the following internationally comparable sectoral data on employment and productivity in Africa, Asia, and Latin America. Feel free to further explore the data at https://www.wider.unu.edu/database/etd-economic-transformation-database.

Kruse, H., E. Mensah, K. Sen, and G. J. de Vries (2022). “A manufacturing renaissance? Industrialization trends in the developing world”, IMF Economic Review DOI: 10.1057/s41308-022-00183-7

License: The GGDC/UNU-WIDER Economic Transformation Database is licensed under a Creative Commons Attribution 4.0 International License.

In [None]:
import pandas as pd

In [None]:
ETDdf = pd.read_csv("ETD.csv", thousands=',')
ETDdf

### Lets make a subset for Southeast Asian Countries

In [None]:
SEA_countries = [
    'Brunei', 'Cambodia', 'Indonesia', 'Laos', 'Malaysia',
    'Myanmar', 'Philippines', 'Singapore', 'Thailand',
    'Timor-Leste', 'Viet Nam']

In [None]:
df_se_asia = ETDdf[ETDdf['country'].isin(SEA_countries)]
df_se_asia 


## Again subset for total employment only 

In [None]:
df_se_asia_emp = df_se_asia[df_se_asia['var'] == 'EMP']
df_se_asia_emp

In [None]:
average_agriculture_employment = df_se_asia_emp.groupby('country')['Agriculture'].mean().sort_values()
average_agriculture_employment
#Note - this is a series

In [None]:
average_agriculture_employment_df = df_se_asia[df_se_asia['var'] == 'EMP'].groupby('country')['Agriculture'].mean().reset_index()
average_agriculture_employment_df
#Note - reset_index yields a dataframe

### We can pass in many different arguments to  'groupby'

In [None]:
df_se_asia_emp.groupby('country')['Agriculture'].min()

In [None]:
df_se_asia_emp.groupby('country')['Agriculture'].max()

In [None]:
df_se_asia_emp.groupby('country')['Agriculture'].nunique()

In [None]:
df_se_asia_emp.groupby('country')['Agriculture'].quantile([0.25, 0.5, 0.75])

### We can pass in multiple columns to groupby

Here let's ask for three columns

In [None]:
SEA_employment = df_se_asia_emp.groupby('country')[['Agriculture', 'Manufacturing', 'Total']].mean()
SEA_employment

In [None]:
SEA_employment['Agr_Percent'] = (SEA_employment['Agriculture'] / SEA_employment['Total']) * 100
SEA_employment['Manu_Percent'] = (SEA_employment['Manufacturing'] / SEA_employment['Total']) * 100
SEA_employment