# Goal of this practice

In this practice, we will learn several pandas DataFrame operations for transforming data. We will use data from the World Bank that gives yearly GDP per country since 1960. We can formulate various tasks with this data. Right now we will foucus on inspecting and getting basic stats from this data.

# Import pandas

In [None]:
import pandas as pd

data_path = "../resources/gdp.csv"


# Load the data

Before loading any data file, we should open the file in a simple text editor if the file is small. If the file is large, we can use `head` or `less` functions in a **terminal** to read the first few lines. This will give us an idea whether we need to skip some rows in order to load the data. E.g., the following code will throw an error as the data is not in expected format.


In [None]:
gdp_df = pd.read_csv(data_path)

Let's open the terminal and change to `Module3/resources`. If we execture `head -n 5 gdp.csv` (i.e read the first 5 lines) we will see that the first 4 lines are meta data and we need to skip these rows. 

In [None]:
gdp_df = pd.read_csv(data_path, skiprows=4)

## Inspect the data frame

In [None]:
# write your code here

# Task: print first two rows of data
print(gdp_df.head(2))

# Task: What is shape of the data?
print(gdp_df.shape)

# Task: print the column names
print(gdp_df.columns)

# Task: show the data types
print(gdp_df.dtypes)

# Remove unnecessary columns

In [None]:
# TODO: drop 'Country Code', 'Indicator Name', 'Indicator Code' columns with drop function
gdp_df_minimal = gdp_df.drop(columns=['Country Code', 'Indicator Name', 'Indicator Code'])

In [None]:
gdp_df_minimal.columns

In [None]:
gdp_df_minimal['Country Name']

## Get country wise stats

Pandas `describe` method operated over numerical columns. Right now country data is row-wise. We need to make them column-wise.

In [None]:
# Set the Index on Country Name - previously was numeric.
gdp_transpose = gdp_df_minimal.set_index('Country Name')

# TODO: Transpose the index (i.e. country) and columns (year)
gdp_transpose = gdp_transpose.transpose()


In [None]:
# apply the describe function
gdp_transpose.describe()

In [None]:
gdp_transpose.columns

## Repeat the above exercise. This time only focus on the last 10 years of data. 


In [None]:
gdp_df_minimal.head()

In [None]:
gdp_df_minimal = gdp_df_minimal.set_index('Country Name')

In [None]:
del gdp_df_minimal['Unnamed: 64']

In [None]:
gdp_df_minimal.head()

In [None]:
# check whether the values in col 2019 is all NaN

print(gdp_df_minimal['2019'].isnull().all())  # will print True if all are NaN

# remove the col 2019. Sol will be accepted even if col 2019 is not deleted.

gdp_df_minimal = gdp_df_minimal.drop(columns=['2019'])



In [None]:
gdp_last_10years_df = gdp_df_minimal.iloc[:, -10:]

gdp_last_10years_df.head()

In [None]:
gdp_transpose2 = gdp_last_10years_df.transpose()
gdp_transpose2.describe()

## Extract the following information from this data

In [None]:
# Task: which country has the maximum GDP in 2011?

ignore_cols = [
    'World', 'Arab World', 'Caribbean small states', 'Central Europe and the Baltics', 'East Asia & Pacific', 
    'East Asia & Pacific (excluding high income)', 'Euro area', 'Europe & Central Asia', 
    'Europe & Central Asia (excluding high income)',
    'European Union', 'Fragile and conflict affected situations', 'Heavily indebted poor countries (HIPC)', 
    'Latin America & Caribbean', 'Latin America & Caribbean (excluding high income)', 
    'Least developed countries: UN classification', 'Middle East & North Africa', 
    'Middle East & North Africa (excluding high income)', 'North America', 'OECD members', 'Other small states',
    'Pacific island small states', 'Small states', 'South Asia', 'Sub-Saharan Africa', 
    'Sub-Saharan Africa (excluding high income)', 'High income', 'Low & middle income', 'Low income', 
    'Lower middle income', 'Middle income', 'Upper middle income', 'Post-demographic dividend', 'IDA & IBRD total',
    'IBRD only', 'Late-demographic dividend'
    
]

gdp_trans_2 = gdp_transpose.drop(columns=ignore_cols)



In [None]:
# Task: get the avg GDP of France from the year 2001 to 2010
#gdp_trans_2.head()
france_gdp = gdp_trans_2['France']
france_gdp.head()

france_gdp_sub = france_gdp.loc['2001':'2010']
france_gdp_sub.head()
print(france_gdp_sub.mean())
print(f"{france_gdp_sub.mean()/1e12:.3} Trillion")



In [None]:
# Task: show country-wise pecent change in GDP with `pct_change()` function 
print(gdp_transpose.head())
gdp_transpose.pct_change()