# Pandas

This section is all about the package, Pandas. Are you ready? Let's go!

You will find some small tasks in sections below. Most codes are hidden and you can only see the output.

> Try to figure out by yourself, or search for references. Being able to search and find information needed is an important skill that benefits you and your career for a long time.





## Set up the environment

### Task: import pandas, and name it as pd

In [None]:
import pandas as pd

## Data Creation

### Task
Find the data [here](https://en.wikipedia.org/wiki/List_of_countries_by_GDP_(nominal)) for the GDP (nomial). For the top **10** countries, create a dataframe that has the names of country, the GDP of the country in 2021 (reported in 2022) by IMF in dollars , the population of the country (you can find the data [here](https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population))

In [None]:
df = pd.DataFrame({'Country':['United States', 'China', 'Japan', 'Germany', 'India', 'United Kingdom', 'France', 'Canada', 'Italy', 'Brazil'],
                   'GDP': [25346805, 19911593, 4912147, 4256540, 3534743, 3376003, 2936702, 2221218, 2058330, 1833274],
                   'Population':[332943701, 1425881285, 125502000, 83695430, 1417945080, 67081234, 67874000, 38856839, 58906742, 214956683]})

In [None]:
df

Unnamed: 0,Country,GDP,Population
0,United States,25346805,332943701
1,China,19911593,1425881285
2,Japan,4912147,125502000
3,Germany,4256540,83695430
4,India,3534743,1417945080
5,United Kingdom,3376003,67081234
6,France,2936702,67874000
7,Canada,2221218,38856839
8,Italy,2058330,58906742
9,Brazil,1833274,214956683


In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 3 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Country     10 non-null     object
 1   GDP         10 non-null     int64 
 2   Population  10 non-null     int64 
dtypes: int64(2), object(1)
memory usage: 368.0+ bytes


## Data Accessing

### Task: List the countries and their GDP

In [None]:
df[['Country', 'GDP']]

Unnamed: 0,Country,GDP
0,United States,25346805
1,China,19911593
2,Japan,4912147
3,Germany,4256540
4,India,3534743
5,United Kingdom,3376003
6,France,2936702
7,Canada,2221218
8,Italy,2058330
9,Brazil,1833274


### Task: List the countries and their population

In [None]:
df[['Country', 'Population']]

Unnamed: 0,Country,Population
0,United States,332943701
1,China,1425881285
2,Japan,125502000
3,Germany,83695430
4,India,1417945080
5,United Kingdom,67081234
6,France,67874000
7,Canada,38856839
8,Italy,58906742
9,Brazil,214956683


### Task: List the first country with its GDP and population

In [None]:
df.loc[0]

Country       United States
GDP                25346805
Population        332943701
Name: 0, dtype: object

## Data Selection

### Task: Show the countries whose GDP is more than 5 trillion dollars
> (Hint: the data source has US$ Million as a unit, hence you need to choose GDP > 5,000,000 unit)




In [None]:
df[df['GDP'] > 5000000]

Unnamed: 0,Country,GDP,Population
0,United States,25346805,332943701
1,China,19911593,1425881285


### Task: Show the countries whose GDP is more than 3 trillion dollars

In [None]:
df[df['GDP'] > 3000000]

Unnamed: 0,Country,GDP,Population
0,United States,25346805,332943701
1,China,19911593,1425881285
2,Japan,4912147,125502000
3,Germany,4256540,83695430
4,India,3534743,1417945080
5,United Kingdom,3376003,67081234


### Task: Show the countries whose pupulation is more than 1 billion

In [None]:
df[df['Population'] > 1000000000]

Unnamed: 0,Country,GDP,Population
1,China,19911593,1425881285
4,India,3534743,1417945080


### Task: Show the countries whose pupulation is less than 300 million

In [None]:
df[df['Population'] < 300000000]

Unnamed: 0,Country,GDP,Population
2,Japan,4912147,125502000
3,Germany,4256540,83695430
5,United Kingdom,3376003,67081234
6,France,2936702,67874000
7,Canada,2221218,38856839
8,Italy,2058330,58906742
9,Brazil,1833274,214956683


### Task: Show the countries whose pupulation is between 300 million and 500 million

In [None]:
df[(df['Population'] > 300000000) & (df['Population'] < 500000000)]

Unnamed: 0,Country,GDP,Population
0,United States,25346805,332943701


## Data Manipulation

### Task: Create a new column, called "GDP per capita", based on the column "GDP" and the column "Population"

In [None]:
df['GDP per capita'] = df['GDP'] / df['Population']
df

Unnamed: 0,Country,GDP,Population,GDP per capita
0,United States,25346805,332943701,0.076129
1,China,19911593,1425881285,0.013964
2,Japan,4912147,125502000,0.03914
3,Germany,4256540,83695430,0.050857
4,India,3534743,1417945080,0.002493
5,United Kingdom,3376003,67081234,0.050327
6,France,2936702,67874000,0.043267
7,Canada,2221218,38856839,0.057164
8,Italy,2058330,58906742,0.034942
9,Brazil,1833274,214956683,0.008529


## Data Understanding

### Task: With the three numeric columns, show the statsitics of each:


1.   Count
1.   Max
1.   Min
1.   Mean
1.   Median
1.   Quantiles
1.   25% Quantile
1.   50% Quantile
1.   75% QUantile
1.   Variance
1.   Std
1.   Total




In [None]:
df.describe()

Unnamed: 0,GDP,Population,GDP per capita
count,10.0,10.0,10.0
mean,7038736.0,383364300.0,0.037681
std,8371837.0,554569300.0,0.023292
min,1833274.0,38856840.0,0.002493
25%,2400089.0,67279430.0,0.019209
50%,3455373.0,104598700.0,0.041203
75%,4748245.0,303446900.0,0.050725
max,25346800.0,1425881000.0,0.076129


In [None]:
df.median()

GDP               3.455373e+06
Population        1.045987e+08
GDP per capita    4.120348e-02
dtype: float64

In [None]:
df.var()

GDP               7.008765e+13
Population        3.075471e+17
GDP per capita    5.425327e-04
dtype: float64

In [None]:
df.sum()

Country           United StatesChinaJapanGermanyIndiaUnited King...
GDP                                                        70387355
Population                                               3833642994
GDP per capita                                             0.376813
dtype: object