# Pandas II

#### Written for the QuantEcon Indian Summer Workshop (August 2022)
#### Author: [Shu Hu](https://shu-hu.com/intro.html)

With some imports:

In [1]:
!pip install --upgrade pandas-datareader



In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Let's consider the following dataframe ``df``.

In [3]:
url = "https://datascience.quantecon.org/assets/data/bball.csv"
df = pd.read_csv(url)

In [4]:
df

Unnamed: 0,Year,Player,Team,TeamName,Games,Pts,Assist,Rebound
0,2015,Curry,GSW,Warriors,79,30.1,6.7,5.4
1,2016,Curry,GSW,Warriors,79,25.3,6.6,4.5
2,2017,Curry,GSW,Warriors,51,26.4,6.1,5.1
3,2015,Durant,OKC,Thunder,72,28.2,5.0,8.2
4,2016,Durant,GSW,Warriors,62,25.1,4.8,8.3
5,2017,Durant,GSW,Warriors,68,26.4,5.4,6.8
6,2015,Ibaka,OKC,Thunder,78,12.6,0.8,6.8
7,2016,Ibaka,ORL,Magic,56,15.1,1.1,6.8
8,2016,Ibaka,TOR,Raptors,23,14.2,0.7,6.8


### Exercise 1 (Merge vs Group)

Given two dataframes ``df1`` and ``df2``.

In [5]:
df1 = df[0:3]
df2 = df[3:]

In [6]:
df1

Unnamed: 0,Year,Player,Team,TeamName,Games,Pts,Assist,Rebound
0,2015,Curry,GSW,Warriors,79,30.1,6.7,5.4
1,2016,Curry,GSW,Warriors,79,25.3,6.6,4.5
2,2017,Curry,GSW,Warriors,51,26.4,6.1,5.1


In [7]:
df2

Unnamed: 0,Year,Player,Team,TeamName,Games,Pts,Assist,Rebound
3,2015,Durant,OKC,Thunder,72,28.2,5.0,8.2
4,2016,Durant,GSW,Warriors,62,25.1,4.8,8.3
5,2017,Durant,GSW,Warriors,68,26.4,5.4,6.8
6,2015,Ibaka,OKC,Thunder,78,12.6,0.8,6.8
7,2016,Ibaka,ORL,Magic,56,15.1,1.1,6.8
8,2016,Ibaka,TOR,Raptors,23,14.2,0.7,6.8


### Exercise 1.1 (Merge: concat)

Concatenate ``df1`` and ``df2`` together.

### Exercise 1.2 (Group)

Group the dataframe ``df`` by column ``Player`` and apply the ``sum()`` function to the resulting groups.

### Exercise 1.3 (Group)

Group the dataframe ``df`` by column ``Player`` and apply ``mean()`` function to the resulting groups.

### Exercise 2

Please read the following QuantEcon lecture before starting:
- https://python-programming.quantecon.org/pandas.html

From reading we know that we can use [pandas-datareader](https://pandas-datareader.readthedocs.io/en/latest/) to access online data.

In [11]:
from pandas_datareader import wb

### Exercise 2.1

Use [wb.search()](https://pandas-datareader.readthedocs.io/en/latest/readers/world-bank.html?highlight=search#pandas_datareader.wb.search) method to find the Gross Domestic Products per capita in constant 2015 US$.

(Hint: use keywords, such as ``GDP`` or ``capita``.)

### Exercise 2.2
Using the id you obtained from Exercise 2.1 to acquire GDP per capita data 
- for countries US (``US``), Australia (``AU``) and India (``IN``)
- from year 2000 to 2022,

and store the data in a dataframe called ``dat``.

### Exercise 2.3

Calculate the average GDP per capita for these three countries, respectively, over the period.

(HINT. Use ``.groupby`` from Exercise 1.)

### Exercise 2.4

Plot the GDP per capita from 2000 to 2022 as time series for the three countries.

## Exercise 3 (Climate change vs GDP growth)

Next let's compare GDP per capita to the share of Droughts, floods, extreme temperatures around the world.

### Exercise 3.1

Use [wb.search()](https://pandas-datareader.readthedocs.io/en/latest/readers/world-bank.html?highlight=search#pandas_datareader.wb.search) method to find the share of Droughts, floods, extreme temperatures.

### Exercise 3.2 

Acquire the GDP per capita and the share of extreme weather for **ALL** available countries in year 2009 using ``id``s from Exercises 2.1 and 3.1.

Store the acquired data in a dataframe called ``df`` and name the columns by ``gdp`` and ``eweather``, respectively.

Here a higher value in the column ``eweather`` means that the corresponding country experienced more extreme weather situations.

### Exercise 3.3 

Use the [statsmodels](https://www.statsmodels.org/stable/regression.html) package to assess the relation between ``gdp`` and ``eweather`` using ordinary least squares regression.

In [25]:
import statsmodels.formula.api as sm