# A. Assignment details

- What: Download, manipulate and merge two or more datasets from below
- How: Create functions in a python file to do this
- Show: Show how to use these python files in a notebook

More details here: reports/assignments/milestone_1/extracredit_explainer.md


## 1.1 Download data

In [66]:
import requests
import pandas as pd

def download_worldbank(indicator, countries, date_start, date_end):
    url_base = 'http://api.worldbank.org/v2/'  # Base URL for the World Bank API
    country_codes = ';'.join(countries)  # Combine country codes into a string
    url = url_base + f'country/{country_codes}/indicator/{indicator}?date={date_start}:{date_end}&per_page=30000' #create the url with start and end date.
    # url = url_base + f'country/{country_codes}/indicator/{indicator}?per_page=30000' # This line overrides the previous one. It will ignore start/end date.

    response = requests.get(url)  # Download data from the URL
    df = pd.read_xml(response.content)  # Convert the downloaded data to a table
    return df  # Return the table

In [89]:
df = download_worldbank("NE.TRD.GNFS.ZS", ["US"], 2019, 2023)
df.info()
df

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   indicator        5 non-null      object 
 1   country          5 non-null      object 
 2   countryiso3code  5 non-null      object 
 3   date             5 non-null      int64  
 4   value            5 non-null      float64
 5   unit             0 non-null      float64
 6   obs_status       0 non-null      float64
 7   decimal          5 non-null      int64  
dtypes: float64(3), int64(2), object(3)
memory usage: 452.0+ bytes


Unnamed: 0,indicator,country,countryiso3code,date,value,unit,obs_status,decimal
0,Trade (% of GDP),United States,USA,2023,24.899363,,,0
1,Trade (% of GDP),United States,USA,2022,26.89169,,,0
2,Trade (% of GDP),United States,USA,2021,25.213656,,,0
3,Trade (% of GDP),United States,USA,2020,23.079778,,,0
4,Trade (% of GDP),United States,USA,2019,26.258481,,,0


In [90]:
df_1 = download_worldbank("SP.POP.TOTL", ["US"], 2019, 2023)
df_1.info()
df_1

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 8 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   indicator        5 non-null      object 
 1   country          5 non-null      object 
 2   countryiso3code  5 non-null      object 
 3   date             5 non-null      int64  
 4   value            5 non-null      int64  
 5   unit             0 non-null      float64
 6   obs_status       0 non-null      float64
 7   decimal          5 non-null      int64  
dtypes: float64(2), int64(3), object(3)
memory usage: 452.0+ bytes


Unnamed: 0,indicator,country,countryiso3code,date,value,unit,obs_status,decimal
0,"Population, total",United States,USA,2023,334914895,,,0
1,"Population, total",United States,USA,2022,333271411,,,0
2,"Population, total",United States,USA,2021,332048977,,,0
3,"Population, total",United States,USA,2020,331526933,,,0
4,"Population, total",United States,USA,2019,328329953,,,0


## 1.2 Manipulate Data

In [91]:
df = df.query("date in [2020, 2022]")[['countryiso3code', 'date', 'value']].dropna()
df.tail(2)

Unnamed: 0,countryiso3code,date,value
1,USA,2022,26.89169
3,USA,2020,23.079778


In [92]:
df_1 = df_1.query("date in [2020, 2022]")[['countryiso3code', 'date', 'value']].dropna()
df_1.tail(2)

Unnamed: 0,countryiso3code,date,value
1,USA,2022,333271411
3,USA,2020,331526933


In [93]:
df = df.rename({"countryiso3code":'country', "year":'date', "value":"Trade (% of GDP)"}, axis=1)
df.head(2)

Unnamed: 0,country,date,Trade (% of GDP)
1,USA,2022,26.89169
3,USA,2020,23.079778


In [94]:
df_1 = df_1.rename({"countryiso3code":'country', "year":'date', "value":"Population, total"}, axis=1)
df_1.head(2)

Unnamed: 0,country,date,"Population, total"
1,USA,2022,333271411
3,USA,2020,331526933


In [95]:
df = df.set_index(['country', 'date'])
df.head(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,Trade (% of GDP)
country,date,Unnamed: 2_level_1
USA,2022,26.89169
USA,2020,23.079778


In [96]:
df_1 = df_1.set_index(['country', 'date'])
df_1.head(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,"Population, total"
country,date,Unnamed: 2_level_1
USA,2022,333271411
USA,2020,331526933


In [97]:
df.sort_index().tail(2)


Unnamed: 0_level_0,Unnamed: 1_level_0,Trade (% of GDP)
country,date,Unnamed: 2_level_1
USA,2020,23.079778
USA,2022,26.89169


In [98]:
df_1.sort_index().tail(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,"Population, total"
country,date,Unnamed: 2_level_1
USA,2020,331526933
USA,2022,333271411


## 1.3 Merge Data

In [99]:
df_merge = pd.merge(
    df,
    df_1,
    right_index = True,
    left_index = True,
    how = 'inner'
    
)
df_merge.tail(2)

Unnamed: 0_level_0,Unnamed: 1_level_0,Trade (% of GDP),"Population, total"
country,date,Unnamed: 2_level_1,Unnamed: 3_level_1
USA,2022,26.89169,333271411
USA,2020,23.079778,331526933


## Global datasets

- What: Download, manipulate and merge two or more datasets from below
- How: Create functions in a python file to do this
- Show: Show how to use these python files in a notebook

* **Penn World Table (PWT):** Provides purchasing power parity and national income accounts data.
    * **Link:** [http://www.ggdc.net/pwt](http://www.ggdc.net/pwt)

* **UNCTADstat (UNCTAD):** Offers data on trade, investment, and development.
    * **Link:** [https://unctadstat.unctad.org/](https://unctadstat.unctad.org/)

* **FAOSTAT (FAO):** Provides data on food, agriculture, forestry, and related areas.
    * **Link:** [http://www.fao.org/faostat/en/#data](http://www.fao.org/faostat/en/#data)

* **ILOSTAT (ILO):** Offers labor statistics, including employment and wages.
    * **Link:** [https://ilostat.ilo.org/](https://ilostat.ilo.org/)

* **Federal Reserve Economic Data (FRED):** Economic and financial data from the Federal Reserve.
    * **Link:** [https://fred.stlouisfed.org/](https://fred.stlouisfed.org/)

* **Bank for International Settlements (BIS):** Data and statistics on banking, financial markets, and the global economy.
    * **Link:** [https://www.bis.org/statistics/index.htm](https://www.bis.org/statistics/index.htm)

* **International Monetary Fund (IMF) Data:** A variety of macroeconomic and financial data.
    * **Link:** [https://www.imf.org/en/data](https://www.imf.org/en/data)

* **Eurostat:** Provides a wide range of statistics on the European Union, including economic, social, and demographic data.
    * **Link:** [https://ec.europa.eu/eurostat/](https://ec.europa.eu/eurostat/)

* **World Bank Data:** Comprehensive data on development indicators across countries.
    * **Link:** [https://data.worldbank.org/](https://data.worldbank.org/)

* **Varieties of Democracy (V-Dem):** Datasets measuring various aspects of democracy across countries.
    * **Link:** [https://v-dem.net/](https://v-dem.net/)

* **OECD Data:** Statistics and indicators from the Organisation for Economic Co-operation and Development, covering a broad range of topics.
    * **Link:** [https://data.oecd.org/](https://data.oecd.org/)

* **Quality of Government (QoG):** Datasets on governance, institutions, and quality of government.
    * **Link:** [https://qog.gu.se/](https://qog.gu.se/)

* **IPCC Data:** Climate change data and scenarios from the Intergovernmental Panel on Climate Change.
    * (Note: Access to data varies, often through specific reports or data portals linked within the IPCC website)
    * **Link:** [https://www.ipcc.ch/](https://www.ipcc.ch/)

* **Our World in Data:** Research and data on global development, poverty, health, and other topics.
    * **Link:** [https://ourworldindata.org/](https://ourworldindata.org/)

* **UN Comtrade:** United Nations Commodity Trade Statistics Database.
    * **Link:** [https://comtrade.un.org/](https://comtrade.un.org/)


## Academic datasets

* **Dallas Fed Global Economic Indicators (DGEI):** [DGEI](https://www.dallasfed.org/research/international/dgei)
* **Dallas Fed International House Price Database:** [House Price](https://www.dallasfed.org/research/international/houseprice#data)
* **New York Fed r* (Natural Rate of Interest):** [r*](https://www.newyorkfed.org/research/policy/rstar)
* **Chinn-Ito Index (KAOPEN):** [KAOPEN](https://web.pdx.edu/~ito/Chinn-Ito_website.htm)
* **Metrick-Schmelzing Paper and Database (Long-Term Real Rates):** [Real Rates](https://som.yale.edu/centers/program-on-financial-stability/metrick-schmelzing-paper-and-database)
* **Yale Program on Financial Stability COVID-19 Tracker:** [COVID-19 Tracker](https://som.yale.edu/centers/program-on-financial-stability/covid-19-tracker)
* **IMF Financial Integration:** [Financial Integration](https://www.imf.org/en/Publications/WP/Issues/2017/05/10/International-Financial-Integration-in-the-Aftermath-of-the-Global-Financial-Crisis-44906)
* **IMF Macroprudential Policy Survey:** [Macroprudential](https://www.elibrary-areaer.imf.org/Macroprudential/Pages/Home.aspx)
* **Nancy Xu's Risk Aversion Index:** [Risk Aversion](https://www.nancyxu.net/risk-aversion-index)
* **Jorda-Schularick-Taylor Macrohistory Database:** [Macrohistory](https://www.nber.org/research/data/jorda-schularick-taylor-macrohistory)
* **Central Bank Independence (CBI) Data:** [CBI Data](https://sites.google.com/site/carogarriga/cbi-data-1)
* **Global Inflation Data:** [Inflation Data](https://www.worldbank.org/en/research/brief/inflation-database)

## U.S Datasets
- https://www.sca.isr.umich.edu/tables.html
- https://www.bea.gov/data
- https://www.bls.gov/jlt/
- https://adpemploymentreport.com/

## Japan datasets

* **Economy Watchers Survey (Cabinet Office):** [Watchers Survey](https://www5.cao.go.jp/keizai3/watcher.html)
* **Consumption Trend Index (ESRI, Cabinet Office):** [Consumption Index](https://www.esri.cao.go.jp/en/stat/shouhi/shouhi-e.html)
* **Prefectural Accounts (ESRI, Cabinet Office):** [Prefectural Accounts](https://www.esri.cao.go.jp/jp/sna/sonota/kenmin/kenmin_top.html)
* **Insurance Statistics (General Insurance Association of Japan):** [Insurance Stats](https://www.sonpo.or.jp/en/statistics/index.html)
* **FSA Policy Response (Financial Services Agency):** [FSA Response](https://www.fsa.go.jp/news/r1/20200313-2.html)
* **Franchise Industry Data (Japan Franchise Association):** [Franchise Data](https://www.jfnet.or.jp/data/data_c.html)
* **Trends Foreign Visitors (MLIT, Japan Tourism Agency):** [Visitor Spending](https://www.mlit.go.jp/kankocho/tokei_hakusyo/gaikokujinshohidoko.html)
* **Tax and Stamp Revenues (Ministry of Finance):** [Tax Revenues](https://www.mof.go.jp/tax_policy/reference/taxes_and_stamp_revenues/data.htm)
* **Life Insurance Statistics (Life Insurance Association of Japan):** [Life Insurance](https://www.seiho.or.jp/english/statistics/)
* **Economic Growth (Tokyo Foundation for Policy Research):** [GDP](https://www.tkfd.or.jp/research/detail.php?id=2983)
* **Consumer Price Index (Statistics Bureau of Japan):** [CPI](https://www.stat.go.jp/english/data/cpi/1588.html#his)
* **Bank Financial Statements (Financial Services Agency):** [Bank Statements](https://www.fsa.go.jp/status/ginkou_kessan/index.html)
* **Budget Revenue and Expenditure (Ministry of Finance):** [Budget Data](https://www.mof.go.jp/policy/budget/report/revenue_and_expenditure/index.htm)
* **Trade Statistics (Japan Customs):** [Trade Stats](https://www.customs.go.jp/toukei/info/index_e.htm)
* **Banking Statistics (Japanese Bankers Association):** [Banking Stats](https://www.zenginkyo.or.jp/stats/year2-02/)
* **Household Expenditure Survey (Statistics Bureau):** [Household Spending](https://www.stat.go.jp/data/kakei/longtime/index.html#time)
* **Economic Statistics (Federation of Economic Organizations):** [Econ Stats](https://www.zenkeijikyo.or.jp/statistics)
* **Regional Banks:** [Regional Banks data](https://www.chiginkyo.or.jp/news_topics/)
* **Japan Economic Policy Uncertainty Index:** [Policy Uncertainty](https://www.policyuncertainty.com/japan_monthly.html)