# <center> Econ 373: Computational Economics (with Python) </center>
## <center> Homework 4 (group) </center>

# <font color='red'>Names:
Kogta, Shivani;
Peng, Yuanhang;
Ponsot, Gabriel;
Rodriguez, Alex;
Shen, Weijia</font>

# <font color='red'>Instructions:</font>
- Save all of your code to a .ipynb file (jupyter notebook file) and name it as **username_hw4.ipynb**, where username is the username of the group member who submits the homework. 
    - **You should remove any test cells/code that is outside of functions.**
    - Submit only username_hw4.ipynb file
- For each question, your file should contain a function labeled **q#** with input/output requirements specified below. 
    - The input refers to the arguments passed to the function. 
    - The output refers to what is returned by the function.
    - We may require output to file or screen within a function, but if that is the case it will be clearly specified.
    - Your functions may call other functions or classes that you create, but they have to be included in the file (i.e., the file that you submit should be self-contained).
    - If your function calls on functions from other libraries, you need to load them within the function (e.g., if you use the os library you should assume that it has been installed on the computer but it has not been imported before calling your function).
    

## Grading

- We will run your file by clicking Kernel--> Restart and Run All. You file should be able to reproduce all the results stored in your jupyter file. 

- We may also run your code by specifying q#(arg) in an empty cell. It should reproduce your stored results. 

Each question is graded on a 3-point scale + 1 point for following the instructions 
- 0 -- no or minimal work submitted (e.g., minor modification of the 'starting point')
- 1 -- some work done but there are errors running/executing the code or results are mostly incomplete
- 2 -- code runs, but results are either somewhat incomplete, incorrect, or there is clear room for improvement (e.g., no comments in the code, graphs are not labelled, etc.) 
- 3 -- all results complete and correct with clear commented code 

In [1]:
#libraries that will be used in this HW 
import os
import shutil as sh
import pandas as pd
#you can add other libraries as needed

In [2]:
# To find your working directory:
%pwd 
# Code in case you want to change your working directory:  %cd
# for example: %cd "C:\Users\\Purdue\ComputationalEconomicsECON320\Week3_4\Group\"
%cd "/Users/shivanikogta/Downloads/ECON373/"
# Define your data folder here:
datafolder = '/Users/shivanikogta/Downloads/ECON373/'
# Please use an absolute path

/Users/shivanikogta/Downloads/ECON373


In [3]:
%pwd 

'/Users/shivanikogta/Downloads/ECON373'

# Question 1
Write a function called **q1** to merge the gdppercapita data with the life expectancy data for the year 2000. 

- **Input**: none
- **Output**: a data frame with 3 columns: country, gdppercapita, life expectancy

<font color='red'> Your column name should be exactly: ['Country', 'GDP Per Capita', 'Life Expectancy']  </font>

*Make sure that life_expectancy_years.csv and gdppercapita.csv are in the data folder.*

In [4]:
def q1():
    # Load the data using read_csv
    life_expectancy = pd.read_csv('life_expectancy_years.csv') #Read in life expectancy CSV file
    gdppercapita = pd.read_csv('gdppercapita.csv') #Read in GDP per capita CSV file

    # Merge the data on the 'country' column
    merged_data = pd.merge(life_expectancy, gdppercapita, on='country')
    merged_data_2000 = merged_data[['country', '2000_x', '2000_y']]

    # Rename the columns
    merged_data_2000.columns = ['Country', 'GDP Per Capita', 'Life Expectancy']

    return merged_data_2000

# Question 2
Write a function called **q2** to find the country with the highest average life expectancy between 1900 and 1950. 

- **Input**: none
- **Output**: country name.

*Make sure that life_expectancy_years.csv is in the data folder.*

In [5]:
#STARTING POINT

def q2():
    # Load the data using read_csv
    life_expectancy = pd.read_csv('life_expectancy_years.csv')

    #for the years 1900 to 1950
    
    selected_years = life_expectancy.loc[:, ['country'] + [str(year) for year in range(1900, 1951)]]
    
    # Calculate average life expectancy for each country
    selected_years['Average Life Expectancy'] = selected_years.iloc[:, 1:].mean(axis=1)

    # Find the country with the highest average life expectancy
    country_name = selected_years.loc[selected_years['Average Life Expectancy'].idxmax(), 'country']

    return country_name

# Question 3
Write a function called **q3** to find the correlation between GDP growth and the lag 1 housing price index between 1980 and 2010.

- **Input**: none
- **Output**: calculated number

*Make sure that FRED_GDP.csv and USSTHPI.csv are in the data folder.*

In [6]:
import pandas as pd

def q3():
   
    # Read GDP data
    gdp = pd.read_csv(datafolder+"FRED_GDP.csv")
    gdp['DATE'] = pd.to_datetime(gdp['DATE'])
    gdp.set_index('DATE', inplace=True)
    
    # Create lagged GDP
    gdp['LAG_GDP'] = gdp['GDP'].shift(1)
    
    # Calculate GDP growth
    gdp['GDP_Growth'] = (gdp['GDP'] / gdp['LAG_GDP']) - 1
    
    # Get the relevant subset of GDP data (between 1980 and 2010)
    gdp_subset = gdp.loc['1980-01-01':'2010-12-31']
    
    # Read HPI data
    hpi = pd.read_csv(datafolder+"USSTHPI.csv")
    hpi['DATE'] = pd.to_datetime(hpi['DATE'])
    hpi.set_index('DATE', inplace=True)
    
    # Create lagged HPI
    hpi['LAG_HPI'] = hpi['USSTHPI'].shift(1)
    
    # Get the relevant subset of HPI data (between 1980 and 2010)
    hpi_subset = hpi.loc['1980-01-01':'2010-12-31']
    
    # Merge or combine two datasets
    merged_data = pd.concat([gdp_subset['GDP_Growth'], hpi_subset['LAG_HPI']], axis=1)
    
    # Find the correlation between GDP growth and lagged HPI
    correlation = merged_data['GDP_Growth'].corr(merged_data['LAG_HPI'])
    
    return correlation
