In [None]:
# pip install otter-grader

In [None]:
pip install openpyxl

In [None]:
# Initialize Otter
# If you need to install Otter, please uncomment and run the previous cell
import otter
grader = otter.Notebook("ps2.ipynb")

# Econ 144 – Problem Set 2

In [None]:
import numpy as np
import pandas as pd
import statsmodels.api as sm
import matplotlib.pyplot as plt

## Problem 1. Efficient Markets Hypothesis I

In this problem, we make a first attempt to answer the question of whether the stock market efficiently uses information in valuing stocks. 

The Efficient Markets Hypothesis (EMH) maintains that current stock prices fully reflect all available information. An implication of this hypothesis is that returns in the current period should not be systematically related to information known in earlier periods. Otherwise, we could use this information to predict stock returns, thus violating the EMH. As an analyst at an investment management company, you have been tasked with examining the validity of the EMH.

You have data on the universe of CRSP common stocks (your "estimation universe"). The data includes monthly observations of returns and several of the financial ratios most commonly used by academic researchers. Financial ratios can be grouped into seven broad categories: *Capitalization, Efficiency, Financial Soundness/Solvency, Liquidity, Profitability, Solvency, Valuation*. Your dataset includes one financial ratio from each of these categories:

1. **totdebt_invcap** (total debt/invested capital):  total tebt (long-term and current) as a fraction of invested capital (category: capitalization)
2. **sale_invcap** (sales/invested capital): sales per dollar of invested capital (category: efficiency)
3. **fcf_ocf** (free cash flow/operating cash flow): free cash flow as a fraction of operating cash flow, where free cash flow is defined as the difference between operating cash flow and capital expenditures (category: financial soundness)
4. **cash_ratio** (cash ratio): cash and short-term investments as a fraction of current liabilities (category: liquidity)
5. **roe** (return on equity): net income as a fraction of average book equity based on most recent two periods, where book equity is defined as the sum of total parent stockholders' equity and deferred taxes and investment tax credit (category: profitability)
6. **de_ratio** (total debt/equity): total liabilities to shareholders’ equity (common and preferred) (category: solvency)
7. **bm** (book/market): book value of equity as a fraction of market value of equity (category: valuation)

Other variables in the dataset:

1. **permno**: CRSP company identifier
2. **date** : observation date
3. **ticker**: company ticker
4. **price**: stock price
5. **return**: net simple return from end of previous month to end of observation month
6. **vwretd**: return of CRSP value-weighted index (including dividends)
7. **vwretx**: return of CRSP value-weighted index (excluding dividends)

The data for this exercise is in the file `finratios.xlsx`. The file contains 13 monthly observations for the stocks in the estimation universe from October 2022 through October 2023. 


In [None]:
rawdata = pd.read_excel('finratios.xlsx')
rawdata.head()

For this problem, we will work with a single cross section (October 2023) of the data that contains a subset of the financial ratios in the dataset. We need to shift the financial ratios by a month so that we will be regressing month $t$ returns on month $t-1$ ratios, i.e., we would like to determine whether we can use information known at month $t-1$ to predict returns between month $t-1$ and month $t$.

In [None]:
data = pd.DataFrame({'permno'  : rawdata['permno'],
                     'retdate' : rawdata['date'],
                     'secret'  : rawdata['secret'],
                     'dkr'     : rawdata['totdebt_invcap'].shift(1),
                     'bm'      : rawdata['bm'].shift(1),
                     'sale'    : rawdata['sale_invcap'].shift(1),
                     'de'      : rawdata['de_ratio'].shift(1)
                   })
data = data.dropna(subset=['secret','dkr','bm','sale','de']) # drop rows with NaN values
data = data.loc[data['retdate'] == 20221031] # select cross section
data.describe()

<!-- BEGIN QUESTION -->

**Question 1.a.**
Plot a scatter diagram of stock returns (`secret`) against the (lagged) total debt/invested capital ratio (`dkr`). Put the stock returns on the vertical axis and `dkr` on the horizontal axis. What do you notice? 

*Hint: Think about OLS assumption (A3).*



<!--
BEGIN QUESTION
name: q1_a
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.b.**
Regress `secret` on `dkr` and a constant. Based on the regression results, what is your initial evaluation of the EMH with respect to the total debt/invested capital ratio?
    
<!--
BEGIN QUESTION
name: q1_b
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.c.**
Run the following code and recreate the scatter plot from the previous question. Describe what this code is doing. Comment on the differences between this scatter plot and the previous scatter plot. How many observations where dropped by running the provided code? Do you feel this is a significant number? 

<!--
BEGIN QUESTION
name: q1_c
manual: true
-->

In [None]:
data = data.loc[(data['secret'] <= np.quantile(data['secret'], 0.995)) & \
                (data['secret'] >= np.quantile(data['secret'], 0.005)) & \
                (data['dkr']    <= np.quantile(data['dkr'], 0.995)) & \
                (data['dkr']    >= np.quantile(data['dkr'], 0.005)) & \
                (data['sale']   <= np.quantile(data['sale'], 0.995)) & \
                (data['sale']   >= np.quantile(data['sale'], 0.005)) & \
                (data['bm']     <= np.quantile(data['bm'], 0.995)) & \
                (data['bm']     >= np.quantile(data['bm'], 0.005)) & \
                (data['de']     <= np.quantile(data['de'], 0.995)) & \
                (data['de']     >= np.quantile(data['de'], 0.005))]
data.describe()

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.d.**
Re-regress `secret` on `dkr` and a constant. Comment on the differences in the coefficient estimates and the standard errors between this regression and the previous regression -- do they make sense, given the visual evidence? Based on the regression results, has your initial evaluation of the EMH (with respect to total the debt/invested capital ratio) changed?
    
<!--
BEGIN QUESTION
name: q1_d
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.e.**
Now regress `secret` on `dkr`, the sales/invested capital ratio (`sale`), and a constant. Based on the resuls of this regression and the previous regression, what is the sign of the correlation between `dkr` and `sale`? Alternatively, is there not enough information to determine the sign of the correlation?
    
<!--
BEGIN QUESTION
name: q1_e
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.f.**
Based on the regression results, evaluate the EMH (with respect to the total debt/invested capital and the sales/invested capital ratio).
    
<!--
BEGIN QUESTION
name: q1_f
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.g.**
Finally, regress `secret` on `dkr`, `sale`, the book/market ratio (`bm`), the total debt/equity ratio (`de`), and a constant. 

Based on the regression results, evaluate the EMH (with respect to all four ratios).

<!--
BEGIN QUESTION
name: q1_g
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 1.h.**
Set up and evaluate (i.e., run) a formal hypothesis test to support your conclusion in the previous question. What is your null hypothesis? Do you reject your null hypothesis?

<!--
BEGIN QUESTION
name: q1_h
manual: true
-->

_Type your answer here, replacing this text._

## Problem 2. Efficient Markets Hypothesis II

In this problem, we continue with our analysis of the EMH using the dataset from problem 1. This time, however, we will make use of the entire datset, not just a single cross section.


In [None]:
from linearmodels.panel import PanelOLS
from linearmodels.panel import PooledOLS

In [None]:
rawdata = pd.read_excel('finratios.xlsx')
rawdata.head()

<!-- BEGIN QUESTION -->

**Question 2.a.**
Run the following code and describe what this code is doing. 

<!--
BEGIN QUESTION
name: q2_a
manual: true
-->

In [None]:
data = pd.DataFrame({'permno'  : rawdata['permno'],
                     'retdate' : rawdata['date'],
                     'secret'  : rawdata['secret']-rawdata['vwretd'],
                     'dkr'     : rawdata['totdebt_invcap'].shift(1),
                     'bm'      : rawdata['bm'].shift(1),
                     'sale'    : rawdata['sale_invcap'].shift(1),
                     'de'      : rawdata['de_ratio'].shift(1)
                   })
data = data.dropna(subset=['secret','dkr','bm','sale','de']) # drop rows with NaN values
data = data.loc[data['retdate'] > 20211115]
data = data.loc[(data['secret'] <= np.quantile(data['secret'], 0.995)) & \
                (data['secret'] >= np.quantile(data['secret'], 0.005)) & \
                (data['dkr']    <= np.quantile(data['dkr'], 0.995)) & \
                (data['dkr']    >= np.quantile(data['dkr'], 0.005)) & \
                (data['sale']   <= np.quantile(data['sale'], 0.995)) & \
                (data['sale']   >= np.quantile(data['sale'], 0.005)) & \
                (data['bm']     <= np.quantile(data['bm'], 0.995)) & \
                (data['bm']     >= np.quantile(data['bm'], 0.005)) & \
                (data['de']     <= np.quantile(data['de'], 0.995)) & \
                (data['de']     >= np.quantile(data['de'], 0.005))]
data.info()

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 2.b.**
Run the following code and describe what this code is doing. 

<!--
BEGIN QUESTION
name: q2_b
manual: true
-->

In [None]:
permno = pd.DataFrame({'permno' : data['permno'].unique()})
nco = len(permno)
for i in range(nco):
    pdata = data.loc[data['permno'] == permno['permno'][i]]
    ndt = len(pdata)
    if ndt < 12:
        data = data.loc[data['permno'] != permno['permno'][i]]
data.info()

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 2.c.**
For each cross section in the dataset, regress `secret` on `dkr`, `sale`, `bm`, `de`, and a constant. 

Are the estimates for the slope coefficients consistently statistically significant or consistently not statistically significant?

<!--
BEGIN QUESTION
name: q2_c
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 2.d.**
Run the following code and describe what this code is doing. 

<!--
BEGIN QUESTION
name: q2_d
manual: true
-->

In [None]:
for i in range(ndt):
    data.insert(6+i,"m"+str(i+1),np.zeros(len(data)),True)
    data.loc[data['retdate'] == dates['retdate'][i], "m"+str(i+1)] = 1
data.set_index(['permno','retdate'], inplace=True)
data.info()

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 2.e.**
Run the following code and describe what this code is doing. Why have we not included `m1` in the dummy variable list? Do you believe the standard errors are correct in this regression?

<!--
BEGIN QUESTION
name: q2_e
manual: true
-->

In [None]:
exog_vars = ["dkr","sale","bm","de"]
mnth_dums = ["m2","m3","m4","m5","m6","m7","m8","m9","m10","m11","m12"]
exog = sm.add_constant(data[exog_vars])
mod = PooledOLS(data['secret'], exog)
res = mod.fit(cov_type='robust')
print(res)

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 2.f.**
Run the following code and describe what this code is doing. What type of variables are we controlling for in this regression? Do you believe the standard errors are correct in this regression? Are they bigger or smaller than the standard errors from the previous regression? Why?

<!--
BEGIN QUESTION
name: q2_f
manual: true
-->

In [None]:
exog_vars = ["dkr","sale","bm","de"]
exog = sm.add_constant(data[exog_vars])
mod = PanelOLS(data['secret'], exog, time_effects=True)
res = mod.fit(cov_type='clustered', cluster_entity=True)
print(res)

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 2.g.**
Based on the regression results, evaluate the EMH (with respect to all four ratios). Is your conclusion different from what you concluded in problem 1 (based on  a single cross section)? Are you surprised? Comment.

<!--
BEGIN QUESTION
name: q2_g
manual: true
-->

_Type your answer here, replacing this text._

## Problem 3. An AR(2) Process

Suppose $\{ x_t \}$ follows the AR(2) model

$x_t = 2.5 + 1.3 x_{t-1}- 0.4 x_{t-2} + \epsilon_t$

where $\epsilon_t$ is iid with $\mathbb{E}[\epsilon_t] = 0$ and $\sigma_{\epsilon}^2 = 9$.


<!-- BEGIN QUESTION -->

**Question 3.a.**
Show that this process is stationary.

<!--
BEGIN QUESTION
name: q3_a
manual: true
-->

_Type or upload an image of your written answer here._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 3.b.**
Does this process have an $\mbox{MA}(\infty)$ representation? Explain.

<!--
BEGIN QUESTION
name: q3_b
manual: true
-->

_Type or upload an image of your written answer here._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 3.c.**
Compute the mean and variance of $x_t$.

<!--
BEGIN QUESTION
name: q3_c
manual: true
-->

_Type or upload an image of your written answer here._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 3.d.**
Compute the first three autocovariances of $x_t$.

<!--
BEGIN QUESTION
name: q3_d
manual: true
-->

_Type or upload an image of your written answer here._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 3.e.**
Compute the first three autocorrelations of $x_t$.

<!--
BEGIN QUESTION
name: q3_e
manual: true
-->

_Type or upload an image of your written answer here._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 3.f.**
Suppose $x_t = 102.3$ and $x_{t-1} = 98$. Compute $\mathbb{E}_t[\epsilon_{t+1}] = 0$.

<!--
BEGIN QUESTION
name: q3_f
manual: true
-->

_Type or upload an image of your written answer here._

## Problem 4. Constructing ARMA Models

In this problem, we use a monthly UK house price series from the file `UKHP.xlsx`, There are a total of 327 monthly observations running from January 1991 through March 2018. The objective of this problem is to build an ARMA model for the percentage house price changes. Recall that there are three stages involved: identification, estimation, and diagnostic checking. 

In [None]:
import statsmodels.tsa.api as smt

In [None]:
data = pd.read_excel('UKHP.xlsx',index_col=0)
data['dhp'] = data['Average House Price']. \
                  transform(lambda x: (x-x.shift(1))/x.shift(1)*100)
data = data.dropna()
data.head()

<!-- BEGIN QUESTION -->

**Question 4.a.**
Plot the data seris for the monthly percentage change in the average house price between February 1991 and March 2018

<!--
BEGIN QUESTION
name: q4_a
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 4.b.**
The first stage of constructing an ARMA model is carried out by looking at autocorrelation and partial autocorrelation functions to identify any structure in the data. 

Run the following code to generate the autocorrelation and partial autocorrelation functions. Test each of the individual correlation coefficients for significance, and test all 6 jointly using the Box-Pierce and Ljung-Box tests.

_**Note:** we drop the first value of both the act and pacf series by **acf[1:]** and **pacf[1:]** because the **acf** fucntion starts to return coefficients at lag 0._
<!--
BEGIN QUESTION
name: q4_b
manual: true
-->

In [None]:
acf,q,p = smt.acf(data['dhp'],nlags=6,qstat=True)
pacf = smt.pacf(data['dhp'],nlags=6)

correlogram = pd.DataFrame({'acf'  : acf[1:],
                            'pacf' : pacf[1:],
                            'Q'    : q
                           })
correlogram

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 4.c.**
Here we plot the autocorrelation and partial autocorrelation functions. Based on the visual evidence, do you think an AR(p), MA(q), or ARMA(p,q) model might best fit the data? Explain.


<!--
BEGIN QUESTION
name: q4_c
manual: true
-->

In [None]:
sm.graphics.tsa.plot_acf(data['dhp'], lags=20)
plt.show()
sm.graphics.tsa.plot_pacf(data['dhp'], lags=20)
plt.show()

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 4.d.**
As a next step in the identification stage, and to get familiar with some of the available tools, we fit an AR(2) -- i.e., an ARIMA(2,0,0) -- model to our data.

Are the estimated coefficients for the intercept, autoregressive, and moving average components statistically significant? Is the autoregressive process implied by this model stationary? Explain.

In theory, the output table would be discussed in a similar fashion to the simple linear regression model. However, in reality it is very difficult to interpret the parameter estimatesin the sense of, for example, saying "a 1 unit increase in $X$ leads to a $\beta_1$ unit increase in $Y$." Because the construction of ARMA models is not based on any economic or financial theory, we often do not to even try to interpret the individual parameter estimates. Instead, we focus on the plausibility of the model as a whole, and try to determine whether it describes the data well and produces accurate forecasts.

<!--
BEGIN QUESTION
name: q4_d
manual: true
-->

In [None]:
res = smt.ARIMA(data['dhp'], order=(2,0,0)).fit()
print(res.summary())

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 4.e.**
Now fit an MA(2) -- i.e., an ARIMA(0,0,2) -- model to our data.

Are the estimated coefficients for the intercept, autoregressive, and moving average components statistically significant? Is the moving average process implied by this model invertible? Explain.

<!--
BEGIN QUESTION
name: q4_e
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 4.f.**
Finally, fit an ARMA(1,1) -- i.e., an ARIMA(1,0,1) -- model to our data.

Are the estimated coefficients for the intercept, autoregressive, and moving average components statistically significant? Of the three models we have estimated so far, which model is the "best" model for fitting our time series? Explain.

<!--
BEGIN QUESTION
name: q4_f
manual: true
-->

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 4.g.**
We can now try to obtain the information criteria for many ARMA($p$,$q$) models simultaneously. 

Run the following code to compute the information criteria for all the possible ARMA($p$,$q$) from ARMA(0,0) to ARMA(5,5).

Based on these results, which model is the "best" model for fitting our time series? Explain.

<!--
BEGIN QUESTION
name: q4_g
manual: true
-->

In [None]:
res1 = smt.arma_order_select_ic(data['dhp'], \
                                max_ar=5, max_ma=5, \
                                ic=['aic','bic','hqic'], \
                                trend='c')

print('AIC')
print(res1.aic)
print(' ')
print('BIC')
print(res1.bic)
print(' ')
print('HQIC')
print(res1.hqic)

_Type your answer here, replacing this text._

## Problem 5. Forecasting

Suppose, based on our results from the previous problem, we select the AR(2) (i.e., the ARMA(2,0)) model for the house price changes. We would like to evaluate this model in terms of how well it forecasts changes in housing prices.

First we choose an "in sample" period for estimation and an "out of sample" period for testing. 

In [None]:
import statsmodels.tsa.api as smt
from statsmodels.graphics.tsaplots import plot_predict
from statsmodels.tsa.arima.model import ARIMA
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.metrics import mean_absolute_percentage_error

In [None]:
data = pd.read_excel('UKHP.xlsx',index_col=0)
data['dhp'] = data['Average House Price']. \
                  transform(lambda x: (x-x.shift(1))/x.shift(1)*100)
data = data.drop(columns=['Average House Price'])
data = data.dropna()
data.tail()

In [None]:
data_insample = data['1991-02-01':'2015-12-01']
data_insample.tail()

In [None]:
data_outsample = data['2016-01-01':'2018-03-01']
data_outsample.head()

<!-- BEGIN QUESTION -->

**Question 5.a.**
Estimate (and display the estimation results) for the ARMA(2,0) model.

<!--
BEGIN QUESTION
name: q5_a
manual: true
-->

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 5.b.**
Now that we have estimated the model, we can produce forecasts for the period 2016-01-01 through 2018-03-01. 

Run the following code to plot the actual series, the out-of-sample predicted series, and the 95% confidence interval for the out-of-sample predicted series.

Based on the visual evidence, comment on the accuracy of the forecast (in your opinion).

<!--
BEGIN QUESTION
name: q5_b
manual: true
-->

In [None]:
model = smt.ARIMA(data['dhp'], order=(2,0,0))
res = model.fit()

fig, ax = plt.subplots()
ax = data.loc['2016-01-01':].plot(ax=ax)
fig = plot_predict(res,'2016-01-01','2018-03-01',dynamic=False,ax=ax)
plt.show()

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 5.c.**
For a more quantitative measure of the accuracy of a forecast we use a few of the measures discussed in class: root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE).

Estimate an ARMA(4,2) model (chosen based on the AIC results from problem 4) and compare the accuracy of the forecasts to the accuracy of the ARMA(2,0) model -- using RMSE, MAE, and MAPE. Which model is "better"? Explain.

<!--
BEGIN QUESTION
name: q5_c
manual: true
-->

In [None]:
pred = res.predict('2016-01-01','2018-03-01',dynamic=False)
rmse = np.sqrt(mean_squared_error(data_outsample['dhp'],pred))
mae  = mean_absolute_error(data_outsample['dhp'],pred)
mape = mean_absolute_percentage_error(data_outsample['dhp'],pred)
print('RMSE: {}'.format(rmse))
print('MAE:  {}'.format(mae))
print('MAPE: {}'.format(mape))

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 5.d.**
Exponential smoothing is another approach (distinct from the ARMA approach) to forecasting, that only uses uses a linear combination of previous values of the time series to forecast future values.

The following code produces one step ahead forecasts based on exponential smoothing, where $\alpha$ is the smoothing constant.

Can you find an $\alpha$ that produces more accurate forecasts than either of the ARMA models from above? Carefully justify your answer.

<!--
BEGIN QUESTION
name: q5_d
manual: true
-->

In [None]:
# Simple exponential smoothing
alpha = 0.25
forecast = np.zeros(len(data_outsample))
actual = np.zeros(len(data_outsample))
j = 0
for d, row1 in data_outsample.iterrows():
    data_insample = data[data.index < d]
    firstrow = True
    actual[j] = row1['dhp']
    for i, row2 in data_insample.iterrows():
        if firstrow:
            forecast[j] = row2['dhp']
            firstrow = False
        else:
            forecast[j] = (1-alpha)*forecast[j] + alpha*row2['dhp']
    # print('sample size: ', len(data_insample))
    # print('forecast dhp:', forecast[j])
    # print('actual dhp:  ', actual[j])
    j = j + 1

plt.figure()
plt.plot(actual, label='Actual DHP')
plt.plot(forecast, label='Forecast DHP')
plt.legend()
plt.show()

_Type your answer here, replacing this text._

<!-- END QUESTION -->

<!-- BEGIN QUESTION -->

**Question 5.e.**
Exponential smoothing with a half-life is a common approach used in risk forecasting.

The following code produces one step ahead forecasts based on exponential smoothing with a half-life.

Can you find a half-life that produces more accurate forecasts than either of the ARMA models and simple exponential smoothing? Carefully justify your answer.

<!--
BEGIN QUESTION
name: q5_e
manual: true
-->

In [None]:
# Exponential smoothing using a halflife
halflife = 1
delta = np.power(0.5,1/halflife)
t = len(data)-1
forecast = np.zeros(len(data_outsample))
actual = np.zeros(len(data_outsample))
j = 0
for d, row1 in data_outsample.iterrows():
    data_insample = data[data.index < d]
    actual[j] = row1['dhp']
    wghtsum = 0
    sumwght = 0
    t = len(data_insample)-1
    for i, row2 in data_insample.iterrows():
        sumwght = sumwght + np.power(delta,t)
        wghtsum = wghtsum + np.power(delta,t)*row2['dhp']
        t = t-1
    forecast[j] = wghtsum/sumwght
    # print('sample size:  ', len(data_insample))
    # print('forecast dhp: ', forecast[j])
    # print('actual dhp:   ', actual[j])
    j = j + 1
    
plt.figure()
plt.plot(actual, label='Actual DHP')
plt.plot(forecast, label='Forecast DHP')
plt.legend()
plt.show()

_Type your answer here, replacing this text._

## Submission

Make sure you have run all cells in your notebook in order before running the cell below, so that all images/graphs appear in the output. The cell below will generate a pdf file for you to submit. **Please save before exporting!**

In [None]:
# Save your notebook first, then run this cell to export your submission.
grader.to_pdf(pagebreaks=False, display_link=True)