# Problem Statement

1. Calculate $ c_{0}, c_{1}, c_{2}, c_{3}, r_{1}, r_{2}, r_{3} $ for the series given in Exercise 2.1.
2. Make a graph of $ r_{k} $, k = 0, 1, 2, 3

3. My interpretation : 
    - Of the TS in 2.1, find the estimates of both 
        1. c_k - autocovariance (ACov) $(\hat{\gamma}_{k})$ 
        2. r_k - autocorrelation (ACor) $(\hat{\rho}_{k})$
        3. where k = 0, 1, 2, 3
    - Graph the autocorrelations at k
        - x-axis : k
        - y-axis : value
---
- 2.4.1 is using python on book's data
- 2.4.1 is using python on [LEW test data](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm)
- 2.4.1 is using pandas on [LEW test data](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm)

# Questions + Futher Explore
1. [LEW dataset](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm)

# TODOs

1. [x] Imports + Load Data

2. [ ] (2.1.12) **Estimate the ACov @ lag k :** $c_k = \hat{\gamma}_{k} $ (gamma hat) = $ {\dfrac{1}{N}} $ * $ \sum_{t=1}^{N - k} (z_{t} - \bar{z})(z_{t+k} - \bar{z}) \space$ k = 0, 1, 2,..., K

3. [ ] (2.1.11) **Estimate the ACor @ lag k :** $ r_k = \hat{\rho}{_k} $ (rho hat) = $ {\dfrac{c_k}{c_0}} $

- [ ] Graph $ r_k $

## 1. Imports + Load Data

In [2]:
import pandas as pd

In [5]:
test_data = pd.DataFrame([1, 2, 3])
# test_data

In [6]:
data = pd.read_csv("lewDataset.csv")
# data

In [None]:
# another way to show the lag
# z_t = data
# df = pd.concat([data.shift(2), data.shift(1), z_t], axis = 1)
# df.columns=["zt-2", "zt-1", "zt"]
# df

In [19]:
k = range(0, len(data))

df = pd.concat([data, data.shift(-1), data.shift(-2), data.shift(-3)], axis = 1)
df.columns=["zt", "zt+1", "zt+2", "zt+3"]
# print(df)

3


- Use the `shift` attribute on the df to show the lag (also sliding window)

## Estimate the ACov @ lag k


\begin{align}
\hat{\gamma}_{k} = \frac{1}{N} \times \sum_{t=1}^{N - k} (z_{t} - \bar{z})(z_{t+k} - \bar{z}),
\space where 
\end{align}

\begin{align}
k = 0, 1, 2,..., K
\end{align}

\begin{align}
\bar{z} = \sum_{t=1}^{N} \frac{z_t}{N}
\end{align}



In [8]:
N = len(df)
print(N)
sample_mean = df["zt"].mean()
print(sample_mean)
print(df)

200
-177.435
      zt   zt+1   zt+2   zt+3
0   -213 -564.0  -35.0  -15.0
1   -564  -35.0  -15.0  141.0
2    -35  -15.0  141.0  115.0
3    -15  141.0  115.0 -420.0
4    141  115.0 -420.0 -360.0
..   ...    ...    ...    ...
195 -385  198.0 -218.0 -536.0
196  198 -218.0 -536.0   96.0
197 -218 -536.0   96.0    NaN
198 -536   96.0    NaN    NaN
199   96    NaN    NaN    NaN

[200 rows x 4 columns]


In [9]:
new_df = df - sample_mean
new_df

Unnamed: 0,zt,zt+1,zt+2,zt+3
0,-35.565,-386.565,142.435,162.435
1,-386.565,142.435,162.435,318.435
2,142.435,162.435,318.435,292.435
3,162.435,318.435,292.435,-242.565
4,318.435,292.435,-242.565,-182.565
...,...,...,...,...
195,-207.565,375.435,-40.565,-358.565
196,375.435,-40.565,-358.565,273.435
197,-40.565,-358.565,273.435,
198,-358.565,273.435,,


In [13]:
zt_vs_zt = (new_df["zt"] * new_df["zt"]).sum() / N
zt_vs_zt1 = (new_df["zt"] * new_df["zt+1"]).sum() / N
zt_vs_zt2 = (new_df["zt"] * new_df["zt+2"]).sum() / N
zt_vs_zt3 = (new_df["zt"] * new_df["zt+3"]).sum() / N

c_ks = pd.DataFrame([zt_vs_zt, zt_vs_zt1, zt_vs_zt2, zt_vs_zt3], columns=["cks"])
c_ks

Unnamed: 0,cks
0,76528.565775
1,-23517.595646
2,-56657.944042
3,59285.855337


In [14]:
c_not = c_ks.loc[:, "cks"][0]
c_not

76528.56577499998

In [15]:
rho_hat_k = c_ks / c_not
rho_hat_k.columns=["r_ks"]
rho_hat_k

Unnamed: 0,r_ks
0,1.0
1,-0.307305
2,-0.74035
3,0.774689


In [None]:
def acov(N, z_bar, df):
    """Given the data, calculate the autocovariance
    
    Parameters:
    df -- pd DataFrame
    
    Return:
    autocovariance -- int
    """
    
    
    # zt_zbar_df = z_t_df.subtract(z_bar) 
    # print(zt_zbar_df)
    
    # ztk_zbar_df = z_t_k_df.subtract(z_bar)
    # print(ztk_zbar_df)
    
    # zz = zt_zbar_df.mul(ztk_zbar_df)
    # gamma_k = zz.cumsum()
    
    # z_df = pd.concat([zt_zbar_df, ztk_zbar_df, zz, gamma_k], axis = 1)
    # z_df.columns=["zt_zbar", "ztk_zbar", "zz", "gamma_k"]
    

    # print(z_df)
    
    # z_df["gamma_k"] = 0
    # z_df["gamma_k"] = (z_df.loc[:, "zt_zbar"] * z_df.loc[:, "ztk_zbar"])
    # z_df["gamma_k"] =  z_df["zz"].cumsum()
    # z_df["c"] = (z_df.loc[:, "zt_zbar"] * z_df.loc[:, "ztk_zbar"]).cumsum()

    # z_df["d"] = z_df["c"].div(N) 
    
    
    # df = z_df.iloc[::-1].reset_index(drop=True).head()
    
    
    return df, df/N

In [None]:
gamma_hat_k,  gamma_hat_kn= acov(N, sample_mean, df)
gamma_hat_k

In [None]:
gamma_hat_kn

## TODO 2

In [None]:
def acor(c_k, c_0):
    """Given the data, calculate the autocorrelation
    
    Parameters:
    df -- pd DataFrame
    
    Return:
    autocorrelation -- int
    """
    
    rho_hat_k = c_k / c_0
    
    return rho_hat_k

In [None]:
acor(gamma_hat_k, c_0)