# Problem Statement

1. On the assumption that $\rho{_j}$ = 0 for *j* > 2, obtain the following :
    - Approximate standard errors for *r${_1}$*, *r${_2}$*, and *r${_j}$*, *j* > 2.
    - The approximate correlation between *r${_3}$* and *r${_5}$*.

2. My interpretation : 
    - Problem is primarily dealing w/ section 2.1.6. When given a TS, we want to check if $\rho{_k}$ = 0 beyond a certain lag (distance apart). *q* is the point at which k is 0/tend to 0.
    - At what point (lag k) is there no relationship between our AutoCors?
---
# Questions/Futher Explore

1. Differences between [here](https://github.com/Brinkley97/random_code/blob/main/foilVsNpSum.ipynb) : 
    - gamma_k = np.sum(z[t] - z_bar * z[t + k] - z_bar)
    - gamma_k = z[t] - z_bar * z[t + k] - z_bar
    - gamma_k = (z[t] - z_bar) * (z[t + k] - z_bar) 
    - gamma_k = np.sum(z[t] - z_bar) * (z[t + k] - z_bar) 

2. Causality (a) vs Correlation (b) vs both (c)
    - Usage of car vs public transportation (a) bc as I use my car, my usage of pt decreases
    - Usage of wifi vs celluar data (
    - Distnace from wifi (c); (a) bc as I move away, the effect is losing connection & (b) at some distance (lag k), I will lose connection
    - iPhone weekly time screen data (b) bc I can see the time of how each app correlate w/ another
    - Nutrition, exercise, environment, sleep, mental state (b) as I'm able to see the relationship between all, 2/5, 4/5
    - How's my communication w/ my parents when I'm home vs when I'm at school?
        - The relationship changes when k distance apart
3. $\rho^{2}_{v}$ in 2.1.15 implies that $\rho$ must be a list indexed at v? 
    - If it's a scalar, 
        - why index at v?
        - why take the square then sum that squared scalar v times?
---

# TODO

- [x] Load Data

- [x] (2.1.12) **Estimate the ACov @ lag k :** $ \hat{\gamma}_{k} $ (gamma hat) = 1/N * $ \sum_{t=1}^{N - k} (z_{t} - \bar{z})(z_{t+k} - \bar{z}) $ k = 0, 1, 2,..., K

- [x] (2.1.12) Set : $ c_k $ = $ \hat{\gamma}_{k} $

- [x] (2.1.11) **Estimate the ACor @ lag k :** $ \hat{\rho}{_k} $ (rho hat) = $ c_k $ / $ c_0 $

- [x] (2.1.11) Set : $ r_k $ to  $ \hat{\rho}{_k} $

- (Prob Statement) $\rho{_j}$ = 0 for *j* > 2

- (2.1.15) **Approximate the Variance of the Estimated ACor Coefficient @ lag k :** var[$ r_{k} $] $ \simeq $ 1/N (1 + 2 * $ \sum_{v=1}^{q} \rho^{2}_{v}$) k > q

---

---

In [1]:
import numpy as np

## Load Data

In [2]:
z = [200, 202, 208, 204, 204, 207, 207, 204, 202, 199, 201, 198, 200, 
        202, 203, 205, 207, 211, 204, 206, 203, 203, 201, 198, 200, 206, 
        207, 206, 200, 203, 203, 200, 200, 195, 202, 204, 207, 206, 200]

In [3]:
N = len(z)
k = 3
zbar = np.mean(z)
N, k, zbar

(39, 3, 203.02564102564102)

## Estimate AutoCov @ lag k

- The average (so a scalar) spread between any two given vars
- I have var @ t and var @ t + k, the estimated AutoCov between two vars
- If wide, then less confident but if narrow, more confident
- **Motivation :** if we want to draw from this distribution, we'll know our confidence level
- Estimated bc data can be too computationally expensive (take much space & time)
- **Estimate the ACov @ lag k :** $ \hat{\gamma}_{k} $ (gamma hat) = 1/N * $ \sum_{t=1}^{N - k} (z_{t} - \bar{z})(z_{t+k} - \bar{z}) $ k = 0, 1, 2,..., K

- Set : $ c_k $ = $ \hat{\gamma}_{k} $

In [4]:
def est_autocov(data, lag_k, sample_mean): 
    ck = 0
    N = len(data)
    
    for t in range(N - lag_k):
        ck += (data[t] - sample_mean) * (data[t + lag_k] - sample_mean)
        
    return ck/N

In [5]:
est_autocov(z, k, zbar)

-1.4589591867698377

## Estimate AutoCor @ lag k

- The relationship between two values at the same var when they are t (value 1) and t + k (value 2) distance apart
- **Motivations :** I can determine at some point in which $ \nexists $ a relationship between some entities :
t and t + k
- Estimated bc data can be too computationally expensive (take much space & time)
- **Estimate the AutoCor @ lag k :** $ \hat{\rho}{_k} $ (rho hat) = $ c_k $ / $ c_0 $
    - AutoCov at k / AutoCov at k = 0

- Set : $ r_k $ to  $ \hat{\rho}{_k} $

In [6]:
def est_autocor(data, lag_k, sample_mean):
    ck = est_autocov(data, lag_k, sample_mean)
    print("ck : ", ck)
    cnot = est_autocov(data, 0, sample_mean)
    print("cnot : ", cnot)
    rho_hat_k = ck/cnot 
    return rho_hat_k

In [10]:
print(k)
rk = est_autocor(z, k, zbar)
print(rk)

3
ck :  -1.4589591867698377
cnot :  10.742932281393824
-0.13580642124093772


## (Prob Statement) $\rho{_j}$ = 0 for *j* > 2


In [None]:
j = 2

set_to_zero_start_idx = len(est_acors_list[j : ])
# print("set_to_zero_start_idx :  ", set_to_zero_start_idx)

set_to_zero_values = est_acors_list[j : ]
# print("set_to_zero_values :  ", set_to_zero_values)

# est_acors_list.append(0)
# est_acors_list

for idx in range(len(r_k)) :
    if idx > j :
        # print(True)
        r_k[idx] = 0
    else :
        # print(False)
        pass
len(r_k), r_k

## Approximate the Variance of the Estimated AutoCor Coefficient @ lag k

- **Approximate the Variance of the Estimated AutoCor Coefficient @ lag k :** var[$ r_{k} $] $ \simeq $ 1/N (1 + 2 * $ \sum_{v=1}^{q} \rho^{2}_{v}$) k > q
- Think : Get close enough to the the spread of where $ \nexists $ a relationship between our variables 

- When k > q, compute the approximate variance of the estimated autocor coefficient @ some distance (lag k), where :
    - q reps the location (or lag or distance apart) @ which the autocor die out at which means $\nexists$ no relationship between t and t + k @ this q

- Cases :
    1. k > q, rk has died out so approx the spread of the estimated autocor coefficient while at distance (or lag k)
    2. k = q, 
    3. k < q, rk has not died out yet so keep original value

In [None]:
# some lag q which every index after, values of rk have died out so q can be stated as "die out at"

In [8]:
# from v to q, where the autocor has died out, approx the spread of the estimated autocor coefficient while at distance (or lag k)
# of these that have died out, calc 2.1.15 bc 


def approx_variance_of_autcor(q, rk):
    print("all observations : ", z)
    # remaining_q, after_die_out, died_out_values, post_die_out
    # + 1 bc we want all those after rk has died out (& ! start at the die out location)
    post_die_out = z[q + 1 : ]
    print("\npost_die_out : ", post_die_out)
    
    remaining_data_q = len(post_die_out)
    print("\nlen of original  : ", N, " vs len of remaining_data_q : ", remaining_data_q)
    
    list_rk = []
    if k > q: 
        for v in range(remaining_data_q):
            list_rk.append(rk)
            print(v, list_rk)

            approx_var_r_sub_k = 1/N * (1 + 2 * (np.square(list_rk[v])))
            # print(approx_var_r_sub_k)
    else:
        approx_var_r_sub_k = 0
    return approx_var_r_sub_k

In [9]:
# die out at
die_at_q = 2
approx_variance_of_autcor(die_at_q, rk)

all observations :  [200, 202, 208, 204, 204, 207, 207, 204, 202, 199, 201, 198, 200, 202, 203, 205, 207, 211, 204, 206, 203, 203, 201, 198, 200, 206, 207, 206, 200, 203, 203, 200, 200, 195, 202, 204, 207, 206, 200]

post_die_out :  [204, 204, 207, 207, 204, 202, 199, 201, 198, 200, 202, 203, 205, 207, 211, 204, 206, 203, 203, 201, 198, 200, 206, 207, 206, 200, 203, 203, 200, 200, 195, 202, 204, 207, 206, 200]

len of original  :  39  vs len of remaining_data_q :  36
0 [-0.13580642124093772]
1 [-0.13580642124093772, -0.13580642124093772]
2 [-0.13580642124093772, -0.13580642124093772, -0.13580642124093772]
3 [-0.13580642124093772, -0.13580642124093772, -0.13580642124093772, -0.13580642124093772]
4 [-0.13580642124093772, -0.13580642124093772, -0.13580642124093772, -0.13580642124093772, -0.13580642124093772]
5 [-0.13580642124093772, -0.13580642124093772, -0.13580642124093772, -0.13580642124093772, -0.13580642124093772, -0.13580642124093772]
6 [-0.13580642124093772, -0.13580642124093772, -

0.026586840207706203