# Testing Code w/ another dataset
[Link](https://www.itl.nist.gov/div898/handbook/eda/section3/eda35c.htm)

---

In [2]:
import numpy as np
import pandas as pd

## Load Data

In [3]:
data = pd.read_csv("lewDataset.csv")
data

Unnamed: 0,measurements
0,-213
1,-564
2,-35
3,-15
4,141
...,...
195,-385
196,198
197,-218
198,-536


In [40]:
z = data['measurements']
# N = len(z)
k = 3
zbar = np.mean(z)
z, N, k, zbar

(0     -213
 1     -564
 2      -35
 3      -15
 4      141
       ... 
 195   -385
 196    198
 197   -218
 198   -536
 199     96
 Name: measurements, Length: 200, dtype: int64,
 200,
 3,
 -177.435)

## Estimate AutoCov @ lag k

- The average (so a scalar) spread between any two given vars
- I have var @ t and var @ t + k, the estimated AutoCov between two vars
- If wide, then less confident but if narrow, more confident
- **Motivation :** if we want to draw from this distribution, we'll know our confidence level
- Estimated bc data can be too computationally expensive (take much space & time)
- **Estimate the ACov @ lag k :** $ \hat{\gamma}_{k} $ (gamma hat) = 1/N * $ \sum_{t=1}^{N - k} (z_{t} - \bar{z})(z_{t+k} - \bar{z}) $ k = 0, 1, 2,..., K

- Set : $ c_k $ = $ \hat{\gamma}_{k} $

In [41]:
def est_autocov(data, lag_k, sample_mean): 
    ck = 0
    N = len(data)
    
    for t in range(N - lag_k):
        # print(t, lag_k)
        ck += (data[t] - sample_mean) * (data[t + lag_k] - sample_mean)
        print("ck : ", t, ck)
    return ck/N

In [42]:
est_autocov(z, k, zbar)

ck :  0 -5777.0007749999995
ck :  1 -128872.82655
ck :  2 -87219.847325
ck :  3 -126620.89309999999
ck :  4 -184755.97887499997
ck :  5 -73503.46964999997
ck :  6 -34556.02042499997
ck :  7 11736.073800000027
ck :  8 153042.94802500005
ck :  9 159877.39725000004
ck :  10 244964.93647500005
ck :  11 368071.4957000001
ck :  12 365839.5999250001
ck :  13 493879.4591500001
ck :  14 583179.6483750001
ck :  15 591382.3176000001
ck :  16 744605.466825
ck :  17 778402.04605
ck :  18 821646.155275
ck :  19 968444.2645
ck :  20 968749.698725
ck :  21 1062581.41295
ck :  22 1164780.592175
ck :  23 1164370.1364000002
ck :  24 1293842.5256250002
ck :  25 1328107.4098500002
ck :  26 1378097.889075
ck :  27 1508333.1483
ck :  28 1499700.282525
ck :  29 1616714.45175
ck :  30 1674565.490975
ck :  31 1692909.2301999999
ck :  32 1836359.254425
ck :  33 1828469.5136499999
ck :  34 1921558.097875
ck :  35 2009746.2421
ck :  36 2018899.2913249999
ck :  37 2154165.20555
ck :  38 2168371.0247750003
ck :  39 

59285.855336625

## Estimate AutoCor @ lag k

- The relationship between two values at the same var when they are t (value 1) and t + k (value 2) distance apart
- **Motivations :** I can determine at some point in which $ \nexists $ a relationship between some entities :
t and t + k
- Estimated bc data can be too computationally expensive (take much space & time)
- **Estimate the AutoCor @ lag k :** $ \hat{\rho}{_k} $ (rho hat) = $ c_k $ / $ c_0 $
    - AutoCov at k / AutoCov at k = 0

- Set : $ r_k $ to  $ \hat{\rho}{_k} $

In [7]:
def est_autocor(data, lag_k, sample_mean):
    ck = est_autocov(data, lag_k, sample_mean)
    print("ck : ", ck)
    cnot = est_autocov(data, 0, sample_mean)
    print("cnot : ", cnot)
    rho_hat_k = ck/cnot 
    return rho_hat_k

In [8]:
rk = est_autocor(z, k, zbar)
print(rk)

ck :  76528.56577500004
cnot :  76528.56577500004
1.0
