# Time Series Analysis - Intro 

## Goal: How does the stock market evolve over time?

- 2 major components: 
    - Holt-Winters Exponential Smoothing model
    - ARIMA
        - Basically, Linear Regression

## Information to consider when trading the market
- "Information" is a broad work
    - Pretty much everythin in ght world is information.
    - Investing is a large-scale engineering problem, not just a data science problem.

1. Technical data (price/volume)
2. Fundamental data (quarterly reports)
3. Exogenous data (other public data; ie: Twitter comments, news, etc...)
4. Insider Information (non-public company info; ie: knowledge of product rollout, etc...)

---

## EMH - Efficiant Market Hypothesis
*You can't out-perform the market - Stocks trade at their fair price*

### 3 forms of EMH
1. Weak
    - Future prices can't be predicted using historical prices.
    - Can't use technical analysis to predict returns
2. Semi-Strong
    - Stock prices instatnly adjust to all public information.
    - Even with the perfect technical and fundamental data, still can't beat the market.
3. Strong
    - Even with all information (public and private), still can't beat the market.

### Is the EMH True?
- Going from strong to weak, strong is least-likely to be correct.
- If the EMH were true, people wouldn't use words like "they *believe* the EMH to be true".

### Testing the EMH
- The idea is you would test an EMH theory with statistical hypothesis testing.
- Using EMH as the null hypothesis, formulate some test statistic, and check how improbable it is under the null hypothesis.

---

## EMH and the Relationship to Time Series Analysis

- Normally with Machine Learning, you expect to see a prediction with a 90-99% accuracy.
- In finance, you're not oging to make a perfect prediction of the future.
- In this domain, it's more important to observe and improve the modeling, instead of the prediction.
- ML tesnds to care only about predictive accuracy
- Modeling isn't just about predictions, but telling you what you know or don't know.
- Important to know when you *can't* know somehting (shown by modeling), then you won't waste time trying to predict it!

---

# Random Walk Hypothesis

- More mathematical than philosophical
- Random walk is a sepcial case of an ARIMA model
- In the previous notebook, the preice simulation assumed a random walk.
- If stock prices follow a random walk, then they are unpredictable.

## Simple Demo of a Random Walk
- Basically, walking in 1 dimension - You take a step to either the left or the right based on a random coin flip - This is a random walk where you have a 50% chance of being correct.

```math
p = price

p<sub>0</sub> = some initial value

p<sub>1</sub> = p<sub>0</sub> + e<sub>1</sub> where e<sub>1</sub> = (-1 OR 1)

p<sub>2</sub> = p<sub>1</sub> + e<sub>2</sub>

...

p<sub>N</sub> = p<sub>N-1</sub> + e<sub>N</sub>
```

<br/>

## Is the Random Walk Hypothesis Correct?

- Assumption: The log returns are iid (Independent and Identically Distributed)
- But, in previous notebooks, we have observed **volatility clustering**
- Automatically means that the log returns are not identically distributed
- If volatility at one moment is related to nearby volatilities, then it's also not independent

---

# The Naive Forecast

- In time series analysis, the simplest baseline is the naive forecast
    - That is, taking the last known variable and extending it into the future (ie: if the price of a stock yesterday was $13, then today we predict it will be $13 with a positive and negative margin of error)
- NOTE: Bad models can make very good looking naive forecasts

- Example of a bad situation
    - Suppose you claim to beat the naive forecast
    - Is it for in-sample (training) data, or out-of-sample (testing) data?
    - Often, when you see the train accuracy rise, the test accuracy gets worse!
    - This is usually because of overfitting the model to the noise in the training data
    - Saying you got 80% training and 75% test accuracy is meaningless in isolation (without a baseline).

## Naive Forecast with Relation to Random Walks
- If the data follows a random walk, then the naive forecast is the **best forecast**

<br/>

---