<a href="https://colab.research.google.com/github/Cullen-hub/Alpha-Factor-Research/blob/main/Alpha_Research_Project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [1]:
#Imports
!pip install yfinance ta shap xgboost
import ta
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

Collecting ta
  Downloading ta-0.11.0.tar.gz (25 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: ta
  Building wheel for ta (setup.py) ... [?25l[?25hdone
  Created wheel for ta: filename=ta-0.11.0-py3-none-any.whl size=29412 sha256=44b63ea70539dd3e9509ed1c9a9f401b75ff1fab21fdf45c4af6fa30d0157e34
  Stored in directory: /root/.cache/pip/wheels/a1/d7/29/7781cc5eb9a3659d032d7d15bdd0f49d07d2b24fec29f44bc4
Successfully built ta
Installing collected packages: ta
Successfully installed ta-0.11.0


# **The Perpose Of These Imports:**

## Library 1: yfinance

*   Access historical financial data, such as stock price data from Yahoo Finance
*   Useful for downloading time series data for stocks, ETFs and other assets

## Library 2: ta

*   Access indicators to help with Technical analysis of data; extracting meaningful features from raw data for modelling.
*   Accessable indicators may include Momentum (RSI), Volume, Volatility and Trends.

## Library 3: shap

*   SHapley Additive exPlanations is a game-theoretic approach to explain the output of any machine learning model.
*  It will treat features e.g., Momentum, RSI, Volatility in a machine learning model as players in a game and the payout is the models prediction.
*  Helps identify what feature contributes most to the overall prediction, making the model easier to interpret.

## Library 4: xgboost

*   A speicialised libary for gradient boosting decision trees. Uses gradient descent to imporve predictions after each interation.
*   Used to handle tabular data and scales well for large datasets.
*   It is written in C++ for speed and efficency








# **What Data?**

In [21]:
#df_adj= yf.download('KO', 'MCD', period = "5y", auto_adjust = True, interval = "1d", group_by = 'ticker')
#df_adj = df_adj.add_prefix("Adj_")

df_raw= yf.download('MCD', period = "5y", auto_adjust = False)
df_raw = df_raw.add_prefix("Raw_")

#Keeping raw and adjusted values to help with flagging M&A deals and spin-off later so not to confuse the model
#df = pd.merge(df_raw, df_adj, left_index = True, right_index = True)
#df = df.drop(columns = ['Raw_Adj Close'])

df_new = yf.Ticker('KO')
df_new = df_new.history(period = "5y")
df_new = df_new.add_prefix("New_")

df_raw.head()


[*********************100%***********************]  1 of 1 completed


Price,Raw_Adj Close,Raw_Close,Raw_High,Raw_Low,Raw_Open,Raw_Volume
Ticker,Raw_MCD,Raw_MCD,Raw_MCD,Raw_MCD,Raw_MCD,Raw_MCD
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
2020-08-19,186.882538,209.509995,211.169998,208.639999,210.220001,2711100
2020-08-20,187.212631,209.880005,210.990005,208.100006,208.119995,2084900
2020-08-21,188.720108,211.570007,212.190002,209.259995,209.960007,3484800
2020-08-24,189.656693,212.619995,213.5,210.929993,212.259995,2463900
2020-08-25,189.683456,212.649994,214.190002,212.220001,213.960007,2302400


In [16]:
print(type(df_new['New_Close']))

<class 'pandas.core.series.Series'>


# **Cleaning Data:**
### Managing missing data
*   Interpolation
*   NA drop

Lets see what kind of differences will occur after apply the two different ways of managing missing values.


In [3]:
#Interpolation
df = df_adj.interpolate(method = 'linear', limit_direction = 'forward')

In [8]:
#NA Drop
df = df_adj.dropna(how = 'any')


# **Adjusted Values**

At the moment with the raw Close, High and low values we may witness some abrupt changes in return at certain periods. This my be the case in the event of splits, pay out of dividends, spin-offs or mergers and acquisition to name a few.

For Splits, dividends and spin-offs the total value of an investors stock will typically remain the same however the raw close value may indicate a large decrease/increase which intern is reflected in the daily return. This can misrepresent the actual market conditions and cause mistakes in my ML model.

Unlike the other corporate actions mentioned, which have to be adjusted for as they provide a false illision of the market value, M&A actually infulences real market value. The adjustment for M&A are not about fixing any values but making the time series continuous.

In order to better reflect the market, I will add new columns for the adjusted close, high, and low daily values to be used in later calculations and models.

Yahoo Finance websites/servers are where yfiance accesses the data, and Yahoo Finance servers use data providers such as Bloomberg to access information about markets. This means that if a data provider reports that a company has decided to pay dividends, for example, then they will prepare for this and add adjusted close values so the return for the period doesn't drastically drop.

Adjustments for splits and dividends are already accounted for by yfinance. The close, open, high and low values are already automatically ajusted and no additional column is required for their adjusted results. yfinace doesnt accoutn for spin-offs or M&A hence producing possible gaps in the data and yahoo doesnt back-adjust historical prices to account for this. For this reason i have decided to remove auto adjust and keep the raw values aswell as the adjusted values to allow for more flexable analysis later if required. However the adjusted values will the values used to train my model later.

Calulation:

$$\text{Return from raw close} = \frac{P_t - P_{t-1}}{P_{t-1}}$$

$$\text{Return from adjusted close} = \frac{AdjP_t - AdjP_{t-1}}{AdjP_{t-1}}$$


---

## Splits

A company can increase the number of their shares whilst maintainng the same market value using splits.

For example, the company could use a 3-for-1 split, meaning that every share an investor currently has in the company becomes three shares.

To maintain the total market value, the value of a single share will drop proportionally to account for the new number in circulation. Hence, for a 3-for-1 split, each share will now be worth a third of its original value, as each original share has Become three shares, but combined still has the same value.

The closing price at the end of a period may now appear significantly lower compared to the previous period (before the split). Now, if we were to calculate the normal return over the period, it would be a significant loss of 2/3 of the original value of the investment. However, we know that the actual market investment portfolio of an investor hasn't changed in value, so we used adjusted close, high, and low values to show the return is closer to 0. This smooths out spikes that may not represent the market, preventing my model from later picking up misrepresentative data and, therefore, hindering the model.

Example:
- 2-for-1 split means each share becomes two shares. Split ratio = 2, Adjustment factor (AF) = $\frac{1}{\text{Split ratio}} = \frac{1}{2}$
- Orignal close before split = £50
- Close after split = £25
- Investors have twice the number of shares but their total value is unchanged.
- Raw simple return = $\frac{25-50}{50}$ = $\frac{-25}{50}$ = -0.5
- Adjusted close corrected real return = $\frac{25-25}{25}$ = $\frac{0}{25}$ = 0

Calculation:

$$New Price = \text{Old Price} \times \text{Adjustment Factor (AF)} $$

<br>

$$Adjusted Price \  = \text{Historical Close Price} \times \text{AF}$$


Adjustment is rerospective so all historical close values are scaled using the Adjusted Price formula using the price at that period to ensure a smooth transition.

 ---

## Dividends

Dividents are payments paid out to shareholders by a company from company profits.

After dividends are paid the price of a share typically falls to the original dividend price minus the dividend paid per share. But even though the return for that period appears to be a loss the investors still have the same investment value for that stock so they are not worse off.


Example:
- Closing price before dividend payout = £50
- Dividend payout per share = £2
- Closing price after dividend payout = £50 - £2 = £48
- Raw simple return = $\frac{48 - 50}{50} = \frac{-2}{50} = -0.04$
- Adjusted close smooths so return = 0


Calculation:

$$AdjClose = \text{Close + Dividend per share}$$

Similarly to before we backfill the adjusted close for all historical periods before the dividend payments, to ensure a smooth transition.Therefore with my last example all the adjusted close prices will be 2 less than the raw close price.

---

## Spin-offs

Spin-offs are when a part of a firm decides to split form the main company and become its own seperate company.

Investors which have shares of the parent company before the split are usually compensated by being provided with additional shares in the spin off firm so that their investment value doesnt change. As part of the parent firm seperates, the value of the parent firm's shares will decrease as part ofthe companies assets are gone.

The value of the new shares of the spin off firm will increase as its becomeits own company.

Example:

- Parent company share price before spin-off = £100
- Spin-off company share value = £20
- Parent company share price after spin-off = £80
- Raw simple return = $\frac{80 - 100}{100} = \frac{-20}{100} = -0.2$
- By adding spin-off and parent company shares we see that investors still have £100 total.
- Adjusted close return = 0

The adjustment factor is used like before to scale the historical close values for parent company:

<br>

$$AF = \frac{\text{(Close Value before spin - Value of spin share given per parent share)}}{\text{Close Value before spin }}$$

$$AdjClose = \text{Historical Close Price}\times AF$$

<br>

---

## Mergers And Acquisitions

Mergers are when two seperate firms agree to join together and become one single new firm.

Aquisitions are when a firm buys another firm hence gaining the firms resourses.

In merges the two companies may chooses to opperate under a single ticker and the old ticker are retired and replaced with the new one. The return of investment of their individual shares will crash to 0 as they know longer exist but investors will be compensated with a portion of shares in the new combined company proportionl to what they owned before, possibly with similar value.

In Aquisitions the company being aquired shares may rise in prices as the aquireer buys them at a premium. However the aquireers share may decrease in price if the firm is taking on more debt from the other company. On the other hand share price may increase as investors may expect the firm to be more valueable as it has access to more resourses/assets and skills.


Merger Example:

- Company A and B decie to merge to form Company C. Both Companie close prices have to be adjusted for.

- Company A Exchange ratio - each share of A will the equivilent to 1.0 share of C

- Company B Exhange ration - each share of B will be the equivilent of 0.5 shares of C.

- Company A most recent close price before merger = £82

- Company B most recent price before merger = £42

- The share price of Company C are the merger begins at £85.

- We then work out what investors old shares in the previous individual companies would be worth now.

- For Company A shareholder - $1.0  \times 85 = 85$
- For Compnay B shareholder - $0.5 \times 85 = 42.5$

- Adjustment Factor Calculations:

<br>

$$AF_A = \frac{85}{A Close_{-1}} = \frac{85}{82} = 1.0365...$$

$$AF_B = \frac{42.5}{B Close_{-1}} = \frac{42.5}{42} = 1.0119...$$

<br>

- Like with other corporate actions yahoo finance servers use these Adjustment functions to scale the entire per-merger history so that the trasition is smooth.

<br>

$$AdjClose_t^A = Close_t^A \times AF_A$$

$$AdjClose_t^B = Close_t^B \times AF_B$$

- Where t is all time periods before the merger.

<br>

Aquisition Example:
- Company A buys Company B at 50% premium (Aquisition)
- Company B's share price jumps from £40 to £60 (Premium of £20)
- Company A's share price falls from £100 to £90 (Market belives Company A has taken on more risk and overpaid)
- If tickers combine, adjusted closes are recalculated so the new combined stock continues smoothly.

<br>

Acquisition Calculation:

All-cash deal:
- Company A's close values dont need to be adjusted because the market price will move up are down based on sentiment. This is nateral market movement so dont need to be corrected for.

- Compnay B's share price will jump to the price Company A purchases Company B for which is at a permium.

- For Company B the historical closing prices are scaled to the acquisition price.

$$AF_B = \frac{\text{Acquisition price}}{\text{Price of B before aquisition}}$$

<br>

Stock-for-stock deal:

- Shareholders of Company B will be issued shares for Company A as company B ticker is retired after the aquisition.

$$AF_B = \frac{\text{Value of shares received}}{\text{Price of B before deal}}$$





# **Adding Returns**
Adding a return column will help me assess the preformance of the asset/stock and predict future price direction and returns.
### Types Of Return:


*   Simple return

*   log return



---



###Simple Return:

The percentage change in the asset's price over a specified period.

$$R_t = \frac{P_t - P_{t-1}}{P_{t-1}} = \frac{P_t}{P_{t-1}} -1$$

Pros:


*  Straightforward to calculate
*  Useful in short-term analysis of stock performance
* Useful when working with smaller price changes
* Easier to interpret as a percentage gain or loss

Cons:


* Simple returns for time series can lead to misleading results, as they do not capture a Stock's overall performance.
*  Simple returns don't compound additively, so can lead to inaccuracies in cumulation return calculations (You cannot sum the values to gauge overall performance of a stock)

---

###Log Return:
Log returns are continuously compounded returns, Calculated using the natural logarithm.

$$r_t = \ln\left(\frac{P_t}{P_{t-1}}\right)$$

Pros:


*   Useful in long-term stock performance analysis where increments in values are larger.
*   Better working with continuous time series data, especially when conducting mathematical modelling.
*  Incorporates compounding effects naturally.
*Additive over time, which makes it much easier to aggregate returns across periods and calculate a stock's overall direction.


Cons:


* Performs very similarly to the less complex Simple returns when looking at data with minor changes in a specified period.
*  Harder to interpret intuitively



In [9]:
df['simple return'] = (df['Adj_Close']/df['Adj_Close'].shift(1)) -1
df['log return'] = np.log(df['Adj_Close']/df['Adj_Close'].shift(1))
df.head()


Price,Adj_Close,Adj_High,Adj_Low,Adj_Open,Adj_Volume,simple return,log return
Ticker,Adj_KO,Adj_KO,Adj_KO,Adj_KO,Adj_KO,Unnamed: 6_level_1,Unnamed: 7_level_1
Date,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2
2020-08-19,40.704441,41.572323,40.57555,41.348907,17371000,,
2020-08-20,40.687267,40.902089,40.463854,40.618526,11308700,-0.000422,-0.000422
2020-08-21,40.627113,40.816158,40.360736,40.678672,14734100,-0.001478,-0.00148
2020-08-24,41.220028,41.228619,40.54119,40.756012,9257700,0.014594,0.014489
2020-08-25,41.168461,41.288761,40.945048,41.245797,7827800,-0.001251,-0.001252


# **Adding Alpha Factor Features**
Alpha ($\alpha$) measures the excess return of an investment relative to a benchmark index or market average, after adjusting for risk. It represents the skill of an investment strategy in generating returns above what would be expected based on its exposure to market risk.

For this project I am focusing on just alpha instead of including beta hence the equation i will be using for alpha will be slightly simplifed:

$$ \alpha_t = R_{p,t} - R_{b,t} $$
$R_p = $ Future return

$R_b = $ Benchmark return



---



## Positive Alpha
- A positive alpha means that strategy or asset outperforming the benchmark on a risk-adjusted basis.

- This could be due to a market mispricing or inefficiency.

- A positive alpha is a good find as we can capitalise on this mistake by buying the stock when it is being priced under its fair value.

- Identifying this early is crucial as the more investors who identify this possitive alpha with also buy hence driving the price up until the alpha returns to 0. At this point the stock is priced at its fair value relative to the benchmark.

- In the long run alpha will tend to zero, the point of equilibrium.

- Timing matters - waiting too long to see if the alpha remains significantly positive consistently (hence less likely to be a inaccuracy in the model created by noise) risks you losing the opportunity. Buying too quickly risks acting on noise rather than a true signal.


### Example:

The S&P 500 says that given the risk profile of Coke the stock (KO) should expected a annual retrun of 7%.

My model looks over a recent time frame and finds that the actual expected return of KO, based on alpha factors like momentum, volatility and sentiment, is 10%.

This indicates a positive alpha,
Alpha = +3%.
Therefore overpreforming relative to the benchmark, providing an good opportuity to invest.

---

## Negative Alpha
- A negative alpha indicates underperformance and suggests that the benchmark is overvaluing the stock.

- Alerts investors to potentially sell their stock to reduce their exposure, as the annual returns are potentially less than the benchmark suggests. Investors may take a short position while the alpha remains negative.

- The reward for investing may be less than the risk you incur. In this case, the stock's mispricing works against you instead of in your favour, like with a positive alpha.

### Example
The benchmark say that given the risk profile of Coke the annual return on KO is 6%.

My model suggests that the return on the strategy on KO is only 3% annually.

This indicates a negative alpha, Alpha = -3%
Therefore, underperforming relative to the benchmark, it is not an attractive investment for a long position.

---
### Key Vocab

Excess Return - The additional returns earned by an investment beyond a reference point, usually a benchmark
 index.

Benchmark - A benchmark index like the S&P 500 reflects the relative success the overall market or a specific market segment.

Market Average - The avarge return or preformance of a broad set of investment within the market.

Skill of investment strategy: The ability for a trading strategy to constantly generate returns beyond what is explained by the benchmark, market movements, or chance (constantly producing positive alpha).  

Market Risk Exposure (Systematic risk) - The sensitivity of an investment's returns to overall market fluctuations (quantified by beta).




In [10]:
#Momentum indicators
df['ROC'] = ta.momentum.ROCIndicator(df['Adj_Close'], window = 12).roc()
df['RSI'] = ta.momentum.RSIIndicator(df['Adj_Close'], window = 14).rsi()
df['STO'] = ta.momentum.StochasticOscillator(df['Adj_High'], df['Adj_Low'], df['Adj_Close'], window = 14, smooth_window = 3).stoch()

#Trend indicators
df['SMA'] = ta.trend.SMAIndicator(df['Adj_Close'], window = 20).sma_indicator()
df['EMA'] = ta.trend.EMAIndicator(df['Adj_Close'], window = 20).ema_indicator()
df['MACD'] = ta.trend.MACD(df['Adj_Close'], window_slow = 26, window_fast = 12).macd()
df['ADX'] = ta.trend.ADXIndicator(df['Adj_High'], df['Adj_Low'], df['Adj_Close'], window = 14).adx()

#Volatility indicators
df['BOL'] = ta.volatility.BollingerBands(df['Adj_Close'], window = 20, window_dev = 2).bollinger_wband()
df['ATR'] = ta.volatility.AverageTrueRange(df['Adj_High'], df['Adj_Low'], df['Adj_Close'], window = 14).average_true_range()

#Volume indicators
df['OBV'] = ta.volume.OnBalanceVolumeIndicator(df['Adj_Close'], df['Adj_Volume']).on_balance_volume()
df['VWAP'] = ta.volume.VolumeWeightedAveragePrice(df['Adj_High'], df['Adj_Low'], df['Adj_Close'], df['Adj_Volume'], window = 14).volume_weighted_average_price()

df.head(20)

ValueError: Data must be 1-dimensional, got ndarray of shape (1255, 1) instead

# **Expaining Indicators**

## Window
- The window value (n) indicates the number of previous rows the indicators uses to calculate its value. Therefore, the first n rows in the designated indicator column will be filled with NA values. In the following steps I may choose to remove all rows containing these NA values to avoid complecations later when training my model.



---
---



## Momentum Indicators
#### Rate Of Change (ROC)


*  About: Rate of change, otherwise known as simply momentum, measures the percentage change in price from one period to the next. The difference in the periods used is determined by the window value, hence in my case the rate of change will be calculated between values 12 rows apart.

* Output: Returns a pandas series containing values which oscillate above and below zero as the rate of change changes between positive and negative.

* Purpose: Used to indicate high volatility and strong momentum which is helpful when determining the size of your position in order to manage risk. I can be used in my future model to predict returns.

* Calculation:

$$ \text{ROC} = \frac{\text{Price}_t - \text{Price}_{t-n}}{\text{Price}_{t-n}} \times 100 $$

Price$_t = $ current closing price

Price$_{t-n} = $ closing price n periods ago

ROC = 0 means no change

+ROC = Price increase since $ t - n$

-ROC =  Price decrease since $t - n$

Range = Unbounded

---

#### Relative Strength Index (RSI)
*  About: The Relative strength index compares the magnitude of recent gains and losses over a specified time period to measure speed and change of price movements of a security/stock.

* Output: Returns a pandas.Series, depending on the scale used values typically will range between 0 and 100.

*   Purpose: It is used to identify overbought or oversold conditions in the trading of an asset/stock. Overbought refers to when the stock prices has risen alot over a short space of time, typically indicated with a high RSI value above 70. On the other hand, oversold refers to when the stock price has fallen over a short period, typically indicated by a RSI value below 30.  

* Calculation:

Gains = Positive price changes (Gosses are 0 for these days)

Losses = Absolute values of negative price changes (Gains are 0 on these days)

The simple average method is used over the initial window (14 rows):

 $$ \text{AvgGain} = \frac{\sum(\text{gains over n periods})}{n}$$

$$  \text{AvgLoss} = \frac{\sum(\text{|loss over n periods|})}{n}$$

$$ \text{Relative Strength (RS)} = \frac{\text{AvgGain}}{\text{AvgLoss}} $$

$$ \text{RSI} = 100 - \frac{100}{1 + \text{RS}} $$

- Range = 0 to 100

- Large Gains, Small Losses = RSI approaches 100

- Small Gains, Large Losses = RSI apporaches 0

After the first window the method changes to Wilder's smoothing method:

- This method avoids sharp spikes in RSI values as it gently updates the previous average with the new data.

- Similar to EMA but with slighty slower decay rate so gives a larger weighting to more recent price changes with still some consideration to the rest of price history. This reduces reaction to short term noise and helps to identify future trend direction.

- Calculation:

n = window size (14)

$$\text{AvgGain}_{\text{new}} = \frac{(\text{AvgGain}_{\text{old}} \times (n -1)) + \text{Gain}_{t}}{n} $$

$$\text{AvgLoss}_{\text{new}} = \frac{(\text{AvgLoss}_{\text{old}} \times (n -1)) + \text{Loss}_{t}}{n} $$

The rest of the steps are the same as the simple average.

---

#### Stochastic Oscillator (STO)


*   About: The stochastic Oscillator tells you the position of the closing price of a stock in relation to the highest and lowest prices in the most recent period (Typically 14 day period).

*   Purpose: Used in ML models to provide informations regaring the momentum of the price wether its trending high or long at the current price. It also helps ideentify points where Market sentiment may have changed, possibly causing strong buying or selling pressure pushing the price.



* Calculation:

Raw %K calculation:

$$ \text{%}K_t = \frac{C_t - L_n}{H_n - L_n} \times 100$$

- $C_t =$ close price today
- $L_n = $ Lowest low in the past n periods
- $H_n =$ Highest high in the past n periods
- $n =$ Lookback window (14 days)

- Range = 0 to 100
- High value = Close to the highest high
- Low value = Close to lowest low
- Typically fluctuates a lot due to reacting to small changes, hence creating noise.

In order to combat the noise created mathematical smoothing is implemented using a moving average, like SMA.

Smoothed %K (Slow %K) Calculation:

$$\text{Slow %K}_t = \frac{\text{%K}_t + \text{%K}_{t-1} + ... +\text{%K}_{t - (m-1)}}{m}$$

- m = smooth window (3 days)
- Provide more stable readings however there is lag as any quick changes in direction are not detected as quickly as we take an average across the last 3 days.  
- Later i may decide to change the smoothing window to be 1 to just get the raw %K and have my ML model handle noise.


---
---


## Trend Indicators
#### Simple Moving Average (SMA)

*   About: Caluates the average closing price over the set window (20 days) for each set of 20 days in the pricing history.
*   Purpose: Used to smooth out short_term price fluctuations and helps idenfy the direction of trend of medium/long term price movement.

* Output: Pandas.Series where the output of each period's SMA is in the row of the last date in the window, hence the first 19 rows will contain NA values. I will look to remove these rows if necessary later.

* Calculations:

$$SMA_t = \frac{P_t + P_{t-1} + ... + P_{t-n+1}}{n}$$

$SMA_t =$ Simple moving average at time $t$

$P_t =$ closing price at time $t$

$n = $ window length (20 days)

Range = Unbounded

---

#### Exponential Moving Average (EMA)


*   About: Starts by calculating the SMA for the first window and then uses the EMA calculation for the test of history. The exponential part doesnt come in the form or raising $e$ to a power but instead uses a smoothing factor so the weighting decays exponentially over time. Hence putting a higher weighting on more recent pricing data.

*   Purpose: Similarly to SMA, EMA is used to identify trends but weighs recent price history more than really historical data. This enables EMA to detect trend changes easiler than SMA as it is more responsive to recent market movement. This is very *useful for short-term strategies in highly volitles market.

* Calculation:

$$\alpha = \frac{2}{n + 1}$$


$$EMA_t = \alpha  \cdot P_t + (1-\alpha) \cdot EMA_{t-1}$$

$n =$ Window length (20 days)

$P_t =$ Price of close at time t

$EMA_{t-1} = $ EMA from the previous period

$\alpha =$ Smoothing constant

The window range only effect the first SMA calculation then from that point we only look back one period to find the previous EMA value.

Range = Unbounded

---

#### Moving Average Convergence Divergence (MACD)

*   About: MACD show the relationship between two moving averages of prices, one short-term (12 days) and one long-term (26 days).

*   Purpose: Helps determine if momentum is becoming weeker or stronger over time.

* Calculation:
$$MACD_t = EMA_{\text{fast}}(P_t) - EMA_{\text{slow}}(P_t)$$

$P_t =$ Closing price at time $t$

$EMA_{\text{fast}} =$ Short-term exponential moving average (12 days)

$EMA_{\text{slow}} = $ Long-term exponential moving average (26 days)

- Positive  MACD = This means that the short-term closing price is rising faster than the long term is rising or falling. This Indicates that momentum is accelerating.  

- Negative MACD means the short-term price is falling while the long-term cost is rising, or the short-term is falling faster than the long-term. This indicates that momentum is decelerating.

- MACD maginitude shows the difference between short and long-term trends.

Range = Unbounded

---

#### Average Directional Index (ADX)

*   About: Measures the strength of a trend, regardless of direction.

*   Purpose: Helps avoid false trend signals from other indicators and provides informations about trend intensity.

* Calculation:
$$\text{Directional Movement(DM)}$$

$$
 +DM_t= \left\{
  \begin{array}{11}
   H_t - H_{t-1} & \text{if} z (H_t - H_{t-1}) >(L_{t-1} - L_t)  \ \text{and} (H_t - H_{t-1}) > 0\\
   0 & \text{Otherwise}
  \end{array}
  \right.
$$

<br>

$$
 -DM_t= \left\{
  \begin{array}{11}
   L_t - L_{t-1} & \text{if} z (L_{t-1} - L_{t}) >(H_{t} - H_{t-1})  \ \text{and} (L_{t-1} - L_{t}) > 0\\
   0 & \text{Otherwise}
  \end{array}
  \right.
$$

<br>

$$True Range(TR)$$
$$TR = max(H_t - L_t, |H_t - C_{t-1}|, |L_t - C_{t-1}|)$$


<br>

$$\text{Smooth} \ TR,  \ +DM, \ -DM : \text{Wilder's smoothing}$$

$$ TR^{(s)}_t = TR^{(s)}_{t-1} - \frac{TR^{(s)}_{t-1}}{N} + TR_t$$

$$ +DM^{(s)}_t = +DM^{(s)}_{t-1} - \frac{+DM^{(s)}_{t-1}}{N} + DM_t$$

$$ -DM^{(s)}_t = -DM^{(s)}_{t-1} - \frac{-DM^{(s)}_{t-1}}{N} + -DM_t$$

<br>

$$\text{Directional Indicators (+DI and -DI)}$$

$$+DI_t = 100 \times \frac{+DM^{(s)}_t}{TR^{(s)}_t}$$

$$-DI_t = 100 \times \frac{-DM^{(s)}_t}{TR^{(s)}_t}$$

<br>

$$Directional Index (DX)$$

$$DX_t = 100 \times \frac{|(+DI_t) - (-DI_t)|}{(+DI_t) - (-DI_t)}$$

<br>

$$ADX_t = \frac{\sum^{t}_{i = t - N + 1}DX_i}{N}$$


* Range = 0 to 100

* Strong Trend >= 25

* Weak/No Trend <= 20

---
---



## Volatility Indicators

#### Bollinger Bands Width (BOL)

*   About: Bollinger Bands are made up of three bands:
 * Middle Band - SMA over set window (20)
 * Upper Band - SMA + (standard deviation $\times$ multiplier),
 * Lower Band - SMA - (standard deviation $\times$ multiplier)

*   Purpose: Calculates the volatility.
 The wider the bands, the greater the volatility; the narrower the bands, the lower the volatility. This can allow traders to identify suitable strategies that work better in periods of high volatility or low volatility.

* Calculations:
$$\text{Bollinger Band Width (BOL)} = \frac{UB - LB}{MB} $$

- Multiplier is typically set to 2 and represents the number of standard deviations away the bands are placed. In the class the parameter window_dev represents the multiplier.

Range = Positive (0 $→ ∞$)

---

#### Average True Range (ATR)

*   About: ATR measures the average magnitude of price movements over a specific time period
*   Purpose: Helps traders adjust their position sizes in the market based on volitility.

* Calculation:

$$TR = max(H - L, |H - C_{t-1}|, |L - C_{t-1}|)$$

$$ATR_t = \frac{TR_t + TR_{t-1}+ ... + TR_{t- n - 1}}{n}$$

- n = window period (14 day)

Range = Unbounded

---
---

## Volume Indicators

#### On-Balance Volume (OBV)

*   About: OBV accumulates volume by adding the day's volume when the price closes high than the previous close, and subtracting the day's volume when it closes lower.

*   Purpose:Used to confirm price trends and measures the flow of money in and out of a stock.

* Calculation:

$$OBV_t = OBV_{t-1} + \left\{
  \begin{array}{11}
    V_t & \text{if} \ C_t > C_{t-1}\\
    -V_t & \text{if} \  C_t < C_{t-1} \\
    0 & \text{if} \ C_t = C_{t-1}
  \end{array}
  \right.$$

$V_t =$ Trading volume at time $t$

$C_t =$ Closing price at time $t$

- Rising OBV & Rising price = bullish confirmation

- Rising OBV & falling price = possible accumulation or bullish divergence

Range = Unbounded

---

#### Volume SMA (VWAP)


*   About: VMAP is a volume-weighted average price of a stock over a specific period. It is the average price at which the asset has traded weighted by trading volume.

*   Purpose: Used to help investors see that average price other investors are paying for a stock and identify market trends. It can be used as a benchmark for algorithmic trads to identify is average market price is at the stocks fair value or not which possibly provides opportnites to invest.

* Calculation:

$$ VWAP_t = \frac{\sum^{t}_{i=1}P_i \cdot V_i}{\sum^{t}_{i=1}V_i}$$

$P_i = $ Price at time $i$ = $\frac{(High + Low + Close)}{3}$

$V_i =$ Trading volume at time $i$

Range = Unbounded






In [None]:
# Data clearning (As many of the indicators will produce NA values in the first window)

df = df.dropna(how = 'any')
df.head(20)

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits,simple return,log return,ROC,RSI,STO,SMA,EMA,MACD,ADX,BOL,ATR,OBV,VWAP
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1
2020-09-21 00:00:00-04:00,43.0956,43.329486,42.108084,42.52388,17514800,0.0,0.0,-0.026957,-0.027327,-3.326095,47.300865,19.860138,42.947611,43.023191,0.478796,0.0,9.24819,0.864692,-45655800,43.493211
2020-09-22 00:00:00-04:00,42.541214,43.34682,42.506563,43.017647,13034600,0.0,0.0,0.011612,0.011545,-0.828047,51.80843,33.748013,43.037492,43.022663,0.419607,0.0,8.461958,0.862946,-32621200,43.531885
2020-09-23 00:00:00-04:00,43.078271,43.242856,41.726931,41.761581,17121200,0.0,0.0,-0.029199,-0.029634,-4.780092,41.973179,1.126365,43.067148,42.90256,0.268253,27.700545,7.961464,0.909588,-49742400,43.439909
2020-09-24 00:00:00-04:00,41.761594,42.53255,41.198537,42.186054,16788100,0.0,0.0,0.010164,0.010113,-1.437007,45.723102,29.305979,43.107286,42.834321,0.180475,27.224248,7.366384,0.939904,-32954300,43.302435
2020-09-25 00:00:00-04:00,41.752922,42.238016,41.562348,42.203369,12603400,0.0,0.0,0.00041,0.00041,-2.143114,45.876752,29.819824,43.145283,42.77423,0.111027,26.781972,6.757069,0.92103,-20350900,43.183844
2020-09-28 00:00:00-04:00,42.714462,43.060958,42.567199,42.688473,11215700,0.0,0.0,0.011494,0.011429,-0.642155,50.135735,44.215967,43.138792,42.766063,0.094049,25.232854,6.790106,0.916498,-9135200,43.175484
2020-09-29 00:00:00-04:00,42.783753,42.931016,42.367956,42.376617,12426600,0.0,0.0,-0.007305,-0.007332,-3.415629,47.545525,34.961215,43.129597,42.728973,0.054798,24.019414,6.868625,0.891253,-21561800,43.144
2020-09-30 00:00:00-04:00,42.679807,42.982991,42.350634,42.76643,15755900,0.0,0.0,0.009199,0.009157,-2.64247,50.956336,46.529458,43.157509,42.73254,0.054518,22.821213,6.630998,0.87276,-5805900,43.110778
2020-10-01 00:00:00-04:00,42.827067,43.017641,42.090761,42.601845,17577300,0.0,0.0,-0.003848,-0.003856,-3.66308,49.493019,41.645159,43.088255,42.720093,0.040547,22.022586,6.485709,0.876626,-23383200,43.019548
2020-10-02 00:00:00-04:00,42.238015,42.948337,42.056105,42.757763,13610100,0.0,0.0,0.00366,0.003653,-2.815523,50.93063,46.272252,43.057302,42.723681,0.041578,21.32367,6.492628,0.877741,-9773100,42.939795


In [None]:
df['Corporate Action'] = (lambda df['simple return']: True if abs((df['simple return'].shift(1) - df['simple return'])\ df['simple return'] else False) > 0.1)