# Table of Contents

1. [Imports](#Imports)<br>
2. [Data Import](#DataImport)<br>
3. [Financial Indicators](#FinancialIndicators)<br>
    3.1 [Simple Moving Average](#SimpleMovingAverage)<br>
    3.2 [Moving Average Convergence Divergence](#MovingAverageConvergenceDivergence)<br>
    3.3 [Stochastic Oscillator](#StochasticOscillator)<br>
    3.4 [Accumulation/Distribution Line](#Accumulation/DistributionLine)<br>
    3.5 [Bollinger Bands](#BollingerBands)<br>
    3.6 [On Balance Volume](#OnBalanceVolume)<br>

## Imports <a class="anchor" id="Imports"></a>

In [1]:
# pip install yfinance

In [6]:
from scipy import stats
import yfinance as yf
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pandas_datareader import DataReader
from datetime import datetime
plt.style.use('fivethirtyeight')
import operator

## Data Import <a class="anchor" id="DataImport"></a>

In [135]:
import numpy as np
import statsmodels.api as sm
from datetime import datetime
from dateutil.relativedelta import relativedelta
ticker='ADBE'
start = datetime(2017,10,1)
end = datetime(2021,4,1)
df = DataReader(ticker,  'yahoo', start, end)

In [136]:
df

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-10-02,150.479996,147.520004,149.789993,147.940002,2341700,147.940002
2017-10-03,148.800003,147.990005,148.479996,148.600006,1808200,148.600006
2017-10-04,148.460007,146.600006,148.210007,147.949997,2885600,147.949997
2017-10-05,150.449997,147.710007,148.490005,150.250000,2413800,150.250000
2017-10-06,151.360001,149.529999,149.960007,151.119995,2282800,151.119995
...,...,...,...,...,...,...
2021-03-26,469.769989,449.049988,450.250000,469.089996,3614400,469.089996
2021-03-29,472.000000,462.500000,469.029999,469.320007,3141900,469.320007
2021-03-30,469.089996,461.500000,462.579987,465.459991,2436300,465.459991
2021-03-31,482.410004,468.700012,469.700012,475.369995,3155900,475.369995


## Financial Indicators <a class="anchor" id="FinancialIndicators"></a>

### Simple Moving Average <a class="anchor" id="SimpleMovingAverage"></a>

The Exponential Moving Average is a staple of technical analysis and is used in countless technical indicators. In a Simple Moving Average, each value in the time period carries equal weight, and values outside of the time period are not included in the average. However, the Exponential Moving Average is a cumulative calculation, including all data. Past values have a diminishing contribution to the average, while more recent values have a greater contribution. This method allows the moving average to be more responsive to changes in the data.

In [137]:
def SMA(df , periods=20):
    df["SMA"] = df ["Adj Close"].rolling(window=periods).mean()
    return df

In [138]:
SMA(df)

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2017-10-02,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,
2017-10-03,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,
2017-10-04,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,
2017-10-05,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,
2017-10-06,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,
...,...,...,...,...,...,...,...
2021-03-26,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501
2021-03-29,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001
2021-03-30,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501
2021-03-31,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500


### Moving Average Convergence Divergence <a class="anchor" id="MovingAverageConvergenceDivergence"></a>

he Moving Average Convergence Divergence (MACD) is the difference between two Exponential Moving Averages. The Signal line is an Exponential Moving Average of the MACD.
The MACD signals trend changes and indicates the start of the new trend direction. High values indicate overbought conditions, low values indicate oversold conditions. Divergence with the price indicates an end to the current trend, especially if the MACD is at extremely high or low values. When the MACD line crosses above the signal line a buy signal is generated. When the MACD crosses below the signal line, a sell signal is generated. To confirm the signal, the MACD should be above zero for a buy, and below zero for a sell.

In [139]:
def MACD(df):
    exp1 = df["Adj Close"].ewm(span=12, adjust=False).mean()
    exp2 = df["Adj Close"].ewm(span=26, adjust=False).mean()
    macd = exp1 - exp2
    df["MACD"] = macd
#     exp3 = macd.ewm(span=9, adjust=False).mean()    
    return df

In [140]:
MACD(df)

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2017-10-02,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,,0.000000
2017-10-03,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,,0.052650
2017-10-04,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,,0.041447
2017-10-05,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,,0.215674
2017-10-06,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,,0.419120
...,...,...,...,...,...,...,...,...
2021-03-26,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501,-3.046080
2021-03-29,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001,-1.523269
2021-03-30,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501,-0.620745
2021-03-31,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500,0.883976


### Stochastic Oscillator <a class="anchor" id="StochasticOscillator"></a>

The Stochastic Oscillator measures where the close is in relation to the recent trading range. The values range from zero to 100. %D values over 75 indicate an overbought condition; values under 25 indicate an oversold condition. When the Fast %D crosses above the Slow %D, it is a buy signal; when it crosses below, it is a sell signal. The Raw %K is generally considered too erratic to use for crossover signals.

In [141]:
def calculate_k(df):
    adj_close = df["Adj Close"]
    highest_hi = df['High'].rolling(window=10).max()
    lower_lo = df["Low"].rolling(window=10).min()
    df['per_k_stoch_10'] = (adj_close - lower_lo)/(highest_hi - lower_lo)*100
    return df

def calculate_d(df):
    df['per_d_stoch_10'] = df['per_k_stoch_10'].rolling(window=10).mean()
    return df

def stochastic_oscillator(df):
    df = calculate_k(df)
    df = calculate_d(df)
    return df

In [142]:
stochastic_oscillator(df)

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD,per_k_stoch_10,per_d_stoch_10
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
2017-10-02,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,,0.000000,,
2017-10-03,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,,0.052650,,
2017-10-04,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,,0.041447,,
2017-10-05,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,,0.215674,,
2017-10-06,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,,0.419120,,
...,...,...,...,...,...,...,...,...,...,...
2021-03-26,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501,-3.046080,97.875023,65.088580
2021-03-29,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001,-1.523269,92.085074,69.240505
2021-03-30,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501,-0.620745,80.685141,70.529972
2021-03-31,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500,0.883976,84.097560,70.713875


### Accumulation/Distribution Line <a class="anchor" id="Accumulation/DistributionLine"></a> 

The Accumulation/Distribution Line is similar to the On Balance Volume (OBV), which sums the volume times +1/-1 based on whether the close is higher than the previous close. The Accumulation/Distribution indicator, however, multiplies the volume by the close location value (CLV). The CLV is based on the movement of the issue within a single bar and can be +1, -1 or zero.
The Accumulation/Distribution Line is interpreted by looking for a divergence in the direction of the indicator relative to price. If the Accumulation/Distribution Line is trending upward it indicates that the price may follow. Also, if the Accumulation/Distribution Line becomes flat while the price is still rising (or falling) then it signals an impending flattening of the price.

In [143]:
df['seq'] = [a for a in range(1, len(df)+1)]

In [144]:
df['Date'] = df.index

In [145]:
df

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD,per_k_stoch_10,per_d_stoch_10,seq,Date
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
2017-10-02,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,,0.000000,,,1,2017-10-02
2017-10-03,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,,0.052650,,,2,2017-10-03
2017-10-04,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,,0.041447,,,3,2017-10-04
2017-10-05,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,,0.215674,,,4,2017-10-05
2017-10-06,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,,0.419120,,,5,2017-10-06
...,...,...,...,...,...,...,...,...,...,...,...,...
2021-03-26,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501,-3.046080,97.875023,65.088580,877,2021-03-26
2021-03-29,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001,-1.523269,92.085074,69.240505,878,2021-03-29
2021-03-30,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501,-0.620745,80.685141,70.529972,879,2021-03-30
2021-03-31,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500,0.883976,84.097560,70.713875,880,2021-03-31


In [146]:
df.set_index("seq", inplace = True)

In [147]:
df

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD,per_k_stoch_10,per_d_stoch_10,Date
seq,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
1,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,,0.000000,,,2017-10-02
2,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,,0.052650,,,2017-10-03
3,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,,0.041447,,,2017-10-04
4,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,,0.215674,,,2017-10-05
5,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,,0.419120,,,2017-10-06
...,...,...,...,...,...,...,...,...,...,...,...
877,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501,-3.046080,97.875023,65.088580,2021-03-26
878,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001,-1.523269,92.085074,69.240505,2021-03-29
879,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501,-0.620745,80.685141,70.529972,2021-03-30
880,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500,0.883976,84.097560,70.713875,2021-03-31


In [148]:
def accumulation_distribution(df):
    values = pd.Series(index = df.index)
    
    first_idx = df.index.values[0]
    
    for idx in df.index.values:
        today = df.loc[idx]
        close, high, low, volume = today["Close"], today["High"], today["Low"], today["Volume"]
        CLV = ((close -low) - (high-close))/ (high-low)
        
        values[idx] = values[idx-1]+ CLV * volume if idx != first_idx else 0
        
    df['a/d'] = values
    return df

In [149]:
accumulation_distribution(df)

  


Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD,per_k_stoch_10,per_d_stoch_10,Date,a/d
seq,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
1,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,,0.000000,,,2017-10-02,0.000000e+00
2,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,,0.052650,,,2017-10-03,9.152727e+05
3,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,,0.041447,,,2017-10-04,2.218417e+06
4,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,,0.215674,,,2017-10-05,4.279842e+06
5,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,,0.419120,,,2017-10-06,5.963861e+06
...,...,...,...,...,...,...,...,...,...,...,...,...
877,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501,-3.046080,97.875023,65.088580,2021-03-26,2.187164e+08
878,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001,-1.523269,92.085074,69.240505,2021-03-29,2.200856e+08
879,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501,-0.620745,80.685141,70.529972,2021-03-30,2.201915e+08
880,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500,0.883976,84.097560,70.713875,2021-03-31,2.201063e+08


In [150]:
df.set_index("Date", inplace = True)

In [151]:
df

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD,per_k_stoch_10,per_d_stoch_10,a/d
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
2017-10-02,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,,0.000000,,,0.000000e+00
2017-10-03,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,,0.052650,,,9.152727e+05
2017-10-04,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,,0.041447,,,2.218417e+06
2017-10-05,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,,0.215674,,,4.279842e+06
2017-10-06,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,,0.419120,,,5.963861e+06
...,...,...,...,...,...,...,...,...,...,...,...
2021-03-26,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501,-3.046080,97.875023,65.088580,2.187164e+08
2021-03-29,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001,-1.523269,92.085074,69.240505,2.200856e+08
2021-03-30,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501,-0.620745,80.685141,70.529972,2.201915e+08
2021-03-31,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500,0.883976,84.097560,70.713875,2.201063e+08


### Bollinger Bands <a class="anchor" id="BollingerBands"></a> 

Bollinger Bands consist of three lines. The middle band is a simple moving average (generally 20 periods) of the typical price (TP). The upper and lower bands are F standard deviations (generally 2) above and below the middle band. The bands widen and narrow when the volatility of the price is higher or lower, respectively.
Bollinger Bands do not, in themselves, generate buy or sell signals; they are an indicator of overbought or oversold conditions. When the price is near the upper or lower band it indicates that a reversal may be imminent. The middle band becomes a support or resistance level. The upper and lower bands can also be interpreted as price targets. When the price bounces off of the lower band and crosses the middle band, then the upper band becomes the price target.

In [152]:
def BBANDS(df):
    df["TP"] = (df["High"] + df["Low"] + df["Close"])/3
    df["Midband"] = df["TP"].rolling(window= 20).mean()
    df["Std"] = df["TP"].rolling(window= 20).std()
    
    df["Upperband"] = df["Midband"] + (df["Std"]*2)
    df["Lowerband"] = df["Midband"] - (df["Std"]*2)
    
    df = df.drop(['Std', 'TP'], axis = 1)
    return df

In [153]:
BBANDS(df)

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD,per_k_stoch_10,per_d_stoch_10,a/d,Midband,Upperband,Lowerband
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1
2017-10-02,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,,0.000000,,,0.000000e+00,,,
2017-10-03,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,,0.052650,,,9.152727e+05,,,
2017-10-04,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,,0.041447,,,2.218417e+06,,,
2017-10-05,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,,0.215674,,,4.279842e+06,,,
2017-10-06,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,,0.419120,,,5.963861e+06,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-03-26,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501,-3.046080,97.875023,65.088580,2.187164e+08,449.109333,470.083888,428.134778
2021-03-29,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001,-1.523269,92.085074,69.240505,2.200856e+08,449.127666,470.170704,428.084628
2021-03-30,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501,-0.620745,80.685141,70.529972,2.201915e+08,448.997666,469.578142,428.417191
2021-03-31,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500,0.883976,84.097560,70.713875,2.201063e+08,450.085333,473.784151,426.386515


### On Balance Volume <a class="anchor" id="OnBalanceVolume"></a> 

The On Balance Volume (OBV) is a cumulative total of the up and down volume. When the close is higher than the previous close, the volume is added to the running total, and when the close is lower than the previous close, the volume is subtracted from the running total.
To interpret the OBV, look for the OBV to move with the price or precede price moves. If the price moves before the OBV, then it is a non-confirmed move. A series of rising peaks, or falling troughs, in the OBV indicates a strong trend. If the OBV is flat, then the market is not trending.

In [154]:
def obv(df):
    df['seq'] = [a for a in range(1, len(df)+1)]
    df['Date'] = df.index
    df.set_index("seq", inplace = True)
    for index in df[:-1].index:
        df.loc[index+1, "OBV"] = abs(df.loc[index+1, "Volume"] - df.loc[index, "Volume"])
    df.set_index("Date", inplace = True)    
    return df

In [155]:
obv(df)

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD,per_k_stoch_10,per_d_stoch_10,a/d,TP,Midband,Std,Upperband,Lowerband,OBV
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1
2017-10-02,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,,0.000000,,,0.000000e+00,148.646667,,,,,
2017-10-03,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,,0.052650,,,9.152727e+05,148.463338,,,,,533500.0
2017-10-04,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,,0.041447,,,2.218417e+06,147.670003,,,,,1077400.0
2017-10-05,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,,0.215674,,,4.279842e+06,149.470001,,,,,471800.0
2017-10-06,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,,0.419120,,,5.963861e+06,150.669998,,,,,131000.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-03-26,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501,-3.046080,97.875023,65.088580,2.187164e+08,462.636658,449.109333,10.487277,470.083888,428.134778,412900.0
2021-03-29,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001,-1.523269,92.085074,69.240505,2.200856e+08,467.940002,449.127666,10.521519,470.170704,428.084628,472500.0
2021-03-30,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501,-0.620745,80.685141,70.529972,2.201915e+08,465.349996,448.997666,10.290238,469.578142,428.417191,705600.0
2021-03-31,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500,0.883976,84.097560,70.713875,2.201063e+08,475.493337,450.085333,11.849409,473.784151,426.386515,719600.0


# Hypothesis

We set the label as 1 if the return 20 trading days in the future > 3% and 0 otherwise.

In [156]:
def _produce_prediction(data, window):
    """
    Function that produces the 'truth' values
    At a given row, it looks 'window' rows ahead to see if the price increased (1) or decreased (0)
    :param window: number of days, or rows to look ahead to see what the price did
    """
    
    prediction = (data.shift(-window)['Adj Close'] >= data['Adj Close']+ data['Adj Close']*0.05 )
    prediction = prediction.iloc[:-window]
    data['pred'] = prediction.astype(int)
    
    return data

df = _produce_prediction(df, window=30)

In [157]:
df

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD,per_k_stoch_10,per_d_stoch_10,a/d,TP,Midband,Std,Upperband,Lowerband,OBV,pred
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2017-10-02,150.479996,147.520004,149.789993,147.940002,2341700,147.940002,,0.000000,,,0.000000e+00,148.646667,,,,,,1.0
2017-10-03,148.800003,147.990005,148.479996,148.600006,1808200,148.600006,,0.052650,,,9.152727e+05,148.463338,,,,,533500.0,1.0
2017-10-04,148.460007,146.600006,148.210007,147.949997,2885600,147.949997,,0.041447,,,2.218417e+06,147.670003,,,,,1077400.0,1.0
2017-10-05,150.449997,147.710007,148.490005,150.250000,2413800,150.250000,,0.215674,,,4.279842e+06,149.470001,,,,,471800.0,1.0
2017-10-06,151.360001,149.529999,149.960007,151.119995,2282800,151.119995,,0.419120,,,5.963861e+06,150.669998,,,,,131000.0,1.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-03-26,469.769989,449.049988,450.250000,469.089996,3614400,469.089996,448.563501,-3.046080,97.875023,65.088580,2.187164e+08,462.636658,449.109333,10.487277,470.083888,428.134778,412900.0,
2021-03-29,472.000000,462.500000,469.029999,469.320007,3141900,469.320007,448.551001,-1.523269,92.085074,69.240505,2.200856e+08,467.940002,449.127666,10.521519,470.170704,428.084628,472500.0,
2021-03-30,469.089996,461.500000,462.579987,465.459991,2436300,465.459991,448.494501,-0.620745,80.685141,70.529972,2.201915e+08,465.349996,448.997666,10.290238,469.578142,428.417191,705600.0,
2021-03-31,482.410004,468.700012,469.700012,475.369995,3155900,475.369995,449.840500,0.883976,84.097560,70.713875,2.201063e+08,475.493337,450.085333,11.849409,473.784151,426.386515,719600.0,


In [158]:
df = df.dropna()

In [159]:
df

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close,SMA,MACD,per_k_stoch_10,per_d_stoch_10,a/d,TP,Midband,Std,Upperband,Lowerband,OBV,pred
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1
2017-10-27,177.580002,174.020004,174.020004,177.330002,2806400,177.330002,158.928000,6.886955,99.150527,81.218427,3.032851e+07,176.310003,158.585667,10.956188,180.498043,136.673292,225200.0,0.0
2017-10-30,177.479996,174.449997,177.000000,176.029999,2124200,176.029999,160.332500,7.112184,94.733256,86.012964,3.041965e+07,175.986664,159.952667,11.349380,182.651427,137.253907,682200.0,0.0
2017-10-31,176.729996,174.520004,176.160004,175.160004,2656500,175.160004,161.660500,7.138193,91.523650,90.583513,2.930176e+07,175.470001,161.303000,11.515836,184.334673,138.271328,532300.0,0.0
2017-11-01,176.940002,174.699997,176.589996,176.250000,2002900,176.250000,163.075500,7.164175,89.893606,92.163918,3.007073e+07,175.963333,162.717667,11.490766,185.699198,139.736135,653600.0,0.0
2017-11-02,181.479996,175.800003,178.020004,180.940002,3262300,180.940002,164.610000,7.477019,95.279780,91.866896,3.271274e+07,179.406667,164.214500,11.623316,187.461133,140.967868,1259400.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2021-02-11,497.000000,491.079987,494.529999,496.619995,1609200,496.619995,476.182500,4.340119,92.491099,81.589963,2.261772e+08,494.899994,476.017333,13.035749,502.088832,449.945835,155300.0,0.0
2021-02-12,499.359985,491.760010,495.160004,498.839996,1450200,498.839996,477.961000,5.027315,97.137181,89.074622,2.274290e+08,496.653330,477.533000,13.601341,504.735682,450.330318,159000.0,0.0
2021-02-16,506.510010,497.600006,500.029999,501.640015,1845900,501.640015,480.139001,5.731788,85.659623,90.801506,2.272570e+08,501.916677,479.571334,14.063760,507.698855,451.443814,395700.0,0.0
2021-02-17,495.549988,485.510010,495.410004,491.230011,2112000,491.230011,481.875502,5.387978,44.111189,85.968044,2.275515e+08,490.763336,481.252335,13.226226,507.704787,454.799882,266100.0,0.0


In [160]:
df['pred'].value_counts()

0.0    462
1.0    370
Name: pred, dtype: int64

In [161]:
df.columns

Index(['High', 'Low', 'Open', 'Close', 'Volume', 'Adj Close', 'SMA', 'MACD',
       'per_k_stoch_10', 'per_d_stoch_10', 'a/d', 'TP', 'Midband', 'Std',
       'Upperband', 'Lowerband', 'OBV', 'pred'],
      dtype='object')

In [162]:
X = df[['Volume',  'SMA', 'MACD',
       'per_k_stoch_10', 'per_d_stoch_10', 'a/d', 'TP', 'Midband', 'Std',
       'Upperband', 'Lowerband', 'OBV']].values

In [163]:
y = df['pred'].values

In [164]:
df.isna().sum() 

High              0
Low               0
Open              0
Close             0
Volume            0
Adj Close         0
SMA               0
MACD              0
per_k_stoch_10    0
per_d_stoch_10    0
a/d               0
TP                0
Midband           0
Std               0
Upperband         0
Lowerband         0
OBV               0
pred              0
dtype: int64

## Model Training

In [165]:
import math
import matplotlib.pyplot as plt
import keras
import pandas as pd
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
from keras.layers import *
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from keras.callbacks import EarlyStopping

In [166]:
from sklearn import preprocessing
from sklearn.svm import SVC
from sklearn.decomposition import PCA
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.model_selection import train_test_split, GridSearchCV, TimeSeriesSplit

In [167]:
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler(feature_range=(0,1))
X=scaler.fit_transform(X)

In [168]:
X

array([[0.19113464, 0.        , 0.5903074 , ..., 0.00473606, 0.        ,
        0.02775239],
       [0.1323253 , 0.00418434, 0.596301  , ..., 0.01065207, 0.00170965,
        0.08419584],
       [0.17821244, 0.00814076, 0.59699313, ..., 0.01527646, 0.0047055 ,
        0.06568189],
       ...,
       [0.10833434, 0.95696345, 0.55956699, ..., 0.90365711, 0.9268575 ,
        0.04881061],
       [0.1312736 , 0.96213689, 0.5504178 , ..., 0.90367341, 0.93673961,
        0.03280389],
       [0.10465337, 0.96491354, 0.53554434, ..., 0.90408341, 0.9419223 ,
        0.03807771]])

In [169]:
y

array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 0., 1., 1., 1., 1., 0., 1., 1., 1., 1.,
       1., 1., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 1., 1., 0., 1.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 1., 0., 1., 1., 1., 1.,
       1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0.

In [170]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42, shuffle=False)

In [171]:
tscv = TimeSeriesSplit(n_splits=3)

In [172]:
def _train_svm(X_train, y_train, X_test, y_test):
    svm = SVC()
    
    # Dictionary of all values we want to test for n_estimators
    params_svm = {'C': [0.1, 1, 10, 100, 1000, 10000], 
              'gamma': [1, 0.1, 0.01, 0.001, 0.0001],
              'kernel': ['rbf']}
    # Use gridsearch to test all values for all n estimators
    svm_gs = GridSearchCV(svm, params_svm , cv=tscv, verbose=5)
    # Fit model to training data
    svm_gs.fit(X_train, y_train)
    
    # Save best model
    svm_best = svm_gs.best_estimator_
     
    # Check best n_neigbors value
    print(svm_gs.best_params_)
    
    prediction = svm_best.predict(X_test)

    print(classification_report(y_test, prediction))
    print(confusion_matrix(y_test, prediction))
    
    return svm_best
svm_model = _train_svm(X_train, y_train, X_test, y_test) 

Fitting 3 folds for each of 30 candidates, totalling 90 fits
[CV 1/3] END .....................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV 2/3] END .....................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV 3/3] END .....................C=0.1, gamma=1, kernel=rbf; total time=   0.0s
[CV 1/3] END ...................C=0.1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV 2/3] END ...................C=0.1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV 3/3] END ...................C=0.1, gamma=0.1, kernel=rbf; total time=   0.0s
[CV 1/3] END ..................C=0.1, gamma=0.01, kernel=rbf; total time=   0.0s
[CV 2/3] END ..................C=0.1, gamma=0.01, kernel=rbf; total time=   0.0s
[CV 3/3] END ..................C=0.1, gamma=0.01, kernel=rbf; total time=   0.0s
[CV 1/3] END .................C=0.1, gamma=0.001, kernel=rbf; total time=   0.0s
[CV 2/3] END .................C=0.1, gamma=0.001, kernel=rbf; total time=   0.0s
[CV 3/3] END .................C=0.1, gamma=0.001

In [173]:
svm_classifier = SVC(C = 100, gamma = 0.1, kernel = 'rbf')

In [174]:
svm_classifier.fit(X_train, y_train)

SVC(C=100, gamma=0.1)

In [175]:
prediction = svm_classifier.predict(X_test)

In [176]:
print(classification_report(y_test, prediction))
print(confusion_matrix(y_test, prediction))

              precision    recall  f1-score   support

         0.0       0.73      0.82      0.77       127
         1.0       0.65      0.53      0.59        81

    accuracy                           0.71       208
   macro avg       0.69      0.67      0.68       208
weighted avg       0.70      0.71      0.70       208

[[104  23]
 [ 38  43]]


In [177]:
prediction

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
       1., 1., 1., 1., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
       0., 0., 0., 0.])

In [134]:
len(prediction)

193

PCA helps in reducing number of interested dimensions in the data space. But, at the same time, it may affect the performance of your SVM module by changing the data-space drastically. Since, you may be ignoring dimensions with lower variance/ information, PCA is lossy