##### An introduction to Support Vector Machines with a Python example focused on trading

Just as any other machine learning algorithm does, a support vector machine (SVM) takes data as input, attempts to find & recognize patterns, and then tells us what it learned.  Support vector machines fall into the category of supervised learning, which means that it creates a function that will map a given input to an output.  More specifically, a SVM is a classification algorithm.

Before we can start implementing trading algorithms and seeking alpha, let's figure out how an SVM works
#### Maximal Margin Classifier
The support vector machine algorithm comes from the maximial margin classifier.  The **maximal margin classifier** uses the distance from a given decision boundary to classify an input.  The greater the distance, or *margin* , the better the classifier is at handling the data. On a catesian plane, the boundary can be thought of as a line.  In three dimensional space, it is a plane, but after than it becomes hard to conceptualize. This boundary can be better thought of as a **hyperplane**, specifically one of dimension $p-1$, where $p$ is the dimension of the data point.
Our boundary, or hyperplane, is known as a seperating hyperplane, because it is used to seperate the data points into desired categories. In general, there are many hyperplanes that can seperate a given data set, but the one we care about is the *maximal margin hyperplane* or the *optimal separating hyperplane*.  This separating hyperplane is the one with the largest minimum distance from each data point in the training set.  By using this hyperplane to classify a data point from the test set, we have the maximal margin classifier.

In [None]:
import matplotlib.pyplot as plt
import numpy as np

points=np.array(np.random.random((100,3)))

plt.scatter(points[:,0], points[:,1])
plt.show()

Now the maximal margin classifier works, to a degree.  If you have a data set which cannot be separated by a hyperplane, you can no longer use this.  Sometimes, you may run into a data that has more than two categories, which makes a linear boundary useless.

![]("img/svm meme 1.jpg")

At this point, you have to consider you options.  
1. You can base your classifier on the seperating hyperplane as explained earlier.  But the hyperplane doesn't exist...so you have no classifier.
2. Consider a classifier that isn't perfect, but it can work some/most of the time

#### Support Vector Classifier
I like the second option too.  By using a classifier that isn't perfect, you can at least handle most observations, and introduce a level of adaptation to the model when it is presented with new data.

This evolution of the maximal margin classifier is known as the **support vector classifier** (SVC), or the soft margin classifier.  Instead of being exact and not very robust in its classification, the SVC allows some observations to be on the wrong side of the margin and/or hyperplane (where the soft comes from), for the sake of getting classification mostly correct.

Without getting into too much math, the algorithm determines which side of the hyperplane an observation will lie on by finding a solution to an optimization problem that uses a tuning parameter, the width of the margin (which it tries to maximize) and slack variables.

The tuning parameter is used to control the bias-variance tradeoff.  When it is small, the classifier fits the data well as the margins are small.  In other words, low bias, high variance.  A larger tuning parameter is the opposite.  It allows for more observations to be on the wrong side of the margin allowing for high bias and low variance.

Slack variables in particular are pretty cool.  They allow data points to be on the side of the margin or hyperplane.  They are also used to transform inequalities into equalities.  The values that the slack values take on can also tell us about the behavior of a given data point. If the slack variable for a given data point is equal to 0, then that data point is on the right side of the margin.  If the slack variable is greater than 0 but less than 1, the data point is on the wrong side of the margin, but on the right side of the hyperplane.  If the slack variable is greater than 1, the data point is on the wrong side of the hyper plane.

The main reason this optimization matters is its affect on the hyperplane. The only values that affect the hyperplane, and in turn how data points are classified, are those that are on the margin, or on the wrong side of it.  If an object is on the right side of the hyperplane, it has not affect on it.  The classifer gets its name from the former data points, as they are known as **support vectors**.

#### Finally, Support Vector Machines
The support vector machines builds on the optimization in support vector classifiers by growing the feature space by using **kernels**. 

Kernels, similar to the previous optimization, uses a fair bit of math.  Put simply, kernels tell us how similar data points are.  By assigning weights to to sequences of data, it can identify how similar two points are, given that it has learned how to compare them.  Kernels allow data to be processed in simpler terms, as opposed to being done in a higher dimensional space.  More specifically, it computes inner products between all possible outputs of all of the pairs of data points in the feature space.  By using kernels instead of enlarging the feature space, the algorithm can be much more efficient.  It uses one function to compare pairs of distinct data points as opposed to using functions for original features in the data set.

Many different kernels exist including the RBF kernel, graph kernels, the linear kernel, polynomial kernel.  For example, the linear kernel compares a pair of data points by using their bivariate correlation.  The polynomial kernel attempts to fit an SVC in a higher dimensional space. A support vector classifier is the same as using an SVM with a polynomial kernel of degree 1.

Basically, the main goal of the Support Vector Machine is to construct a hyperplane, which it then uses to classify data.  Despite generally being categorized as a classification algorithm, there is an extension of the Support Vector Machine used for regression, known as *Support Vector Regression*.

#### Support Vector Machines for Trading

Before I get into this application, know that this is by no means advice on how/what you should trade.  That's on you.

We'll start by gathering our data.

We'll use a time period going back about five years, October 28, 2014 to October 28, 2019.  The stocks that we will get price data for are the components of the Dow Jones Industrial Average.

Yahoo Finance used to be really easy to get data from, but most packages no longer work, so we'll also create a web scraper in the process.

The first things we'll do is import all of the packages we'll need an then use the request package to scrape the contents of [this](https://finance.yahoo.com/quote/%5EDJI/components?p=%5EDJI) on Yahoo Finance.  This page contains the names of all of the companies that make up the Dow Jones Industrial Average, as well as their tickers

In [1]:
from bs4 import BeautifulSoup
import datetime
import json
import numpy as np
import pandas as pd
import requests
import time
import warnings
warnings.simplefilter('ignore')


Dow_Page = requests.get('https://finance.yahoo.com/quote/%5EDJI/components?p=%5EDJI')
Dow_Content = Dow_Page.content

Next, we'll use BeautifulSoup4 to make the information in `Dow_Content` searchable.

In [2]:
soup = BeautifulSoup(Dow_Content)

data = list(soup.findAll("td",{"class":"Py(10px) Ta(start) Pend(10px)"}))

The lines above parse the data gathered from the webpage and search for the bit of HTML code that corresponds to the table on the page.  This can be found by right click on the area of the page, inspecting the element, and with a little investigation you can find the class name used above.

There will be two types of lines that the search will come across:
1. Lines containing the ticker
2. Lines containing the company name with no ticker

We don't care for the later, so when the loop finds them, it ignores them and moves on.  A few string operations to trim the extra fat and we have our ticker.  Each ticker is then added to a list for safe keeping.

In [3]:
Ticker_List = []
for i in data:
    TempData = str(i)
    if "title" in TempData:
        TempData = TempData[TempData.find("title"):]
        TempData = TempData[TempData.find(">")+1:TempData.find("<")]
        Ticker_List.append(TempData)
    else:
        continue

Yahoo Finance uses a Unix time stamp in their url, so we make use of the `time` package to convert our start and end dates to the desired format.  It can take either `struct_time` (more about that [here](https://docs.python.org/2/library/time.html#time.struct_time)) or a tuple of 9 time arguments.  We don't really care for anything past the date here.

In [4]:
Start_Date = int(time.mktime((2014,10,28,4,0,0,0,0,0)))
End_Date = int(time.mktime((2019,10,28,4,0,0,0,0,0)))

The `ScrapeYahoo` function takes three arguments:
1. ticker, a string representing a given stock
2. start, a unix timestamp representing the start date 
3. end, a unix timestamp representing the current day

It combines these with the base url for Yahoo Finance and gets the data from the desired web page.  Instead of processing it like we did earlier, we parse the JSON data from the page.  Yahoo Finance uses cookies now, and simply using the HTML code will throw an error.

The lines after parse the content of the JSON data.  Something that helped a lot while I was initially exploring the dataset was the `keys()` method for Python dictionaries.  It made traversing the JSON data much easier.  You can read about it [here](https://www.programiz.com/python-programming/methods/dictionary/keys)

In [8]:
def ScrapeYahoo(ticker, start, end):
    
    #Form the URL to be scraped
    Base_Url = 'https://query1.finance.yahoo.com/v8/finance/chart/'
    Scrape_Url = Base_Url + ticker + "?period1=" + str(start)+"&period2="+str(end)+"&interval=1d"
    
    #Get data from page
    r = requests.get(Scrape_Url)
    Page_Data = r.json()
    
    # Compile data into a DataFrame
    Stock_df = pd.DataFrame()
    Stock_df['DateTime'] = Page_Data['chart']['result'][0]['timestamp']
    Stock_df['DateTime'] = Stock_df['DateTime'].apply(lambda x: datetime.datetime.fromtimestamp(x).date().isoformat())
    Stock_df["Open"] = Page_Data["chart"]["result"][0]["indicators"]["quote"][0]["open"]
    Stock_df["High"] = Page_Data["chart"]["result"][0]["indicators"]["quote"][0]["high"]
    Stock_df["Low"] = Page_Data["chart"]["result"][0]["indicators"]["quote"][0]["low"]
    Stock_df["Close"] = Page_Data["chart"]["result"][0]["indicators"]["quote"][0]["close"]
    Stock_df["Volume"] = Page_Data["chart"]["result"][0]["indicators"]["quote"][0]["volume"]
    Stock_df = Stock_df.set_index("DateTime")
    
    #Add data to a dictionary containing all values
    Stock_Data[ticker] =  Stock_df
    

The `Stock_Data` dictionary will hold our parsed data.  The keys in the dictionary will be the ticker of a given stock.  For each stock, the function `ScrapeYahoo` will create a dateframe containing open, high, low, close, and volume data.

In [36]:
Stock_Data = {}

for i in Ticker_List:
    ScrapeYahoo(i, Start_Date, End_Date)
    #print(i + " done")
    time.sleep(0.5)

UTX done
MRK done
NKE done
WMT done
CVX done
PG done
CAT done
AXP done
DIS done
BA done
VZ done
KO done
JPM done
IBM done
INTC done
CSCO done
JNJ done
WBA done
TRV done
UNH done
XOM done
AAPL done
HD done
V done
PFE done
DOW done
MCD done
GS done
MMM done
MSFT done


In [37]:
CopyOfStockData = Stock_Data

We have historical price data, now what?  Recall that the support vector machine is a classification algorithm.  We're going to attempt to classify price movements in to *buy* and *sell* signals with the help of technical analysis.

Technical analysis is a methodology that uses past data to forecast the future direction of price.  In general, technical indicators use price data and volume in their calculations. The motivation for the indicators chosen come from the papers listed in the references section at the end of the article.

One very important thing to pay attention to before moving on: **look-ahead bias**.
We already have all of the closing data, which is what will be used for calculations.  In a real world scenario, the most you have is the previous day's closing.  We have to make sure our calculations don't take in data that technically had not occurred yet.
To do this, we will *lag* the data. That is, shift our data back one day.

We will make use of the `talib` library perform the technical analysis calculations.

In [22]:
import talib as ta
from talib import MA_Type

In [17]:
for i in Ticker_List:
    
    Stock_Data[i]['High Shifted']=Stock_Data[i]['High'].shift(1)
    Stock_Data[i]['Low Shifted'] = Stock_Data[i]['Low'].shift(1)
    Stock_Data[i]['Close Shifted'] = Stock_Data[i]['Close'].shift(1)
    
    Stock_Data[i]['Upper BBand'], Stock_Data[i]['Middle BBand'],Stock_Data[i]['Lower BBand']= ta.BBANDS(Stock_Data[i]['Close Shifted'],
                                                                                                       timeperiod=20,)
    
    Stock_Data[i]['RSI'] = ta.RSI(np.array(Stock_Data[i]['Close Shifted']), timeperiod=14)

    Stock_Data[i]['Macd'], Stock_Data[i]['Macd Signal'],Stock_Data[i]['Macd Hist'] = ta.MACD(Stock_Data[i]['Close Shifted'], fastperiod=12, slowperiod=26, 
                                                               signalperiod=9)

    Stock_Data[i]['Momentum'] = ta.MOM(Stock_Data[i]['Close Shifted'],timeperiod=12)
    
# NEED TO BE NORMALIZED    

Unnamed: 0_level_0,Open,High,Low,Close,Volume
DateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
2014-10-28,104.959999,106.519997,104.860001,106.269997,4173500
2014-10-29,106.720001,107.269997,105.419998,105.849998,4060700
2014-10-30,105.209999,106.589996,104.699997,106.339996,2890500
2014-10-31,107.830002,107.949997,106.980003,107.000000,4461100
2014-11-03,107.419998,107.500000,106.029999,106.300003,4587600
2014-11-04,107.000000,107.290001,106.330002,106.879997,3992400
2014-11-05,107.550003,108.440002,107.080002,107.930000,6698900
2014-11-06,107.900002,108.830002,107.779999,108.580002,3581300
2014-11-07,108.489998,109.080002,107.949997,109.080002,3253500
2014-11-10,109.050003,109.230003,108.669998,109.000000,3523700


In [50]:
test

Unnamed: 0_level_0,Open,High,Low,Close,Volume,High Shifted,Low Shifted,Close Shifted,Upper BBand,Middle BBand,Lower BBand,RSI,Macd,Macd Signal,Macd Hist,Momentum
DateTime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1
2014-10-28,104.959999,106.519997,104.860001,106.269997,4173500,,,,,,,,,,,
2014-10-29,106.720001,107.269997,105.419998,105.849998,4060700,106.519997,104.860001,106.269997,,,,,,,,
2014-10-30,105.209999,106.589996,104.699997,106.339996,2890500,107.269997,105.419998,105.849998,,,,,,,,
2014-10-31,107.830002,107.949997,106.980003,107.000000,4461100,106.589996,104.699997,106.339996,,,,,,,,
2014-11-03,107.419998,107.500000,106.029999,106.300003,4587600,107.949997,106.980003,107.000000,,,,,,,,
2014-11-04,107.000000,107.290001,106.330002,106.879997,3992400,107.500000,106.029999,106.300003,,,,,,,,
2014-11-05,107.550003,108.440002,107.080002,107.930000,6698900,107.290001,106.330002,106.879997,,,,,,,,
2014-11-06,107.900002,108.830002,107.779999,108.580002,3581300,108.440002,107.080002,107.930000,,,,,,,,
2014-11-07,108.489998,109.080002,107.949997,109.080002,3253500,108.830002,107.779999,108.580002,,,,,,,,
2014-11-10,109.050003,109.230003,108.669998,109.000000,3523700,109.080002,107.949997,109.080002,,,,,,,,


References
https://pdfs.semanticscholar.org/4d9f/4d308e318eb65f02bd12d2abc37ce7493698.pdf
https://doi.org/10.1016/j.jfds.2018.04.003
https://blog.quantinsti.com/trading-using-machine-learning-python-svm-support-vector-machine/
https://www.northinfo.com/documents/719.pdf
https://dspace.unia.es/bitstream/handle/10334/3924/0881_Caki.pdf?sequence=5
http://gide.unileon.es/admin/UploadFolder/journal_of_forecasting.pdf

In [141]:
from bs4 import BeautifulSoup
import datetime
import json
import numpy as np
import pandas as pd
import requests
import talib as ta
from talib import MA_Type
import time
import warnings
warnings.simplefilter('ignore')


Dow_Page = requests.get('https://finance.yahoo.com/quote/%5EDJI/components?p=%5EDJI')
Dow_Content = Dow_Page.content

soup = BeautifulSoup(Dow_Content)

data = list(soup.findAll("td",{"class":"Py(10px) Ta(start) Pend(10px)"}))

Ticker_List = []
for i in data:
    TempData = str(i)
    if "title" in TempData:
        TempData = TempData[TempData.find("title"):]
        TempData = TempData[TempData.find(">")+1:TempData.find("<")]
        Ticker_List.append(TempData)
    else:
        continue

Start_Date = int(time.mktime((2014,10,28,4,0,0,0,0,0)))
End_Date = int(time.mktime((2019,10,28,4,0,0,0,0,0)))

Stock_Data = {}

for i in Ticker_List:
    ScrapeYahoo(i, Start_Date, End_Date)
    #print(i + " done") Uncomment to check progress of function
    time.sleep(0.5)
    
for i in Ticker_List:
    
    Stock_Data[i]['High Shifted']=Stock_Data[i]['High'].shift(1)
    Stock_Data[i]['Low Shifted'] = Stock_Data[i]['Low'].shift(1)
    Stock_Data[i]['Close Shifted'] = Stock_Data[i]['Close'].shift(1)
    
    Stock_Data[i]['Upper BBand'], Stock_Data[i]['Middle BBand'],Stock_Data[i]['Lower BBand']= ta.BBANDS(Stock_Data[i]['Close Shifted'],
                                                                                                       timeperiod=20,)
    
    Stock_Data[i]['RSI'] = ta.RSI(np.array(Stock_Data[i]['Close Shifted']), timeperiod=14)

    Stock_Data[i]['Macd'], Stock_Data[i]['Macd Signal'],Stock_Data[i]['Macd Hist'] = ta.MACD(Stock_Data[i]['Close Shifted'], fastperiod=12, slowperiod=26, 
                                                               signalperiod=9)

    Stock_Data[i]['Momentum'] = ta.MOM(Stock_Data[i]['Close Shifted'],timeperiod=12)

UTX done
MRK done
NKE done
WMT done
CVX done
PG done
CAT done
AXP done
DIS done
BA done
VZ done
KO done
JPM done
IBM done
INTC done
CSCO done
JNJ done
WBA done
TRV done
UNH done
XOM done
AAPL done
HD done
V done
PFE done
DOW done
MCD done
GS done
MMM done
MSFT done
