# Support Vector Machine (SVM)
-very popular and widely used
-operates in infinite dimensions
defines a margin boundary between the data points in multidimensional space

Goal
to find a flat boundary or hyperplane that leads to a homogenous partition of the data

Good sepeartion is achieved by the hyperplane that has the largest distance to the nearest training-data point of any class since
in general the larger the margin the lower the generalization error of the classifier

### Non Linearly Seperable Problem
We can use kernal function in order to transform the problem into a linearly seperable one

SVMS with non lienar kernels add additional dimensions to the data in ordeer to create seperation

#### Kernal Trick
Process of adding new features that express mathematical relationships between measured characteristics and features

#### Allows SVM to learn concepts that are not measured in original dataset

#### Gaussian RBF Kernal

*can be used for regression

*not influenced by noisy data

*easier to use then neural networks

*several model parameters to tune

*black box model


In [1]:
import numpy as np
import pandas as pd
from datetime import datetime
from sklearn.linear_model import LogisticRegression
import pandas_datareader.data as web
from sklearn.svm import LinearSVC
from sklearn.metrics import confusion_matrix

  from pandas.util.testing import assert_frame_equal


In [13]:
stock_symbol="AAPL"
start_date="01/01/2017"
end_date="12/31/2019"
lags = 5

In [14]:
df = web.DataReader(stock_symbol, "yahoo", start_date,end_date)
df.head()

Unnamed: 0_level_0,High,Low,Open,Close,Volume,Adj Close
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2017-01-03,116.330002,114.760002,115.800003,116.150002,28781900.0,110.691154
2017-01-04,116.510002,115.75,115.849998,116.019997,21118100.0,110.567276
2017-01-05,116.860001,115.809998,115.919998,116.610001,22193600.0,111.129562
2017-01-06,118.160004,116.470001,116.779999,117.910004,31751900.0,112.368469
2017-01-09,119.43,117.940002,117.949997,118.989998,33561900.0,113.397697


In [15]:
#create new 
tslag = pd.DataFrame(index=df.index)
tslag["Today"]= df["Adj Close"]
tslag["Volume"] = df["Volume"]

In [16]:
#Create the shifted lag series of prior trading period close values
for i in range(0,lags):
    tslag["Lag%s" %str(i+1)] =  df["Adj Close"].shift(i+1)

# Create the returns data fram
dfret = pd.DataFrame(index=tslag.index)
dfret["Volume"] = tslag["Volume"]
dfret["Today"] = tslag["Today"].pct_change()*100.0
dfret.head()

Unnamed: 0_level_0,Volume,Today
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2017-01-03,28781900.0,
2017-01-04,21118100.0,-0.111914
2017-01-05,22193600.0,0.508547
2017-01-06,31751900.0,1.114831
2017-01-09,33561900.0,0.91594


In [17]:
# Create the lagged percentage returns columns
for i in range(0,lags):
    dfret["Lag%s" % str(i+1)] = tslag["Lag%s" % str(i+1)].pct_change()*100.0

# Direction (+1 or -1 indicating an Up or Down down day)
dfret["Direction"] = np.sign(dfret["Today"])

#because of the shifts there are NaN avlues
dfret.drop(dfret.index[:5], inplace=True)
dfret.head()

  result = getattr(ufunc, method)(*inputs, **kwargs)


Unnamed: 0_level_0,Volume,Today,Lag1,Lag2,Lag3,Lag4,Lag5,Direction
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1
2017-01-10,24462100.0,0.100859,0.91594,1.114831,0.508547,-0.111914,,1.0
2017-01-11,27588600.0,0.537314,0.100859,0.91594,1.114831,0.508547,-0.111914,1.0
2017-01-12,27086200.0,-0.417564,0.537314,0.100859,0.91594,1.114831,0.508547,-1.0
2017-01-13,26111900.0,-0.176084,-0.417564,0.537314,0.100859,0.91594,1.114831,-1.0
2017-01-17,34439800.0,0.806462,-0.176084,-0.417564,0.537314,0.100859,0.91594,1.0


In [18]:
X = dfret[["Lag1","Lag2","Lag3","Lag4"]]
y = dfret["Direction"]
start_test = datetime(2018,4,1)

X_train = X[X.index < start_test]
X_test = X[X.index >= start_test]
y_train = y[y.index < start_test]
y_test = y[y.index >= start_test]

# we use Linear Support Vector Machine as the machine learning model
model = LinearSVC()

# train
model.fit(X_train, y_train)

#Predict
predictions =model.predict(X_test)

#score
model.score(X_test, y_test)




0.5339366515837104

In [19]:
#Confusion matrix
confusion_matrix(predictions,y_test)

array([[ 66,  78],
       [128, 170]], dtype=int64)