Regression Training and Testing - Practical Machine Learning Tutorial with Python p.4
========

Original video on Sentdex's Youtube Channel: https://www.youtube.com/watch?v=r4mwkS2T9aI

Support his awesome content: https://www.pythonprogramming.net/


In [41]:
import pandas as pd
import quandl
import math 
#These are added during the video, we add it at the top for clarity
import numpy as np
from sklearn import preprocessing, model_selection, svm 
#Note that cross_validation has been replaced by model_selection
from sklearn.linear_model import LinearRegression


Code from last time: 

In [42]:
df = quandl.get('WIKI/GOOGL')
df = df[['Adj. Open','Adj. High','Adj. Low','Adj. Close','Adj. Volume',]]
df['HL_PCT'] = (df['Adj. High']-df['Adj. Close'])/df['Adj. Close'] * 100
df['PCT_change'] = (df['Adj. Close']-df['Adj. Open'])/df['Adj. Open'] * 100
df = df[['Adj. Close','HL_PCT','PCT_change','Adj. Volume']]
forecast_col = 'Adj. Close'
df.fillna(-99999, inplace=True)
forecast_out = int(math.ceil(0.01*len(df)))
df['label'] = df[forecast_col].shift(-forecast_out)
df.dropna(inplace=True)

Defining x and y. Features will typically uppercase "X", and variables will be lowercase "y"

In [43]:
X = np.array(df.drop(['label'],1))
y = np.array(df['label'])

Here, we feed the X through preprocessor so we can scale it and normalized with the other data points. Make sure to always do this for training and testing data in the future. 

In [44]:
X = preprocessing.scale(X)

Now, we create our training and testing sets, choosings a model size of 20% (0.2). Notice that this code is a little different from the video. Remember the deprecated "cross_validation" above? It's replaced by "model_selection". 

In [45]:
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.2)

 Next, we find a classifier. We'll use linear regression to start. The n_jobs are the number of jobs or threads. Setting this to zero will allow the system to choose how many cores to allocate.

In [46]:
clf = LinearRegression(n_jobs = 0)

Now, we fit or train our classifier, we fit the features and labels: 

In [47]:
clf.fit(X_train, y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=0, normalize=False)

Now, let's see how accurate our model is. Note that this is the squared error. 

In [48]:
accuracy = clf.score(X_test, y_test)
print(accuracy)

0.978068511571935


Lets's see how svm does. We change our classifier "clf". Note that you will get a warning if you don't add the gamma='auto'. 

In [49]:
clf = svm.SVR(gamma='auto')

Then, we repeat the other lines above: 

In [50]:
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
print(accuracy)

0.7793939605247766


Let's test a polynomial kernel in our SVM model by adding "kernel='poly'"

In [51]:
clf = svm.SVR(gamma='auto', kernel='poly')
clf.fit(X_train, y_train)
accuracy = clf.score(X_test, y_test)
print(accuracy)

0.6723980636857032


Link to scikit learn documentation:

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html


That's it for this lesson!
----
On to the next video: https://www.youtube.com/watch?v=QLVMqwpOLPk