<img src="http://imgur.com/1ZcRyrc.png" style="float: left; margin: 20px; height: 55px">

# Evaluating SVR on Multiple Datasets


---

In this lab you can explore several datasets with SVM regressors compared to linear regression with and without regularisation. 

First, use the 

**Bike share data**

    resource-datasets/bikeshare
    
    regression to predict casual


Then, use any of the following datasets:

**Auto**

    resource-datasets/auto_stats

    regression to predict mpg

**Boston housing data**

    from sklearn.datasets import load_boston

    regression to predict MEDV

**Wine data**

    resource-datasets/wine_data
    
    regression to predict the quality

In [1]:
import pandas as pd
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import seaborn as sns
 
plt.style.use('fivethirtyeight')
%config InlineBackend.figure_format = 'retina'
%matplotlib inline

In [2]:
from sklearn.datasets import load_boston
from sklearn.svm import SVR, LinearSVR
from sklearn.linear_model import LinearRegression, Ridge, Lasso
from sklearn.model_selection import cross_val_score, cross_val_predict
from sklearn import metrics
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.preprocessing import StandardScaler

### 1. Load the dataset

- Are there any missing values? Impute or clean if so.
- Select a regression target and predictors.
- Rescale the data.

In [3]:
# Read in bikeshare data
bikeshare = pd.read_csv('../../../../resource-datasets/bikeshare/bikeshare.csv')
bikeshare.head()

Unnamed: 0,instant,dteday,season,yr,mnth,hr,holiday,weekday,workingday,weathersit,temp,atemp,hum,windspeed,casual,registered,cnt
0,1,2011-01-01,1,0,1,0,0,6,0,1,0.24,0.2879,0.81,0.0,3,13,16
1,2,2011-01-01,1,0,1,1,0,6,0,1,0.22,0.2727,0.8,0.0,8,32,40
2,3,2011-01-01,1,0,1,2,0,6,0,1,0.22,0.2727,0.8,0.0,5,27,32
3,4,2011-01-01,1,0,1,3,0,6,0,1,0.24,0.2879,0.75,0.0,3,10,13
4,5,2011-01-01,1,0,1,4,0,6,0,1,0.24,0.2879,0.75,0.0,0,1,1


### 2. Build an SVM regressor on the data

See the documentation for details about the [SVM-regressor](http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR).

- Initialize and train a linear SVM with the default settings. What is the average R2-score with 5-fold cross validation?
- Repeat using a radial basis function (rbf) regressor. Compare the scores. Which one is better?

### 3. Tune the SVM regressors with gridsearch

- Check in the documentation which parameters can be tuned in combination with different kernels.
- Create a further train-test split to obtain a hold-out validation set.
- Cross-validate scores.

### 4. Compare to linear regression with and without regularisation.

- Gridsearch optimal parameters 
- Cross-validate scores.
- Examine the R2.

### 5. Perform the steps above with a different dataset.