##  SVM and kernels

Kernels concept get adopted in variety of ML algorithms (e.g. Kernel PCA, Gaussian Processes, kNN, ...).

So in this task you are to examine kernels for SVM algorithm applied to rather simple artificial datasets.

To make it clear: we will work with the classification problem through the whole notebook. 

In [0]:
import numpy as np
from sklearn.datasets import make_moons
import matplotlib.pyplot as plt
from sklearn.metrics import roc_auc_score
from mlxtend.plotting import plot_decision_regions

Let's generate our dataset and take a look on it.

In [0]:
moons_points, moons_labels = make_moons(n_samples=500, noise=0.2, random_state=42)
plt.scatter(moons_points[:, 0], moons_points[:, 1], c=moons_labels)

## 1.1 Pure models.
First let's try to solve this case with good old Logistic Regression and simple (linear kernel) SVM classifier.

Train LR and SVM classifiers (choose params by hand, no CV or intensive grid search neeeded) and plot their decision regions. Calculate one preffered classification metric.

Describe results in one-two sentences.

_Tip:_ to plot classifiers decisions you colud use either sklearn examples ([this](https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_alpha.html#sphx-glr-auto-examples-neural-networks-plot-mlp-alpha-py) or any other) and mess with matplotlib yourself or great [mlxtend](https://github.com/rasbt/mlxtend) package (see their examples for details)

In [0]:
def plot_decisions(X, y, clf):
    clf.fit(X, y)
    plot_decision_regions(X, y, clf=clf)
    plt.show()

In [0]:
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

lr = LogisticRegression(penalty = 'l2', C = 1.5) # add some params
svm = SVC(kernel='linear', C = 3) # here too

names = ['Logistic Regression','SVM']
i = 0  
for clf in [lr, svm]:
    print(str(names[i])) 
    plot_decisions(moons_points, moons_labels, clf)
    print(str(names[i]) + ' score = ' + str(roc_auc_score(clf.predict(moons_points), moons_labels))) 
    i = i + 1

## 1.2 Kernel tirck

Now use different kernels (`poly`, `rbf`, `sigmoid`) on SVC to get better results. Play `degree` parameter and others.

For each kernel estimate optimal params, plot decision regions, calculate metric you've chosen eariler.

Write couple of sentences on:

* What have happenned with classification quality?
* How did decision border changed for each kernel?
* What `degree` have you chosen and why?

__Note__:
In this part we will use [GridSearchCV](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html), which is capable of iteratively fitting the selected model with different sets of hyperparameters and perform its evaluaiton using cross-validation over the provided dataset.

In [0]:
from sklearn.model_selection import GridSearchCV

# <YOUR CODE HERE>
parameters_grid = {
    # describe the parameter grid as pairs - parameter_name: parameter range
}

parameters_grid_poly = {
    # describe the parameter grid as pairs - parameter_name: parameter range
}

# use SVC() and the necessary kernel
svm_p = 
svm_r = 
svm_s = 

# use grid search with 3-fold cross validation and the necessary parameter grid
grid_cv_r = GridSearchCV()
grid_cv_s = GridSearchCV()
grid_cv_p = GridSearchCV()

grid_cv_r.fit(moons_points, moons_labels)
grid_cv_s.fit(moons_points, moons_labels)
grid_cv_p.fit(moons_points, moons_labels)

# print best params and score

In [0]:
# plot the decision boundaries for SVM with the used kernels

## 1.3 Simpler solution (of a kind)
What is we could use Logisitc Regression to successfully solve this task?

Feature generation is a thing to help here. Different techniques of feature generation are used in real life, couple of them will be covered in additional lectures.

In particular case simple `PolynomialFeatures` ([link](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html)) are able to save the day.

Generate the set of new features, train LR on it, plot decision regions, calculate metric.

* Compare SVM's results with this solution (quality, borders type)
* What degree of PolynomialFeatures have you used? Compare with same SVM kernel parameter.

__Note__:
In this part we will use scikit-learn [Pipeline](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html) class, which is capable of combining a set of pre-processing methods for the used dataset and fitting an estimator in the end. It is very useful for preprocessing functions, which parameters are estimated during training phase and used without change during testing phase. Moreover, pipelines allow for simpler grid search and cross-validation.

In [0]:
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline

lr = LogisticRegression(<set the parameters as you wish>)
poly = PolynomialFeatures(degree = 3)

pipeline = Pipeline(steps = [('poly', poly), ('regression', lr)])
pipeline.fit(moons_points, moons_labels)
plot_decision_regions(moons_points, moons_labels, clf=pipeline)

print(' score = ' + str(roc_auc_score(pipeline.predict(moons_points), moons_labels)))

## 1.4 Bonus area: Harder problem

Let's make this task a bit more challenging via upgrading dataset:

In [0]:
from sklearn.datasets import make_circles

circles_points, circles_labels = make_circles(n_samples=500, noise=0.06, random_state=42)

plt.figure(figsize=(5, 5))
plt.scatter(circles_points[:, 0], circles_points[:, 1], c=circles_labels)

And even more:

In [0]:
points = np.vstack((circles_points*2.5 + 0.5, moons_points))
labels = np.hstack((circles_labels, moons_labels + 2)) # + 2 to distinct moons classes

plt.figure(figsize=(5, 5))
plt.scatter(points[:, 0], points[:, 1], c=labels)

Now do your best using all the approaches above!

Tune LR with generated features, SVM with appropriate kernel of your choice. You may add some of your loved models to demonstrate their (and your) strength. Again plot decision regions, calculate metric.

Justify the results in a few phrases.

__Note__:
Don't forget, that we are dealing with multi-class classification, so we need to select the multi-class strategy (one-vs-rest or one-vs-one) and adjust our models accordingly.

The simplest way to do so is to use scikit-learn [multiclass](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.multiclass).
You may use [OneVsRestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html) or [OneVsOneClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsOneClassifier.html#sklearn.multiclass.OneVsOneClassifier) classes respectively to wrap your target model class.

The resulting model will be ready to fit and predict for multi-class dataset.

In [0]:
### YOUR CODE HERE
from sklearn.multiclass import OneVsRestClassifier
from sklearn.preprocessing import label_binarize

labels_bin = label_binarize(labels, classes=[0, 1, 2, 3])

parameters_grid = {
}

parameters_grid_poly = {
}

parameters_grid_sigmoid = {
}

svm_p = OneVsRestClassifier()
svm_r = OneVsRestClassifier()
svm_s = OneVsRestClassifier()

# use grid search with 3-fold cross-validation and 'accuracy' score for best model selection
grid_cv_r = 
grid_cv_s = 
grid_cv_p = 

grid_cv_r.fit(points, labels_bin)
grid_cv_s.fit(points, labels_bin)
grid_cv_p.fit(points, labels_bin)

print('rbf: best params: ' + str(grid_cv_r.best_params_) + ' score: ' + str(grid_cv_r.best_score_))
print('sigmoid: best params: ' + str(grid_cv_s.best_params_) + ' score: ' + str(grid_cv_s.best_score_))
print('poly: best params: ' + str(grid_cv_p.best_params_) + ' score: ' + str(grid_cv_p.best_score_))

In [0]:
# plot the best results

Logistic Regression supports multi-class targets by desing in scikit-learn, so we just need to set the proper multi-class strategy type.

We advise to select 'multinomial' type, so the model loss is fit across the entire probability distribution, instead of having several 'one-versus-rest' losses.

In [0]:
# Fit logistic regression
poly = PolynomialFeatures()
lr = LogisticRegression(multi_class='multinomial', solver = 'saga', max_iter = 5000)
pipeline = # select pipeline

In [0]:
parameters_grid = {
    # set up parameter grid for the logistic regression pipeline
}

# use grid search with 3-fold cross-validation and 'accuracy' score for best model selection


In [0]:
print(grid_cv.best_params_)
print(grid_cv.best_score_)
plot_decisions(points, labels, grid_cv.best_estimator_)

**Conclusion**: 

Try to describe the obtained results.



