# In this notebook:

## Creating non-linearly separable synthetic datasets

## Binary classification methods with 2 features for easy visualisation

- linear classification methods: logistic regression, SVM
- non-linear methods: polynomial SVM, RBF SVM

## Evaluating model performance:
- training/test set split,
- generalisation errors
- metrics: accuracy, F1 measures, average_precision_score, AUC (using classification reports)

In [None]:
from commonFunctions import *

## what happens when the dataset is not _linearly separable_?

In [None]:
# see https://scikit-learn.org/stable/modules/generated/sklearn.datasets.make_classification.html
# and https://scikit-learn.org/stable/datasets/index.html#sample-generators

X, CL = makeDataset(kind='circle', balanced = False, unbalance = 0.8)

again let us first visualise this

In [None]:
plt.figure(figsize=(10,6))
sns.scatterplot(x=X[:,0], y=X[:,1], hue=CL)

can we still learn a meaningful linear model?

In [None]:
XTrain, XTest, CLTrain, CLTest = train_test_split(X, CL, test_size = 0.33, random_state = 10)

In [None]:
XTrainScaled, XTestScaled  = scale(XTrain, XTest)

In [None]:
plotTrainTest(XTrainScaled, CLTrain, XTestScaled, CLTest)

## trying logit

In [None]:
clf, CL_pred_Train, CL_pred_Test = logit(XTrainScaled, CLTrain, XTestScaled, CLTest)

# plotLinearFitTrainTest(clf, XTrain, CLTrain, XTest, CLTest)

plotContourFitTrainTestAlternate(clf, XTrain, CLTrain, XTest, CLTest)

evaluationReport(CLTrain, CL_pred_Train, CLTest, CL_pred_Test)

plotROC(clf, XTest, CLTest)

## Decision trees

In [None]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(XTrain, CLTrain)

plotContourFitTrainTest(clf, XTrain, CLTrain, XTest, CLTest)

evaluationReport(CLTrain, CL_pred_Train, CLTest, CL_pred_Test)

plotROC(clf, XTest, CLTest)


dot = tree.export_graphviz(clf, out_file=None) 
graph = graphviz.Source(dot) 
graph


## let's try SVM with a polynomial kernel

In [None]:
clf, CL_pred_Train, CL_pred_Test  = SVM(XTrainScaled, CLTrain, XTestScaled, CLTest, kernel='poly')

evaluationReport(CLTrain, CL_pred_Train, CLTest, CL_pred_Test)

plotContourFitTrainTest(clf, XTrainScaled, CLTrain, XTestScaled, CLTest)

plotROC(clf, XTest, CLTest)

## and the RBF kernel

In [None]:
clf, CL_pred_Train, CL_pred_Test  = SVM(XTrainScaled, CLTrain, XTestScaled, CLTest, kernel='rbf')

evaluationReport(CLTrain, CL_pred_Train, CLTest, CL_pred_Test)

plotContourFitTrainTest(clf, XTrainScaled, CLTrain, XTestScaled, CLTest)

plotROC(clf, XTest, CLTest)