# Fundamentals of Machine Learning (CSCI-UA.473)

## Lab 4: Logistic Regression Recap, Multi-class Classification and Support Vector Machines

In [None]:
import numpy as np
import pandas as pd
from sklearn import datasets
# Install autograd:
#!conda install -c conda-forge autograd

import autograd.numpy as numpy
import autograd.numpy.random as npr

from autograd import grad,elementwise_grad

import scipy.optimize

import matplotlib.pyplot as plt
from tqdm import tqdm
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
from sklearn import model_selection

# For the plots in this notebook we will use a widget to get interactive and better looking plots! 
# follow these instructions to set up matplotlib widget for Jupyter Lab
# https://github.com/matplotlib/jupyter-matplotlib#installation
# Specifically you will need to install the widget using : conda install -c conda-forge ipympl
# To enable the widget run the following two commands :
# jupyter nbextension install --py --symlink --sys-prefix --overwrite ipympl
#jupyter nbextension enable --py --sys-prefix ipympl
%matplotlib widget

In [None]:
# General helper method to convert sci-kit datasets to Pandas DataFrame.
def sklearn_to_df(sklearn_dataset):
    df = pd.DataFrame(sklearn_dataset.data, columns=sklearn_dataset.feature_names)
    df['target'] = pd.Series(sklearn_dataset.target)
    return df

## Part 1: Logistic Regression & Different Metrics

We will now implement and train another logistic regression model using Sci-kit learn. The goal will be to implement the model, which given a new data point infers the probability of breast cancer. 

Helper method below copied from: [Helper Method](https://stackoverflow.com/questions/38105539/how-to-convert-a-scikit-learn-dataset-to-a-pandas-dataset)

In [None]:
# General helper method to convert sci-kit datasets to Pandas DataFrame.
def sklearn_to_df(sklearn_dataset):
    df = pd.DataFrame(sklearn_dataset.data, columns=sklearn_dataset.feature_names)
    df['target'] = pd.Series(sklearn_dataset.target)
    return df

Let's eyeball the data a little bit in a quick and hacky manner. Always a good practice to see what the data looks like. **The training data of course!**

In [None]:
cancer_dataset = datasets.load_breast_cancer() # Load the data and convert to a pandas dataframe
df = sklearn_to_df(cancer_dataset)

print(df.head()) # Print out the first five data points.


Let's gather a few summary statistics about our data. Again, always a good practice. 

In [None]:
N = len(df) # The number of data points.
print('N = {:d} data points'.format(N))

# Give a barplot of each class.
plt.figure()
plt.bar([0,1], df['target'].value_counts(ascending = True), color = ['r', 'b'], tick_label = cancer_dataset.target_names)
plt.ylabel('Count')
plt.title('Cancer Dataset: Class Counts');

This dataset is unbalanced because there are more examples of benign cancer than malignant.  This is typical of many real-life datasets where we are sometimes limited in how many training examples we have.  Let's split our data into a training and validation set.  We'll use a 80/20 split.

In [None]:
# Split the data.  DO NOT TOUCH THE TEST DATA FROM HERE ON!!
train_data, val_data = model_selection.train_test_split(df, test_size = 0.2) # 0.2 is 20% validation data.

# Split the features from the class labels.
X_train = train_data.drop('target', axis = 1) # We drop the target from the features.  
X_val  = val_data.drop('target', axis = 1)  # Note that this does not operate inplace.
 
y_train = train_data['target']
y_val  = val_data['target']

Now that our data is loaded and split we can train a logistic regression model.  For the optimization we use the "liblinear" solver.  There are many other solvers that are also available, such as Newton CG for example.  For more information there is a nice stackexchange post here: [Solvers](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html).

In [None]:
# Now fit a logistic regression model.
model = LogisticRegression(solver = 'liblinear', class_weight = 'balanced')
model.fit(X_train, y_train);

The model is trained so we can validate it on our validation set.  The Sci-kit metrics module contains many useful functions for this purpose.  We try out a few of them below. 

Let us first briefly explain some of these metrics we will use.  

#### Accuracy 
Accuracy is obviously the percentage of all correctly classified examples in our test set.  

#### Confusion Matrix
The confusion matrix is the following matrix:
$$
C = \begin{bmatrix}
\text{Predict 0, Actual 0} & \text{Predict 0, Actual 1}\\
\text{Predict 1, Actual 0} & \text{Predict 1, Actual 1}
\end{bmatrix}
$$
Notice that the diagonal entries are the examples that are correctly classified.  

#### Precision Score
The precision score is the percentage 
$$
\text{Precision } = \frac{C_{00}}{C_{00} + C_{01}}.
$$
So it is the percentage of predicted malignant tumors that we classify correctly.  

#### Recall Score 
The recall score is the percentage
$$
\text{Recall } = \frac{C_{00}}{C_{00} + C_{10}}
$$
So it is the percentage of malignant tumors that we classify correctly. Note that Precision and Recall are two different quantities.

**Using multiple evaluation metrics helps give a better picture of how well our classifier is doing.**

In [None]:
pred = model.predict(X_val)

# See the percentage of examples that are correctly classified.
accuracy = metrics.accuracy_score(y_val, pred) 
print("Accuracy = {:0.1f}%".format(accuracy * 100))

# The matrix of predictions and true values for each class.
conf_matrix = metrics.confusion_matrix(y_val, pred)
print("Confusion matrix = ")
print(conf_matrix)

# Precision score.
precision = metrics.precision_score(y_val, pred)
print("Precision = {:0.1f}%".format(100 * precision))

# Recall score.
recall = metrics.recall_score(y_val, pred)
print("Recall    = {:0.1f}%".format(100 * recall))

# MCC Score.
mcc = metrics.matthews_corrcoef(y_val, pred)
print("MCC = {:0.2f}".format(mcc))

### We can also plot the ROC curve and use it to estimate thresholds

In [None]:
pred_probs = model.predict_proba(X_val)
metrics.RocCurveDisplay.from_predictions(y_val, pred_probs[:,1])

# or alternatively
metrics.RocCurveDisplay.from_estimator(model, X_val, y_val)

### If we have highly imbalanced classes, we can plot the PR curve as well

In [None]:
metrics.PrecisionRecallDisplay.from_predictions(y_val, pred_probs[:,1])

# or alternatively
metrics.PrecisionRecallDisplay.from_estimator(model, X_val, y_val)

## Part 2 : Multiclass Classification

Multiclass classification is just when we have more than 2 classes.  Instead of modeling the posterior class probabilities as Bernoulli random variables, we can model them as multinomial random variables for example.  The example we'll consider here is the classic Iris dataset by Fisher (a prominent early statistician).  Let's load the data below.

In [None]:
# Load the data and convert it into a Pandas dataframe.
iris_data = datasets.load_iris()
df = sklearn_to_df(iris_data)

# Print the first 5 examples to see how the data looks.
print("N = ", len(df))
print(df.head())

### Split the data into train and test

We'll use a 80/20 split for our training/test sets. We will not touch the test set. For validation purposes we will use cross validation. This is particularly useful in this situation since we only have 150 examples in our dataset. We will use $k$-fold cross-validation with $k=10$.  This works by splitting the training data into 10 disjoint sets.  We then leave out one of these sets and train our model on the remaining 9 sets and treat the left out set as a validation set.  We repeat this a total of 10 times and average the validation scores.  If $k = N$ is the number of samples, then this is also called leave one out cross-validation (LOOCV).  Note that using $k$-fold cross-validation is more expensive than if we had a separate validation set to begin with because we have to train our model many times.

Cross-validation will become more useful when we have a hyperparameter to control during training and want to see which value of the hyperparameter will give the best results.

In [None]:
# Split the data.  DO NOT TOUCH THE TEST DATA FROM HERE ON!!
train_data, test_data = model_selection.train_test_split(df, test_size = 0.2) # 0.2 is 20% test data.

# Split the features from the class labels.
X_train = train_data.drop('target', axis = 1) # We drop the target from the features.  
X_test  = test_data.drop('target', axis = 1)  # Note that this does not operate inplace.
 
y_train = train_data['target']
y_test  = test_data['target']

Below we compare 2 logistic regression models using cross-validation.  The first model is trained on all of the features whereas the second model is only trained on "sepal length" and "sepal width".

In [None]:
# Get the 2 different models.
model1 = LogisticRegression(multi_class = 'multinomial', solver = 'newton-cg')
model2 = LogisticRegression(multi_class = 'multinomial', solver = 'newton-cg')

X_train2 = X_train[['sepal length (cm)', 'sepal width (cm)']]

# k-fold cross-validation.
k = 10
splitter = model_selection.KFold(k)

# Compute cross-validation scores.
cv_scores1 = model_selection.cross_val_score(model1, X_train, y_train, cv = splitter)
cv_scores2 = model_selection.cross_val_score(model2, X_train2, y_train, cv = splitter)

# Print the means of the scores.
print("Model 1 CV score = ", np.mean(cv_scores1))
print("Model 2 CV score = ", np.mean(cv_scores2))

### Learning curves

We can also look to see how our two models perform when given additional training examples.  A learning curve shows the tradeoff between the number of samples we use to train and the model's predictive accuracy.  We plot the learning curves for our two models below.

In [None]:
fig, (ax1, ax2) = plt.subplots(1,2, figsize = (10,4))

train_sizes1, train_scores1, test_scores1 = model_selection.learning_curve(model1, X_train, y_train, cv = splitter)
train_sizes2, train_scores2, test_scores2 = model_selection.learning_curve(model2, X_train2, y_train, cv = splitter)

train_scores_mean1 = np.mean(train_scores1, axis = 1)
test_scores_mean1  = np.mean(test_scores1, axis = 1)
train_scores_std1  = np.std(train_scores1, axis = 1)
test_scores_std1   = np.std(test_scores1, axis = 1)

train_scores_mean2 = np.mean(train_scores2, axis = 1)
test_scores_mean2  = np.mean(test_scores2, axis = 1)
train_scores_std2  = np.std(train_scores2, axis = 1)
test_scores_std2   = np.std(test_scores2, axis = 1)

ax1.set_title('Logistic Regression Model 1')
ax1.plot(train_sizes1, train_scores_mean1, 'o-', label = 'Training')
ax1.plot(train_sizes1, test_scores_mean1, 's-', label = 'Validation')
ax1.fill_between(train_sizes1, train_scores_mean1 - train_scores_std1, train_scores_mean1 + train_scores_std1, alpha = 0.1)
ax1.fill_between(train_sizes1, test_scores_mean1 - test_scores_std1, test_scores_mean1 + test_scores_std1, alpha = 0.1)
ax1.set_xlabel('Training Examples')
ax1.set_ylabel('Score')
ax1.grid()
ax1.legend()

ax2.set_title('Logistic Regression Model 2')
ax2.plot(train_sizes2, train_scores_mean2, 'o-', label = 'Training')
ax2.plot(train_sizes2, test_scores_mean2, 's-', label = 'Validation')
ax2.fill_between(train_sizes2, train_scores_mean2 - train_scores_std2, train_scores_mean2 + train_scores_std2, alpha = 0.1)
ax2.fill_between(train_sizes2, test_scores_mean2 - test_scores_std2, test_scores_mean2 + test_scores_std2, alpha = 0.1)
ax2.set_xlabel('Training Examples')
ax2.set_ylabel('Score')
ax2.grid()
ax2.legend()

plt.tight_layout();

### Evaluate the models.

Finally, we actually evaluate our two models on our test set.  The first model has a higher cross-validation score so we expect it to do better on the test set.

In [None]:
model1.fit(X_train, y_train)
model2.fit(X_train2, y_train)

pred1 = model1.predict(X_test)

X_test2 = X_test[['sepal length (cm)', 'sepal width (cm)']]
pred2 = model2.predict(X_test2)

# Get the percent of correctly classified results.
acc1 = metrics.accuracy_score(pred1, y_test)
acc2 = metrics.accuracy_score(pred2, y_test)

print("Logistic regression model accuracy (all features) = {:0.1f}%".format(100 * acc1))
print("Logistic regression model accuracy (2 features)   = {:0.1f}%".format(100 * acc2))

As we'd expect, training with more features gives us a more flexible model.  Excluding the remaining features is equivalent to setting their weights to 0.  Thus, we are solving a constrained minimization problem when we train, which will worse than solving the unconstrained problem where we are using all of the features.

## Part 3 : Support Vector Machines
We will now play around with the support vector machine. We will first compare them to a standard logistic regression model. Then we will see how they work on datasets which are not linearly separable. 

Import some necessary packages

In [None]:
# Start by importing the packages we'll need.
import numpy as np
from sklearn import datasets
from sklearn.svm import LinearSVC
from sklearn.linear_model import LogisticRegression
from sklearn import model_selection
from sklearn import metrics
import matplotlib.pyplot as plt
%matplotlib inline

### Implement Linear SVMs
We will implement a linear SVM for the task of breast cancer classification. 

First load the dataset and split it into training and validation set. 

In [None]:
# We'll set a random seed first.
np.random.seed(19)

# First load the data.
data = datasets.load_breast_cancer()
X = data.data   # The features.
y = data.target # The targets.

# Print the dataset sizes.
print('Shape of X = ', X.shape)
print('Shape of y = ', y.shape)

# Split the data into a training and test set.
X_train, X_val, y_train, y_val = model_selection.train_test_split(X, y, test_size = 0.33, random_state = 7)

Now train a linear SVM using scikit-learn. In addition also train a logistic regression model also using scikit-learn.

In [None]:
# Train the linear SVM.

svm = LinearSVC(dual = False) # Uses the squared-hinge loss function when fitting the model.

svm.fit(X_train, y_train)

# NOTE : We can also try using the hinge loss, by specifying loss='hinge' and setting dual to True. 
# The squared_hinge loss is more commonly used to penalize outliers more severely, does not use random states and has a smoother loss curve.

# Now evaluate it on the test points.
y_pred = svm.predict(X_val)

acc = metrics.accuracy_score(y_val, y_pred)
print('Linear SVM validation accuracy = {:0.1f}%'.format(100*acc))

# Compare to a simple logistic regression model.

lr = LogisticRegression(solver = 'liblinear')
lr.fit(X_train, y_train)
y_pred = lr.predict(X_val)

acc = metrics.accuracy_score(y_val, y_pred)
print('Logistic Regression validation accuracy = {:0.1f}%'.format(100*acc))

As we can see, a basic linear SVM already does very well and is comparable to logistic regression.

Let's compare logistic regression with SVM for binary classification.  Suppose that our data has $n$ features and we have trained a logistic regression model with weight vector $\theta_{\mathrm{LR}}\in \mathbb{R}^{n+1}$ as well as a linear SVM with parameters $w_{\mathrm{SVM}}\in \mathbb{R}^n$, and $b_{\mathrm{SVM}}\in \mathbb{R}$.  Assume that the optimal parameters have been found in both cases.  If the data is linearly separable then is it true that $\theta_{\mathrm{LR}} = (b_{\mathrm{SVM}}, w_{\mathrm{SVM}})$? 

If not then why?

**They will not be the same in general.**

Consider the dataset $X = (-5, -1, 1, 2)^T$ and $y = (-1, -1, 1, 1)^T$ (equivalently $y = (0,0,1,1)^T$ for logistic regression).  The SVM parameters will just be $(b_{\mathrm{SVM}}, w_{\mathrm{SVM}}) = (0,1)$.  However, the logistic regression parameters will be slightly different.  Recall the gradient of the loss function for logistic regression is
$$
\nabla_{\theta}J(\theta) = -\sum_{i=1}^4 (y_i - h(X_i;\theta))X_i
$$
where
$$
h(x;\theta) = \frac{1}{1 + \exp(-\theta^Tx)}
$$
However, if we plug all of the values into the gradient we get
$$
\nabla_{\theta}J((b_{\mathrm{SVM}}, w_{\mathrm{SVM}})) = -\left( \frac{5}{1+e^{5}} + \frac{1}{1+e^{1}} + \frac{e^{-1}}{1+e^{-1}} +  \frac{2e^{-2}}{1 + e^{-2}} \right) \neq 0
$$
Since the gradient is non-zero we know that $(b_{\mathrm{SVM}}, w_{\mathrm{SVM}})$ is not the optimal value for the logistic regression loss function and hence cannot be $\theta_{\mathrm{LR}}$.

### Case of non-linearly separable dataset

If the data is linear separable, then a linear SVM should be able to achieve 100% accuracy.  However, this is rarely the case since even the breast cancer dataset was not exactly linear separable.  We'll use a synthetic dataset to illustrate this.  This data is drawn from a bi-modal Gaussian mixture model.

In [None]:
"""
Input:
    N : the number of data points

Output:
    X, y : the features and targets of shapes (N,2) and (N, )
"""
def sample_bimodal_data(N):
    
    # The two modes and covariances.
    mu1 = np.asarray([-1, 0])
    mu2 = np.asarray([1, 0])
    
    cov1 = 2.5 * np.identity(2)
    cov2 = 1.5 * np.identity(2)
    
    N1 = N//2   # Number of points in first class.
    N2 = N - N1 # Number of points in second class.
    
    # Sample the random points.
    X1 = np.random.multivariate_normal(mu1, cov1, N1)
    X2 = np.random.multivariate_normal(mu2, cov2, N2)
    Y1 = np.zeros(N1)
    Y2 = np.ones(N2)
    
    # Combine the data.
    X = np.vstack((X1, X2))
    Y = np.concatenate((Y1, Y2), axis = None)
    
    return X,Y

In [None]:
# Plot the sample data.
N = 100
X,Y = sample_bimodal_data(N)

plt.figure(1)
plt.scatter(X[:N//2, 0], X[:N//2, 1], label = 'Class 0')
plt.scatter(X[N - N//2:, 0], X[N - N//2:, 1], label = 'Class 1')
plt.legend()
plt.xlabel(r'$x_1$')
plt.ylabel(r'$x_2$')
plt.title('Sample Data');

Increasing the factor in front of the covariances or shifting the centers of the two distributions to be closer to each other will cause the data to overlap more, making it harder to classify. Lets try that! 

#### Using a slack variable C

Since the data is not perfectly linearly separable you'll want to use a slack variable which allows SVM to handle this dataset.  Let's train some models with different values of $C$ and compare them using cross-validation.

In [None]:
# First get the data and split it into training and testing.

N = 1000
X, Y = sample_bimodal_data(N)

# Use a 70/30 split
X_train, X_val, Y_train, Y_val = model_selection.train_test_split(X, Y, test_size = 0.30, random_state = 981)

In [None]:
# Define the SVM model to use with a slack variable
svm = LinearSVC(C = 1e-1, dual = False)
svm.fit(X_train, Y_train)
svmpred = svm.predict(X_val)
acc = metrics.accuracy_score(Y_val, svmpred)
print('SVM accuracy = {:0.1f}%'.format(100*acc))

plt.figure(2)
I = svmpred == 0
plt.scatter(X_val[I, 0], X_val[I, 1], label = 'Predicted class 0')
I = svmpred == 1
plt.scatter(X_val[I, 0], X_val[I, 1], label = 'Predicted class 1')
plt.legend()
plt.show()

**Experiment with various different mu1 values and demonstrate SVM accuracy gets worse as mu1 and mu2 get closer**


Let's train some models with different $C$ and compare them use cross-validation.

In [None]:
# Define the different SVM models to use
svm_1 = LinearSVC(C = 10, dual = False)
svm_2 = LinearSVC(C = 1, dual = False)
svm_3 = LinearSVC(C = 1e-3, dual = False)
svm_4 = LinearSVC(C = 1e-7, dual = False)

split = model_selection.KFold(5)
# Get the CV scores.
cv_1 = model_selection.cross_val_score(svm_1, X_train, Y_train, cv = split)
cv_2 = model_selection.cross_val_score(svm_2, X_train, Y_train, cv = split)
cv_3 = model_selection.cross_val_score(svm_3, X_train, Y_train, cv = split)
cv_4 = model_selection.cross_val_score(svm_4, X_train, Y_train, cv = split)

# Print the average scores.
print('C = 10    CV average score = {:0.1f}%'.format(np.mean(cv_1) * 100))
print('C = 1     CV average score = {:0.1f}%'.format(np.mean(cv_2) * 100))
print('C = 1e-3  CV average score = {:0.1f}%'.format(np.mean(cv_3) * 100))
print('C = 1e-7  CV average score = {:0.1f}%'.format(np.mean(cv_4) * 100))

We see that the model performs slightly differently for different values of the slack variable $C$.  

$$
\min_{w,b,\zeta} \frac{1}{2}w^Tw + C\sum_{i=1}^n \zeta_i,\quad \text{ such that }\quad y_i(w^Tx_i + b) \ge 1 - \zeta_i,\quad \zeta_i \ge 0
$$

See the sci-kit [documentation](https://scikit-learn.org/stable/modules/svm.html) for more details.  We can also plot a curve of the validation score for many different $C$ values which can be helpful for determining the optimal hyperparameter.

In [None]:
# Get the C values we want to look at.
C = 1/(2**np.arange(0, 20)) # 1,...,1e-6

k = 10 # Kfold CV.
cv_scores = np.zeros(len(C))
split = model_selection.KFold(k)
for i in range(len(C)):
    svm = LinearSVC(C = C[i], dual = False)
    cv_scores[i] = np.mean(model_selection.cross_val_score(svm, X_train, Y_train, cv = split))

plt.figure(2)
plt.semilogx(C, cv_scores, 'b-x')
plt.xlabel(r'$C$')
plt.ylabel(r'Score')
plt.title(r'{:d}-Fold CV Score for Linear SVM'.format(k))
plt.grid();

### Another non-linearly separable dataset

In [None]:
"""
Input:
    N : the number of data points

Output:
    X, y : the features and targets of shapes (N,2) and (N, )
"""
def gen_data1(N):
    N1 = N//2
    N2 = N - N1
    t = np.linspace(0, 2*np.pi, N1)
    
    X1 = np.zeros((N1, 2))
    X1[:,0] = 4*np.cos(t) + 0.1*np.random.randn(N1)
    X1[:,1] = 4*np.sin(t) + 0.1*np.random.randn(N1)
    y1 = np.zeros(N1)
    
    X2 = np.random.randn(2*N2)
    X2 = X2.reshape((N2, 2))
    y2 = np.ones(N2)

    # Combine the data.
    X = np.vstack((X1, X2))
    y = np.concatenate((y1, y2), axis = None) # axis = None means that arrays flattened before use
    
    return X,y

In [None]:
# Plot the data.
N = 1000
X, Y = gen_data1(N)

plt.figure(3)
plt.scatter(X[:N//2, 0], X[:N//2, 1], label = 'Class 0')
plt.scatter(X[N - N//2:, 0], X[N - N//2:, 1], label = 'Class 1')
plt.legend()
plt.xlabel(r'$x_1$')
plt.ylabel(r'$x_2$')
plt.title('Sample Data');

In [None]:
# Define the different SVM models to use
# Use a 70/30 split
X_train, X_val, Y_train, Y_val = model_selection.train_test_split(X, Y, test_size = 0.3, random_state = 981)
svm = LinearSVC(C = 1e+10, dual = False)
svm.fit(X_train, Y_train)
svmpred = svm.predict(X_val)
acc = metrics.accuracy_score(Y_val, svmpred)
print('SVM accuracy = {:0.1f}%'.format(100*acc))

plt.figure(2)
I = svmpred == 0
plt.scatter(X_val[I, 0], X_val[I, 1], label = 'predicted class 0')
I = svmpred == 1
plt.scatter(X_val[I, 0], X_val[I, 1], label = 'prediced class 1')
plt.legend()
plt.show()

Here the data is not linearly separable although we can very clearly see some separation. If we transform the data by only looking at the radius, then we would be able to linearly separate the data. We will visit this in the next lecture when we talk about kernel SVM's which are much more flexible models and can handle a wider array of datasets.

In [None]:
def cart2pol(x, y):
    rho = np.sqrt(x**2 + y**2)
    phi = np.arctan2(y, x)
    return (rho, phi)

pX = np.vstack(cart2pol(X[:, 0], X[:, 1])).T
print(pX.shape)
plt.figure(4)
plt.scatter(pX[:N//2, 0], pX[:N//2, 1], label = 'Class 0')
plt.scatter(pX[N - N//2:, 0], pX[N - N//2:, 1], label = 'Class 1')
plt.legend()
plt.xlabel(r'$p_1$ (radius)')
plt.ylabel(r'$p_2$ (angle)')
plt.title('Sample Data')
plt.show()

In [None]:
# Define the different SVM models to use
# Use a 70/30 split
X_train, X_val, Y_train, Y_val = model_selection.train_test_split(pX, Y, test_size = 0.3, random_state = 981)
svm = LinearSVC(C = 1e+10, dual = False)
svm.fit(X_train, Y_train)
svmpred = svm.predict(X_val)
acc = metrics.accuracy_score(Y_val, svmpred)
print('SVM accuracy = {:0.1f}%'.format(100*acc))