<a href="https://colab.research.google.com/github/MaralAminpour/ML-BME-Course-UofA-Fall-2023/blob/main/Week-3-Classification-models/3.2-Classification-models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Binary Classification Models

We will go through a series of examples creating different types of binary classification models for the same dataset. Binary classifiers predict two labels. Usually, we'll convert our labels to 0 and 1.  We will use the example of heart disease data.

In [1]:
# Imports needed for this notebook
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns

# Note that if you are using Anaconda, GraphViz won't be installed by default.
# You will need to install graphviz and python-graphviz.
import graphviz

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix, accuracy_score, recall_score, classification_report, roc_curve, auc
from sklearn.linear_model import Perceptron, LogisticRegression, SGDClassifier
from sklearn.svm import LinearSVC, SVC
from sklearn.tree import DecisionTreeClassifier, export_graphviz
from sklearn.ensemble import RandomForestClassifier

First, let's load the data we'll need for all the examples.

In [2]:
# This code will download the required data files from GitHub
import requests
def download_data(source, dest):
    base_url = 'https://raw.githubusercontent.com/'
    owner = 'SirTurtle'
    repo = 'ML-BME-UofA-data'
    branch = 'main'
    url = '{}/{}/{}/{}/{}'.format(base_url, owner, repo, branch, source)
    r = requests.get(url)
    f = open(dest, 'wb')
    f.write(r.content)
    f.close()

# Create the temp directory, if it doesn't already exist
import os
if not os.path.exists('temp'):
   os.makedirs('temp')

In [3]:
# Download the data
download_data('Week-3-Classification-models/data/heart_failure_data.csv', 'temp/heart_failure_data.csv')

# Read data file into a dataframe object
df = pd.read_csv('temp/heart_failure_data.csv')

# Print the first few lines of the dataframe
df.head()

Unnamed: 0,EF,GLS,HF
0,50.92228,-19.57,0
1,54.601227,-19.0,0
2,50.0,-21.0,0
3,50.819672,-18.74,0
4,53.191489,-19.78,0


## Data dictionary

**EF**: Ejection Fraction. A measurement of how much blood the left ventricle pumps out with each contraction. Expressed as a percent in the range 0 to 100.

**GLS**: Global Longitudinal Strain. A measurement of myocardial deformation along the longitudinal cardiac axis. Expressed as a negative percent in the range 0 to -100.

**HF**: Heart Failure class
- 0 = Healthy
- 1 = Heart failure

### Exploratory Data Analysis

In [4]:
# Check balance of the output variables
df.groupby(['HF'])['HF'].count()

HF
0    60
1    60
Name: HF, dtype: int64

Our example has **an exactly equal number of samples in each class**. If we had an unbalanced dataset, we could balance it by

1. **removing some random samples** from the larger class, or by

2. **duplicating small samples from the smaller class**.

In [5]:
# Convert to numpy
heart_failure_data = df.to_numpy()

# Create feature matrix and target vector
X = heart_failure_data[:,:2]   #all rows (:) and the first two columns (:2) of heart_failure_data
y = heart_failure_data[:,2]    #ll rows (:) and only the third column (2) of heart_failure_data.


print('Feature matrix X dimensions: ', X.shape)
print('Target vector y dimensions: ', y.shape)

Feature matrix X dimensions:  (120, 2)
Target vector y dimensions:  (120,)


X = heart_failure_data[:,:2]: We are creating the feature matrix X by selecting all rows (:) and the first two columns (:2) of heart_failure_data. So X will contain the data from the first and second columns of the original dataset.
y = heart_failure_data[:,2]: We are creating the target vector y by selecting all rows (:) and only the third column (2) of heart_failure_data. This means that y will contain the data from the third column of the original dataset.

## Plot data

First, let's plot the data.

In [None]:
# This function will plot the heart failure data
# We will plot the first feature (EF) on the x-axis and the second feature (GLS) on the y-axis
def PlotData(X, y):

    # Plot class 0
    plt.plot(X[y==0,0], X[y==0,1], 'bo', alpha=0.75, markeredgecolor='k', label = 'Healthy')

    # Plot class 1
    plt.plot(X[y==1,0], X[y==1,1], 'rd', alpha=0.75, markeredgecolor='k', label = 'Heart Failure')

    # Annotate the plot
    plt.title('Diagnosis of Heart Failure')
    plt.xlabel('EF')
    plt.ylabel('GLS')
    plt.legend()

# Call the function to plot the dataset
PlotData(X, y)

## Standardize Data

We'll use [StandardScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html) to standardize the features.

In [None]:
# Create an object to scale the features to have zero mean and unit variance
# We don't need to do this for all models, but let's do it here to be consistent
scaler = StandardScaler()

# Create a feature matrix containing EF and GLS
X = scaler.fit_transform(X)

# Plot the scaled data
PlotData(X, y)

## Creating training and test sets

We'll create training and test sets that we'll use for each example. For these examples, we'll just split up the data samples randomly, with 60% in the training set and 40% in the test set. A more common division would be (80%,20%) but since our dataset is small, we'll use more in the test set. We'll use the [train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)  function.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0)
# Using a fixed random_state for consistency

## Perceptron

### Fit the model
This code fits the [Perceptron](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Perceptron.html) model to the training data.

- max_iter: The maximum number of passes over the training data (aka epochs)
- eta0: Constant by which the updates are multiplied

Note that the Perceptron model is the same as the [SGDClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html) function with SGDClassifier(loss="perceptron", eta0=1, learning_rate="constant", penalty=None).

In [None]:
# Create and fit the model
p_model = Perceptron(max_iter=100, eta0=0.2, random_state=0)
p_model.fit(X_train, y_train)

### Evaluate the model

We'll perform the same basic analysis for each model. First, we'll show the confusion matrix for the test set. Then we'll calculate several useful scores. This will help to evaluate the performance of each model and also to assess how much overfitting we have. Next, we'll also generate some interesting plots for each model.

We'll use the [confusion_matrix](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html) and the [accuracy_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html) function. We'll use the [recall_score](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.recall_score.html) to calculate the sensitivity specificity.

Note that in binary classification, the recall of the positive class is also known as “sensitivity”; the recall of the negative class is “specificity”.

In our example:
- Negative: HF = 0 = healthy
- Positive: HF = 1 = heart failure

An alternative is to use the [classification_report](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html) function. This will return the recall, precision and F1 score on all classes.

In [None]:
def EvaluateModel(model, X_train, X_test, y_train, y_test):

    # Calculate and print metrics for the training set
    y_train_pred = model.predict(X_train)
    train_accuracy = accuracy_score(y_train, y_train_pred)
    train_sensitivity = recall_score(y_train, y_train_pred, pos_label=1)
    train_specificity = recall_score(y_train, y_train_pred, pos_label=0)
    print('Training set: Accuracy = {:0.2f} Sensitivity = {:0.2f} Specificity = {:0.2f}'.format(train_accuracy, train_sensitivity, train_specificity))

    # Calculate and print metrics for the test set
    y_test_pred = model.predict(X_test)
    test_accuracy = accuracy_score(y_test, y_test_pred)
    test_sensitivity = recall_score(y_test, y_test_pred, pos_label=1)
    test_specificity = recall_score(y_test, y_test_pred, pos_label=0)
    print('Test set: Accuracy = {:0.2f} Sensitivity = {:0.2f} Specificity = {:0.2f}'.format(test_accuracy, test_sensitivity, test_specificity))

    # Display the confusion matrix for the test set
    cm = confusion_matrix(y_test, y_test_pred)
    sns.heatmap(cm, annot=True)
    plt.gcf().set_size_inches(2, 2)
    plt.show()

EvaluateModel(p_model, X_train, X_test, y_train, y_test)

We have better sensitivity and worse specificity. Our model is better at correctly predicting that patients with heart disease have heart disease, but is worse at correctly predicting that patients without heart disease don't have heart disease.

In [None]:
# This is an alternate function for evaluating the model results using the classification_report function
def EvaluateModelClassificationReport(model, X_train, X_test, y_train, y_test):

    # Calculate and print classification report for the training set
    y_train_pred = model.predict(X_train)
    print('Classification report for training set:')
    print(classification_report(y_train, y_train_pred))

    # Calculate and print classification report for the test set
    y_test_pred = model.predict(X_test)
    print('Classification report for test set:')
    print(classification_report(y_test, y_test_pred))

    # Display the confusion matrix for the test set
    cm = confusion_matrix(y_test, y_test_pred)
    sns.heatmap(cm, annot=True)
    plt.gcf().set_size_inches(2, 2)
    plt.show()

EvaluateModelClassificationReport(p_model, X_train, X_test, y_train, y_test)

### Plot the model

Let's also visualize the decision boundary itself. This specific visualization only works because we have two features. (In general, the decision boundary will be a hyperplane, so to use a similar visualization we would need to reduce the dimensionality of the features.) The result of the classification is plotted below. We'll also plot data points for the full dataset.

In [None]:
# Plot the decision boundary for a linear model
# This function is only for linear models with 2 features
def PlotDecisionBoundary(model, X, y):

    # Plot the data
    PlotData(X, y)

    # Next, we will plot the decision boundary using the weights of the model

    # Define y-coordinates
    # We are just using the range (min and max) of the y-axis (GLS in this case)
    x2 = np.array([X[:,1].min(), X[:,1].max()])

    # Find the weights
    # w0 is the
    # w1 is the weight for the first feature
    # w2 is the weight for the second feature
    # i.e. The decision boundary is function h(x) = w0 + w1*x_1 + w2*x_2 where x_1 and x_2 are our 2 features (EF, GLS)
    w0 = model.intercept_[0]
    w1 = model.coef_[0][0]
    w2 = model.coef_[0][1]

    # Define x-coordinates
    # Notice that this equation is just a rearrangement of the function for the decision boundary
    x1 = -(w0 + w2*x2)/w1
    # Now the points (x1[0],x2[0]) and (x1[1],x2[1]) are two points on our decision boundary

    # Plot the decision boundary
    plt.plot(x1, x2, "k-")

    plt.show()

PlotDecisionBoundary(p_model, X, y)

The line is our model's decision boundary between the two classes (healthy and heart failure).

Let's show some more information instead of just the decision boundary. We can show confidence scores for predictions. In the case of the Perceptron model the confidence score is simply proportional to the signed distance from the decision boundary. In Scikit-Learn we use the model's decision_function method to find the confidence scores.

In [None]:
# Plot the decision boundary for a linear model
# This function is only for linear models with 2 features
def PlotConfidenceScores(model, X, y, label=1):

    # Create 1D arrays of points for each feature (we have 2 features in this example)
    x1 = np.linspace(X[:,0].min(), X[:,0].max(), 1000)
    x2 = np.linspace(X[:,1].min(), X[:,1].max(), 1000).T # Note the transpose

    # Create 2D arrays that hold the coordinates in 2D feature space
    x1, x2 = np.meshgrid(x1, x2)

    # Flatten x1 and x2 to 1D vectors and concatenate into a feature matrix
    Xp = np.c_[x1.ravel(), x2.ravel()]

    # Predict confidence scores for the whole feature space
    df = model.decision_function(Xp)

    # Select the class
    # Note that the confidence scores are >0 for class 1 (positive class)
    # If we want to show the confidence scores for class 0 (negative class) we multiply by -1
    if label == 0:
        df *= -1

    # Reshape to 2D
    df = df.reshape(x1.shape)

    # Plot using contourf to generate colored regions
    plt.contourf(x1, x2, df, cmap = 'summer')

    # Add a colorbar
    plt.colorbar()

    # Also, plot the line where the confidence score == 0, i.e. the decision boundary
    plt.contour(x1, x2, df, levels=[0], colors='k')

    # Also plot the data
    PlotData(X, y)

    plt.show()

PlotConfidenceScores(p_model, X, y, label=1)

## Logistic Regression

The next model we'll try is the [LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html) classifier.

In [None]:
# Here's the code for the logistic regression classifier

# Create and fit the model
logreg_model = LogisticRegression(random_state=0)
logreg_model.fit(X_train, y_train)
logreg_pred = logreg_model.predict(X_test)

We'll use the same code to evaluate the model.

In [None]:
EvaluateModel(logreg_model, X_train, X_test, y_train, y_test)

In [None]:
# Plot the decision boundary
PlotDecisionBoundary(logreg_model, X, y)

In [None]:
# Plot the confidence scores
PlotConfidenceScores(logreg_model, X, y, label=1)

An advantage of the LogisticRegression model compared with the Perceptron is that we can estimate probabilities for each class. This is more useful than just using the distance from the decision boundary as a confidence score. We'll plot the probabilities for each class using [predict_proba](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression.predict_proba) method of the model.

In [None]:
# Plot the probabilities for a linear model
# This function is only for linear models with 2 features
def PlotProbabilities(model, X, y, label=1):

    # Create 1D arrays of points for each feature (we have 2 features in this example)
    x1 = np.linspace(X[:,0].min(), X[:,0].max(), 1000)
    x2 = np.linspace(X[:,1].min(), X[:,1].max(), 1000).T # Note the transpose

    # Create 2D arrays that hold the coordinates in 2D feature space
    x1, x2 = np.meshgrid(x1, x2)

    # Flatten x1 and x2 to 1D vectors and concatenate into a feature matrix
    Xp = np.c_[x1.ravel(), x2.ravel()]

    # Predict probabilities for the whole feature space
    # Note the predict_proba function! This gives us the probabilities
    proba = model.predict_proba(Xp)

    # Select the class
    # Note this is different from how we selected the class with the decision boundary function
    p = proba[:, label]

    # Reshape to 2D
    p = p.reshape(x1.shape)

    # Plot using contourf
    plt.contourf(x1, x2, p, cmap = 'summer')

    # Add colorbar
    plt.colorbar()

    # Also, plot the line where the probability == 0.5
    plt.contour(x1, x2, p, levels=[0.5], colors='k')

    # Also plot the data
    PlotData(X, y)

PlotProbabilities(logreg_model, X, y)

## Support Vector Classifier

#### Support vector classification

Now we'll explore the Support Vector Classifier (SVC). Linear binary SVC is very similar to the perceptron and logistic regression in the sense that it finds the optimal hyperplane to separate two classes. These methods, however, have different objectives through which they decide what is the optimal decision boundary.

There are three different SVC classifiers in `sklearn` library:
1. [LinearSVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.LinearSVC.html) implements linear classifier optimised for performance but does not support the kernel trick.
2. [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) implements SVC with kernel trick. Setting `kernel='linear'` produces the same result as LinearSVC but is less efficient in terms of computational time. Setting `kernel='rbf'` produces non-linear classifier with Gaussian kernel.
3. [SGDclassifier](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html) implements various classifiers that are optimised using stochastic gradient descent. Its default setting for loss function is `loss='hinge'` which is another implementation of a linear SVC.

In practice, the SVC model may take a long time to run for very large datasets, so the LinearSVC or SGDclassifier may be a better choice. On the other hand, the SVC model supports the kernel trick and makes it easy to obtain and visualize the support vectors.

In [None]:
# First, a LinearSVC model
# dual='auto' will select dual=False unless n_samples<n_features and other conditions are met
# With dual=False the optimization variable has dimension=dimension of n_features (dual=True it is equal to n_samples)
# Note that when dual=False, random_state has no effect since the algorithm is not random
linearsvc_model = LinearSVC(dual='auto', random_state=0)
linearsvc_model.fit(X_train, y_train)
linearsvc_pred = linearsvc_model.predict(X_test)

# Results
EvaluateModel(linearsvc_model, X_train, X_test, y_train, y_test)

In [None]:
# This function is for plotting the different SVC models
# It is similar to PlotProbilities, but can also plot the support vectors
# If plotSV=True, plot the support vectors with circles
#    Note that LinearSVC does not provide the support vectors, so we can't easily plot them
# If plotDF=True, plot the confidence scores using a contour plot
# If plotProb=True, plot the probabilities using a contour plot
#    Note that LinearSVC does not provide probabilities
def PlotSVC(model, X, y, label=1, plotSV=False, plotDF=False, plotProba=False):

    # Create 1D arrays of points for each feature (we have 2 features in this example)
    x1 = np.linspace(-2.5, 2, 1000)
    x2 = np.linspace(-3, 3.5, 1000).T # note the transpose

    # Create 2D arrays that hold the coordinates in 2D feature space
    x1, x2 = np.meshgrid(x1, x2)

    # Flatten x1 and x2 to 1D vectors and concatenate into a feature matrix
    Xp = np.c_[x1.ravel(), x2.ravel()]

    # Plot decision function
    if plotDF:
        # Predict confidence scores for the whole feature space
        df = model.decision_function(Xp)

        # Reshape to 2D
        df = df.reshape(x1.shape)
        if label == 0:
            df *= -1

        # Zero contour is decision boundary, isolines +-1 are the margins
        contour = plt.contour(x1, x2, df, levels=[-1,0,1], colors='k', linestyles=('dashed', 'solid', 'dashed'))
        plt.clabel(contour, inline=1, fontsize=14)

    # Plot probabilities
    if plotProba:

        # Predict probabilities for the whole feature space
        proba = model.predict_proba(Xp)

        # Select the class
        p = proba[:, label]

        # Reshape to 2D
        p = p.reshape(x1.shape)

        # Plot using contourf
        plt.contourf(x1, x2, p, cmap = 'summer')

        # Add colorbar
        plt.colorbar()

        # Also, plot the line where the probability == 0.5
        plt.contour(x1, x2, p, levels=[0.5], colors='k')

    # Plot support vectors
    if plotSV:
        svs = model.support_vectors_
        plt.scatter(svs[:, 0], svs[:, 1], s=180, facecolors='pink', label = 'Support vectors', edgecolor='k')

    # plot data
    PlotData(X,y)

In [None]:
# Plot boundary
PlotSVC(linearsvc_model, X, y, plotSV=False, plotDF=True, plotProba=False)

## Support Vector Classifier

Next we'll try the [SVC](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html) model.

In [None]:
# A linear SVC using the SVC class (instead of LinearSVC)
# probability=True will allow us to use predict_proba on the fitted model
svc_model = SVC(kernel='linear', probability=True, random_state=0)
svc_model.fit(X_train, y_train)
svc_pred = svc_model.predict(X_test)

# Results
EvaluateModel(svc_model, X_train, X_test, y_train, y_test)

In [None]:
# Plot probabilities and support vectors
PlotSVC(svc_model, X, y, plotSV=True, plotDF=True, plotProba=False)

## Support Vector Classifier with Kernel Trick

The kernel trick we'll allow us to have a nonlinear boundary between the classes, instead of a simple line (or hyperplane).

In [None]:
# Create SVC model using the kernel trick
kernelsvc_model = SVC(kernel='rbf', probability=True, random_state=0)

# Fit the model
kernelsvc_model.fit(X_train, y_train)
EvaluateModel(kernelsvc_model, X_train, X_test, y_train, y_test)

In [None]:
# Plot decision boundary and margins
PlotSVC(kernelsvc_model, X, y, plotSV=True, plotDF=True, plotProba=False)

## Decision Tree

Now let's check out the [DecisionTreeClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html).

In [None]:
# Create and fit a DecisionTreeClassifier model
# max_depth is the maximum depth of the tree
tree_model = DecisionTreeClassifier(max_depth=2, random_state=0)
tree_model.fit(X_train, y_train)
EvaluateModel(tree_model, X_train, X_test, y_train, y_test)

In [None]:
# Plot the DecisionTreeClassifier model
PlotProbabilities(tree_model, X, y)

What happens if we increase max_depth?

It can be interesting to visualize the decision tree itself. We'll use the GraphViz library to visualize the tree.

Note that if you are using Anaconda, GraphViz won't be installed by default. You will need to install graphviz and python-graphviz.

In [None]:
dot_data = export_graphviz(tree_model,
                           feature_names=['EF', 'GLS'],
                           class_names=['Healthy', 'Heart Failure'],
                           filled=True, rounded=True,
                           special_characters=True,
                           out_file=None)
graph = graphviz.Source(dot_data)
graph

The gini parameter ranges between 0 and 1 and measures how "pure" the data is, with lower values indicating the data at particular node in the tree is closer to being all one class. gini = 0.5 means the data is 50% in each class. Note that the tree includes one leaf node where gini is still 0.5, although it only contains 4 samples.

## Random Forest

Finally, let's see if we can improve on the DecisionTree with an ensemble of decision trees using the [RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html).

In [None]:
forest_model = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
forest_model.fit(X_train, y_train)
EvaluateModel(forest_model, X_train, X_test, y_train, y_test)

In [None]:
PlotProbabilities(forest_model, X, y)

Notice that the decision boundary is a little closer to the lines obtained with the linear methods.

## Comparing all the models

Finally, we'll compare all the models using the [roc_curve](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_curve.html) and the ROC-AUC for the test set. We'll use the probabilities where available, otherwise the confidence scores.

In [None]:
def PlotROC():
    pass

label = 'Perceptron'
model = p_model
fpr, tpr, threshold = roc_curve(y_test, model.decision_function(X_test))
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=label+' (area = %0.2f)' % roc_auc, linewidth=1)

label = 'Logistic Regression'
model = logreg_model
fpr, tpr, threshold = roc_curve(y_test, model.predict_proba(X_test)[:,1])
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=label+' (area = %0.2f)' % roc_auc, linewidth=1)

label = 'Linear SVC'
model = linearsvc_model
fpr, tpr, threshold = roc_curve(y_test, model.decision_function(X_test))
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=label+' (area = %0.2f)' % roc_auc, linewidth=1)

label = 'SVC'
model = svc_model
fpr, tpr, threshold = roc_curve(y_test, model.predict_proba(X_test)[:,1])
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=label+' (area = %0.2f)' % roc_auc, linewidth=1)

label = 'Kernel SVC'
model = kernelsvc_model
fpr, tpr, threshold = roc_curve(y_test, model.predict_proba(X_test)[:,1])
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=label+' (area = %0.2f)' % roc_auc, linewidth=1)

label = 'Decision Tree'
model = tree_model
fpr, tpr, threshold = roc_curve(y_test, model.predict_proba(X_test)[:,1])
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=label+' (area = %0.2f)' % roc_auc, linewidth=1)

label = 'Random Forest'
model = tree_model
fpr, tpr, threshold = roc_curve(y_test, model.predict_proba(X_test)[:,1])
roc_auc = auc(fpr, tpr)
plt.plot(fpr, tpr, label=label+' (area = %0.2f)' % roc_auc, linewidth=1)

plt.plot([0, 1], [0, 1], 'k--', linewidth=2)
plt.xlim([-0.05, 1.0])
plt.ylim([-0.05, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.title("ROC curves", fontsize=17)

plt.tight_layout()
plt.show()