## ACE_Uganda Kaggle competition 1
### by Ibra Lujumba

#### Exploratory Data Analysis 

This is done to try to understand the properties of the data before any machine learning algorithm id used to make predictions about the data.

##### Importing python modules for data analysis and visualization

In [None]:
import numpy as np # manipulation of arrays
import pandas as pd # manipulating dataframes
import matplotlib.pyplot as plt # data visualisation
import seaborn as sb # data visualisation,it is based on plt

In [None]:
#ignoring warnings that may arise
import warnings
warnings.filterwarnings('ignore')

###### Importing the datasets

In [None]:
!ls ../input/ace-class-assignment/

# reading in the data
data = pd.read_csv('../input/ace-class-assignment/AMP_TrainSet.csv')
new = pd.read_csv('../input/ace-class-assignment/Test.csv')

##### Checking the dimensions of the data as well as the datatype of each column

In [None]:
# checking dimensions of the datasets
data.shape, new.shape

In [None]:
# checking the datatypes of the variables
data.dtypes, new.dtypes

All the values in all the variables exists as either floats or integers.

Proceeding to work with the training dataset to build the classifier

In [None]:
# getting the descriptive statistics of the train dataset such as arithmetic mean, 
# standard deviation, quartiles and number of non-NA values in each column 
data.describe()

In [None]:
# checking the proprotions of classses
data.groupby('CLASS').size()

In [None]:
# obtaining pairwise correlation values for the variables in the train dataset
# use this resource to understand the output https://realpython.com/numpy-scipy-pandas-correlation-python/#pearson-correlation-coefficient
pearsoncorr = data.corr(method='pearson')

# visualizing the correlation matrix as a heatmap to make interpretation easier
plt.figure(figsize=(10,10))
top_corr = pearsoncorr.index
sb.heatmap(pearsoncorr, 
            xticklabels=pearsoncorr.columns,
            yticklabels=pearsoncorr.columns,
            cmap='RdYlGn',
            annot=True,
            linewidth=0.5)

Looking at the last row, FULL_Charge and AS_MeanAmphiMoment have the highest positive correlation values with CLASS whereas second,third and fourth variables have the most negative correlation values.

You can get the p-values associated with the correlation values using the code below.

`from scipy.stats import pearsonr`

`data.corr(method=lambda x, y: pearsonr(x, y)[1]) - np.eye(len(train.columns))`

In this example, all values were too low to be informative

In [None]:
# using a scatter plot matrix to visualise correlations
# plt.figure(figsize=(60,60))
# sb.pairplot(data)

Some variables are significantly correlated with each other which raises the problem of multicollinearity (variables are correlated with each other as well as with the response variable).
These variables are Full_Charge, FULL_AcidicMolPer, FULL_AURR980107,...

Variables that require further investigation - NT_EFC195, AS_MeanAmphiMoment

In [None]:
len(data['AS_MeanAmphiMoment'].unique()), data['NT_EFC195'].unique()

#this confirms that NT_EFC195 is a categorical variable

In [None]:
data[['CLASS','NT_EFC195']].head() #NT_EFC195 assumes both values irrespective of class


In [None]:
# getting the associated p-values. The value of 1 at the bottom should be ignored 
from scipy.stats import pearsonr
data.corr(method=lambda x, y: pearsonr(x, y)[1])['CLASS']

In [None]:
# checking the distribution and skewness of variables
plt.figure(figsize=(10,6))
data.skew().plot(kind='bar')

Most of the variables are minimally skewed except NT_EFC195. Further checks will be done to try to understand the properties of this variable.


In [None]:
data.groupby('NT_EFC195').size() # majority of the instances are of Class 0.

The skewedness in this variable can be understood by having most of its values at zeros


In [None]:
data.plot(kind='density', subplots=True, layout=(4,3), figsize=(10,10))

Values for AS_FUK010112, CT_RACS820104,FULL_GEOR030101 and FULL_AURR980107 lie close to zero compared to the rest of the variables.

Tranformation possibilities
* using the minimum and maximum scaler
* standardisation

#### Data transformation

Better performance of algorithms can be obtained if the data is transformed.
Some algorithms are may take features with large values as the most important features in the predictions

Seperating the predictor variables from the target variable

In [None]:
# converting Pandas dataframe to ndArray
dataArray = data.to_numpy()

# seperating the predictor and response variables
target = dataArray[:,11]
predictors = dataArray[:,0:11]

In [None]:
# using minMaxScaler to set all values between 0 and 1
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0,1))
rescaledPredictors = scaler.fit_transform(predictors)
                        

# using StandardScaler
from sklearn.preprocessing import StandardScaler
scaler1 = StandardScaler().fit(predictors)
standardizedPredictors = scaler1.transform(predictors)
                        


#### Feature selection

In [None]:
# using univariate statistics F-test as an alternative to chi-squared test (some values are zero and chi2 returns an error)

# untransformed data
from sklearn.feature_selection import SelectKBest, f_classif
bestFeatures = SelectKBest(score_func=f_classif, k=7)
fit = bestFeatures.fit(predictors, target)

scores = pd.DataFrame(fit.scores_) 
pvalues = pd.DataFrame(fit.pvalues_)
columns = pd.DataFrame(data.columns[0:11])

featureValues = pd.concat([columns,scores, pvalues,], axis=1) # concatenating dataframes
featureValues.columns = ['predictor', 'score', 'pvalue'] # naming the columns

print(featureValues.nlargest(7, 'score'))

In [None]:
# checking the transformed data

# rescaledPredictors
reBestFeatures = SelectKBest(score_func=f_classif, k=7)
reFit = reBestFeatures.fit(rescaledPredictors, target)

reScores = pd.DataFrame(reFit.scores_) 
rePvalues = pd.DataFrame(reFit.pvalues_)
reColumns = pd.DataFrame(data.columns[0:11])

reFeatureValues = pd.concat([reColumns,reScores, rePvalues,], axis=1) # concatenating dataframes
reFeatureValues.columns = ['re_predictor', 're_score', 're_pvalue'] # naming the columns



# standardizedPredictors
stBestFeatures = SelectKBest(score_func=f_classif, k=7)
stFit = stBestFeatures.fit(standardizedPredictors, target)

stScores = pd.DataFrame(stFit.scores_) 
stPvalues = pd.DataFrame(stFit.pvalues_)
stColumns = pd.DataFrame(data.columns[0:11])

stFeatureValues = pd.concat([stColumns,stScores, stPvalues,], axis=1) # concatenating dataframes
stFeatureValues.columns = ['st_predictor', 'st_score', 'st_pvalue'] # naming the columns

print(reFeatureValues.nlargest(7, 're_score')), print(stFeatureValues.nlargest(7, 'st_score'))

In [None]:
# using feature importance
from sklearn.ensemble import ExtraTreesClassifier
model = ExtraTreesClassifier()
model.fit(predictors, target)
print(model.feature_importances_)

# visualising feature importance
importances = pd.Series(model.feature_importances_, index=data.columns[0:11])
importances.nlargest(10).plot(kind='barh')
plt.show()

#### Building the classification model

In [None]:
# Splitting the data_one dataset into training and test datasets and using a logit function to classify instances
from sklearn.model_selection import train_test_split # random split
from sklearn.linear_model import LogisticRegression # all machine learning models in Python are implemented as classes
p_train, p_test, t_train, t_test = train_test_split(predictors, target, 
                                                    test_size=0.30,random_state=42)

logit = LogisticRegression() # making instance of model

# fitting the model on untransformed data
logit.fit(p_train, t_train)

#### Measuring model performance

We can measure the performance of a classification problem using precison, F1 Score, ROC curve


In [None]:
# predict on test data
predictions = logit.predict(p_test) 

In [None]:
# the confusion matrix
from sklearn import metrics
cm = metrics.confusion_matrix(t_test, predictions)
cm
sb.heatmap(cm, annot=True, fmt='.3f', linewidths=.5,
          square=True, cmap='Blues') 
plt.ylabel('Actual label'); plt.xlabel('Predicted label')


In [None]:
# performance metrics
print("Accuracy: ",metrics.accuracy_score(t_test, predictions)*100)
print("Precision: ",metrics.precision_score(t_test, predictions)*100)
print("Recall: ",metrics.recall_score(t_test, predictions)*100)

from sklearn.metrics import matthews_corrcoef
print('MCC: ',matthews_corrcoef(t_test, predictions)) # takes into account true and false positives and negatives, 
                                                      # higher values are better
# not affected by unbalanced classes




In [None]:
# ROC curve of true positive rate against false positive rate
# shows tradeoff between sensitivy and specificity

pred_probs = logit.predict_proba(p_test)[::,1] # start=0, stop=size of dimension, step=1
fpr, tpr,_ = metrics.roc_curve(t_test, pred_probs)
auc = metrics.roc_auc_score(t_test, pred_probs)
plt.plot(fpr, tpr, label = 'Untransformed+all Var, auc='+ str(auc))
plt.legend(loc=4)
plt.ylabel('tpr'), plt.xlabel('fpr')
plt.show()

In [None]:
# performance on new data
new_pred = logit.predict(new.values)

pred_df = pd.DataFrame(new_pred) 
pred_df.columns=["CLASS"]
pred_df.index.name="Index" 
pred_df["CLASS"] = pred_df["CLASS"].map({0:'False',1.0:'True'})

#csv file output
pred_df.to_csv("ilujumba.csv") 
print(pred_df['CLASS'].unique())

#printing the numbers of False and True
print(pred_df.groupby('CLASS').size()[0].sum())
print(pred_df.groupby('CLASS').size()[1].sum())

#### Logistic regression on rescaled data

In [None]:
p1_train, p1_test, t1_train, t1_test = train_test_split(rescaledPredictors, target, 
                                                        test_size=0.30,random_state=42)

logit1 = LogisticRegression() # making instance of model

# fitting the model on rescaled data
logit1.fit(p1_train, t1_train)

# predict on test data
predictions1 = logit1.predict(p1_test)

# performance metrics
print("Accuracy: ",metrics.accuracy_score(t1_test, predictions1)*100)
print("Precision: ",metrics.precision_score(t1_test, predictions1)*100)
print("Recall: ",metrics.recall_score(t1_test, predictions1)*100)

from sklearn.metrics import matthews_corrcoef
print('MCC: ',matthews_corrcoef(t1_test, predictions1))

# rescaling new data
newArray = new.to_numpy()
rescaledNew = scaler.fit_transform(newArray)


In [None]:
# performance on new data (rescaled)
new_pred1 = logit1.predict(rescaledNew)

pred_df1 = pd.DataFrame(new_pred1) 
pred_df1.columns=["CLASS"]
pred_df1.index.name="Index" 
pred_df1["CLASS"] = pred_df1["CLASS"].map({0:'False',1.0:'True'})

#csv file output
pred_df1.to_csv("ilujumba1.csv") 
print(pred_df1['CLASS'].unique())

#printing the numbers of False and True
print(pred_df1.groupby('CLASS').size()[0].sum())
print(pred_df1.groupby('CLASS').size()[1].sum())

#### Logistic regression on standardized data

In [None]:
p2_train, p2_test, t2_train, t2_test = train_test_split(standardizedPredictors, target, 
                                                        test_size=0.30,random_state=42)

logit2 = LogisticRegression() # making instance of model

# fitting the model on rescaled data
logit2.fit(p2_train, t2_train)

# predict on test data
predictions2 = logit2.predict(p2_test)

# performance metrics
print("Accuracy: ",metrics.accuracy_score(t2_test, predictions2)*100)
print("Precision: ",metrics.precision_score(t2_test, predictions2)*100)
print("Recall: ",metrics.recall_score(t2_test, predictions2)*100)
print('MCC: ',matthews_corrcoef(t2_test, predictions2))

# standardizing new data
standardizedNew = scaler1.transform(newArray)

# performance on new data (standaridized)
new_pred2 = logit2.predict(standardizedNew)

pred_df2 = pd.DataFrame(new_pred2) 
pred_df2.columns=["CLASS"]
pred_df2.index.name="Index" 
pred_df2["CLASS"] = pred_df2["CLASS"].map({0:'False',1.0:'True'})

#csv file output
pred_df2.to_csv("ilujumba2.csv") 
print(pred_df2['CLASS'].unique())

#printing the numbers of False and True
print(pred_df2.groupby('CLASS').size()[0].sum())
print(pred_df2.groupby('CLASS').size()[1].sum())

#### Using selected features, rescaled data and Logistic regression


In [None]:
p3_train, p3_test, t3_train, t3_test = train_test_split(rescaledPredictors[:,(0,1,2,3,7)], target, 
                                                        test_size=0.30,random_state=42)

logit3 = LogisticRegression() # making instance of model

# fitting the model on rescaled data
logit3.fit(p3_train, t3_train)

# predict on test data
predictions3 = logit3.predict(p3_test)

# performance metrics
print("Accuracy: ",metrics.accuracy_score(t3_test, predictions3)*100)
print("Precision: ",metrics.precision_score(t3_test, predictions3)*100)
print("Recall: ",metrics.recall_score(t3_test, predictions3)*100)

from sklearn.metrics import matthews_corrcoef
print('MCC: ',matthews_corrcoef(t3_test, predictions3))

# rescaling new data
# newArray = new.to_numpy()
# rescaledNew = scaler.fit_transform(newArray)

# performance on new data (rescaled)
new_pred3 = logit3.predict(rescaledNew[:,(0,1,2,3,7)])

pred_df3 = pd.DataFrame(new_pred3) 
pred_df3.columns=["CLASS"]
pred_df3.index.name="Index" 
pred_df3["CLASS"] = pred_df3["CLASS"].map({0:'False',1.0:'True'})

#csv file output
pred_df3.to_csv("ilujumba3.csv") 
print(pred_df3['CLASS'].unique())


#printing the numbers of False and True
print(pred_df3.groupby('CLASS').size()[0].sum())
print(pred_df3.groupby('CLASS').size()[1].sum())

#### Using cross-validation and Logistic Regression

In [None]:
from sklearn.model_selection import KFold
from sklearn.model_selection import cross_val_score

kfold = KFold(n_splits=10, random_state=42)
model6 = LogisticRegression()
model6.fit(predictors, target)

results = cross_val_score(model6, predictors, target)
print(results.mean())

model6_pred = model6.predict(rescaledNew)
df6 = pd.DataFrame(model6_pred)
df6.columns = ['CLASS']
df6.index.name = 'Index'
df6['CLASS'] = df6['CLASS'].map({0.0:False, 1.0:True})

df6.to_csv('ilujumba7.csv')

#### Naive Bayes classifier with kfold cross-validation

In [None]:
from sklearn.naive_bayes import GaussianNB
kfold = KFold(n_splits=10, random_state=42, shuffle=True)
model7 = GaussianNB()
model7.fit(predictors, target)

results = cross_val_score(model7, predictors, target)
print(results.mean())

model7_pred = model7.predict(newArray)
df7 = pd.DataFrame(model7_pred)
df7.columns = ['CLASS']
df7.index.name = 'Index'
df7['CLASS'] = df7['CLASS'].map({0.0:'False', 1.0:'True'})

df7.to_csv('ilujumba7.csv')
print(df7['CLASS'].unique())

#printing the numbers of False and True
print(df7.groupby('CLASS').size()[0].sum())
print(df7.groupby('CLASS').size()[1].sum())

#### Naive Bayes classifier on rescaled features

Assumes that all features are independent of each other and each feature contributes equally to the resulting class

In [None]:
kfold = KFold(n_splits=10, random_state=42, shuffle=True)
model8 = GaussianNB()
model8.fit(rescaledPredictors, target)

results1 = cross_val_score(model8, rescaledPredictors, target)
print(results1.mean())

model8_pred = model8.predict(rescaledNew)
df8 = pd.DataFrame(model8_pred)
df8.columns = ['CLASS']
df8.index.name = 'Index'
df8['CLASS'] = df8['CLASS'].map({0.0:'False', 1.0:'True'})

df8.to_csv('ilujumba8.csv')
print(df8['CLASS'].unique())

#printing the numbers of False and True
print(df8.groupby('CLASS').size()[0].sum())
print(df8.groupby('CLASS').size()[1].sum())

#### Naive Bayes and kfold validation

In [None]:
from sklearn.model_selection import cross_val_predict
from sklearn.naive_bayes import GaussianNB

kfold = KFold(n_splits=10, random_state=42, shuffle=True)
model9 = GaussianNB()
model9.fit(predictors, target)

results = cross_val_score(model9, predictors, target, cv =10) # ten-fold cross validation
print('mean for results', results.mean())

predic = cross_val_predict(model9, predictors, target, cv =10)
accuracy = metrics.r2_score(target, predic)
print('cross-predicted accuracy ', accuracy)

model9_pred = model9.predict(newArray)
df9 = pd.DataFrame(model9_pred)
df9.columns = ['CLASS']
df9.index.name = 'Index'
df9['CLASS'] = df9['CLASS'].map({0.0:'False', 1.0:'True'})

df9.to_csv('ilujumba9.csv')
print(df9['CLASS'].unique())

#printing the numbers of False and True
print(df9.groupby('CLASS').size()[0].sum())
print(df9.groupby('CLASS').size()[1].sum())

## Comparing several algorithms to look at the nature of the decision boundaries created

https://medium.com/cascade-bio-blog/creating-visualizations-to-better-understand-your-data-and-models-part-2-28d5c46e956

Algorithms define a st of hyperplanes that divide the datapoints to their respective classes and span the feature space trained on. Visualising enables one to understand the limitations of a given algorithm on a dataset given to it.
Thus decision boundaries enable one to understand to how the training data selected affects performance of the algorithm.

Ten sklearn classifier algorithms were compared

In [None]:
#importing classifiers from the sklearn library

from matplotlib.colors import ListedColormap
from sklearn.model_selection import train_test_split
from sklearn.neural_network import MLPClassifier #1
from sklearn.neighbors import KNeighborsClassifier #2
from sklearn.svm import SVC #3
from sklearn.gaussian_process import GaussianProcessClassifier #4
from sklearn.gaussian_process.kernels import RBF #5
from sklearn.tree import DecisionTreeClassifier #6
from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier #7,8
from sklearn.naive_bayes import GaussianNB #9
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis #10
from sklearn.linear_model import LogisticRegression #11

names = ["Nearest Neighbors", "Linear SVM", "RBF SVM", "Gaussian Process",
         "Decision Tree", "Random Forest", "Neural Net", "AdaBoost",
         "Naive Bayes", "QDA", "Logistic Regression"]

classifiers = [
    KNeighborsClassifier(3), # holds no assumption on data distribution (non-parametric)
    SVC(kernel="linear", C=0.025), # using a linear kernel
    SVC(gamma=2, C=1), # using radial basis function  kernel,C is low to enable a large decision margin
    GaussianProcessClassifier(1.0 * RBF(1.0)), # based on Laplace approximation
    DecisionTreeClassifier(),
    RandomForestClassifier(n_estimators=100), # 100 trees in the forest
    MLPClassifier(max_iter=1000), #iterations until converge
    AdaBoostClassifier(), # fits multiple classifiers on the same dataset
    GaussianNB(),
    QuadraticDiscriminantAnalysis(),
    LogisticRegression()]



Dimensionality reduction
https://stackabuse.com/dimensionality-reduction-in-python-with-scikit-learn/

Since the data is multi-dimensional, it was reduced using Principal Component Analysis (PCA) to reduce it to two components.
Trial runs were done to check how much of the variation in the data is explained by the principal components.

Another thing to keep in mind is that PCA works best on standardised/normalised data

In [None]:
# preprocessing the dataset
dataArray = data.to_numpy()
X, y = dataArray[:,0:11], dataArray[0:,11]
X = StandardScaler().fit_transform(X)

# reducing dimensions of the dataset using PCA  https://towardsdatascience.com/pca-using-python-scikit-learn-e653f8989e60
from sklearn.decomposition import PCA
pca = PCA()
pca.fit_transform(X)
pca_variance = pca.explained_variance_
plt.figure(figsize=(8, 6))
plt.bar(range(11), pca_variance, alpha=0.5, align='center', label='individual variance')
plt.legend()
plt.ylabel('Variance ratio')
plt.xlabel('Principal components')
plt.show()

In [None]:
pca2 = PCA(0.95) # keeping principal components that explain 95% of the variance
ninety_five = pca2.fit_transform(X)
ninety_five.shape

In [None]:
print("Explained variance: ", sum(pca2.explained_variance_ratio_))

Eight features explain 95% of the variance in the dataset

In [None]:
pca2 = PCA(3) # keeping features three principal components
principalComponents = pca2.fit_transform(X)

from mpl_toolkits.mplot3d import Axes3D
plt.figure(figsize=(10,6))
ax = plt.axes(projection='3d')
ax.scatter(principalComponents[:,0], principalComponents[:,1], principalComponents[:,2], 
           linewidths=1, alpha=.5,
           edgecolor='k', s= 200,
           c=data['CLASS'])
plt.show()

The three pincipal components wete visualised using a 3D plot. The figure above shows clustering of the three components. Each component is not exactly independent of the others so the clusters overlap to some extent

In [None]:
#converting principal component ndarrays to DataFrame format
principalDf = pd.DataFrame(data = principalComponents, columns = ['PC1', 'PC2','PC3'])
finalDf = pd.concat([principalDf, data['CLASS']], axis = 1)

In [None]:
finalDf.head()

In [None]:
print('Variance explained by three PCs: ',sum(pca2.explained_variance_ratio_)*100,'%')

#### Visualising the top 2 principal components

In [None]:
fig = plt.figure(figsize = (6,6))
ax = fig.add_subplot(111) 
ax.set(xlim=(-10,10), ylim=(-10,10))
ax.set_xlabel('Principal Component 1', fontsize = 15)
ax.set_ylabel('Principal Component 2', fontsize = 15)
ax.set_title('top 2 components', fontsize = 20)

targets = [0, 1]
colors = ['r', 'g']

for target, color in zip(targets,colors):
    indices = finalDf['CLASS'] == target
    ax.scatter(finalDf.loc[indices, 'PC1']
               , finalDf.loc[indices, 'PC2']
               , c = color
               , s = 50)
ax.legend(targets)
ax.grid()

In [None]:
# splitting the into training and test part
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=42)

A mesh grid is required. This can be thought of as a matrix of coordinates upon which the model will make decisions.
These are then visualised to reveal decision boundaries.
The mesh grip was created based on the data and a step size of 0.02

In [None]:
# creating mesh for the contour plot

h = .02  # step size in the mesh
x_min, x_max = X[:, 0].min() - .5, X[:, 0].max() + .5
y_min, y_max = X[:, 1].min() - .5, X[:, 1].max() + .5
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))

Two principal components were used to enable visualisation on a scatter plot.

The parameters for the PCA were generated on the training data and these were applied on both the training and training sets

In [None]:
pca4 = PCA(n_components=2)

# applying PCA on training set
pca4.fit(X_train)

#applying transform on training and testing sets
train_ = pca4.transform(X_train)
test_ = pca4.transform(X_test)

In [None]:
print("Explained variance: ", sum(pca4.explained_variance_ratio_))

In [None]:
train_.shape, test_.shape

After transforming the data and the creating the meshgrid, decision boundaries for the algorithms were created by iterating over the classifiers.

In [None]:
figure = plt.figure(figsize=(27, 15))
i = 1

datasets=[data]
for ds_cnt, ds in enumerate(datasets):
    # just plot the dataset first
    cm = plt.cm.RdBu
    cm_bright = ListedColormap(['#FF0000', '#0000FF'])
    ax = plt.subplot(len(datasets), len(classifiers) + 1, i)

    if ds_cnt == 0:
        ax.set_title("Input data")
        # Plot the top 2 principal components for training data
        ax.scatter(train_[:, 0], train_[:, 1], c=y_train, cmap=cm_bright,
                    edgecolors='k')
        # Plot the top 2 principal components for the testing data
        ax.scatter(test_[:, 0], test_[:, 1], c=y_test, cmap=cm_bright, alpha=0.6,
                    edgecolors='k')
        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xticks(())
        ax.set_yticks(())
        i += 1

        # iterate over classifiers

    for name, clf in zip(names, classifiers):
        ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
        clf.fit(train_, y_train)
        score = clf.score(test_, y_test)

        # Plot the decision boundary. For that, we will assign a color to each
        # point in the mesh [x_min, x_max]x[y_min, y_max].

        if hasattr(clf, "decision_function"):
            Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()]) # confidence scores
        else:
            Z = clf.predict_proba(np.c_[xx.ravel(), yy.ravel()])[:, 1] # probability estimates

        # Put the result into a color plot
        Z = Z.reshape(xx.shape)
        ax.contourf(xx, yy, Z, cmap=cm, alpha=.8)

        # Plot the training points
        ax.scatter(train_[:, 0], train_[:, 1], c=y_train, cmap=cm_bright, edgecolors='k')
        # Plot the testing points
        ax.scatter(test_[:, 0], test_[:, 1], c=y_test, cmap=cm_bright, edgecolors='k', alpha=0.4)

        ax.set_xlim(xx.min(), xx.max())
        ax.set_ylim(yy.min(), yy.max())
        ax.set_xticks(())
        ax.set_yticks(())
        if ds_cnt == 0:
            ax.set_title(name)
            ax.text(xx.max() - .3, yy.min() + .3, ('%.2f' % score).lstrip('0'), size=15, horizontalalignment='right')
            i += 1

plt.tight_layout()
plt.show()

Accuracies of the different algorithms are indicated on the lower right corner.

The plots show training points in solid colors and testing points semi-transparent. Contour decision boundaries were used which seperate points based on shared characteritics.
