<!-- # Διπλωματική Εργασία
## Ταξινόμηση του καρκίνου του μαστού με μεθόδους μηχανικής μάθησης
### Εξαγωγή χαρακτηριστικών με PCA

> Λάζαρος Πανιτσίδης<br />
> Τμήμα Μηχανικών Παραγωγής και Διοίκησης <br />
> Διεθνές Πανεπιστήμιο της Ελλάδος <br />
> lazarospanitsidis@outlook.com -->

# Diploma thesis
## Breast cancer classification using machine learning methods
### Feature extraction with PCA

> Lazaros Panitsidis<br />
> Department of Industrial Engineering and Management <br />
> International Hellenic University <br />
> lazarospanitsidis@outlook.com

## Contents
1. [Useful Python Libraries](#1)
1. [Data Processing](#2)
1. [Gaussian Naive Bayes](#3)
1. [Linear Discriminant Analysis](#4)
1. [Quadratic Discriminant Analysis](#5)
1. [Ridge Classifier](#6)
1. [Decision Tree Classifier](#7)
1. [Random Forest Classifier](#8)
1. [ADA Boost Classifier (Adaptive Boosting)](#9)
1. [C-Support Vector Classification](#10)
1. [Stochastic Gradient Descent Classifier](#11)
1. [eXtreme Gradient Boosting](#12)
1. [Light Gradient Boosting Machine](#13)
1. [K-Nearest Neighbors Classifier](#14)
1. [Multi-layer Perceptron Classifier](#15)
1. [Summary](#16)

<a id='1'></a>
## 1) Useful Python Libraries

In [1]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns # data visualization library  
import scipy.stats as stats
import matplotlib.pyplot as plt
import time
pd.set_option('display.max_columns', None)
pd.options.mode.chained_assignment = None  # default='warn'
#import warnings library
import warnings
# ignore all warnings
warnings.filterwarnings('ignore')
# Any results you write to the current directory are saved as output.

# some of them are not used in this file
from sklearn.experimental import enable_halving_search_cv  # noqa
from sklearn.feature_selection import SelectKBest, f_classif, chi2, RFE, RFECV , mutual_info_classif
from sklearn.model_selection import train_test_split, cross_val_score , GridSearchCV , LeaveOneOut,KFold,RandomizedSearchCV,StratifiedKFold, HalvingGridSearchCV
from skopt import BayesSearchCV # https://scikit-optimize.github.io/stable/modules/generated/skopt.BayesSearchCV.html#skopt.BayesSearchCV , https://scikit-optimize.github.io/stable/auto_examples/bayesian-optimization.html
from sklearn.metrics import f1_score, confusion_matrix, accuracy_score , make_scorer , classification_report
from sklearn.decomposition import PCA
from sklearn.pipeline import make_pipeline , Pipeline # https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.Pipeline.html
from sklearn.preprocessing import StandardScaler , LabelEncoder , MinMaxScaler
from xgboost import XGBClassifier , plot_importance
from sklearn.utils import resample
from sklearn.svm import SVC
from sklearn.linear_model import SGDClassifier , RidgeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis , QuadraticDiscriminantAnalysis
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier , AdaBoostClassifier
from sklearn.neighbors import KNeighborsClassifier
import lightgbm as lgbm
from sklearn.neural_network import MLPClassifier
import pygad

<a id='2'></a>
## 2) Data Processing

In [2]:
dataWISC = pd.read_csv('dataWisc.csv')
dataWISC.drop(["id", "Unnamed: 32"], axis = 1, inplace = True)

# Undersampling function
def make_undersample(_df, column):
  dfs_r = {}
  dfs_c = {}
  smaller = 1e1000
  ignore = ""
  for c in _df[column].unique():
    dfs_c[c] = _df[_df[column] == c]
    if dfs_c[c].shape[0] < smaller:
      smaller = dfs_c[c].shape[0]
      ignore = c

  for c in dfs_c:
    if c == ignore:
      continue
    dfs_r[c] = resample(dfs_c[c], 
                        replace=False, # sample without replacement
                        n_samples=smaller,
                        random_state=0)
  return pd.concat([dfs_r[c] for c in dfs_r] + [dfs_c[ignore]])

dataWISC = make_undersample(dataWISC,'diagnosis')

#Description of the dataset

#how many cases are included in the dataset
length = len(dataWISC)
#how many features are in the dataset
features = dataWISC.shape[1]-1 # - diagnosis

# Number of malignant cases
malignant = len(dataWISC[dataWISC['diagnosis']=='M'])

#Number of benign cases
benign = len(dataWISC[dataWISC['diagnosis']=='B'])

#Rate of malignant tumors over all cases
rate = (float(malignant)/(length))*100

print ("There are "+ str(len(dataWISC))+" cases in this dataset")
print ("There are {}".format(features)+" features in this dataset")
print ("There are {}".format(malignant)+" cases diagnosed as malignant tumor")
print ("There are {}".format(benign)+" cases diagnosed as benign tumor")
print ("The percentage of malignant cases is: {:.2f}%".format(rate))

There are 424 cases in this dataset
There are 30 features in this dataset
There are 212 cases diagnosed as malignant tumor
There are 212 cases diagnosed as benign tumor
The percentage of malignant cases is: 50.00%


In [3]:
y = dataWISC.diagnosis                          # M or B 
x = dataWISC.drop('diagnosis',axis = 1 )
target_names=['Benign','Malignant']
le= LabelEncoder()
le.fit(y)
y_le = le.transform(y)

In [4]:
x_new = x
x_new.head(3)

Unnamed: 0,radius_mean,texture_mean,perimeter_mean,area_mean,smoothness_mean,compactness_mean,concavity_mean,concave points_mean,symmetry_mean,fractal_dimension_mean,radius_se,texture_se,perimeter_se,area_se,smoothness_se,compactness_se,concavity_se,concave points_se,symmetry_se,fractal_dimension_se,radius_worst,texture_worst,perimeter_worst,area_worst,smoothness_worst,compactness_worst,concavity_worst,concave points_worst,symmetry_worst,fractal_dimension_worst
49,13.49,22.3,86.91,561.0,0.08752,0.07698,0.04751,0.03384,0.1809,0.05718,0.2338,1.353,1.735,20.2,0.004455,0.01382,0.02095,0.01184,0.01641,0.001956,15.15,31.82,99.0,698.8,0.1162,0.1711,0.2282,0.1282,0.2871,0.06917
285,12.58,18.4,79.83,489.0,0.08393,0.04216,0.00186,0.002924,0.1697,0.05855,0.2719,1.35,1.721,22.45,0.006383,0.008008,0.00186,0.002924,0.02571,0.002015,13.5,23.08,85.56,564.1,0.1038,0.06624,0.005579,0.008772,0.2505,0.06431
495,14.87,20.21,96.12,680.9,0.09587,0.08345,0.06824,0.04951,0.1487,0.05748,0.2323,1.636,1.596,21.84,0.005415,0.01371,0.02153,0.01183,0.01959,0.001812,16.01,28.48,103.9,783.6,0.1216,0.1388,0.17,0.1017,0.2369,0.06599


In [5]:
# https://machinelearningmastery.com/loocv-for-evaluating-machine-learning-algorithms/#:~:text=Given%20the%20improved%20estimate%20of,biased%20estimates%20of%20model%20performance.
# cv = LeaveOneOut()
rng = np.random.RandomState(13) # random number generator , use it in every random state if shuffle=True for different results.Usefull to test a specific algorithm multiple times within a for loop.

# https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html
cv=StratifiedKFold(n_splits=10, shuffle=False, random_state=None)
search_cv = StratifiedKFold(n_splits=5, shuffle=False, random_state=None)

originalclass = []
predictedclass = []

def classification_report_with_accuracy_score(y_true, y_pred):
  originalclass.extend(y_true)
  predictedclass.extend(y_pred)
  #print(classification_report(y_true, y_pred, target_names=target_names)) 
  return accuracy_score(y_true, y_pred)

def print_best_params(search):
    print("")
    print("Best hyperparameters : ", search.best_params_)
    print("")
    print("Best estimator : ", search.best_estimator_)
    print("")

In [6]:
# this method didn't work so it will not be used (nested cross validation which returns the best parameters and their scores)

# Following kf is the outer loop
outer_kf = StratifiedKFold(n_splits=10,shuffle=True,random_state=13)
inner_kf = StratifiedKFold(n_splits=5,shuffle=True,random_state=13)
# model = SVC()
# params = {'kernel':['rbf','linear'],'C':[1,10]}
def nested_cv_with_gscv(model,params,x,y):
    outer_loop_accuracy_scores = []
    inner_loop_won_params = []
    inner_loop_accuracy_scores = []

    # Looping through the outer loop, feeding each training set into a GSCV as the inner loop
    for train_index,test_index in outer_kf.split(x,y):
        
        GSCV = GridSearchCV(estimator=model,param_grid=params,cv=inner_kf)
        
        # GSCV is looping through the training data to find the best parameters. This is the inner loop
        GSCV.fit(x[train_index],y[train_index])
        
        # The best hyper parameters from GSCV is now being tested on the unseen outer loop test data.
        pred = GSCV.predict(x[test_index])
        
        # Appending the "winning" hyper parameters and their associated accuracy score
        inner_loop_won_params.append(GSCV.best_params_)
        outer_loop_accuracy_scores.append(accuracy_score(y[test_index],pred))
        inner_loop_accuracy_scores.append(GSCV.best_score_)

    for i in zip(inner_loop_won_params,outer_loop_accuracy_scores,inner_loop_accuracy_scores):
        print (i)

    print('Mean of outer loop accuracy score:',np.mean(outer_loop_accuracy_scores))

# https://github.com/rosscleung/Projects/blob/b9abc20db545d9f483e90a9b046ea50c74f25718/Tutorial%20notebooks/Nested%20Cross%20Validation%20Example.ipynb

* The process by which the best model will be selected is as follows:
1. Evaluation of the default algorithm with 10-fold cross validation
2. Evaluation of the tuned hyperparameter algorithm with nested cross-validation (5-fold Grid Search/Randomized Search inside a 10-fold cross validation)
3. Choosing the best model (from steps 1 and 2) and finding the best hyperparameters

<a id='3'></a>
## 3) [Gaussian Naive Bayes](<https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html#sklearn.naive_bayes.GaussianNB>)

* Default hyperparameters

In [7]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_gnb = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('gnb', GaussianNB())])
score = cross_val_score(clf_gnb, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.918     0.953     0.935       212
   Malignant      0.951     0.915     0.933       212

    accuracy                          0.934       424
   macro avg      0.935     0.934     0.934       424
weighted avg      0.935     0.934     0.934       424

--- Time of execution : 0.11070418357849121 seconds ---


* Νested Cross Validation with Grid Search for hyper-parameter tuning

In [8]:
start = time.time()
param_grid = { 'gnb__var_smoothing': np.logspace(0,-10, num=100) }

search = GridSearchCV(clf_gnb, param_grid=param_grid, n_jobs=-1,cv=search_cv,verbose=0,scoring='f1_macro')

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.919     0.958     0.938       212
   Malignant      0.956     0.915     0.935       212

    accuracy                          0.936       424
   macro avg      0.937     0.936     0.936       424
weighted avg      0.937     0.936     0.936       424

--- Time of execution : 9.095436811447144 seconds ---


<a id='4'></a>
## 4) [Linear Discriminant Analysis](<https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.LinearDiscriminantAnalysis.html>)

* Default hyperparameters

In [9]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_lda = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('lda', LinearDiscriminantAnalysis())])

score = cross_val_score(clf_lda, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.938     0.991     0.963       212
   Malignant      0.990     0.934     0.961       212

    accuracy                          0.962       424
   macro avg      0.964     0.962     0.962       424
weighted avg      0.964     0.962     0.962       424

--- Time of execution : 0.08859038352966309 seconds ---


* Νested Cross Validation with Grid Search for Model Evaluation

In [10]:
start = time.time()
param_grid = [
    {
        'lda__solver' : ['lsqr','eigen'],
        'lda__shrinkage':[None,'auto']
    },
    {
        'lda__solver' : ['svd'],
        'lda__tol': np.linspace(0, 0.01, num=100)
    }
]

search = RandomizedSearchCV(clf_lda, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=0,scoring='f1_macro',n_iter=500)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.938     0.991     0.963       212
   Malignant      0.990     0.934     0.961       212

    accuracy                          0.962       424
   macro avg      0.964     0.962     0.962       424
weighted avg      0.964     0.962     0.962       424

--- Time of execution : 6.808892011642456 seconds ---


<a id='5'></a>
## 5) [Quadratic Discriminant Analysis](<https://scikit-learn.org/stable/modules/generated/sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis.html>)

* Default hyperparameters

In [11]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_qda = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('qda', QuadraticDiscriminantAnalysis())])

score = cross_val_score(clf_qda, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.967     0.953     0.960       212
   Malignant      0.953     0.967     0.960       212

    accuracy                          0.960       424
   macro avg      0.960     0.960     0.960       424
weighted avg      0.960     0.960     0.960       424

--- Time of execution : 0.08455038070678711 seconds ---


* Νested Cross Validation with Randomized Search for hyper-parameter tuning

In [12]:
start = time.time()
param_grid = {
    'qda__reg_param': np.linspace(0, 1, num=100),
    'qda__tol': np.linspace(0, 0.01, num=100)
}

search = RandomizedSearchCV(clf_qda, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=0,scoring='f1_macro',n_iter=500)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.959     0.986     0.972       212
   Malignant      0.985     0.958     0.971       212

    accuracy                          0.972       424
   macro avg      0.972     0.972     0.972       424
weighted avg      0.972     0.972     0.972       424

--- Time of execution : 27.98405933380127 seconds ---


<a id='6'></a>
## 6) [Ridge Classifier](<https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.RidgeClassifier.html#sklearn.linear_model.RidgeClassifier>)

* Default hyperparameters

In [13]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_rc = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('rg', RidgeClassifier())])

score = cross_val_score(clf_rc, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.938     0.991     0.963       212
   Malignant      0.990     0.934     0.961       212

    accuracy                          0.962       424
   macro avg      0.964     0.962     0.962       424
weighted avg      0.964     0.962     0.962       424

--- Time of execution : 0.09992051124572754 seconds ---


* Νested Cross Validation with Randomized Search for hyper-parameter tuning

In [14]:
start = time.time()
param_grid = {
    'rg__alpha' : np.linspace(0, 1, num=10),
    'rg__fit_intercept' : [True,False],
    'rg__copy_X' : [True,False],
    'rg__max_iter' : [None],
    'rg__tol' : [0.001],
    'rg__class_weight' : [None,'balanced'],
    'rg__solver' : ['svd', 'cholesky', 'lsqr', 'sparse_cg', 'sag', 'saga', 'lbfgs'],
    'rg__positive' : [False]
}

search = RandomizedSearchCV(clf_rc, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=0,scoring='f1_macro',n_iter=500)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.938     0.991     0.963       212
   Malignant      0.990     0.934     0.961       212

    accuracy                          0.962       424
   macro avg      0.964     0.962     0.962       424
weighted avg      0.964     0.962     0.962       424

--- Time of execution : 32.04672145843506 seconds ---


<a id='7'></a>
## 7) [Decision Tree Classifier](<https://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeClassifier.html>)

* Default hyperparameters

In [15]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_tree = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('tree', DecisionTreeClassifier(random_state=13))])

score = cross_val_score(clf_tree, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.934     0.934     0.934       212
   Malignant      0.934     0.934     0.934       212

    accuracy                          0.934       424
   macro avg      0.934     0.934     0.934       424
weighted avg      0.934     0.934     0.934       424

--- Time of execution : 0.09537887573242188 seconds ---


* Νested Cross Validation with Randomized Search for hyper-parameter tuning

In [16]:
start = time.time()
param_grid = {
    'tree__criterion' :['gini','entropy'],
    'tree__splitter' : ['best','random'],
    'tree__max_depth': [list(range(2, 20)),None],
    'tree__min_samples_split': list(range(2, 6)),
    'tree__min_samples_leaf': list(range(1, 8)),
    'tree__min_weight_fraction_leaf' : [0.0],
    'tree__max_features': [None, 'sqrt', 'log2'],
    'tree__max_leaf_nodes' : [None],
    'tree__min_impurity_decrease' : [0.0],
    'tree__class_weight' : [None,'balanced'],
    'tree__ccp_alpha' : [0.0],
    'tree__random_state' : [13]
}

search = RandomizedSearchCV(clf_tree, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=0,scoring='f1_macro',n_iter=1000)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.914     0.948     0.931       212
   Malignant      0.946     0.910     0.928       212

    accuracy                          0.929       424
   macro avg      0.930     0.929     0.929       424
weighted avg      0.930     0.929     0.929       424

--- Time of execution : 51.517773389816284 seconds ---


* Finding the best hyper-parameters

In [17]:
search.fit(x_new, y) # we need this for adaboost

print_best_params(search)
search_results = pd.DataFrame(search.cv_results_)
search_results.sort_values(by='mean_test_score',ascending=False).head(3)


Best hyperparameters :  {'tree__splitter': 'best', 'tree__random_state': 13, 'tree__min_weight_fraction_leaf': 0.0, 'tree__min_samples_split': 5, 'tree__min_samples_leaf': 1, 'tree__min_impurity_decrease': 0.0, 'tree__max_leaf_nodes': None, 'tree__max_features': None, 'tree__max_depth': None, 'tree__criterion': 'entropy', 'tree__class_weight': None, 'tree__ccp_alpha': 0.0}

Best estimator :  Pipeline(steps=[('scaler', StandardScaler()), ('pca', PCA(n_components=10)),
                ('tree',
                 DecisionTreeClassifier(criterion='entropy',
                                        min_samples_split=5,
                                        random_state=13))])



Unnamed: 0,mean_fit_time,std_fit_time,mean_score_time,std_score_time,param_tree__splitter,param_tree__random_state,param_tree__min_weight_fraction_leaf,param_tree__min_samples_split,param_tree__min_samples_leaf,param_tree__min_impurity_decrease,param_tree__max_leaf_nodes,param_tree__max_features,param_tree__max_depth,param_tree__criterion,param_tree__class_weight,param_tree__ccp_alpha,params,split0_test_score,split1_test_score,split2_test_score,split3_test_score,split4_test_score,mean_test_score,std_test_score,rank_test_score
41,0.003124,0.006248,0.001902,0.003803444,best,13,0.0,5,1,0.0,,,,entropy,,0.0,"{'tree__splitter': 'best', 'tree__random_state...",0.964706,0.905765,1.0,0.917464,0.952354,0.948058,0.033818,1
206,0.005784,0.000399,0.001995,3.16298e-07,best,13,0.0,5,1,0.0,,,,entropy,balanced,0.0,"{'tree__splitter': 'best', 'tree__random_state...",0.964706,0.894118,1.0,0.917464,0.952354,0.945728,0.036911,2
414,0.007978,0.005984,0.001995,2.611745e-07,best,13,0.0,4,1,0.0,,,,entropy,,0.0,"{'tree__splitter': 'best', 'tree__random_state...",0.964706,0.905765,1.0,0.917464,0.940468,0.945681,0.03385,3


<a id='8'></a>
## 8) [Random Forest Classifier](<https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html>)

* Default hyperparameters

In [18]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_rf = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('rf', RandomForestClassifier(random_state=13))])
                       
score = cross_val_score(clf_rf, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.949     0.962     0.956       212
   Malignant      0.962     0.948     0.955       212

    accuracy                          0.955       424
   macro avg      0.955     0.955     0.955       424
weighted avg      0.955     0.955     0.955       424

--- Time of execution : 1.2531466484069824 seconds ---


* Νested Cross Validation with Randomized Search for hyper-parameter tuning

In [19]:
start = time.time()
param_grid = {
    'rf__bootstrap': [True,False],
    'rf__max_depth': [list(range(5,15)), None],
    'rf__n_estimators' :[100],
    'rf__max_features': [None, 'sqrt', 'log2'],
    'rf__max_leaf_nodes' : [None,list(range(5,15))],
    'rf__min_samples_leaf': list(range(1,10)),
    'rf__min_samples_split': list(range(2, 6)),
    'rf__criterion' :['entropy','gini'],
    'rf__random_state' : [13]
}

search = RandomizedSearchCV(clf_rf, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=1,scoring='f1_macro',n_iter=300)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)

Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits


In [20]:
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.953     0.962     0.958       212
   Malignant      0.962     0.953     0.957       212

    accuracy                          0.958       424
   macro avg      0.958     0.958     0.958       424
weighted avg      0.958     0.958     0.958       424

--- Time of execution : 148.47292494773865 seconds ---


<a id='9'></a>
## 9) [ADA Boost Classifier (Adaptive Boosting)](<https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html#:~:text=An%20AdaBoost%20%5B1%5D%20classifier%20is,focus%20more%20on%20difficult%20cases.>)

* Default hyperparameters

In [21]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_adaboost = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('adab', AdaBoostClassifier(random_state=13))])

score = cross_val_score(clf_adaboost, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.958     0.962     0.960       212
   Malignant      0.962     0.958     0.960       212

    accuracy                          0.960       424
   macro avg      0.960     0.960     0.960       424
weighted avg      0.960     0.960     0.960       424

--- Time of execution : 0.802978515625 seconds ---


* Νested Cross Validation with Randomized Search for hyper-parameter tuning

In [22]:
start = time.time()
param_grid = {
    'adab__base_estimator' : [DecisionTreeClassifier(criterion='entropy',min_samples_split=5, random_state=13)],
    'adab__n_estimators' : np.arange(100,210,10),
    'adab__learning_rate' : np.power(10, np.arange(-3, 1, dtype=float)),
    'adab__algorithm' : ['SAMME', 'SAMME.R'],
    'adab__random_state' : [13],
}

search = RandomizedSearchCV(clf_adaboost, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=0,scoring='f1_macro',n_iter=300)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.949     0.958     0.953       212
   Malignant      0.957     0.948     0.953       212

    accuracy                          0.953       424
   macro avg      0.953     0.953     0.953       424
weighted avg      0.953     0.953     0.953       424

--- Time of execution : 41.33515763282776 seconds ---


<a id='10'></a>
## 10) [C-Support Vector Classification](<https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html>)

* Default hyperparameters

In [23]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_svc = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('svc', SVC())])

score = cross_val_score(clf_svc, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.972     0.991     0.981       212
   Malignant      0.990     0.972     0.981       212

    accuracy                          0.981       424
   macro avg      0.981     0.981     0.981       424
weighted avg      0.981     0.981     0.981       424

--- Time of execution : 0.09321236610412598 seconds ---


* Νested Cross Validation with Grid Search for hyper-parameter tuning

In [24]:
start = time.time()
param_grid = [
    {
        'svc__kernel': ['rbf'], 
        'svc__gamma': [1,1e-1,1e-2, 1e-3, 1e-4,'auto','scale'], 
        'svc__C': [1, 10, 100, 1000],
        'svc__decision_function_shape': ['ovo', 'ovr'],
        'svc__random_state' : [13]
    },
    {
        'svc__kernel': ['linear'], 
        'svc__C': [1, 10, 100, 1000],
        'svc__decision_function_shape': ['ovo', 'ovr'],
        'svc__random_state' : [13]
    },
]

search = GridSearchCV(clf_svc, param_grid=param_grid, n_jobs=-1,cv=search_cv,verbose=0,scoring='f1_macro')

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.977     0.986     0.981       212
   Malignant      0.986     0.976     0.981       212

    accuracy                          0.981       424
   macro avg      0.981     0.981     0.981       424
weighted avg      0.981     0.981     0.981       424

--- Time of execution : 9.945495843887329 seconds ---


<a id='11'></a>
## 11) [Stochastic Gradient Descent Classifier](<https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.SGDClassifier.html>)

* Default hyperparameters

In [25]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_sgd = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('sgd', SGDClassifier(random_state=13))])

score = cross_val_score(clf_sgd, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.967     0.953     0.960       212
   Malignant      0.953     0.967     0.960       212

    accuracy                          0.960       424
   macro avg      0.960     0.960     0.960       424
weighted avg      0.960     0.960     0.960       424

--- Time of execution : 0.08829951286315918 seconds ---


* Νested Cross Validation with Grid Search for hyper-parameter tuning

In [26]:
start = time.time()
param_grid = {
    'sgd__average': [True, False],
    'sgd__l1_ratio': np.linspace(0, 1, num=10),
    'sgd__alpha': np.power(10, np.arange(-2, 1, dtype=float)),
    'sgd__random_state' : [13]
}

grid_search = GridSearchCV(clf_sgd, param_grid=param_grid, n_jobs=-1,cv=search_cv,verbose=0,scoring='f1_macro')

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(grid_search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.963     0.986     0.974       212
   Malignant      0.986     0.962     0.974       212

    accuracy                          0.974       424
   macro avg      0.974     0.974     0.974       424
weighted avg      0.974     0.974     0.974       424

--- Time of execution : 4.022985219955444 seconds ---


<a id='12'></a>
## 12) [eXtreme Gradient Boosting](<https://xgboost.readthedocs.io/en/latest/parameter.html#learning-task-parameters>)

* Default hyperparameters

In [27]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_xgb = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('xgb', XGBClassifier(random_state=13))])

score = cross_val_score(clf_xgb, x_new, y_le, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.962     0.962     0.962       212
   Malignant      0.962     0.962     0.962       212

    accuracy                          0.962       424
   macro avg      0.962     0.962     0.962       424
weighted avg      0.962     0.962     0.962       424

--- Time of execution : 0.5850555896759033 seconds ---


* Νested Cross Validation with Randomized Search for hyper-parameter tuning

In [28]:
# https://www.kaggle.com/code/prashant111/a-guide-on-xgboost-hyperparameters-tuning/notebook
# https://www.cs.cornell.edu/courses/cs4780/2018sp/lectures/lecturenote19.html
# https://medium.com/data-design/xgboost-hi-im-gamma-what-can-i-do-for-you-and-the-tuning-of-regularization-a42ea17e6ab6

start = time.time()
param_grid = {
        'xgb__booster' : ['gbtree'],
        'xgb__validate_parameters' : [True],
        'xgb__learning_rate' : np.arange(0.01,1.1,0.05),
        'xgb__gamma' : np.arange(0,1.05,0.05),
        'xgb__max_depth' : np.arange(2,11,1),
        'xgb__min_child_weight' : np.arange(1,6,1),
        'xgb__max_delta_step' : np.arange(0,5,1),
        'xgb__subsample' : [0.5],
        'xgb__colsample_bylevel' : [1],
        'xgb__colsample_bynode' : [1],
        'xgb__colsample_bytree' : [1],
        'xgb__reg_lambda' : [0,1],
        'xgb__reg_alpha' : [0],
        'xgb__tree_method' : ['exact'],
        'xgb__scale_pos_weight' : [1],
        'xgb__objective' : ['binary:logistic'], # 'multi:softmax' -> same scores as 'binary:logistic' with grid search
        #'num_class' : [2],
        'xgb__n_estimators' : np.arange(100,210,10),
        'xgb__random_state' : [13]
    }

grid_search = RandomizedSearchCV(clf_xgb, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=1,scoring='f1_macro',n_iter=300)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(grid_search, x_new, y_le, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)

Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits


In [29]:
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.959     0.981     0.970       212
   Malignant      0.981     0.958     0.969       212

    accuracy                          0.969       424
   macro avg      0.970     0.969     0.969       424
weighted avg      0.970     0.969     0.969       424

--- Time of execution : 217.93663454055786 seconds ---


<a id='13'></a>
## 13) [Light Gradient Boosting Machine](<https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html>)

* Default hyperparameters

In [30]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_lgbm = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('lgbm', lgbm.LGBMClassifier(random_state=13))])

score = cross_val_score(clf_lgbm, x_new, y_le, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.967     0.981     0.974       212
   Malignant      0.981     0.967     0.974       212

    accuracy                          0.974       424
   macro avg      0.974     0.974     0.974       424
weighted avg      0.974     0.974     0.974       424

--- Time of execution : 0.2835416793823242 seconds ---


* Νested Cross Validation with Randomized Search for hyper-parameter tuning

In [31]:
# https://lightgbm.readthedocs.io/en/latest/Parameters-Tuning.html
# https://neptune.ai/blog/lightgbm-parameters-guide
# https://www.youtube.com/watch?v=5CWwwtEM2TA&ab_channel=PyData & https://github.com/MSusik/newgradientboosting/blob/master/pydata.pdf

start = time.time()
param_grid = {
        'lgbm__boosting_type' : ['gbdt','dart'],
        'lgbm__num_leaves' : np.arange(5,55,1),
        'lgbm__max_depth' : np.arange(2,11,1),
        'lgbm__learning_rate' : np.arange(0.01,1.1,0.05),
        'lgbm__n_estimators' : np.arange(100,210,10),
        'lgbm__objective' : ['binary'],
        'lgbm__min_child_samples' : np.arange(10,35,5),
        'lgbm__subsample' : [0.5],
        'lgbm__reg_lambda' : [0,1],
        'lgbm__reg_alpha' : [0],
        'lgbm__subsample' : [0.5],
        'lgbm__colsample_bytree' : [1],
        'lgbm__scale_pos_weight' : [1],
        'lgbm__random_state' : [13]
    }

grid_search = RandomizedSearchCV(clf_lgbm, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=1,scoring='f1_macro',n_iter=300)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(grid_search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)

Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits
Fitting 5 folds for each of 300 candidates, totalling 1500 fits


In [32]:
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.963     0.976     0.970       212
   Malignant      0.976     0.962     0.969       212

    accuracy                          0.969       424
   macro avg      0.969     0.969     0.969       424
weighted avg      0.969     0.969     0.969       424

--- Time of execution : 90.93715715408325 seconds ---


<a id='14'></a>
## 14) [K-Nearest Neighbors Classifier](<https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html>)

* Default hyperparameters

In [33]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_knn = Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('knn', KNeighborsClassifier())])

score = cross_val_score(clf_knn, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.946     0.986     0.965       212
   Malignant      0.985     0.943     0.964       212

    accuracy                          0.965       424
   macro avg      0.965     0.965     0.965       424
weighted avg      0.965     0.965     0.965       424

--- Time of execution : 0.1001896858215332 seconds ---


* Νested Cross Validation with Grid Search for hyper-parameter tuning

In [34]:
start = time.time()
param_grid = {
    'knn__n_neighbors': list(range(3,10)),
    'knn__weights': ['uniform','distance'],
    'knn__algorithm' : ['ball_tree', 'kd_tree', 'brute'],
    'knn__leaf_size': [10,20,30,40,50],
    'knn__p': [1,2],
    'knn__metric': ['minkowski','manhattan','chebyshev']
}

grid_search = GridSearchCV(clf_knn, param_grid=param_grid, n_jobs=-1,cv=search_cv,verbose=0,scoring='f1_macro')

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(grid_search, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.972     0.986     0.979       212
   Malignant      0.986     0.972     0.979       212

    accuracy                          0.979       424
   macro avg      0.979     0.979     0.979       424
weighted avg      0.979     0.979     0.979       424

--- Time of execution : 87.84296345710754 seconds ---


<a id='15'></a>
## 15) [Multi-layer Perceptron Classifier](<https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPClassifier.html>)

* Default hyperparameters

In [35]:
start = time.time()
originalclass = []
predictedclass = []
  
# Cross validate
clf_mlp =  Pipeline([('scaler', StandardScaler()),('pca',PCA(n_components=10)), ('scaler2', StandardScaler()),('mlp', MLPClassifier(shuffle=True,random_state=13))])
# scaling after pca showed better results
score = cross_val_score(clf_mlp, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))
print("--- Time of execution : %s seconds ---" % (time.time() - start))

              precision    recall  f1-score   support

      Benign      0.972     0.991     0.981       212
   Malignant      0.990     0.972     0.981       212

    accuracy                          0.981       424
   macro avg      0.981     0.981     0.981       424
weighted avg      0.981     0.981     0.981       424

--- Time of execution : 3.700441360473633 seconds ---


* Νested Cross Validation with Randomized Search for hyper-parameter tuning

* Tried a wider range of hyperparameters in nested cross validation at first , but over testing, worst attempts were removed (those in comments). Finally, when few hyperparameters remained, they were tested separately with a 10-fold cross-validation.

1. Solver : ADAM

In [36]:
# http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
# https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote20.pdf
# https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw
# https://datascience.stackexchange.com/questions/36049/how-to-adjust-the-hyperparameters-of-mlp-classifier-to-get-more-perfect-performa
# https://scikit-learn.org/stable/auto_examples/neural_networks/plot_mlp_alpha.html
param_grid = {
    'mlp__hidden_layer_sizes' : [(20,)] , #[(20,),(100,),(20,4),(10,10,)] 
    'mlp__activation' : ['relu'], #['logistic','tanh','relu']
    'mlp__solver' : ['adam'],
    'mlp__alpha' : [0.1], #10.0 ** -np.arange(1, 7) , np.linspace(0.1,3,20) , 1 ,*0.1
    'mlp__batch_size' : [99], # ~297 samples for train at inner cv -> 3 equal size batches , #[16,32,64,128,'auto']
    'mlp__learning_rate' : ['constant'],
    'mlp__learning_rate_init' : [0.01,0.001],
    'mlp__max_iter' : [1000],
    'mlp__verbose' : [1]
}

search_adam = GridSearchCV(clf_mlp, param_grid=param_grid, n_jobs=-1,cv=search_cv,verbose=1,scoring='f1_macro')

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search_adam, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Iteration 1, loss = 0.60904345
Iteration 2, loss = 0.51329005
Iteration 3, loss = 0.43427708
Iteration 4, loss = 0.37003938
Iteration 5, loss = 0.31461902
Iteration 6, loss = 0.26791543
Iteration 7, loss = 0.22981482
Iteration 8, loss = 0.19866344
Iteration 9, loss = 0.17438068
Iteration 10, loss = 0.15387658
Iteration 11, loss = 0.13839645
Iteration 12, loss = 0.12554637
Iteration 13, loss = 0.11533133
Iteration 14, loss = 0.10708479
Iteration 15, loss = 0.09933705
Iteration 16, loss = 0.09376646
Iteration 17, loss = 0.08873488
Iteration 18, loss = 0.08502769
Iteration 19, loss = 0.08117158
Iteration 20, loss = 0.07786950
Iteration 21, loss = 0.07522183
Iteration 22, loss = 0.07285588
Iteration 23, loss = 0.07105131
Iteration 24, loss = 0.06917150
Iteration 25, loss = 0.06729218
Iteration 26, loss = 0.06573170
Iteration 27, loss = 0.06471634
Iteration 28, loss = 0.06329278
Iteration 29, loss = 0.06245844
Iteration 30, loss = 

In [37]:
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))

              precision    recall  f1-score   support

      Benign      0.981     0.991     0.986       212
   Malignant      0.990     0.981     0.986       212

    accuracy                          0.986       424
   macro avg      0.986     0.986     0.986       424
weighted avg      0.986     0.986     0.986       424



In [38]:
originalclass = []
predictedclass = []
  
# Cross validate
clf_mlp_adam =  Pipeline(steps=[('scaler', StandardScaler()), ('pca', PCA(n_components=10)),
                ('scaler2', StandardScaler()),
                ('mlp',MLPClassifier(alpha=0.1, batch_size=99,
                               hidden_layer_sizes=(20,),
                               learning_rate_init=0.01, max_iter=1000,
                               random_state=13, verbose=0))])

score = cross_val_score(clf_mlp_adam, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))

              precision    recall  f1-score   support

      Benign      0.981     0.986     0.984       212
   Malignant      0.986     0.981     0.983       212

    accuracy                          0.983       424
   macro avg      0.984     0.983     0.983       424
weighted avg      0.984     0.983     0.983       424



2. Solver : SGD

In [39]:
# http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
# https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote20.html
# https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw
# https://datascience.stackexchange.com/questions/36049/how-to-adjust-the-hyperparameters-of-mlp-classifier-to-get-more-perfect-performa
param_grid = {
    'mlp__hidden_layer_sizes' : [(20,)], # [(20,),(100,),(20,4),(10,10,)]
    'mlp__activation' : ['relu'], #['logistic','tanh','relu'] 
    'mlp__solver' : ['sgd'],
    'mlp__alpha' : [0.1], #10.0 ** -np.arange(1, 7) , np.linspace(0.1,3,20) , 1 ,*0.1
    'mlp__batch_size' : [99], # ~297 samples for train at inner cv -> 3 equal size batches , #[16,32,64,128,'auto']
    'mlp__learning_rate' : ['constant','adaptive'], # [*'constant','invscaling','adaptive']
    'mlp__learning_rate_init' : [0.01],
    'mlp__power_t' : [0.5],
    'mlp__momentum' : [0.2], # np.arange(0.1,1,0.1), *0.2
    'mlp__max_iter' : [1000],
    'mlp__verbose' : [1]
}

search_sgd = RandomizedSearchCV(clf_mlp, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=1,scoring='f1_macro',n_iter=100)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search_sgd, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)

Fitting 5 folds for each of 2 candidates, totalling 10 fits
Iteration 1, loss = 0.64557034
Iteration 2, loss = 0.63982398
Iteration 3, loss = 0.63416835
Iteration 4, loss = 0.62863543
Iteration 5, loss = 0.62320076
Iteration 6, loss = 0.61778218
Iteration 7, loss = 0.61248881
Iteration 8, loss = 0.60722685
Iteration 9, loss = 0.60209668
Iteration 10, loss = 0.59696720
Iteration 11, loss = 0.59199538
Iteration 12, loss = 0.58708415
Iteration 13, loss = 0.58222086
Iteration 14, loss = 0.57735901
Iteration 15, loss = 0.57266623
Iteration 16, loss = 0.56787383
Iteration 17, loss = 0.56320912
Iteration 18, loss = 0.55862619
Iteration 19, loss = 0.55405714
Iteration 20, loss = 0.54954477
Iteration 21, loss = 0.54513753
Iteration 22, loss = 0.54088514
Iteration 23, loss = 0.53639636
Iteration 24, loss = 0.53213378
Iteration 25, loss = 0.52786982
Iteration 26, loss = 0.52361705
Iteration 27, loss = 0.51945330
Iteration 28, loss = 0.51539597
Iteration 29, loss = 0.51132100
Iteration 30, loss = 

In [40]:
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))

              precision    recall  f1-score   support

      Benign      0.972     0.995     0.984       212
   Malignant      0.995     0.972     0.983       212

    accuracy                          0.983       424
   macro avg      0.984     0.983     0.983       424
weighted avg      0.984     0.983     0.983       424



In [41]:
originalclass = []
predictedclass = []
  
# Cross validate
clf_mlp_sgd =  Pipeline(steps=[('scaler', StandardScaler()), ('pca', PCA(n_components=10)),('scaler2', StandardScaler()),
                ('mlp',MLPClassifier(solver='sgd',alpha=0.1, batch_size=99,
                               hidden_layer_sizes=(20,),learning_rate='constant',
                               learning_rate_init=0.01, max_iter=1000,momentum=0.2,
                               random_state=13, verbose=0))])

score = cross_val_score(clf_mlp_sgd, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))

              precision    recall  f1-score   support

      Benign      0.972     0.995     0.984       212
   Malignant      0.995     0.972     0.983       212

    accuracy                          0.983       424
   macro avg      0.984     0.983     0.983       424
weighted avg      0.984     0.983     0.983       424



3. Solver : LBFGS

In [42]:
# http://yann.lecun.com/exdb/publis/pdf/lecun-98b.pdf
# https://www.cs.cornell.edu/courses/cs4780/2018fa/lectures/lecturenote20.html
# https://stats.stackexchange.com/questions/181/how-to-choose-the-number-of-hidden-layers-and-nodes-in-a-feedforward-neural-netw
# https://datascience.stackexchange.com/questions/36049/how-to-adjust-the-hyperparameters-of-mlp-classifier-to-get-more-perfect-performa
param_grid = {
    'mlp__hidden_layer_sizes' : [(20,)],  #[(20,),(100,),(20,4),(10,10,)]
    'mlp__activation' : ['relu'],
    'mlp__solver' : ['lbfgs'],
    'mlp__alpha' : [1], #10.0 ** -np.arange(1, 7) , np.linspace(0.1,3,20) , 1
    'mlp__max_iter' : np.arange(300,500,10) , # np.arange(100,300,10) , [100,200,500,1000]
}

search_lbfgs = RandomizedSearchCV(clf_mlp, param_distributions=param_grid, n_jobs=-1,cv=search_cv,verbose=1,scoring='f1_macro',n_iter=1000)

originalclass = []
predictedclass = []
  
# Nested Cross validation
score = cross_val_score(search_lbfgs, x_new, y, scoring=make_scorer(classification_report_with_accuracy_score),cv=cv)

Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits
Fitting 5 folds for each of 20 candidates, totalling 100 fits


In [43]:
print(classification_report(originalclass, predictedclass, target_names=target_names, digits=3))

              precision    recall  f1-score   support

      Benign      0.968     0.986     0.977       212
   Malignant      0.986     0.967     0.976       212

    accuracy                          0.976       424
   macro avg      0.977     0.976     0.976       424
weighted avg      0.977     0.976     0.976       424



<a id='16'></a>
## 16) Summary

* Below are the tables of the specific feature selection method.
* The performance of the algorithms is in descending order.
* All the results are the average values of a 10-fold cross validation.
* The columns contain the accuracy and the average values of precision, recall and f1 score.
* It is observed that the number of samples of Βenign and Μalignant cancer are equal (212 respectively), so the weighted average and the macro average are equal.

<table style="text-align:center">
    <tr>
        <th colspan="6"> PCA : Default algorithms</th>
    </tr>
    <tr>
        <th></th>
        <th>precision </th>
        <th>recall</th>
        <th>f1 score</th>
        <th>accuracy</th>
        <th>Execution Time<br>
        (seconds)</th>  
    </tr>
    <tr>
        <th>MLP</th>
        <td>0.981</td>
        <td>0.981</td>
        <td>0.981</td>
        <td>0.981</td>
        <td>3.700</td>
    </tr>
    <tr>
        <th>SVC</th>
        <td>0.981</td>
        <td>0.981</td>
        <td>0.981</td>
        <td>0.981</td>
        <td>0.093</td>
    </tr>
    <tr>
        <th>LGBM</th>
        <td>0.974</td>
        <td>0.974</td>
        <td>0.974</td>
        <td>0.974</td>
        <td>0.284</td>
    </tr>
    <tr>
        <th>KNN</th>
        <td>0.965</td>
        <td>0.965</td>
        <td>0.965</td>
        <td>0.965</td>
        <td>0.100</td>
    </tr>
    <tr>
        <th>Ridge</th>
        <td>0.964</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>0.100</td>
    </tr>
    <tr>
        <th>LDA</th>
        <td>0.964</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>0.089</td>
    </tr>
     <tr>
        <th>XGBoost</th>
        <td>0.962</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>0.585</td>
    </tr>
    <tr>
        <th>QDA</th>
        <td>0.960</td>
        <td>0.960</td>
        <td>0.960</td>
        <td>0.960</td>
        <td>0.085</td>
    </tr>
    <tr>
        <th>SGD</th>
        <td>0.960</td>
        <td>0.960</td>
        <td>0.960</td>
        <td>0.960</td>
        <td>0.088</td>
    </tr>
    <tr>
        <th>AdaBoost</th>
        <td>0.960</td>
        <td>0.960</td>
        <td>0.960</td>
        <td>0.960</td>
        <td>0.803</td>
    </tr>
    <tr>
        <th>Random Forest</th>
        <td>0.955</td>
        <td>0.955</td>
        <td>0.955</td>
        <td>0.955</td>
        <td>1.253</td>
    </tr>
    <tr>
        <th>GNB</th>
        <td>0.935</td>
        <td>0.934</td>
        <td>0.934</td>
        <td>0.934</td>
        <td>0.111</td>
    </tr>
    <tr>
        <th>Decision Tree</th>
        <td>0.934</td>
        <td>0.934</td>
        <td>0.934</td>
        <td>0.934</td>
        <td>0.954</td>
    </tr>

</table>

<table style="text-align:center">
    <tr>
        <th colspan="6"> PCA : Tuned algorithms</th>
    </tr>
    <tr>
        <th></th>
        <th>precision </th>
        <th>recall</th>
        <th>f1 score</th>
        <th>accuracy</th> 
        <th>Execution Time<br>
        (seconds)</th> 
    </tr>
    <tr>
        <th>MLP</th>
        <td>0.984</td>
        <td>0.983</td>
        <td>0.983</td>
        <td>0.983</td>
        <td>trial and error</td>
    </tr>
    <tr>
        <th>SVC</th>
        <td>0.981</td>
        <td>0.981</td>
        <td>0.981</td>
        <td>0.981</td>
        <td>9.945</td>
    </tr>
    <tr>
        <th>KNN</th>
        <td>0.979</td>
        <td>0.979</td>
        <td>0.979</td>
        <td>0.979</td>
        <td>87.843</td>
    </tr>
    <tr>
        <th>SGD</th>
        <td>0.974</td>
        <td>0.974</td>
        <td>0.974</td>
        <td>0.974</td>
        <td>4.023</td>
    </tr>
    <tr>
        <th>QDA</th>
        <td>0.972</td>
        <td>0.972</td>
        <td>0.972</td>
        <td>0.972</td>
        <td>27.984</td>
    </tr>
    <tr>
        <th>LGBM</th>
        <td>0.967</td>
        <td>0.967</td>
        <td>0.967</td>
        <td>0.967</td>
        <td>90.937</td>
    </tr>
    <tr>
        <th>LDA</th>
        <td>0.964</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>6.809</td>
    </tr>
    <tr>
        <th>Ridge</th>
        <td>0.964</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>0.962</td>
        <td>32.047</td>
    </tr>
    <tr>
        <th>XGBoost</th>
        <td>0.955</td>
        <td>0.955</td>
        <td>0.955</td>
        <td>0.955</td>
        <td>217.937</td>
    </tr>
    <tr>
        <th>AdaBoost</th>
        <td>0.953</td>
        <td>0.953</td>
        <td>0.953</td>
        <td>0.953</td>
        <td>41.335</td>
    </tr>
    <tr>
        <th>Random Forest</th>
        <td>0.948</td>
        <td>0.948</td>
        <td>0.948</td>
        <td>0.948</td>
        <td>148.473</td>
    </tr>
    <tr>
        <th>GNB</th>
        <td>0.937</td>
        <td>0.936</td>
        <td>0.936</td>
        <td>0.936</td>
        <td>9.095</td>
    </tr>
    <tr>
        <th>Decision Tree</th>
        <td>0.929</td>
        <td>0.929</td>
        <td>0.929</td>
        <td>0.929</td>
        <td>51.518</td>
    </tr>

</table>

- As it is seen, some models perform better with default parameters. This can happen for various reasons such as:
    - Unlucky selection of hyperparameters from random search
    - Hyperparameters selected cause overfitting
    - Smaller training sample in the inner loop due to nested cross-validation

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
- Sources for nested cross validation :
    1. [Cross-Validation and Hyperparameter Search in scikit-learn - A Complete Guide](<https://dev.to/balapriya/cross-validation-and-hyperparameter-search-in-scikit-learn-a-complete-guide-5ed8>)
    2. [Nested Cross Validation for Algorithm Selection](<https://vitalflux.com/python-nested-cross-validation-algorithm-selection/>)
    3. [Nested Cross-Validation for Machine Learning with Python](<https://machinelearningmastery.com/nested-cross-validation-for-machine-learning-with-python/>)
    4. [Nested cross validation for model selection](<https://stats.stackexchange.com/questions/65128/nested-cross-validation-for-model-selection/65158#65158>)
    5. [scikit-learn GridSearchCV with multiple repetitions](<https://stackoverflow.com/questions/42228735/scikit-learn-gridsearchcv-with-multiple-repetitions/42230764#42230764>)
    6. [Model selection done right: A gentle introduction to nested cross-validation](<https://ploomber.io/blog/nested-cv/>)
    7. [How to obtain optimal hyperparameters after nested cross validation?](<https://stats.stackexchange.com/questions/254612/how-to-obtain-optimal-hyperparameters-after-nested-cross-validation>)
    8. [Cross-validation for parameter tuning, model selection, and feature selection](<https://github.com/justmarkham/scikit-learn-videos/blob/master/07_cross_validation.ipynb>)
- Sources for Hyper Parameter-Optimization :
    1. [Random Search for Hyper-Parameter Optimization](<https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf>)
    2. [Hyperparameter tuning for machine learning models](<https://www.jeremyjordan.me/hyperparameter-tuning/>)
- Sources for code :
    - All sources are in comments at each code part