# ML1 - Evaluation Measures

In this experiment, I am using the iris dataset and train a normal Logistic Regression Algorithm.

In the following code, there is NO exploratory data analysis, pre-processing, data cleansing or data transformation. The goal here is only to examine the different Evaluation Measures.

## Part 1: Import

In [42]:
from sklearn.datasets import load_iris
iris = load_iris()
X,y = iris.data, iris.target

# With Python version 3, a warning message pops up when I use the Logistic Regression. I prefered to mute that.
import warnings
warnings.filterwarnings("ignore", category=FutureWarning)

## Part 2: Split Data Set

One way to proceed with the machine learning task is via **spliting** the dataset to **training** and **test** set.

In [43]:
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, test_size= 0.3)
logistic = LogisticRegression()
logistic.fit(X_train, y_train)

LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
          intercept_scaling=1, max_iter=100, multi_class='warn',
          n_jobs=None, penalty='l2', random_state=None, solver='warn',
          tol=0.0001, verbose=0, warm_start=False)

## Part 3: K fold

Another way is to make cross validation tests. We define the number of folds to be made (cv = kfold) and the way that random rows are selected (seed).

There are different metrics that are presented in a typical classification problem but some of them are for binary and others for multiclass machine learning tasks. In this experiment, MULTICLASS accuracy metrics are used.

In [49]:
from sklearn import model_selection
model = LogisticRegression()

#Classification Accuracy
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
scoring = 'accuracy'
results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring)
print(("Accuracy: %.3f with Standard Deviation: %.3f") % (results.mean(), results.std()))

#Logarithmic Loss
scoring = 'neg_log_loss'
results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring)
print(("Log Loss: %.3f with Standard Deviation: %.3f") % (results.mean(), results.std()))

#ROC AUC Curve
seed = 7
kfold = model_selection.KFold(n_splits=10, random_state=seed)
model = LogisticRegression()
scoring = 'roc_auc'
#results = model_selection.cross_val_score(model, X_train, y_train, cv=kfold, scoring=scoring)


Accuracy: 0.962 with Standard Deviation: 0.047
Log Loss: -0.361 with Standard Deviation: 0.067


- **Accuracy:** It counts how many of the predictions are correct. It works with both binary and multiclass tasks.
- **Prcision:** It is about of being precise when guessing. It tracks the percentage of times, when forecasting a class, tha a class was right. If for example you have diagnosed 10 patients with cancer and 9 actually are ill, it means that the precision is 90%.
- **Recall:** It reports, among an entiry class, your percentage of correct guesses. If for example there are 20 patients with cancer and you have diagnosed only 9 out of 20, it means that recall is 45%.
- **F1 score:** It is considered a great metric that combines Precision and Recall.


- **ROC AUC:** It is useful when we want to order the classification according to the probability of being correct. The curve will sort cases starting from the most likely to have cancer to those least likely to have cancer. The curve is higher when the ordering is good and low when it it is bad. If the model has a high ROC AUC, you need to check the most likely ill patients. ROC AUC Curve is only good for binary classification tasks

## Part 4: Confusion Matrix

In [50]:
from sklearn.metrics import confusion_matrix

model = LogisticRegression()
model.fit(X_train, y_train)
predicted = model.predict(X_test)
matrix = confusion_matrix(y_test, predicted)
print(matrix)


[[15  0  0]
 [ 0 12  3]
 [ 0  0 15]]


## Part 5: Classification Report

In [51]:
from sklearn.metrics import classification_report

predicted = model.predict(X_test)
report = classification_report(y_test, predicted)
print(report)

              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       1.00      0.80      0.89        15
           2       0.83      1.00      0.91        15

   micro avg       0.93      0.93      0.93        45
   macro avg       0.94      0.93      0.93        45
weighted avg       0.94      0.93      0.93        45

