## <span style="color:Purple">**Exercise**</span>
Use iris flower dataset from sklearn library and use cross_val_score against following models to measure the performance of each. In the end figure out the model with best performance,

1. Logistic Regression
2. SVM
3. Decision Tree
4. Random Forest

In [1]:
import numpy as np
import matplotlib.pyplot as plt 
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier

In [2]:
iris=load_iris()
dir(iris)

['DESCR',
 'data',
 'data_module',
 'feature_names',
 'filename',
 'frame',
 'target',
 'target_names']

In [3]:
iris.target, iris.target_names

(array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
        0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
        2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2]),
 array(['setosa', 'versicolor', 'virginica'], dtype='<U10'))

In [4]:
X=iris.data
y=iris.target

In [5]:
X_train, X_test, y_train, y_test=train_test_split(X,y, test_size=0.3, random_state=42)

len(X_train), len(X_test)

(105, 45)

In [6]:
# Defining a function for model training 

def get_scores(model, X_train, X_test, y_train, y_test):
    model.fit(X_train, y_train)
    return model.score(X_test, y_test)

## **Logistic Regression**

In [7]:
get_scores(LogisticRegression(solver='liblinear', multi_class='auto'), X_train, X_test, y_train, y_test)

0.9777777777777777

## **Support Vector Machine**

In [8]:
get_scores(SVC(C=5, kernel='linear'), X_train, X_test, y_train, y_test)

0.9777777777777777

## **Decision Tree Classifier**

In [9]:
get_scores(DecisionTreeClassifier(criterion='gini'), X_train, X_test, y_train, y_test)

1.0

## **Random Forest Classifier**

In [10]:
get_scores(RandomForestClassifier(n_estimators=100, criterion='gini'), X_train, X_test, y_train, y_test)

1.0

## **Using Stratified K-Fold Cross-Validation**

In [11]:
from sklearn.model_selection import StratifiedKFold         #Splits the data for train and test, uniformly
folds=StratifiedKFold(n_splits=4)

scores_lr=[]
scores_svm=[]
scores_tree=[]
scores_rfcl=[]

for train_index, test_index in folds.split(X, y):
    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    scores_lr.append(get_scores(LogisticRegression(solver='liblinear', multi_class='auto'), X_train, X_test, y_train, y_test))
    scores_svm.append(get_scores(SVC(C=5, kernel='linear'), X_train, X_test, y_train, y_test))
    scores_tree.append(get_scores(DecisionTreeClassifier(criterion='gini'), X_train, X_test, y_train, y_test))
    scores_rfcl.append(get_scores(RandomForestClassifier(n_estimators=100, criterion='gini'), X_train, X_test, y_train, y_test))

print('Logistic Regression scores:', scores_lr)
print('SVM scores:', scores_svm)
print('Decision Tree scores:', scores_tree)
print('Random Forest scores:', scores_rfcl)

Logistic Regression scores: [1.0, 0.9473684210526315, 0.8648648648648649, 1.0]
SVM scores: [1.0, 0.9473684210526315, 0.972972972972973, 1.0]
Decision Tree scores: [0.9736842105263158, 0.9473684210526315, 0.9459459459459459, 1.0]
Random Forest scores: [0.9736842105263158, 0.9473684210526315, 0.9459459459459459, 1.0]


## **Using Cross Val Score function**
 StratifiedKFold is used by default for classification

In [12]:
from sklearn.model_selection import cross_val_score

In [13]:
#Logistic Regression performance using cross_val_score
cross_val_score(LogisticRegression(solver='liblinear', multi_class='auto'),X,y,cv=4)


array([1.        , 0.94736842, 0.86486486, 1.        ])

In [14]:
# SVM performance using cross_val_score
cross_val_score(SVC(C=5, kernel='linear'), X, y, cv=4)

array([1.        , 0.94736842, 0.97297297, 1.        ])

In [15]:
# Decision Tree Performance using cross_val_score
cross_val_score(DecisionTreeClassifier(criterion='gini'), X, y, cv=4)

array([0.97368421, 0.94736842, 0.94594595, 0.97297297])

In [16]:
# Random Forest performance using cross_val_score
cross_val_score(RandomForestClassifier(n_estimators=100, criterion='gini'), X, y, cv=4)

array([0.97368421, 0.94736842, 0.91891892, 1.        ])

## **Parameter Tunning using KFold cross Validation**

In [17]:
scores1=cross_val_score(SVC(C=5, gamma='auto'), X, y, cv=5)
np.average(scores1)

0.9800000000000001

In [18]:
scores2=cross_val_score(SVC(C=10, kernel='linear'), X, y, cv=5)
np.average(scores2)

0.9733333333333334

In [21]:
scores3=cross_val_score(SVC(C=5, kernel='poly'), X, y, cv=5)
np.average(scores3)

0.9666666666666666

### **Conclusion:**
- Here we used cross_val_score to fine tune our SVM and figured that having Regularization=5, gamma as auto gives the best result.