# Machine Learning Finding Optimal Model and Hyperparameters

**Hyperparameters** - configuration settings of an algorithm that are not learned from the data but rather set by the user before training.   

**GridSearchCV** - It automates the process of training and evaluating models with different hyperparameter combinations and provides you with the best combination 

### Excercise
Try different classifiers and find out the one that gives best performance. Also find the optimal parameters for that classifier.

In [7]:
from sklearn.datasets import load_iris
iris = load_iris()

In [8]:
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.naive_bayes import MultinomialNB

In [9]:
model_params = {
    'svm': {
        'model': SVC(gamma='auto'),
        'params': {
            'C': [1, 10, 20],
            'kernel': ['rbf', 'linear']
        }
    },
    'knn': {
        'model': KNeighborsClassifier(),
        'params': {
            'n_neighbors': [3, 5, 10]
        }
    },
    'decision_tree': {
        'model': DecisionTreeClassifier(),
        'params': {
            'criterion': ['gini', 'entropy']
        }
    },
    'random_forest': {
        'model': RandomForestClassifier(),
        'params': {
            'n_estimators': [1, 5, 10]
        }
    },
    'logistic_regression': {
        'model': LogisticRegression(solver='liblinear', multi_class='auto'),
        'params': {
            'C': [1, 5, 10],
        }
    },
    'naive_bayes_gaussian': {
        'model': GaussianNB(),
        'params': {}
    },
    'naive_bayes_multinomial': {
        'model': MultinomialNB(),
        'params': {}
    }
}

In [10]:
from sklearn.model_selection import GridSearchCV
import pandas as pd
scores = []

for model_name, mp in model_params.items():
    clf = GridSearchCV(mp['model'], mp['params'], cv=5, return_train_score=False)
    clf.fit(iris.data, iris.target)
    scores.append({
        'model': model_name,
        'best_score': clf.best_score_,
        'best_params': clf.best_params_
    })

df = pd.DataFrame(scores, columns=['model', 'best_score', 'best_params'])
df

Unnamed: 0,model,best_score,best_params
0,svm,0.98,"{'C': 1, 'kernel': 'rbf'}"
1,knn,0.98,{'n_neighbors': 10}
2,decision_tree,0.966667,{'criterion': 'gini'}
3,random_forest,0.96,{'n_estimators': 1}
4,logistic_regression,0.966667,{'C': 5}
5,naive_bayes_gaussian,0.953333,{}
6,naive_bayes_multinomial,0.953333,{}


**Here for the iris flower classification, the best model is SVM (C=1, kernel=rbf)**

## RandomizedSearchCV
To reducie number of iterations and with random combination of parameters. This is useful when you have too many parameters to try and your training time is longer. It helps to reduce the cost of computation.

In [11]:
from sklearn.model_selection import RandomizedSearchCV

In [12]:
rscv = RandomizedSearchCV(SVC(gamma='auto'), {
    'C': [1, 10, 20],
    'kernel': ['rbf', 'linear']
    },
    cv=5,
    return_train_score=False,
    n_iter=2
)

rscv.fit(iris.data, iris.target)
df2 = pd.DataFrame(rscv.cv_results_)[['param_C', 'param_kernel', 'mean_test_score']]
df2

Unnamed: 0,param_C,param_kernel,mean_test_score
0,1,rbf,0.98
1,20,linear,0.966667
