# Defining the problem

### What is the input?





the input data representing the outcome of speed dating session based on the profile of two people

*   conisting of 191 features that represent each speed dating session 
*   The dataset is clean, but has a lot of missing values which need to preprocessed




### What is the output?


The output is the probability of matching people 


*   The prediction of the probability (0-1, float) that the dating session will lead to a successful match.




### What data mining function is required?

As I understand from this part of the slide


```
Data Mining Functions
1. Generalization and Summarization
2. Association and Correlation
3. Classification & Prediction
4. Clustering
5. Outlier/Anomaly Analysis
6. Time and Ordering 
7. Structure and Network Analysis
```

The data mining in this problem requires Classification & Prediction After preprocessing the data


### What could be the challenges?

The Challenges represented in:


*   Missing data
*   the dataset is highly unbalanced (mostly unmatched)
*   Searching for the best Hyperparameters 








### What is the impact?

The impact of using the raw data as it is, without cleaning and reprocessing, will result a model with low accuracy that doesn't learn well or as desired from the data in the traing stage


---

the real-life impact of building a model that solve this problem is represent in implementing a recommendation system to better match people in speed dating events according to the result of some survey that people could fill about themselves and their possible partener and according to the previous collected data.
<br/>
and this will save the time that they may wast together if they aren't matching 



### What is an ideal solution?

the ideal solution is to clean and preprocess the data before working with it



> Some of the possible solutions are:



*   Filling the missing data with approprait value according to the result of hyperparamter tuning 
*   Drop useless Columns (or Features) in order to reduce the dimintionality (Feature selection)








### What is the experimental protocol used and how was it carried out? 

After loading the data and cleaning and preprocessing it, the experimental protocol used is setting value for k-fold cross validation cv while using GridSearchCV / RandomizedSearchCV / or BayesSearchCV
<br/>
and measure the perormance using (roc_auc)


### What preprocessing steps are used?


*   view the data and understand it
*   using df.info() to get more insight about the data
*   check the missing data using df.isna().sum()
*   convert all object columns to categorical column
*   extracting numeric features and categorical features
*   define a pipe line for numeric feature preprocessing with applying StandardScaler on it
*   define a pipe line for categorical feature preprocessing with applying OneHotEncoder on it
*   define the preprocessor and specify what are the categorical and numeric pipeline on it 

*   using hyperparameter tuning, try to find: 
  -   the approprait strategy to fill the missing data


# Get Started (Importing packages & Loading the data)


## Import packages 

In [None]:
# this line is for BayesSearchCV and using skopt package
!pip install scikit-optimize

Collecting scikit-optimize
  Downloading scikit_optimize-0.9.0-py2.py3-none-any.whl (100 kB)
[K     |████████████████████████████████| 100 kB 3.3 MB/s 
[?25hCollecting pyaml>=16.9
  Downloading pyaml-21.10.1-py2.py3-none-any.whl (24 kB)
Installing collected packages: pyaml, scikit-optimize
Successfully installed pyaml-21.10.1 scikit-optimize-0.9.0


In [None]:
import numpy as np 
import pandas as pd 
import seaborn as sns 
import matplotlib.pyplot as plt 
%matplotlib inline
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV, KFold, RandomizedSearchCV
from sklearn.metrics import classification_report, confusion_matrix, roc_curve, precision_score,recall_score, f1_score,precision_recall_curve
sns.set()
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler, LabelEncoder, OneHotEncoder
from sklearn.svm import SVC
from time import time
from sklearn.svm import LinearSVC
from sklearn.decomposition import PCA

from sklearn.compose import ColumnTransformer
from sklearn.datasets import fetch_openml
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from xgboost.sklearn import XGBClassifier
from sklearn.ensemble import GradientBoostingClassifier

from skopt import BayesSearchCV
from skopt.space import Real, Categorical, Integer
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import PredefinedSplit

In [None]:
pd.set_option("display.max_rows", 20)

## Load Data

In [None]:
# Loading the data from csv files

train = pd.read_csv('data/train.csv')
test = pd.read_csv('data/test.csv')

In [None]:
# Look at first records of the data 
train.head()

Unnamed: 0,gender,idg,condtn,wave,round,position,positin1,order,partner,pid,...,sinc3_3,intel3_3,fun3_3,amb3_3,attr5_3,sinc5_3,intel5_3,fun5_3,amb5_3,id
0,0,3,2,14,18,2,2.0,14,12,372.0,...,,,,,,,,,,2583
1,1,14,1,3,10,2,,8,8,63.0,...,8.0,8.0,7.0,8.0,,,,,,6830
2,1,14,1,13,10,8,8.0,10,10,331.0,...,,,,,,,,,,4840
3,1,38,2,9,20,18,13.0,6,7,200.0,...,9.0,8.0,8.0,6.0,,,,,,5508
4,1,24,2,14,20,6,6.0,20,17,357.0,...,,,,,,,,,,4828


In [None]:
# show the information of the train dataset
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5909 entries, 0 to 5908
Columns: 192 entries, gender to id
dtypes: float64(173), int64(11), object(8)
memory usage: 8.7+ MB


# Data Preprocessing

In [None]:
# check the weight of missing values
train.isna().sum().sum()

304971

In [None]:
# Convert all object columns to categorical column (categorical encoding)
train[train.select_dtypes(['object']).columns] = train.select_dtypes(['object']).apply(lambda x: x.astype('category'))

In [None]:
# show the information of the train dataset
train.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5909 entries, 0 to 5908
Columns: 192 entries, gender to id
dtypes: category(8), float64(173), int64(11)
memory usage: 8.5 MB


In [None]:
# split the train data to features and lable column
import numpy as np
from sklearn.model_selection import train_test_split

y = train['match'] 
X = train.drop(columns=['match', 'id'])
print('original shape', X.shape, y.shape)

original shape (5909, 190) (5909,)


In [None]:
# extracting numeric features and categorical features names

# numeric features 
features_numeric = list(X.select_dtypes(include=['float64', 'int64']))

# categorical features 
features_categorical = list(X.select_dtypes(include=['category']))

print('numeric features:', features_numeric)
print('categorical features:', features_categorical)

numeric features: ['gender', 'idg', 'condtn', 'wave', 'round', 'position', 'positin1', 'order', 'partner', 'pid', 'int_corr', 'samerace', 'age_o', 'race_o', 'pf_o_att', 'pf_o_sin', 'pf_o_int', 'pf_o_fun', 'pf_o_amb', 'pf_o_sha', 'attr_o', 'sinc_o', 'intel_o', 'fun_o', 'amb_o', 'shar_o', 'like_o', 'prob_o', 'met_o', 'age', 'field_cd', 'race', 'imprace', 'imprelig', 'goal', 'date', 'go_out', 'career_c', 'sports', 'tvsports', 'exercise', 'dining', 'museums', 'art', 'hiking', 'gaming', 'clubbing', 'reading', 'tv', 'theater', 'movies', 'concerts', 'music', 'shopping', 'yoga', 'exphappy', 'expnum', 'attr1_1', 'sinc1_1', 'intel1_1', 'fun1_1', 'amb1_1', 'shar1_1', 'attr4_1', 'sinc4_1', 'intel4_1', 'fun4_1', 'amb4_1', 'shar4_1', 'attr2_1', 'sinc2_1', 'intel2_1', 'fun2_1', 'amb2_1', 'shar2_1', 'attr3_1', 'sinc3_1', 'fun3_1', 'intel3_1', 'amb3_1', 'attr5_1', 'sinc5_1', 'intel5_1', 'fun5_1', 'amb5_1', 'attr', 'sinc', 'intel', 'fun', 'amb', 'shar', 'like', 'prob', 'met', 'match_es', 'attr1_s', 'sin

In [None]:

np.random.seed(0)

# define a pipe line for numeric feature preprocessing
transformer_numeric = Pipeline(
    steps=[
        ('imputer', SimpleImputer()),
        ('scaler', StandardScaler())]
)

# define a pipe line for categorical feature preprocessing
transformer_categorical = Pipeline(
    steps=[
        ('imputer', SimpleImputer(strategy='constant')),
        ('onehot', OneHotEncoder(handle_unknown='ignore'))
    ]
)


# define the preprocessor 
# we also specify what are the categorical 
preprocessor = ColumnTransformer(
    transformers=[
        ('num', transformer_numeric, features_numeric),
        ('cat', transformer_categorical, features_categorical)
    ]
)




# Saving Prediction Result


In [None]:
# define function to save the csv file of the result after each trial
def saveResult(test, classifier, fileName):
  submission = pd.DataFrame()

  submission['id'] = test['id']

  submission['match'] = classifier.predict_proba(test.drop(columns=['id']))[:,1]

  submission.to_csv(fileName, index=False)

# Tuning Pipeline

## RandomForestClassifier Pipeline with **GridSearchCV**

thoughts and observations for trial 0, plan for trial 1: 

<br/>

I used **RandomForestClassifier** through the pipeline that contain the classifier and the preprocessor object which created in the preprocessing step
with **GridSearchCV** function to get the best hyperparameters that give the better accuracy.
<br/>

the hyperparameters that used in this trial:
* 'imputer__strategy': ['mean'] => strategey of filling the missing data
* 'n_estimators': [20, 30, 40, 50]  
* 'max_depth':[5, 10, 20, 30]
<br/>

I excepected to get the best hyperparameter that reach the global optimal (within the given range) and produce the better accuracy among all combination



In [None]:
# combine the preprocessor with the model as a full tunable pipeline
full_pipline = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('my_classifier', 
           RandomForestClassifier(),
        )
    ]
)

In [None]:
# fitting the pipeline
# The pipeline object can be used like any sk-learn model  
full_pipline = full_pipline.fit(X, y)

In [None]:

# specifying the search space (hyperparameters)
param_grid = {
    'preprocessor__num__imputer__strategy': ['mean'],
    'my_classifier__n_estimators': [20, 30, 40, 50],  
    'my_classifier__max_depth':[5, 10, 20, 30]       
}

# four-fold cross-validation
grid_search = GridSearchCV(
    full_pipline, param_grid, cv=4, verbose=1, n_jobs=2, 
    scoring='roc_auc')

grid_search.fit(X, y)

print('best score {}'.format(grid_search.best_score_))
print('best score {}'.format(grid_search.best_params_))

Fitting 4 folds for each of 16 candidates, totalling 64 fits
best score 0.8436019397678725
best score {'my_classifier__max_depth': 5, 'my_classifier__n_estimators': 50, 'preprocessor__num__imputer__strategy': 'mean'}


In [None]:
# call saveResult function to save the predicted result to csv file
saveResult(test, grid_search, 'RandomForestClassifier_Pipeline_with_GridSearchCV.csv')

## GradientBoostingClassifier Pipeline with **GridSearchCV**

thoughts and observations for trial 1, plan for trial 2: 

<br/>

I used **GradientBoostingClassifier** through the pipeline that contain the classifier and the preprocessor object which created in the preprocessing step
with **GridSearchCV** function to get the best hyperparameters that give the better accuracy.
<br/>

the hyperparameters that used in this trial:
* 'imputer__strategy': ['mean', 'median'] => strategey of filling the missing data
* 'n_estimators': [250, 500, 750]  
* 'max_depth': [3, 5, 7, 9]
* 'learning_rate': [0.01, 0.1, 1]  

<br/>

I excepected to get the best hyperparameter that reach the global optimal (within the given range) and produce the better accuracy among all combination
<br/>
Also I excepected a better accuracy than the previous trial



In [None]:
# combine the preprocessor with the model as a full tunable pipeline
full_pipline = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('my_classifier', 
           GradientBoostingClassifier(),
        )
    ]
)

In [None]:
# fitting the pipeline
# The pipeline object can be used like any sk-learn model  
full_pipline = full_pipline.fit(X, y)

In [None]:

# specifying the search space (hyperparameters)
param_grid = {
    'preprocessor__num__imputer__strategy': ['mean', 'median'],
    'my_classifier__n_estimators': [250, 500, 750],  
    'my_classifier__max_depth': [3, 5, 7, 9],
    'my_classifier__learning_rate': [0.01, 0.1, 1]       
}

# three-fold cross-validation
grid_search = GridSearchCV(
    full_pipline, param_grid, cv=3, verbose=1, n_jobs=2, 
    scoring='roc_auc')

grid_search.fit(X, y)

print('best score {}'.format(grid_search.best_score_))
print('best score {}'.format(grid_search.best_params_))

Fitting 3 folds for each of 36 candidates, totalling 108 fits
best score 0.8736446887339357
best score {'my_classifier__learning_rate': 0.01, 'my_classifier__max_depth': 5, 'my_classifier__n_estimators': 750, 'preprocessor__num__imputer__strategy': 'mean'}


In [None]:
# call saveResult function to save the predicted result to csv file
saveResult(test, grid_search, 'GradientBoostingClassifier_Pipeline_with_GridSearchCV.csv')

## XGBClassifier Pipeline with **GridSearchCV**

thoughts and observations for trial 2, plan for trial 3: 

<br/>

I used **XGBClassifier** through the pipeline that contain the classifier and the preprocessor object which created in the preprocessing step
with **GridSearchCV** function to get the best hyperparameters that give the better accuracy.
<br/>

the hyperparameters that used in this trial:
* 'imputer__strategy': ['mean', 'median', 'most_frequent']  => strategey of filling the missing data
* 'n_estimators': [50, 100, 200]  
* 'max_depth': [2, 7, 10]  

<br/>

I excepected to get the best hyperparameter that reach the global optimal (within the given range) and produce the better accuracy among all combination



In [None]:
# combine the preprocessor with the model as a full tunable pipeline
full_pipline = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('my_classifier', 
           XGBClassifier(),
        )
    ]
)

In [None]:
# fitting the pipeline
# The pipeline object can be used like any sk-learn model  
full_pipline = full_pipline.fit(X, y)

In [None]:

# specifying the search space (hyperparameters)
param_grid = {
    'preprocessor__num__imputer__strategy': ['mean', 'median', 'most_frequent'],
    'my_classifier__n_estimators': [50, 100, 200],  
    'my_classifier__max_depth':[2, 7, 10]       
}

# four-fold cross-validation
grid_search = GridSearchCV(
    full_pipline, param_grid, cv=4, verbose=1, n_jobs=2, 
    scoring='roc_auc')

grid_search.fit(X, y)

print('best score {}'.format(grid_search.best_score_))
print('best score {}'.format(grid_search.best_params_))

Fitting 4 folds for each of 27 candidates, totalling 108 fits
best score 0.881179071152443
best score {'my_classifier__max_depth': 7, 'my_classifier__n_estimators': 100, 'preprocessor__num__imputer__strategy': 'mean'}


In [None]:
# call saveResult function to save the predicted result to csv file
saveResult(test, grid_search, 'XGBClassifier_Pipeline_with_GridSearchCV.csv')

## XGBClassifier Pipeline with **RandomizedSearchCV**

thoughts and observations for trial 3, plan for trial 4: 

<br/>

I used **XGBClassifier** through the pipeline that contain the classifier and the preprocessor object which created in the preprocessing step
with **RandomizedSearchCV** function to get the best random hyperparameters of all the hyperparameter combinations among the specified number of iteration 
<br/>

the hyperparameters that used in this trial:
* 'imputer__strategy': ['mean', 'median', 'most_frequent']  => strategey of filling the missing data
* 'n_estimators': [50, 100, 200]  
* 'max_depth': [2, 7, 10]  

<br/>

I excepected to get lower accuracy compared to the GridSearchCV with the same classifier
<br/>
Also I excepected the hyperparameters that reach the *local optimal* (within the given range) and produce the better accuracy among the random selected combinations



In [None]:
# combine the preprocessor with the model as a full tunable pipeline
full_pipline = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('my_classifier', 
           XGBClassifier(),
        )
    ]
)

In [None]:
# fitting the pipeline
# The pipeline object can be used like any sk-learn model  
full_pipline = full_pipline.fit(X, y)

In [None]:

# specifying the search space (hyperparameters)
param_grid = {
    'preprocessor__num__imputer__strategy': ['mean', 'median', 'most_frequent'],
    'my_classifier__n_estimators': [50, 100, 200],  
    'my_classifier__max_depth':[2, 7, 10]       
}

# five-fold cross-validation
grid_search = RandomizedSearchCV(
    full_pipline, param_grid, cv=5, verbose=1, n_jobs=2, 
    # number of random trials
    n_iter=20,
    scoring='roc_auc')

grid_search.fit(X, y)

print('best score {}'.format(grid_search.best_score_))
print('best score {}'.format(grid_search.best_params_))

Fitting 5 folds for each of 20 candidates, totalling 100 fits
best score 0.8819554153240737
best score {'preprocessor__num__imputer__strategy': 'most_frequent', 'my_classifier__n_estimators': 200, 'my_classifier__max_depth': 7}


In [None]:
# call saveResult function to save the predicted result to csv file
saveResult(test, grid_search, 'XGBClassifier_Pipeline_with_RandomizedSearchCV.csv')

## SVM Pipeline with **RandomizedSearchCV**

thoughts and observations for trial 4, plan for trial 5: 

<br/>

I used **SVM Classifier** through the pipeline that contain the classifier and the preprocessor object which created in the preprocessing step
with **RandomizedSearchCV** function to get the best random hyperparameters of all the hyperparameter combinations among the specified number of iteration 
<br/>

the hyperparameters that used in this trial:
* 'imputer__strategy': ['mean', 'median', 'most_frequent']  => strategey of filling the missing data
* 'kernel': ['linear', 'rbf', 'poly'],
* 'C': [0.001, 0.01, 0.1, 1, 10, 100, 200],
* 'gamma': [0.1, 0.5, 0.7, 1],
* 'degree': [1, 2, 3, 4, 5]  

<br/>


I excepected the hyperparameters that reach the *local optimal* (within the given range) and produce the better accuracy among the random selected combinations



In [None]:
# combine the preprocessor with the model as a full tunable pipeline
full_pipline = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('my_classifier', 
           SVC(probability=True, class_weight='balanced'),
        )
    ]
)

In [None]:
# fitting the pipeline
# The pipeline object can be used like any sk-learn model  
full_pipline = full_pipline.fit(X, y)

In [None]:

# specifying the search space (hyperparameters)
param = {
    'preprocessor__num__imputer__strategy': ['mean', 'median', 'most_frequent'],
    'my_classifier__kernel': ['linear', 'rbf', 'poly'],
    'my_classifier__C': [0.001, 0.01, 0.1, 1, 10, 100, 200],
    'my_classifier__gamma': [0.1, 0.5, 0.7, 1],
    'my_classifier__degree': [1, 2, 3, 4, 5]       
}

# five-fold cross-validation
grid_search = RandomizedSearchCV(
    full_pipline, param, cv=5, verbose=1, n_jobs=2, 
    # number of random trials
    n_iter=20,
    scoring='roc_auc')

grid_search.fit(X, y)

print('best score {}'.format(grid_search.best_score_))
print('best score {}'.format(grid_search.best_params_))

Fitting 5 folds for each of 20 candidates, totalling 100 fits
best score 0.8636563477471924
best score {'preprocessor__num__imputer__strategy': 'mean', 'my_classifier__kernel': 'poly', 'my_classifier__gamma': 0.1, 'my_classifier__degree': 1, 'my_classifier__C': 0.01}


In [None]:
# call saveResult function to save the predicted result to csv file
saveResult(test, grid_search, 'SVC_Pipeline_with_RandomizedSearchCV.csv')

## SVM Pipeline with **BayesSearchCV**

thoughts and observations for trial 5, plan for trial 6: 

<br/>

I used **SVM Classifier** through the pipeline that contain the classifier and the preprocessor object which created in the preprocessing step
with **BayesSearchCV** function to get the best hyperparameters among the specified number of iteration based on using bayesian learning to predict what is the next hyperparamter values we should try given the current trials
<br/>

the hyperparameters that used in this trial:
* 'imputer__strategy': ['mean', 'median', 'most_frequent']  => strategey of filling the missing data
* 'kernel': ['linear', 'rbf', 'poly'],
* 'C': [0.001, 0.01, 0.1, 1, 10, 100, 200],
* 'gamma': [0.1, 0.5, 0.7, 1],
* 'degree': [1, 2, 3, 4, 5]  

<br/>

I excepected to get better accuracy compared to the RandomizedSearchCV with the same classifier
<br/>
Also I excepected the hyperparameters that reach the *local optimal* (within the given range) and produce the better accuracy among the generated combinations



In [None]:
# combine the preprocessor with the model as a full tunable pipeline
full_pipline = Pipeline(
    steps=[
        ('preprocessor', preprocessor),
        ('my_classifier', 
           SVC(probability=True, class_weight='balanced'),
        )
    ]
)

In [None]:
# fitting the pipeline
# The pipeline object can be used like any sk-learn model  
full_pipline = full_pipline.fit(X, y)

In [None]:

# specifying the search space (hyperparameters)
param = {
    'preprocessor__num__imputer__strategy': ['mean', 'median', 'most_frequent'],
    'my_classifier__kernel': ['linear', 'rbf', 'poly'],
    'my_classifier__C': [0.001, 0.01, 0.1, 1, 10, 100, 200],
    'my_classifier__gamma': [0.1, 0.5, 0.7, 1],
    'my_classifier__degree': [1, 2, 3, 4, 5]       
}

# three-fold cross-validation
bayes_search = BayesSearchCV(
    full_pipline, param, cv=3, n_iter=30, random_state=0, verbose=1, n_jobs=2, 
    scoring='roc_auc')

bayes_search.fit(X, y)

print('best score {}'.format(bayes_search.best_score_))
print('best score {}'.format(bayes_search.best_params_))


Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fits
Fitting 3 folds for each of 1 candidates, totalling 3 fi

In [None]:
# call saveResult function to save the predicted result to csv file
saveResult(test, bayes_search, 'SVC_Pipeline_with_BayesSearchCV.csv')

# Questions

## Why a simple linear regression model (without any activation function) is not good for classification task, compared to Perceptron/Logistic regression?



Becuase when using simple linear regression model:
*  the predicted value is continuous, not probabilistic
*  sensitive to imbalance data when using linear regression for classification

*  Linear regression produces a linear hypothesis function. However, in classification problems, our data do not show up in a linear distribution but in a grouped distribution.
<br/>

<br/>

resources: 
* https://jinglescode.github.io/2019/05/07/why-linear-regression-is-not-suitable-for-classification/
* https://ai.plainenglish.io/why-dont-we-approach-to-classification-problems-using-linear-regression-in-machine-learning-8edcca89448


## What's a decision tree and how it is different to a logistic regression model?

*  **Decision Tree :**  the most powerful and popular tool for classification and prediction. A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label. 

*  the different between decesion tree and logistic regression is: Decision Trees bisect the space into smaller and smaller regions, whereas Logistic Regression fits a single line to divide the space exactly into two

The resource of the first part of the question is:
 https://www.geeksforgeeks.org/decision-tree/

## What's the difference between grid search and random search?



> Grid search
* Try out every combination of the parameters:
* Computationally expensive
* Global optimal (within the given range)
* Sklearn: model_selection.GridSearchCV


> Random search
* Try out a random subset
* `good enough`
* Local optimal (within the given range)
* Efficient (less trials)
* Sklearn: model_selection.RandomizedSearchCV



---

<br/>

**Random Search** replaces the exhaustive enumeration of all combinations that is used in the **grid Search** by selecting them randomly.

<br/>

**Grid Search** use all the hyperparameter combinations but **Random Search** limit the number of hyperparameter combinations that are tested

<br/>

Grid search (global optimal) is expensive when you specify a large search space. Alternatively, random search CV gives local optimal (may be good enough and even more generalizable)

<br/>

the random search is more faster than the grid search


## What's the difference between bayesian search and random search?


> Random search
* Try out a random subset
* `good enough`
* Local optimal (within the given range)
* Efficient (less trials)
* Sklearn: model_selection.RandomizedSearchCV


> Bayesian Optimization
* As an optimization problem
* Trial -> estimated error -> Bayesian model estimates the next
parameter to try -> trial -> repeat..
* pip install bayesian-optimization



---

<br/>

The **Random Search** use random sample of the combinations of hyperparameters to train the model and get the score

<br/>

The main difference between **Bayesian search** and the other methods is that the tuning algorithm optimizes its parameter selection in each round according to the previous round score. Thus, instead of randomly choosing the next set of parameters, the algorithm optimizes the choice, and likely reaches the best parameter set faster than the previous two methods. Meaning, this method chooses only the relevant search space and discards the ranges that will most likely not deliver the best solution. Thus, it can be beneficial when you have a large amount of data, the learning is slow, and you want to minimize the tuning time.


<br/>

resource for bayesian search:
https://towardsdatascience.com/bayesian-optimization-for-hyperparameter-tuning-how-and-why-655b0ee0b399