# How to recognize Hyperparameters in any sklearn estimator?

1. Hyperparameters are not direcly learned within the estimators

2. E.g. Degree in PolynomialFeatures

3. Learning Rate in SGD

4. Generally passed as argument

## Select hyperparameters that result in best cross validation validation scores

Hyperparameters search consists of ⁉

1. an estimator
2. parameter space
3. a method for searching or sampling candidates
4. cross-validation scheme 
5. Score Function



## HPT approaches in sklearn

1. `GridSearchCV` exhaustively consideres all the parameter combinations for specified values


```
param_grid = [{'C':[1,10,100,1000],
'kernel':['linear']}]
```


2. `RandomizedSearchCV` samples a given number of candidate values from a parameter space with a specified distribution

```
param_dist = {
  "average":[True, False],
  "l1_ratio": stats.uniform(0,1),
  "alpha": loguniform(1e-4, 1e0)
}
```


## Grid Search Vs Randomized Search

GridSearchCV | RandomizedSearchCV |
-------------|--------------------|
specifies the exact values of parameters in grid | Specifies the distribution of parameter values and values sampled from those distribtuion|
   | Computational budget can be chosen independent of number of parameters and vlaues|
   | The budget is chosen in terms of the number of sampled candidates or the number of training iterations. Specified in n_iter arguments. 



# Steps in HPT

1. Divide the Training data into training, validation and test set

2. For each combination of hyper-parameter values learn a model with training set

## This step creates multiple model
set n_jobs = -1 to run step in parallel

Some parameter combination may cause failure in fittin one or more folds of data. This may cause failure in fitting one or more folds of data. This may cause search to fail. Set error_score = 0 or np.NaN. for problematic fold to zero and complete the search


3. Evaluate the performance of each model with validation set and select a model with best evaluation score

4. Retrain model with best hyperparameter setting on training and validation set combined.

5. Evaluate the model performance on test set. 

# How to determine degree of polynomial regression with grid search?




```
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import SGDRegressor
param_grid = [{'poly_degree': [2,3,4,5,6,7,8,9]}]

pipeline = Pipeline(steps = [('poly', PolynomialFeatures()),
                             ('sgd', SGDRegressor())])

grid_search = GridSearchCV(pipeline, param_grid, cv = 5, scoring = 'neg_mean_squared_error',
                           return_train_score = True)

grid_search.fit(x_train.reshape(-1,1), y_train)
```

