Elements of models that cannot be learned by fitting the model.  

Common Examples:
- Alpha in Ridge/Lasso Regression
- n_neighbors in k-NN
- a and b in Linear Regression

#### Parameters

- Learned from data, include split-point in node, split-features, etc.
- Another way to put it: components of the model **learned during the modeling process**
- **you do not set these**; the algorithm sets these and will disover these for the user.

Where to Find Parameters:
- Know about the algorithm first and foremost
- But also located in the Scikit Learn documentation - **under "Attributes" section**
    - NOT in the "parameters" section

Linear Coefficients

In [None]:
#for example in linear regression the coefficients of each data point
#for viewing

original_variables = list(X_train.columns)

#zip together names and coefficients
zipped_together = list(zip(original_variables, log_reg_clf.coef_[0]))
coefs = [list(x) for x in zipped_together]

#put into DataFrame with column labels
coefs = pd.DataFrame(coefs, columns = ["Variable", "Coefficient"])

#sort dataframe and print top three coeffiencients
coefs.sort_values(by=["Coefficient"], axis=0, inplace=True, ascending=False)
print(coefs.head(3))

In [None]:
#another example
# Create a list of original variable names from the training DataFrame
original_variables = X_train.columns

# Extract the coefficients of the logistic regression estimator
model_coefficients = log_reg_clf.coef_[0]

# Create a dataframe of the variables and coefficients & print it out
coefficient_df = pd.DataFrame({"Variable" : original_variables, 
                               "Coefficient": model_coefficients})
print(coefficient_df)

# Print out the top 3 positive variables
top_three_df = coefficient_df.sort_values(by="Coefficient", axis=0, 
                                          ascending=False)[0:3]
print(top_three_df)

Parameters in Random Forest

- the parameters are in the **node decisions** to build the model (what feature and what value to split on)

In [None]:
#extracting the node decisions

#get the columns it split on
split_column = chosen_tree.tree_.feature[1]
split_column_name = X_train.columns[split_column]

#get the level it split on
split_value = chosen_tree.tree_.threshold[1]

print("This node split on feature {}, at a value of {}".format(split_column_name, split_value))

In [None]:
#example 2
# Extract the 7th (index 6) tree from the random forest
chosen_tree = rf_clf.estimators_[6]

# Visualize the graph using the provided image
imgplot = plt.imshow(tree_viz_image)
plt.show()

# Extract the parameters and level of the top (index 0) node
split_column = chosen_tree.tree_.feature[0]
split_column_name = X_train.columns[split_column]
split_value = chosen_tree.tree_.threshold[0]

# Print out the feature and level
print("This node split on feature {}, at a value of {}".format(split_column_name, split_value))

#### Hyperparameters

- Not learned from data, set prior to training, include max_depth, min_sample_leaf, splitting criterion, etc.
- **something that is set BEFORE the modeling process begins**

##### Hyperparameters that matter:

Random Forest:
- **n_estimators** (pref. and not uncommon to have a high value)
- **max_features** (try different values to ensure tree diversity)
- **max_depth, min_sample_leaf** (important for overfitting)
- **criterion** (maybe)

Sources for learning about hyperparameters:
- Academic papers
- Scikit Learn module documentation
- Practical experience

In [None]:
#how to find all the knobs and dials to set for a model
#example

rf_clf = RandomForestClassifier()
print(rf_clf)

#for what they mean: http://scikit-learn.org

Hyperparameter Tuning
- trying and experimenting with hyperparameter values and choosing the ones that perform the best

Even with hyperparameter tuning, it's wise to use cross-validation to ensure the model doesn't overfit.

In [None]:
#placing neighbors and corresponding accuracy in a DataFrame
results_df = pd.DataFrame({
    'neighbors':neighbors_list,
    'accuracy':accuracy_list
})

print(results_df)

In [None]:
#plotting Learning Curves
plt.plot(results_df['neighbors'],
        results_df['accuracy'])

#adding the labels and title
plt.gca().set(xlabel='n_neighbors', ylabel='Accuracy',
             title='Accuracy for different n_neighbors')

plt.show()

In [None]:
# Set the learning rates & results storage
learning_rates = [0.001, 0.01, 0.05, 0.1, 0.2, 0.5]
results_list = []

# Create the for loop to evaluate model predictions for each learning rate
for learning_rate in learning_rates:
    model = GradientBoostingClassifier(learning_rate=learning_rate)
    predictions = model.fit(X_train, y_train).predict(X_test)
    # Save the learning rate and accuracy score
    results_list.append([learning_rate, accuracy_score(y_test, predictions)])

# Gather everything into a DataFrame
results_df = pd.DataFrame(results_list, columns=['learning_rate', 
                                                 'accuracy'])
print(results_df)

##### Some hyperparameters are not important for model performance
- These are more for computational reasons/decisions

For Random Classifier, these will not assist in model performance:
- n_jobs
- random_state
- verbose

In [None]:
# Print out the old estimator, notice which hyperparameter is badly set
print(rf_clf_old)

# Get confusion matrix & accuracy for the old rf_model
print("Confusion Matrix: \n\n {} \n Accuracy Score: \n\n {}".format(
  	confusion_matrix(y_test, rf_old_predictions),  
  	accuracy_score(y_test, rf_old_predictions)))

Additional things to consider:
- which hyperparameters *might conflict*
- reasoning out if the values on the hyperparameters makes sense (such as low number of trees in a decision model or having 1 Neighbor as a robust KNN model?)
- incrementing a hyperparameter by a very small amount will unlikely improve the model

In [None]:
#model creation function

def gbm_grid_search(learn_rate, max_depth):
    model = GradientBoostingClassifier(
    learning_rate = learn_rate,
    max_depth = max_depth)
    predictions = mode.fit(X_train, y_train).predict(X_test)
    return ([learn_rate, max_depth, accuracy_score(y_test, predictions)])

#look through the list of hyperparamters and call function
return_list = []

for learn_rate in learn_rate_list:
    for max_depth in max_depth_list:
        results_list.append(gbm_grid_search(learn_rate, max_depth))
        
#save results in DataFrame to view
results_df = pd.DataFrame(results_list, columns=['learning_rate', 'max_depth', 'accuracy'])
print(results_df)

### Grid Search Cross-Validation

Advantages of GridSearch:
- You don't have to write thousands of lines of code

Steps in a Grid Search:
- An algorithm to tune the hyperparameters (ie. an estimator)
- Defining which hyperparameters we will tune
- Defining a range of values for each hyperparameter
- Setting a cross-validation scheme
- Define a score function so we can decide which square on our grid was 'the best'
- Include extra useful information or functions

GridSearchCV Object Inputs
- **estimator**
- **param_grid** - requires a dictionary with parameter names and values
    - and the keys must be valid hyperparameters
- **cv**
- **scoring**
- **refit**
- **n_jobs**

In [None]:
#check all available metrics using:
from sklearn import metrics
sorted(metrics.SCORERS.keys())

In [None]:
from sklearn.model_selection import GridSearchCV

#specify the hyperparameter
param_grid = {'n_neighbors' : np.arange(1, 50)}

#instantiate classifier (this scenario KNN)
knn = KNeighborsClassifier()

#use gridsearchcv and param_grid defined above
knn_cv = GridSearchCV(knn, param_grid, cv=5)

#performing the fit for the searchgrid in place
knn_cv.fit(X, y)

##### GridSearchCV Output Properties:
- 1 - Result Log : **cv_results_**
- 2 - Best results:
    - **best_index_**
    - **best_params_**
    - **best_score_**
- 3 - Extra information
    - **scorer_**
    - **n_splits_**
    - **refit_time_**

#### Retrieve the hyperparameters that performed the best

The optimal hyperparameters are those of the model achieving the best CV score.

In [None]:
knn_cv.best_params_

In [None]:
knn_cv.best_score_

#### RandomizedCV Example

How Random Search is similar to grid search:
- Define an estimator, which hyperparamters to tune and the range of values for each hyperparameter
- still set a cross-validation scheme and scoring function
- BUT instead, randomly select grid squares

This works because:
- Not every hyperparameter is as important

In [None]:
from scipy.stats import randint
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import RandomizedSearchCV

#parameters and distributions
param_dist = {"max_depth": [3, None],
              "max_features": randint(1, 9),
              "min_samples_leaf": randint(1, 9),
              "criterion": ["gini", "entropy"]}


tree = DecisionTreeClassifier()
tree_cv = RandomizedSearchCV(tree, param_dist, cv=5)
tree_cv.fit(X, y)


print("Tuned Decision Tree Parameters: {}".format(tree_cv.best_params_))
print("Best score is {}".format(tree_cv.best_score_))

#### Another example with GridSearchCV

In [None]:
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import roc_auc_score

params_dt = {
    'max_depth':[3,4,5,6],
    'min_samples_leaf':[0.04,0.06,0.08],
    'max_features':[0.2,0.4,0.6,0.8]
}

grid_dt = GridSearchCV(estimator=dt,
                      param_grid = params_dt,
                      scoring='accuracy',
                      cv=10,
                      n_jobs=1)

grid_dt.fit(X_train, y_train)

#extracting best parameters from grid_dt after training
best_hyperparams = grid_dt.best_params_
print('Best hyperparameters:\n', best_hyperparams)

#extract best CV score from 'grid_dt'
best_CV_score = grid_dt.best_score_
print('Best CV accuracy'.format(best_CV_score))

#extract best model from grid_dt
best_model = grid_dt.best_estimator_

#evaluate test set accuracy
test_acc = best_model.score(X_test, y_test)
print("Test set accuracy of best model: {:.3f}".format(test_acc))

### Inspecting Random Forest Hyperparameters

In [None]:
from sklearn.ensemble import RandomForestRegressor

SEED=1
rf = RandomForestRegressor(random_state=SEED)

#inspecting rf's hyperparameters
rf.get_params()


from sklearn.metrics import mean_squared_error as MSE
from sklearn.model_selection import GridSearchCV

#dictionary of parameters
params_rf = {
    'n_estimators': [300,400,500],
    'max_depth':[4,5,6],
    'min_samples_leaf':[0.1,0.2],
    'max_features':['log2','sqrt']
}

#instantiate GriSearchCV object
grid_rf = GridSearchCV(estimator=rf,
                      param_grid=params_rf,
                      cv=3,
                      scoring='neg_mean_squared_error',
                      verbose=1,
                      n_jobs=-1)

grid_rf.fit(X_train, y_train)

best_hyperparams = grid_rf.best_params_
print('Best hyperparameters:\n', best_hyperparams)

best_model = grid_rf.best_estimator_

y_pred = best_model.predict(X_test)

rmse_test = MSE(y_test, y_pred)**(1/2)
print('Test set RMSE of rf: {:.2f}'.format(rmse_test))