## Exercise 2: Determine optimal hyperparameters based on 10-fold cross validation
In this exercise, you need to write code that will use your estimators from above to automatically choose the best hyperparameters for the histogram and kernel density estimator.  In particular, find the best `n_bins` and `bandwidth` for the histogram and KDE respectively.

### Task 1: Implement custom scorer function for use in GridSearchCV
To do this, you will need to implement a `scorer` function that will compute the log likelihood of the data given (higher is better).
This function takes in the model, the input data X and y_true (which defaults to None since this is an unsupervised problem).

In [None]:
def mean_log_likelihood_scorer(model, X, y_true=None):
    ########## Your code here ########
    # Compute and return the mean log probability of the data
    #  (Note y_true is not used)
    predictions = model.predict_proba(X)
    mean_ll = np.mean(np.log(predictions))
    return mean_ll
    ############

### Task 2: Estimate best hyperparameters
Then you can use sklearn's cross validation utilities to cross validate using the training data to determine the best parameters by passing this function as the `scoring` argument of GridSearchCV (note you just pass it directly as `mean_log_likelihood_scorer` without the parenthesis; this is known as passing a function to another function).

You should try 2-20 number of bins and a 50 bandwidth parameters linearly spaced between 0.1 and 10.

Finally, print out the optimal hyperparameters and, using the optimal hyperparameters, print out the log likelihood of the test data for both the histogram and KDE model.

In [None]:
from sklearn.model_selection import GridSearchCV

grid_histogram = {
    'n_bins' : [i for i in range(2, 21)]
}

grid_kde = {
    'bandwidth' : np.linspace(0.1, 10, 50)
}

histogram_cv = GridSearchCV(HistogramDensity(n_bins=15, min_val=min_val, max_val=max_val),
                            grid_histogram,
                            cv = 5,
                            scoring = mean_log_likelihood_scorer)
result = histogram_cv.fit(X_train)
print(f'The best parameters are {histogram_cv.best_params_}')
print(f'The best accuracy on the testing data is {histogram_cv.score(X_test)}')

kde_cv = GridSearchCV(KernelDensity(bandwidth=1),
                      grid_kde,
                      cv=5,
                      scoring = mean_log_likelihood_scorer)
result = kde_cv.fit(X_train)
print(f'The best parameters are {kde_cv.best_params_}')
print(f'The best accuracy on the testing data is {kde_cv.score(X_test)}')


The best parameters are {'n_bins': 14}
The best accuracy on the testing data is -2.6758365586689674
The best parameters are {'bandwidth': 0.3020408163265306}
The best accuracy on the testing data is -1.9960199163901615
