Skip to content
This repository has been archived by the owner on Jun 7, 2023. It is now read-only.

docs(readme): try to naturalize copy #1

Merged
merged 3 commits into from May 15, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
63 changes: 33 additions & 30 deletions README.md
@@ -1,51 +1,54 @@
# Paperspace Hyperparamter Tuning for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend
# Paperspace Hyperparameter Tuning for solving CIFAR-100 with a convolutional neural network (CNN) built with Keras and TensorFlow, GPU backend

This project acts as both a tutorial and a demo to using Paperspace Hyperparameter tuning based on hyperopt with Keras, TensorFlow and TensorBoard. Not only we try to find the best hyperparameters for the given hyperspace, but also we represent the neural network architecture as hyperparameters that can be tuned. This automates the process of searching for the best neural architecture configuration and hyperparameters.
This project acts as both a tutorial and demo for using Paperspace Hyperparameter Tuning based on _Hyperopt_ with Keras, TensorFlow, and TensorBoard. Not only do we try to find the best hyperparameters for the given hyperspace, but we also represent the neural network architecture as hyperparameters that can be tuned. This serves to automate the process of searching for the best neural architecture configuration and hyperparameters.

Here, we are meta-optimizing a neural net and its architecture on the CIFAR-100 dataset (100 fine labels), a computer vision task. This code could be easily transferred to another vision dataset or even to another machine learning task.
Here, we are meta-optimizing a neural net and its architecture using the CIFAR-100 dataset, a computer vision task with 100 fine labeled items. This code could be easily transferred to another vision dataset or even to another machine learning task.

## How Hyperopt works

Hyperopt is a way to search through an hyperparameter space. For example, it can use the Tree-structured Parzen Estimator (TPE) algorithm, which explore intelligently the search space while narrowing down to the estimated best parameters.
Hyperopt is a method for searching through a hyperparameter space. For example, it can use the Tree-structured Parzen Estimator (TPE) algorithm, which intelligently explores the search space while narrowing down to the best estimated parameters.

It is hence a good method for meta-optimizing a neural network which is itself an optimisation problem: tuning a neural network uses gradient descent methods, and tuning the hyperparameters needs to be done differently since gradient descent can't apply. Therefore, Hyperopt can be useful not only for tuning hyperparameters such as the learning rate, but also to tune more fancy parameters in a flexible way, such as changing the number of layers of certain types, or the number of neurons in a layer, or even the type of layer to use at a certain place in the network given an array of choices, each with nested tunable hyperparameters.
It is thus a good method for meta-optimizing a neural network. Whereas a neural network is an optimization problem that is tuned using gradient descent methods, hyperparameters cannot be tuned using gradient descent methods. That's where Hyperopt comes in and shines: it's useful not only for tuning hyperparameters like learning rate, but also for tuning more sophisticated parameters, and in a flexible way: it can change the number of layers of different types; the number of neurons in one layer or another; or even the type of layer to use at a certain place in the network given an array of choices, each themselves with nested, tunable hyperparameters.

This is an oriented random search, in contrast with a Grid Search where hyperparameters are pre-established with fixed steps increase. Random Search for Hyper-Parameter Optimization (such as what Hyperopt do) has proven to be an effective search technique. The paper about this technique sits among the most cited deep learning papers. To sum up, it is more efficient to search randomly through values and to intelligently narrow the search space rather than looping on fixed sets of values for the hyperparameters.
This kind of Oriented Random Search is Hyperopt's strength, as opposed to a simpler Grid Search where hyperparameters are pre-established with fixed-step increases. Random Search for Hyperparameter Optimization has proven to be an effective search technique. The paper about this technique sits among the most cited deep learning papers. In summary, it is more efficient to randomly search through values and intelligently narrow the search space, rather than looping on fixed sets of hyperparameter values.

How to define Hyperopt parameters?
A parameter is defined with a certain uniformrange or else a probability distribution, such as:
### How to define Hyperopt parameters

A parameter is defined with either a certain uniform range or a probability distribution, such as:

hp.randint(label, upper)
hp.uniform(label, low, high)
hp.loguniform(label, low, high)
hp.normal(label, mu, sigma)
hp.lognormal(label, mu, sigma)
There is also a few quantized versions of those functions, which rounds the generated values at each step of "q":

There are also a few quantized versions of those functions, which round the generated values at each step of "q":

hp.quniform(label, low, high, q)
hp.qloguniform(label, low, high, q)
hp.qnormal(label, mu, sigma, q)
hp.qlognormal(label, mu, sigma, q)
It is also possible to use a "choice" which can lead to hyperparameter nesting:

It is also possible to use a "choice" that can lead to hyperparameter nesting:

hp.choice(label, ["list", "of", "potential", "choices"])
hp.choice(label, [hp.uniform(sub_label_1, low, high), hp.normal(sub_label_2, mu, sigma), None, 0, 1, "anything"])

## Meta-optimize the neural network with Hyperopt

To run the hyperparameter search yourself, do: `python3 hyperopt_optimize.py`. You might want to look at `requirements.py` and install some of them manually to acquire GPU acceleration (e.g.: installing TensorFlow and Keras especially by yourself).
To run the hyperparameter search yourself, run: `python3 hyperopt_optimize.py`. You might want to look at `requirements.py` and install some of those dependencies manually in order to take advantage of GPU acceleration (such as by installing TensorFlow and Keras yourself, in particular).

Optimization results will continuously be saved in the `results/` folder (sort files to take best result as human-readable text).
Also, the results are pickled to `results.pkl` to be able to resume the TPE meta-optimization process later simply by running the program again with `python3 hyperopt_optimize.py`.
Optimization results will continuously be saved in the `results/` folder (sort the files to get the best result as human-readable text).

If you want to learn more about Hyperopt, you'll probably want to watch that [video](https://www.youtube.com/watch?v=Mp1xnPfE4PY) made by the creator of Hyperopt. Also, if you want to run the model on the CIFAR-10 dataset, you must edit the file `neural_net.py`.
Also, the results are pickled to `results.pkl` to be able to resume the TPE meta-optimization process later, which you can do simply by running the program again with `python3 hyperopt_optimize.py`.

It is possible that you get better results than there are already here. Pull requests / contributions are welcome. Suggestion: trying many different initializers for the layers would be an interesting thing to try. Adding SELU activations would be interesting too. To restart the training with new or removed hyperparameters, it is recommended to delete existing results with `./delete_results.sh`.
If you want to learn more about Hyperopt, you'll probably want to [watch this video](https://www.youtube.com/watch?v=Mp1xnPfE4PY), made by the creator of Hyperopt. Also, if you want to run the model on a different dataset, such as CIFAR-10, you must edit the file `neural_net.py`.

It's possible that you may achieve better results than what's been achieved here in this repository. Pull requests / contributions are welcome! A couple of suggestions for finding interesting results: 1) try many different initializers for the layers; 2) add SELU activations. To restart the training with new or removed hyperparameters, it is recommended to delete existing results by running `./delete_results.sh`.

## The Deep Convolutional Neural Network Model

Here is a basic overview of the model. I implemented it in such a way that Hyperopt will try to change the shape of the layers and remove or replace some of them according to some pre-parametrized ideas that I have got. Therefore, not only the learning rate is changed with hyperopt, but a lot more parameters.
Below is a basic overview of the model. We implemented it in such a way that Hyperopt will try to change the shape of the layers and remove or replace some of them based on some pre-parameterized ideas that we're trying here. In this approach, Hyperopt changes a lot of parameters in addition to the learning rate.

```python

Expand All @@ -56,20 +59,20 @@ space = {
'lr_rate_mult': hp.loguniform('lr_rate_mult', -0.5, 0.5),
# L2 weight decay:
'l2_weight_reg_mult': hp.loguniform('l2_weight_reg_mult', -1.3, 1.3),
# Batch size fed for each gradient update
# Batch size fed for each gradient update:
'batch_size': hp.quniform('batch_size', 100, 700, 5),
# Choice of optimizer:
'optimizer': hp.choice('optimizer', ['Adam', 'Nadam', 'RMSprop']),
# Coarse labels importance for weights updates:
'coarse_labels_weight': hp.uniform('coarse_labels_weight', 0.1, 0.7),
# Uniform distribution in finding appropriate dropout values, conv layers
# Uniform distribution in finding appropriate dropout values, conv layers:
'conv_dropout_drop_proba': hp.uniform('conv_dropout_proba', 0.0, 0.35),
# Uniform distribution in finding appropriate dropout values, FC layers
# Uniform distribution in finding appropriate dropout values, FC layers:
'fc_dropout_drop_proba': hp.uniform('fc_dropout_proba', 0.0, 0.6),
# Use batch normalisation at more places?
# Use batch normalization at more places?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have a ??

'use_BN': hp.choice('use_BN', [False, True]),

# Use a first convolution which is special?
# Use a first convolution that is special?
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this have a ??

'first_conv': hp.choice(
'first_conv', [None, hp.choice('first_conv_size', [3, 4])]
),
Expand All @@ -96,17 +99,17 @@ space = {
# The kernel_size for residual convolutions:
'res_conv_kernel_size': hp.quniform('res_conv_kernel_size', 2, 4, 1),

# Amount of fully-connected units after convolution feature map
# Amount of fully-connected units after convolution feature map:
'fc_units_1_mult': hp.loguniform('fc_units_1_mult', -0.6, 0.6),
# Use one more FC layer at output
# Use one more FC layer at output:
'one_more_fc': hp.choice(
'one_more_fc', [None, hp.loguniform('fc_units_2_mult', -0.6, 0.6)]
),
# Activations that are used everywhere
# Activations that are used everywhere:
'activation': hp.choice('activation', ['relu', 'elu'])
}

# Here is one possible outcome for this stochastic space, let's plot that:
# Here is one possible outcome for this stochastic space; let's plot that:
space_base_demo_to_plot = {
'lr_rate_mult': 1.0,
'l2_weight_reg_mult': 1.0,
Expand Down Expand Up @@ -139,22 +142,22 @@ space_base_demo_to_plot = {

## Analysis of the hyperparameters

Here is an excerpt:
Here is an excerpt:

<p align="center">
<img src="hyperparameters_scatter_matrix.png">
</p>

This could help to redefine the hyperparameters and to narrow them down successively, relaunching the meta-optimization on refined spaces.


## Best result

The final accuracy is of 67.61% in average on the 100 fine labels, and is of 77.31% in average on the 20 coarse labels.
The results are comparable to the ones in the middle of [that list](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#43494641522d313030), under the CIFAR-100 section.
The final accuracy is 67.61% on average for the 100 fine labels, and is 77.31% on average for the 20 coarse labels.
These results are comparable to the ones in the middle of [this list](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html#43494641522d313030), under the CIFAR-100 section.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the significance of this list? might be worth contextualizing. maybe it's due to my lack of familiarity, but this felt like an arbitrary list on the internet to compare against :)

The only image preprocessing that we do is a random flip left-right.

### Best hyperspace found:

```python

space_best_model = {
Expand Down