# Intro into tensorflow

In [1]:
%load_ext autoreload
%autoreload 2

import numpy as np
import os
from pathlib import Path
from ray import tune
import sys
import tensorflow as tf

sys.path.insert(0, "..") 
from src.data import make_dataset

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1' 


# Hyperparameter tuning
While it is really usefull to play with different architectures to see what happens, it can easily become very time consuming. Right now, we are just considering Dense layers, but we can add all sorts of layers in different combinations. The search-space is also much too big for a naive, brute force gridsearch. Especially if we are going to add in more types of layers, each with their own parameters.

To do this more intelligent, we will use ~~kerastuner~~ raytuner. Ray is
excellent for parallel computing, and works with any framework (tensorflow,
pytorch, etc). 

This implements smart ways to sample the hyperparameter space. To do
this, we will have to define a more generic model, the hypermodel.


We will define ranges of hyperparameters. The input of our hypermodel will be
the hyperparameters (`config`) later on. 
There are different types of hyperparameters: `Int`, `Float`, uniform
distributions, normal distributions, etc. All the different types can be find in
the [ray documentation](https://docs.ray.io/en/master/tune/api_docs/search_space.html#tune-sample-docs).

First, we set the range of the amount of units in every dense layer to somewhere between 32 an 96, in steps of 32.
Second, we add a for loop to add multiple dense layers, somewhere between 2 and 5 additional layers.

In [2]:
datafile = Path("..") / "data/processed/data.npy"
local_dir = Path("../models/ray")
logbase = Path("..") / "logs"
datafile.exists()

True

In [19]:
config = {
    "datafile" : datafile.absolute(),
    "units" : tune.qrandint(16, 128, 8),
    "dense_layers" : tune.randint(2,6), 
    "activation" : "relu", 
    "optimizer" : "Adam", 
    "epochs" : 100,
    "local_dir" : local_dir.absolute(),
    "log_dir" : logbase.absolute() / "hypertuned",
    "samples" : 10,
}

The [hyperband algorithm](https://jmlr.org/papers/v18/16-558.html) (image (b), configuration evaluation) often outperforms bayesian search (image (a), Configuration selection), at least in speed. 

<img src=https://miro.medium.com/max/1400/1*DASrFL5AZNm2YjvJEq8z8w.png width=600/>

However, according to the [No Free Lunch Theorem](https://ti.arc.nasa.gov/m/profile/dhw/papers/78.pdf) "for any algorithm, any elevated performance over one class of problems is offset by performance over another class". So, as a rule of thumb, use Hyperband, but there is no guarantee that you get the best results. We set the max_epochs low, to speed things up. We might get better results by increasing that number some, but for this tutorial it will take too long. And we can still get an improvement over what we had.

In [16]:
analysis.best_config



In [25]:
from src.models import hypermodel
model = hypermodel.hypermodel(analysis.best_config)

<class 'src.models.hypermodel.hypermodel'>


In [20]:
from src.models import train_model

analysis = train_model.hypertune(iterations=50, config=config)

Trial name,status,loc,dense_layers,units,acc,iter,total time (s),val_loss
train_hypermodel_b1e68_00000,TERMINATED,,3,120,21.2207,5,3.66938,0.324065
train_hypermodel_b1e68_00001,TERMINATED,,4,72,15.3755,50,25.689,0.286318
train_hypermodel_b1e68_00002,TERMINATED,,2,40,23.0022,5,3.08078,0.334201
train_hypermodel_b1e68_00003,TERMINATED,,3,112,21.0996,5,3.78572,0.389697
train_hypermodel_b1e68_00004,TERMINATED,,5,32,22.773,5,3.56109,0.329557
train_hypermodel_b1e68_00005,TERMINATED,,3,120,20.93,5,3.46426,0.38555
train_hypermodel_b1e68_00006,TERMINATED,,2,88,21.6973,5,2.98837,0.569156
train_hypermodel_b1e68_00007,TERMINATED,,3,80,18.296,20,10.2908,0.291879
train_hypermodel_b1e68_00008,TERMINATED,,4,32,22.2392,5,3.38107,0.320264
train_hypermodel_b1e68_00009,TERMINATED,,3,88,21.3543,5,2.67309,0.477528


2021-08-17 12:35:21,426	INFO tune.py:550 -- Total run time: 55.19 seconds (55.07 seconds for the tuning loop).


Best hyperparameters found were:  {'datafile': PosixPath('/Users/rgrouls/Documents/academy/HU/ml-21/deep1/notebooks/../data/processed/data.npy'), 'units': 72, 'dense_layers': 4, 'activation': 'relu', 'optimizer': 'Adam', 'epochs': 100, 'local_dir': PosixPath('/Users/rgrouls/Documents/academy/HU/ml-21/deep1/notebooks/../models/ray'), 'log_dir': PosixPath('/Users/rgrouls/Documents/academy/HU/ml-21/deep1/notebooks/../logs/hypertuned'), 'samples': 10}


Note that the CPU time is pretty fast for checking 10 configurations! As you can
see, a lot of the models are aborted before the full training ends. We can obtain the best values from the search:

In [21]:
analysis.best_config

{'datafile': PosixPath('/Users/rgrouls/Documents/academy/HU/ml-21/deep1/notebooks/../data/processed/data.npy'),
 'units': 72,
 'dense_layers': 4,
 'activation': 'relu',
 'optimizer': 'Adam',
 'epochs': 100,
 'local_dir': PosixPath('/Users/rgrouls/Documents/academy/HU/ml-21/deep1/notebooks/../models/ray'),
 'log_dir': PosixPath('/Users/rgrouls/Documents/academy/HU/ml-21/deep1/notebooks/../logs/hypertuned'),
 'samples': 10}

And use those to train a model

In [8]:
train_model.train_hypermodel(analysis.best_config, verbose=1, tuning=False)

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100


Why not start with hypertuning directly? Because we first need to have an idea of where to search. Sure, you could start with an immmense parameter space and search that, but the chance of finding a good model will drop with the amount of space you need to search, even if you are using a smart way to search. Looking for a pebble will by much harder in the mountains and much easier in your backyard.