Skip to content

Commit

Permalink
"NO DEPLOY FRIDAYS" and major cleanup of tests
Browse files Browse the repository at this point in the history
...is more than just a catchy phrase.

In any case, the test suite is now clean and extensible as opposed to the coverage obsessed mess that it was before.

Quite a few bugs were squashed betwee now and Friday 4pm. Most importantly, there were quite a few sloppiness in the reducer process, that would have effected the occassional case.
  • Loading branch information
mikkokotila committed Aug 3, 2019
1 parent db35311 commit e43b9c2
Show file tree
Hide file tree
Showing 30 changed files with 423 additions and 569 deletions.
4 changes: 2 additions & 2 deletions docs/AutoPredict.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ Argument | Input | Description
`x_val` | array or list of arrays | validation data features
`y_val` | array or list of arrays | validation data labels
`y_pred` | array or list of arrays | prediction data features
`n` | int | number of promising models to be included in the evaluation process
`task` | string | 'binary', 'multi_class', 'multi_label', or 'continuous'
`metric` | None | the metric against which the validation is performed
`n_models` | int | number of promising models to be included in the evaluation process
`folds` | None | number of folds to be used for cross-validation
`shuffle` | None | if data is shuffled before splitting
`average` | str | 'binary', 'micro', 'macro', 'samples', or 'weighted'
`asc` | None | should be True if metric is a loss
8 changes: 8 additions & 0 deletions docs/Monitoring.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,3 +22,11 @@ Scan(print_params=True)
**Live Monitoring :** Live monitoring provides an epoch-by-epoch updating line graph that is enabled through the `live()` custom callback.

**Round Hyperparameters :** Displays the hyperparameters for each permutation. Does not work together with live monitoring.

### Local Monitoring

Epoch-by-epoch training data is available during the experiment using the `ExperimentLogCallback`:

```python

```
16 changes: 13 additions & 3 deletions docs/Scan.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,9 +38,19 @@ Argument | Input | Description
`minimize_loss` | bool | `reduction_metric` is a loss
`disable_progress_bar` | bool | Disable live updating progress bar
`print_params` | bool | Print each permutation hyperparameters
`debug` | bool | Turn on debug messages
`clear_tf_session` | bool | Clear backend session between permutations

NOTE: `boolean_limit` will only work if its the last argument in `Scan()` and the following bracket is on a newline:

```python

talos.Scan(...
boolean_limit=lambda p: p['first_neuron'] * p['hidden_layers'] < 220
)
```



## Scan Object Properties

Once the `Scan()` procedures are completed, an object with several useful properties is returned. The object can be used as an input to `Analyze()`, `Evaluate()`, `Predict()` and `Deploy()`, and has many properties that can be accessed directly. The namespace is strictly kept clean, so all the properties consist of meaningful contents.
Expand Down Expand Up @@ -82,7 +92,7 @@ scan_object.details
```python
scan_object.evaluate_models(x_val=x_val,
y_val=y_val,
n=10,
n_models=10,
metric='f1score',
folds=5,
shuffle=True,
Expand All @@ -95,7 +105,7 @@ Argument | Description
`scan_object` | The class object returned by Scan() upon completion of the experiment.
`x_val` | Input data (features) in the same format as used in Scan(), but should not be the same data (or it will not be much of validation).
`y_val` | Input data (labels) in the same format as used in Scan(), but should not be the same data (or it will not be much of validation).
`n` | The number of models to be evaluated. If set to 10, then 10 models with the highest metric value are evaluated. See below.
`n_models` | The number of models to be evaluated. If set to 10, then 10 models with the highest metric value are evaluated. See below.
`metric` | The metric to be used for picking the models to be evaluated.
`folds` | The number of folds to be used in the evaluation.
`shuffle` | If the data is to be shuffled or not. Set always to False for timeseries but keep in mind that you might get periodical/seasonal bias.
Expand Down
3 changes: 3 additions & 0 deletions talos/__init__.py
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
import warnings
warnings.simplefilter('ignore')

# import commands
from .scan.Scan import Scan
from .commands.analyze import Analyze
Expand Down
37 changes: 25 additions & 12 deletions talos/autom8/autoparams.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,11 +80,14 @@ def _automated(self, shapes='fixed'):
self.params['network'] = 'dense'
self.last_activations()

def shapes(self):
def shapes(self, shapes='auto'):

'''Uses triangle, funnel, and brick shapes.'''

self._append_params('shapes', ['triangle', 'funnel', 'brick'])
if shapes == 'auto':
self._append_params('shapes', ['triangle', 'funnel', 'brick'])
else:
self._append_params('shapes', shapes)

def shapes_slope(self, min=0, max=.6, steps=.1):

Expand All @@ -93,10 +96,10 @@ def shapes_slope(self, min=0, max=.6, steps=.1):

self._append_params('shapes', np.arange(min, max, steps).tolist())

def layers(self, min_layers=0, max_layers=6):
def layers(self, min_layers=0, max_layers=6, steps=1):

self._append_params('hidden_layers',
list(range(min_layers, max_layers)))
list(range(min_layers, max_layers, steps)))

def dropout(self, min=0, max=.85, steps=0.1):

Expand Down Expand Up @@ -137,7 +140,7 @@ def losses(self, losses='auto'):
if losses == 'auto':
self._append_params('losses', loss[self._task])
else:
self._append_params('losses', [losses])
self._append_params('losses', losses)

def neurons(self, min=8, max=None, steps=None):

Expand Down Expand Up @@ -171,15 +174,22 @@ def epochs(self, min=50, max=None, steps=None):
if max is None and steps is None:
values = [int(np.exp2(i/2))+50 for i in range(3, 15)]
else:
values = range(min, max, steps)
values = list(range(min, max, steps))

self._append_params('epochs', values)

def kernel_initializers(self):
def kernel_initializers(self, kernel_inits='auto'):

self._append_params('kernel_initializer',
['glorot_uniform', 'glorot_normal',
'random_uniform', 'random_normal'])
'''
kernel_inits | list | one or more kernel initializers
'''

if kernel_inits == 'auto':
self._append_params('kernel_initializer',
['glorot_uniform', 'glorot_normal',
'random_uniform', 'random_normal'])
else:
self._append_params('kernel_initializer', kernel_inits)

def lr(self, learning_rates='auto'):

Expand Down Expand Up @@ -219,14 +229,17 @@ def networks(self, networks='auto'):
else:
self._append_params('network', networks)

def last_activations(self):
def last_activations(self, last_activations='auto'):

'''If `last_activations='auto'` then activations will be picked
automatically based on `AutoParams` property `task`.
Otherwise input a list with one or more activations will be used.
'''

self._append_params('last_activation', last_activation[self._task])
if last_activations == 'auto':
self._append_params('last_activation', last_activation[self._task])
else:
self._append_params('last_activation', last_activations)

def resample_params(self, n):

Expand Down
25 changes: 16 additions & 9 deletions talos/autom8/autopredict.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,11 @@ def AutoPredict(scan_object,
x_val,
y_val,
x_pred,
n=10,
task,
metric='val_acc',
n_models=10,
folds=5,
shuffle=True,
average='binary',
asc=False):

'''Automatically handles the process of finding the best models from a
Expand All @@ -24,13 +24,22 @@ def AutoPredict(scan_object,
x_val : ndarray or list of ndarray
Data to be used for 'x' in evaluation. Note that should be in the same
format as the data which was used in the Scan() but not the same data.
n : str
Number of promising models to be included in the evaluation process.
Time increase linearly with number of models.
y_val : ndarray or list of ndarray
Data to be used for 'y' in evaluation. Note that should be in the same
format as the data which was used in the Scan() but not the same data.
y_pred : ndarray or list of ndarray
Input data to be used for the actual predictions in evaluation. Note
it should be in the same format as the data which was used in the
Scan() but not the same data.
task : string
'binary', 'multi_class', 'multi_label', or 'continuous'.
metric : str
The metric to be used for deciding which models are promising.
Basically the 'n' argument and 'metric' argument are combined to pick
'n' best performing models based on 'metric'.
n_models : str
Number of promising models to be included in the evaluation process.
Time increase linearly with number of models.
folds : int
Number of folds to be used in cross-validation.
shuffle : bool
Expand Down Expand Up @@ -71,11 +80,11 @@ def AutoPredict(scan_object,
# evaluate and add the evaluation scores
scan_object.evaluate_models(x_val,
y_val,
n=n,
n_models=n_models,
task=task,
metric=metric,
folds=folds,
shuffle=shuffle,
average=average,
asc=False)

# get the best model based on evaluated score
Expand All @@ -91,8 +100,6 @@ def AutoPredict(scan_object,
scan_object.preds_parameters = scan_object.data.sort_values('eval_f1score_mean',
ascending=False).iloc[0]

#scan_object.predictions = scan_object.preds_model.predict(x_pred)

print(">> Added model, probabilities, classes, and parameters to scan_object")

return scan_object
18 changes: 12 additions & 6 deletions talos/autom8/autoscan.py
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,9 @@ def __init__(self,
the actual experiment.
`task` | str | 'binary', 'multi_class', 'multi_label', or 'continuous'
`max_param_values` | int | Number of parameter values to be included
`max_param_values` | int | Number of parameter values to be included.
Note, this will only work when `params` is
not passed as kwargs in `AutoScan.start`.
'''

self.task = task
Expand All @@ -28,11 +30,15 @@ def start(self, x, y, **kwargs):

import talos

p = talos.autom8.AutoParams(task=self.task)
p.resample_params(self.max_param_values)
params = p.params

m = talos.autom8.AutoModel(self.task).model
scan_object = talos.Scan(x, y, params, m, **kwargs)

try:
kwargs['params']
scan_object = talos.Scan(x, y, model=m, **kwargs)
except KeyError:
p = talos.autom8.AutoParams(task=self.task)
p.resample_params(self.max_param_values)
params = p.params
scan_object = talos.Scan(x, y, params, m, **kwargs)

return scan_object
18 changes: 10 additions & 8 deletions talos/commands/evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,11 @@ def __init__(self, scan_object):
def evaluate(self,
x,
y,
task,
metric,
model_id=None,
folds=5,
shuffle=True,
metric='val_acc',
mode='multi_label',
asc=False,
print_out=False):

Expand All @@ -41,8 +41,8 @@ def evaluate(self,
the results to pick for evaluation.
shuffle : bool
Data is shuffled before evaluation.
mode : string
'binary', 'multi_class', 'multi_label', or 'regression'.
task : string
'binary', 'multi_class', 'multi_label', or 'continuous'.
asc : bool
False if the metric is to be optimized upwards
(e.g. accuracy or f1_score)
Expand Down Expand Up @@ -71,21 +71,23 @@ def evaluate(self,

y_pred = model.predict(kx[i], verbose=0)

if mode == 'binary':
if task == 'binary':
y_pred = y_pred >= .5
scores = sk.metrics.f1_score(y_pred, ky[i], average='binary')

elif mode == 'multi_class':
elif task == 'multi_class':
y_pred = y_pred.argmax(axis=-1)
print(y_pred)
print(ky[i])
scores = sk.metrics.f1_score(y_pred, ky[i], average='macro')

if mode == 'multi_label':
if task == 'multi_label':
y_pred = model.predict(kx[i]).argmax(axis=1)
scores = sk.metrics.f1_score(y_pred,
ky[i].argmax(axis=1),
average='macro')

elif mode == 'regression':
elif task == 'continuous':
y_pred = model.predict(kx[i])
scores = sk.metrics.mean_absolute_error(y_pred, ky[i])

Expand Down
3 changes: 2 additions & 1 deletion talos/commands/predict.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,13 @@ class Predict:
'''Class for making predictions on the models that are stored
in the Scan() object'''

def __init__(self, scan_object):
def __init__(self, scan_object, task):

'''Takes in as input a Scan() object'''

self.scan_object = scan_object
self.data = scan_object.data
self.task = task

def predict(self, x, model_id=None, metric='val_acc', asc=False):

Expand Down
6 changes: 3 additions & 3 deletions talos/model/output_layer.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,11 @@ def output_layer(task, last_activation, y_train, y_val):
activation = last_activation
last_neuron = 1

elif task == 'multiclass':
elif task == 'multi_class':
activation = last_activation
last_neuron = np.unique(np.hstack((y_train, y_val)))
last_neuron = len(np.unique(np.hstack((y_train, y_val))))

elif task == 'multilabel':
elif task == 'multi_label':
activation = last_activation
last_neuron = y_train.shape[1]

Expand Down
18 changes: 10 additions & 8 deletions talos/parameters/ParamSpace.py
Original file line number Diff line number Diff line change
Expand Up @@ -218,36 +218,38 @@ def remove_is_not(self, label, value):
'''Removes baesd on exact match but reversed'''

col = self.param_keys.index(label)
self.param_space = self.param_space[self.param_space[:, col] == value]
self.param_index = list(range(len(self.param_space)))
drop = np.where(self.param_space[:, col] != value)[0].tolist()
self.param_index = [x for x in self.param_index if x not in drop]

def remove_is(self, label, value):

'''Removes based on exact match'''

col = self.param_keys.index(label)
self.param_space = self.param_space[self.param_space[:, col] != value]
self.param_index = list(range(len(self.param_space)))
drop = np.where(self.param_space[:, col] == value)[0].tolist()
self.param_index = [x for x in self.param_index if x not in drop]

def remove_ge(self, label, value):

'''Removes based on greater-or-equal'''

col = self.param_keys.index(label)
self.param_space = self.param_space[self.param_space[:, col] >= value]
self.param_index = list(range(len(self.param_space)))
drop = np.where(self.param_space[:, col] >= value)[0].tolist()
self.param_index = [x for x in self.param_index if x not in drop]

def remove_le(self, label, value):

'''Removes based on lesser-or-equal'''

col = self.param_keys.index(label)
self.param_space = self.param_space[self.param_space[:, col] <= value]
self.param_index = list(range(len(self.param_space)))
drop = np.where(self.param_space[:, col] <= value)[0].tolist()
self.param_index = [x for x in self.param_index if x not in drop]

def remove_lambda(self, function):

'''Removes based on a lambda function'''

index = self._convert_lambda(function)(self.param_space)
print(index)
self.param_space = self.param_space[index]
self.param_index = list(range(len(self.param_space)))
2 changes: 0 additions & 2 deletions talos/reducers/correlation.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,15 +30,13 @@ def correlation(self, method):

# if all nans, then stop
if len(corr_values) <= 1:
print("all nans")
return self

# sort based on the metric type
corr_values.sort_values(ascending=self.minimize_loss, inplace=True)

# if less than threshold, then stop
if abs(corr_values[-1]) < self.reduction_threshold:
print("below threshold")
return self

# get the strongest correlation
Expand Down

0 comments on commit e43b9c2

Please sign in to comment.