Skip to content

Commit

Permalink
new docs, optimizer rehaul and AutoML
Browse files Browse the repository at this point in the history
- there is now a new sub-module `autom8` inside which several AutoML features live
- `AutoParams` automatically generates a parameter dictionary and streamlines its manipulation before experiment
- `AutoModel` automatically creates a input model for `Scan()` which is fully wired for use with `AutoParams` or other experiment with comprehensive search
- `AutoScan` leverage `AutoParams` and `AutoModel` to reduce the whole experiment into a single line of code
- `AutoPredict` takes the results of `Scan` (or `AutoScan`), picks best model candidates, evaluates the candidates, picks the winner, and makes predictions with it on input data
- the new docs are now completed
- added `local_strategy` to reduction strategies, which allows making changes to the parameter space from local system while the experiment is running
- added `pearson` and `kendall` reduction strategies
- streamlined the way custom strategies can be added
- completely rebuilt `correlation` strategy, including the underlying statistical approach
- added a helper function `cols_to_multilabel` for custom reducers
- added a new generator `SequenceGenerator`
- removed redundant files from the repo
- tests are updated in regards to the changes but not yet new features
  • Loading branch information
mikkokotila committed Aug 2, 2019
1 parent 64a31ab commit eed709e
Show file tree
Hide file tree
Showing 57 changed files with 1,641 additions and 492 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,9 @@ Based on what no doubt constitutes a "biased" review (being our own) of more tha
- model generalization evaluator
- experiment analytics
- Random search
- Pseudo, Quasi, and Quantum Random optimizers
- Grid search
- Correlation based optimization
- Pseudo, Quasi, and Quantum Random functions
- Probabilistic optimization
- Model candidate generality evaluation
- Live training monitor
- Experiment analytics
Expand Down
28 changes: 28 additions & 0 deletions docs/AutoModel.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# AutoModel

`AutoModel` provides a meaningful way to test several network architectures in an automated manner. Currently there are five supported architectures:

- conv1d
- lstm
- bidirectional_lstm
- simplernn
- dense

`AutoModel` creates an input model for Scan(). Optimized for being used together with `AutoParams()` and expects one or more of the above architectures to be included in params dictionary, for example:

```python

p = {...
'networks': ['dense', 'conv1d', 'lstm']
...}

```

## AutoModel Arguments

Argument | Input | Description
--------- | ------- | -----------
`task` | str or None | `binary`, `multi_label`, `multi_class`, or `continuous`
`metric` | None or list | One or more Keras metric (functions) to be used in the model

Setting `task` effects which various aspects of the model and should be set according to the specific prediction task, or set to `None` in which case `metric` input is required.
82 changes: 82 additions & 0 deletions docs/AutoParams.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
# AutoParams

`AutoParams()` allows automated generation of comprehensive parameter dictionary to be used as input for `Scan()` experiments as well as a streamlined way to manipulate parameter dictionaries.

#### to automatically create a params dictionary

```python
p = talos.Autom8.AutoParams().params

```
NOTE: The above example yields a very large permutation space so configure `Scan()` accordingly with `fraction_limit`.

#### an alternative way where a class object is returned

```python
param_object = talos.Autom8.AutoParams()

```

Now various properties can be accessed through `param_object`, these are detailed below. For example:

#### modifying a single parameter in the params dictionary

```python
param_object.batch_size(bottom_value=20, max_value=100, steps=10)
```

Now the modified params dictionary can be accessed through `params_object.params`

#### to append a current parameter dictionary

```python
params_dict = talos.Autom8.AutoParams(p, task='multi_label').params

```
NOTE: Note, when the dictionary is created for a prediction task other than 'binary', the `task` argument has to be declared accordingly (`binary`, `multi_label`, `multi_class`, or `continuous`).

## AutoParams Arguments

Argument | Input | Description
--------- | ------- | -----------
`params` | dict or None | If `None` then a new parameter dictionary is created
`task` | str | 'binary', 'multi_class', 'multi_label', or 'continuous'
`replace` | bool | Replace current dictionary entries with new ones.
`auto` | bool | automatically generate or append params dictionary with all available parameters.
`network` | network | If `True` several model architectures will be added

## AutoParams Properties

The **`params`** property returns the parameter dictionary which can be used as an input to `Scan()`.

The **`resample_params`** accepts `n` as input and resamples the params dictionary so that n values remain for each parameter.

All other properties relate with manipulating individual parameters in the parameter dictionary.

**`activations`** For controlling the corresponding parameter in the parameters dictionary.

**`batch_size`** For controlling the corresponding parameter in the parameters dictionary.

**`dropout`** For controlling the corresponding parameter in the parameters dictionary.

**`epochs`** For controlling the corresponding parameter in the parameters dictionary.

**`kernel_initializer`** For controlling the corresponding parameter in the parameters dictionary.

**`last_activation`** For controlling the corresponding parameter in the parameters dictionary.

**`layers`** For controlling the corresponding parameter (i.e. `hidden_layers`) in the parameters dictionary.

**`losses`** For controlling the corresponding parameter in the parameters dictionary.

**`lr`** For controlling the corresponding parameter in the parameters dictionary.

**`networks`** For controlling the Talos present network architectures (`dense`, `lstm`, `bidirectional_lstm`, `conv1d`, and `simplernn`). NOTE: the use of preset networks requires the use of the input model from `AutoModel()` for `Scan()`.

**`neurons`** For controlling the corresponding parameter (i.e. `first_neuron`) in the parameters dictionary.

**`optimizers`** For controlling the corresponding parameter in the parameters dictionary.

**`shapes`** For controlling the Talos preset network shapes (`brick`, `funnel`, and `triangle`).

**`shapes_slope`** For controlling the shape parameter with a floating point value to set the slope of the network from input layer to output layer.
33 changes: 33 additions & 0 deletions docs/AutoPredict.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
# AutoPredict

`AutoPredict()` automatically handles the process of finding the best models from a completed `Scan()` experiment, evaluates those models, and uses the winning model to make predictions on input data.

```python
scan_object = talos.autom8.AutoPredict(scan_object, x_val=x, y_val=y, x_pred=x)
```

NOTE: the input data must be in same format as 'x' that was used in `Scan()`.
Also, `x_val` and `y_val` should not have been exposed to the model during the
`Scan()` experiment.

`AutoPredict()` will add four new properties to `Scan()`:

**`preds_model`** contains the winning Keras model (function)
**`preds_parameters`** contains the hyperparameters for the selected model
**`preds_probabilities`** contains the prediction probabilities for `x_pred`
**`predict_classes`** contains the predicted classes for `x_pred`.

## AutoPredict Arguments

Argument | Input | Description
--------- | ------- | -----------
`scan_object` | class object | the class object returned from `Scan()`
`x_val` | array or list of arrays | validation data features
`y_val` | array or list of arrays | validation data labels
`y_pred` | array or list of arrays | prediction data features
`n` | int | number of promising models to be included in the evaluation process
`metric` | None | the metric against which the validation is performed
`folds` | None | number of folds to be used for cross-validation
`shuffle` | None | if data is shuffled before splitting
`average` | str | 'binary', 'micro', 'macro', 'samples', or 'weighted'
`asc` | None | should be True if metric is a loss
31 changes: 31 additions & 0 deletions docs/AutoScan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# AutoScan

`AutoScan()` provides a streamlined way for conducting a hyperparameter search experiment with any dataset. It is particularly useful for early exploration as with default settings `AutoScan()` casts a very broad parameter space including all common hyperparameters, network shapes, sizes, as well as architectures

Configure the `AutoScan()` experiment and then use the property `start` in the returned class object to start the actual experiment.

```python
auto = talos.autom8.AutoScan(task='binary', max_param_values=2)
auto.start(x, y, experiment_name='testing.new', fraction_limit=0.001)
```

NOTE: `auto.start()` accepts all `Scan()` arguments.

## AutoScan Arguments

Argument | Input | Description
--------- | ------- | -----------
`task` | str or None | `binary`, `multi_label`, `multi_class`, or `continuous`
`max_param_values` | int | Number of parameter values to be included

Setting `task` effects which various aspects of the model and should be set according to the specific prediction task, or set to `None` in which case `metric` input is required.

## AutoScan Properties

The only property **`start`** starts the actual experiment. `AutoScan.start()` accepts the following arguments:

Argument | Input | Description
--------- | ------- | -----------
`x` | array or list of arrays | prediction features
`y` | array or list of arrays | prediction outcome variable
`kwargs` | arguments | any `Scan()` argument can be passed into `AutoScan.start()`
17 changes: 17 additions & 0 deletions docs/Custom_Reducers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Custom Reducer

A custom reduction strategy can be created and dropped into Talos. Read more about the reduction principle

There are only two criteria to meet:

- The input of the custom strategy is 2-dimensional
- The output of the custom strategy is in the form:

```python
return label, value
```
Here `value` is any hyperparameter value, and `label` is the name of any hyperparameter. Any arbitrary strategy can be implemented, as long as the input and output criteria are met.

The file containing the strategy can then be placed in `/reducers` in Talos package, and corresponding changes made into `/reducers/reduce_run.py` to make the strategy available in `Scan()`. Having done this, the reduction strategy is now available as per the example [above](#probabilistic-reduction).

A [pull request](https://github.com/autonomio/talos/pulls) is highly encouraged once a beneficial reduction strategy has been successfully added.
16 changes: 8 additions & 8 deletions docs/Evaluate.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ NOTE: It's very important to save part of your data for evaluation, and keep it

Parameter | Default | Description
--------- | ------- | -----------
x | NA | the predictor data x
y | NA | the prediction data y (truth)
model_id | None | the model_id to be used
folds | None | number of folds to be used for cross-validation
shuffle | None | if data is shuffled before splitting
average | 'binary' | 'binary', 'micro', 'macro', 'samples', or 'weighted'
metric | None | the metric against which the validation is performed
asc | None | should be True if metric is a loss
`x` | NA | the predictor data x
`y` | NA | the prediction data y (truth)
`model_id` | None | the model_id to be used
`folds` | None | number of folds to be used for cross-validation
`shuffle` | None | if data is shuffled before splitting
`average` | 'binary' | 'binary', 'micro', 'macro', 'samples', or 'weighted'
`metric` | None | the metric against which the validation is performed
`asc` | None | should be True if metric is a loss

The above arguments are for the <code>evaluate</code> attribute of the <code>Evaluate</code> object.
51 changes: 51 additions & 0 deletions docs/Examples_AutoML.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# AutoML

Performing an AutoML style hyperparameter search experiment with Talos could not be any easier.

The single-file code example can be found [here](Examples_AutoML_Code.md).

### Imports

```python
import talos
import wrangle
```

### Loading Data
```python
x, y = talos.templates.datasets.cervical_cancer()

# we spare 10% of data for testing later
x, y, x_test, y_test = wrangle.array_split(x, y, .1)

# then validation split
x_train, y_train, x_val, y_val = wrangle.array_split(x, y, .2)
```

`x` and `y` are expected to be either numpy arrays or lists of numpy arrays and same applies for the case where `x_train`, `y_train`, `x_val`, `y_val` is used instead.

### Defining the Model

In this case there is no need to define the model. `talos.autom8.AutoModel()` is used behind the scenes, where several model architectures fully wired for Talos are found. We simply initiate the `AutoScan()` object first:

```python
autom8 = talos.autom8.AutoScan('binary', 5)
```

### Parameter Dictionary

There is also no need to worry about the parameter dictionary. This is handled in the background with `AutoParams()`.


### Scan()

The `Scan()` itself is started through the **`start`** property of the `AutoScan()` class object.

```python
autom8.start(x=x_train,
y=y_train,
x_val=x_val,
y_val=y_val,
fraction_limit=0.000001)
```
We pass data here just like we would do it in `Scan()` normally. Also, you are free to use any of the `Scan()` arguments here to configure the experiment. Find the description for all `Scan()` arguments [here](Scan.md#scan-arguments).
22 changes: 22 additions & 0 deletions docs/Examples_AutoML_Code.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
[BACK](Examples_AutoML.md)

# AutoML

```python

x, y = talos.templates.datasets.cervical_cancer()

# we spare 10% of data for testing later
x, y, x_test, y_test = wrangle.array_split(x, y, .1)

# then validation split
x_train, y_train, x_val, y_val = wrangle.array_split(x, y, .2)

autom8 = talos.autom8.AutoScan('binary', 5)

autom8.start(x=x_train,
y=y_train,
x_val=x_val,
y_val=y_val,
fraction_limit=0.000001)
```

0 comments on commit eed709e

Please sign in to comment.