new docs, optimizer rehaul and AutoML

- there is now a new sub-module `autom8` inside which several AutoML features live - `AutoParams` automatically generates a parameter dictionary and streamlines its manipulation before experiment - `AutoModel` automatically creates a input model for `Scan()` which is fully wired for use with `AutoParams` or other experiment with comprehensive search - `AutoScan` leverage `AutoParams` and `AutoModel` to reduce the whole experiment into a single line of code - `AutoPredict` takes the results of `Scan` (or `AutoScan`), picks best model candidates, evaluates the candidates, picks the winner, and makes predictions with it on input data - the new docs are now completed - added `local_strategy` to reduction strategies, which allows making changes to the parameter space from local system while the experiment is running - added `pearson` and `kendall` reduction strategies - streamlined the way custom strategies can be added - completely rebuilt `correlation` strategy, including the underlying statistical approach - added a helper function `cols_to_multilabel` for custom reducers - added a new generator `SequenceGenerator` - removed redundant files from the repo - tests are updated in regards to the changes but not yet new features
autonomio · Aug 2, 2019 · eed709e · eed709e
1 parent 64a31ab
commit eed709e
Show file tree

Hide file tree

Showing 57 changed files with 1,641 additions and 492 deletions.
diff --git a/README.md b/README.md
@@ -61,9 +61,9 @@ Based on what no doubt constitutes a "biased" review (being our own) of more tha
 - model generalization evaluator
 - experiment analytics
 - Random search
+- Pseudo, Quasi, and Quantum Random optimizers
 - Grid search
-- Correlation based optimization
-- Pseudo, Quasi, and Quantum Random functions
+- Probabilistic optimization
 - Model candidate generality evaluation
 - Live training monitor
 - Experiment analytics

diff --git a/docs/AutoModel.md b/docs/AutoModel.md
@@ -0,0 +1,28 @@
+# AutoModel
+
+`AutoModel` provides a meaningful way to test several network architectures in an automated manner. Currently there are five supported architectures:
+
+- conv1d
+- lstm
+- bidirectional_lstm
+- simplernn
+- dense
+
+`AutoModel` creates an input model for Scan(). Optimized for being used together with `AutoParams()` and expects one or more of the above architectures to be included in params dictionary, for example:
+
+```python
+
+p = {...
+    'networks': ['dense', 'conv1d', 'lstm']
+    ...}
+
+```
+
+## AutoModel Arguments
+
+Argument | Input | Description
+--------- | ------- | -----------
+`task` | str or None | `binary`, `multi_label`, `multi_class`, or `continuous`
+`metric` | None or list | One or more Keras metric (functions) to be used in the model
+
+Setting `task` effects which various aspects of the model and should be set according to the specific prediction task, or set to `None` in which case `metric` input is required.
diff --git a/docs/AutoParams.md b/docs/AutoParams.md
@@ -0,0 +1,82 @@
+# AutoParams
+
+`AutoParams()` allows automated generation of comprehensive parameter dictionary to be used as input for `Scan()` experiments as well as a streamlined way to manipulate parameter dictionaries.
+
+#### to automatically create a params dictionary
+
+```python
+p = talos.Autom8.AutoParams().params
+
+```
+NOTE: The above example yields a very large permutation space so configure `Scan()` accordingly with `fraction_limit`.
+
+#### an alternative way where a class object is returned
+
+```python
+param_object = talos.Autom8.AutoParams()
+
+```
+
+Now various properties can be accessed through `param_object`, these are detailed below. For example:
+
+#### modifying a single parameter in the params dictionary
+
+```python
+param_object.batch_size(bottom_value=20, max_value=100, steps=10)
+```
+
+Now the modified params dictionary can be accessed through `params_object.params`
+
+#### to append a current parameter dictionary
+
+```python
+params_dict = talos.Autom8.AutoParams(p, task='multi_label').params
+
+```
+NOTE: Note, when the dictionary is created for a prediction task other than 'binary', the `task` argument has to be declared accordingly (`binary`, `multi_label`, `multi_class`, or `continuous`).
+
+## AutoParams Arguments
+
+Argument | Input | Description
+--------- | ------- | -----------
+`params` | dict or None | If `None` then a new parameter dictionary is created
+`task` | str | 'binary', 'multi_class', 'multi_label', or 'continuous'
+`replace` | bool | Replace current dictionary entries with new ones.
+`auto` | bool | automatically generate or append params dictionary with all available parameters.
+`network` | network | If `True` several model architectures will be added
+
+## AutoParams Properties
+
+The **`params`** property returns the parameter dictionary which can be used as an input to `Scan()`.
+
+The **`resample_params`** accepts `n` as input and resamples the params dictionary so that n values remain for each parameter.
+
+All other properties relate with manipulating individual parameters in the parameter dictionary.
+
+**`activations`** For controlling the corresponding parameter in the parameters dictionary.
+
+**`batch_size`** For controlling the corresponding parameter in the parameters dictionary.
+
+**`dropout`** For controlling the corresponding parameter in the parameters dictionary.
+
+**`epochs`** For controlling the corresponding parameter in the parameters dictionary.
+
+**`kernel_initializer`** For controlling the corresponding parameter in the parameters dictionary.
+
+**`last_activation`** For controlling the corresponding parameter in the parameters dictionary.
+
+**`layers`** For controlling the corresponding parameter (i.e. `hidden_layers`) in the parameters dictionary.
+
+**`losses`** For controlling the corresponding parameter in the parameters dictionary.
+
+**`lr`** For controlling the corresponding parameter in the parameters dictionary.
+
+**`networks`** For controlling the Talos present network architectures (`dense`, `lstm`, `bidirectional_lstm`, `conv1d`, and `simplernn`). NOTE: the use of preset networks requires the use of the input model from `AutoModel()` for `Scan()`. 
+
+**`neurons`** For controlling the corresponding parameter (i.e. `first_neuron`) in the parameters dictionary.
+
+**`optimizers`** For controlling the corresponding parameter in the parameters dictionary.
+
+**`shapes`** For controlling the Talos preset network shapes (`brick`, `funnel`, and `triangle`).
+
+**`shapes_slope`** For controlling the shape parameter with a floating point value to set the slope of the network from input layer to output layer.
diff --git a/docs/AutoPredict.md b/docs/AutoPredict.md
@@ -0,0 +1,33 @@
+# AutoPredict
+
+`AutoPredict()` automatically handles the process of finding the best models from a completed `Scan()` experiment, evaluates those models, and uses the winning model to make predictions on input data.
+
+```python
+scan_object = talos.autom8.AutoPredict(scan_object, x_val=x, y_val=y, x_pred=x)
+```
+
+NOTE: the input data must be in same format as 'x' that was used in `Scan()`.
+Also, `x_val` and `y_val` should not have been exposed to the model during the
+`Scan()` experiment.
+
+`AutoPredict()` will add four new properties to `Scan()`:
+
+**`preds_model`** contains the winning Keras model (function)
+**`preds_parameters`** contains the hyperparameters for the selected model
+**`preds_probabilities`** contains the prediction probabilities for `x_pred`
+**`predict_classes`** contains the predicted classes for `x_pred`.
+
+## AutoPredict Arguments
+
+Argument | Input | Description
+--------- | ------- | -----------
+`scan_object` | class object | the class object returned from `Scan()`
+`x_val` | array or list of arrays | validation data features
+`y_val` | array or list of arrays | validation data labels
+`y_pred` | array or list of arrays | prediction data features
+`n` | int | number of promising models to be included in the evaluation process
+`metric` | None | the metric against which the validation is performed
+`folds` | None | number of folds to be used for cross-validation
+`shuffle` | None | if data is shuffled before splitting
+`average` | str | 'binary', 'micro', 'macro', 'samples', or 'weighted'
+`asc` | None | should be True if metric is a loss
diff --git a/docs/AutoScan.md b/docs/AutoScan.md
@@ -0,0 +1,31 @@
+# AutoScan
+
+`AutoScan()` provides a streamlined way for conducting a hyperparameter search experiment with any dataset. It is particularly useful for early exploration as with default settings `AutoScan()` casts a very broad parameter space including all common hyperparameters, network shapes, sizes, as well as architectures
+
+Configure the `AutoScan()` experiment and then use the property `start` in the returned class object to start the actual experiment.
+
+```python
+auto = talos.autom8.AutoScan(task='binary', max_param_values=2)
+auto.start(x, y, experiment_name='testing.new', fraction_limit=0.001)
+```
+
+NOTE: `auto.start()` accepts all `Scan()` arguments.
+
+## AutoScan Arguments
+
+Argument | Input | Description
+--------- | ------- | -----------
+`task` | str or None | `binary`, `multi_label`, `multi_class`, or `continuous`
+`max_param_values` | int | Number of parameter values to be included
+
+Setting `task` effects which various aspects of the model and should be set according to the specific prediction task, or set to `None` in which case `metric` input is required.
+
+## AutoScan Properties
+
+The only property **`start`** starts the actual experiment. `AutoScan.start()` accepts the following arguments:
+
+Argument | Input | Description
+--------- | ------- | -----------
+`x` | array or list of arrays | prediction features
+`y` | array or list of arrays | prediction outcome variable
+`kwargs` | arguments | any `Scan()` argument can be passed into `AutoScan.start()`
diff --git a/docs/Custom_Reducers.md b/docs/Custom_Reducers.md
@@ -0,0 +1,17 @@
+# Custom Reducer
+
+A custom reduction strategy can be created and dropped into Talos. Read more about the reduction principle
+
+There are only two criteria to meet:
+
+- The input of the custom strategy is 2-dimensional
+- The output of the custom strategy is in the form:
+
+```python
+return label, value
+```
+Here `value` is any hyperparameter value, and `label` is the name of any hyperparameter. Any arbitrary strategy can be implemented, as long as the input and output criteria are met.
+
+The file containing the strategy can then be placed in `/reducers` in Talos package, and corresponding changes made into `/reducers/reduce_run.py` to make the strategy available in `Scan()`. Having done this, the reduction strategy is now available as per the example [above](#probabilistic-reduction).
+
+A [pull request](https://github.com/autonomio/talos/pulls) is highly encouraged once a beneficial reduction strategy has been successfully added.
diff --git a/docs/Evaluate.md b/docs/Evaluate.md
@@ -22,13 +22,13 @@ NOTE: It's very important to save part of your data for evaluation, and keep it
 
 Parameter | Default | Description
 --------- | ------- | -----------
-x | NA | the predictor data x
-y | NA | the prediction data y (truth)
-model_id | None | the model_id to be used
-folds | None | number of folds to be used for cross-validation
-shuffle | None | if data is shuffled before splitting
-average | 'binary' | 'binary', 'micro', 'macro', 'samples', or 'weighted'
-metric | None | the metric against which the validation is performed
-asc | None | should be True if metric is a loss
+`x` | NA | the predictor data x
+`y` | NA | the prediction data y (truth)
+`model_id` | None | the model_id to be used
+`folds` | None | number of folds to be used for cross-validation
+`shuffle` | None | if data is shuffled before splitting
+`average` | 'binary' | 'binary', 'micro', 'macro', 'samples', or 'weighted'
+`metric` | None | the metric against which the validation is performed
+`asc` | None | should be True if metric is a loss
 
 The above arguments are for the <code>evaluate</code> attribute of the <code>Evaluate</code> object.
diff --git a/docs/Examples_AutoML.md b/docs/Examples_AutoML.md
@@ -0,0 +1,51 @@
+# AutoML
+
+Performing an AutoML style hyperparameter search experiment with Talos could not be any easier.
+
+The single-file code example can be found [here](Examples_AutoML_Code.md).
+
+### Imports
+
+```python
+import talos
+import wrangle
+```
+
+### Loading Data
+```python
+x, y = talos.templates.datasets.cervical_cancer()
+
+# we spare 10% of data for testing later
+x, y, x_test, y_test = wrangle.array_split(x, y, .1)
+
+# then validation split
+x_train, y_train, x_val, y_val = wrangle.array_split(x, y, .2)
+```
+
+`x` and `y` are expected to be either numpy arrays or lists of numpy arrays and same applies for the case where `x_train`, `y_train`, `x_val`, `y_val` is used instead.
+
+### Defining the Model
+
+In this case there is no need to define the model. `talos.autom8.AutoModel()` is used behind the scenes, where several model architectures fully wired for Talos are found. We simply initiate the `AutoScan()` object first:
+
+```python
+autom8 = talos.autom8.AutoScan('binary', 5)
+```
+
+### Parameter Dictionary
+
+There is also no need to worry about the parameter dictionary. This is handled in the background with `AutoParams()`.
+
+
+### Scan()
+
+The `Scan()` itself is started through the **`start`** property of the `AutoScan()` class object.
+
+```python
+autom8.start(x=x_train,
+             y=y_train,
+             x_val=x_val,
+             y_val=y_val,
+             fraction_limit=0.000001)
+```
+We pass data here just like we would do it in `Scan()` normally. Also, you are free to use any of the `Scan()` arguments here to configure the experiment. Find the description for all `Scan()` arguments [here](Scan.md#scan-arguments).
diff --git a/docs/Examples_AutoML_Code.md b/docs/Examples_AutoML_Code.md
@@ -0,0 +1,22 @@
+[BACK](Examples_AutoML.md)
+
+# AutoML
+
+```python
+
+x, y = talos.templates.datasets.cervical_cancer()
+
+# we spare 10% of data for testing later
+x, y, x_test, y_test = wrangle.array_split(x, y, .1)
+
+# then validation split
+x_train, y_train, x_val, y_val = wrangle.array_split(x, y, .2)
+
+autom8 = talos.autom8.AutoScan('binary', 5)
+
+autom8.start(x=x_train,
+             y=y_train,
+             x_val=x_val,
+             y_val=y_val,
+             fraction_limit=0.000001)
+```