Improvements for evolutionary algorithms

I had a meeting with Grey Nearing (NASA - Goddard) to discuss hyperparameterization in hydrology, for statistical models as in Elm or for physical hydrology models.  We discussed a few improvements to do on the evolutionary algorithms for statistical models:
 * [x] Research ways to support integer combinatoric problems (what is currently handled by `elm.pipeline.evolve_train` and related code) where some parameters are continuous variables rather than enumerated choices.  This would involve writing a mutation method that is custom for each parameter, using `mutUniformInt` (see example below) for discrete choice problems, and a different mutation method from [`deap` for continuous parameters](http://deap.readthedocs.io/en/master/api/tools.html).
   * [x] Continuous parameters could be specified with one or more of the following options:
     * [ ] Simple upper / lower bounds with uniform sample between them
     * [x] Random distribution with bounds?  Named distributions or passing in a random generator like `np.random.lognormal`?
 * [ ] Also in evolutionary algorithms, as well as the function currently called `ensemble` or `fit_ensemble` where custom model selection functions are used, we need to consider cases where the model structure is changing throughout the optimization, e.g. choices for number of components in a PCA preprocessing step that are `[4, 5, None]` to indicate 4 or 5 components or no PCA at all.
 * [x] How do we handle cases where the user wants to hyperparameterize, either integer, continuous or mixed problems, where there is some combination of parameters that should not be considered, e.g. I want one of `[2,3,4]` for `a` and one of `[10,20,30]` for `b` but never the combination of `(2,20)` for `a,b`.  I think this is handled by the param grid specification in scikit-learn and we should follow that convention.
 * [ ] Stretch goal - how can we extend the evolutionary algorithm ideas a level higher, to optimally select parameters of models, structures of models, or even alternate approaches (e.g. select a stats model from Elm with hyperparameterization vs VIC vs Noah MP). 

Example NSGA-2 control specification for `elm.pipeline.evolve_train` as it works currently from Phase I:

```
nsga_control = {
    'select_method': 'selNSGA2',
    'crossover_method': 'cxTwoPoint',  # TODO can we modify for float/int
    'mutate_method': 'mutUniformInt',  # TODO same comment as ^^
    'init_pop': 'random',
    'indpb': 0.6,     # probability of each attribute changing
    'mutpb': 0.9,     # probability of mutation
    'cxpb':  0.5,     # probability of crossover
    'eta':   20,      # eta: Crowding degree of the crossover
                      # A high eta will produce children resembling
                      # to their parents, while a small eta will
                      # produce solutions much more different.
    'ngen':  3,       # Number of generations
    'mu':    64,      # Population size
    'k':     24,      # Number selected to move to next generation
    'early_stop': {'threshold': [0, 1, 1], 'agg': all},
    # alternatively 'early_stop': {'abs_change': [10], 'agg': 'all'},
    # alternatively early_stop: {percent_change: [10], agg: all}
    # alternatively early_stop: {threshold: [10], agg: any}
}

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improvements for evolutionary algorithms #185

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improvements for evolutionary algorithms #185

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions