Skip to content

Improvements for evolutionary algorithms #185

@PeterDSteinberg

Description

@PeterDSteinberg

I had a meeting with Grey Nearing (NASA - Goddard) to discuss hyperparameterization in hydrology, for statistical models as in Elm or for physical hydrology models. We discussed a few improvements to do on the evolutionary algorithms for statistical models:

  • Research ways to support integer combinatoric problems (what is currently handled by elm.pipeline.evolve_train and related code) where some parameters are continuous variables rather than enumerated choices. This would involve writing a mutation method that is custom for each parameter, using mutUniformInt (see example below) for discrete choice problems, and a different mutation method from deap for continuous parameters.
    • Continuous parameters could be specified with one or more of the following options:
      • Simple upper / lower bounds with uniform sample between them
      • Random distribution with bounds? Named distributions or passing in a random generator like np.random.lognormal?
  • Also in evolutionary algorithms, as well as the function currently called ensemble or fit_ensemble where custom model selection functions are used, we need to consider cases where the model structure is changing throughout the optimization, e.g. choices for number of components in a PCA preprocessing step that are [4, 5, None] to indicate 4 or 5 components or no PCA at all.
  • How do we handle cases where the user wants to hyperparameterize, either integer, continuous or mixed problems, where there is some combination of parameters that should not be considered, e.g. I want one of [2,3,4] for a and one of [10,20,30] for b but never the combination of (2,20) for a,b. I think this is handled by the param grid specification in scikit-learn and we should follow that convention.
  • Stretch goal - how can we extend the evolutionary algorithm ideas a level higher, to optimally select parameters of models, structures of models, or even alternate approaches (e.g. select a stats model from Elm with hyperparameterization vs VIC vs Noah MP).

Example NSGA-2 control specification for elm.pipeline.evolve_train as it works currently from Phase I:

nsga_control = {
    'select_method': 'selNSGA2',
    'crossover_method': 'cxTwoPoint',  # TODO can we modify for float/int
    'mutate_method': 'mutUniformInt',  # TODO same comment as ^^
    'init_pop': 'random',
    'indpb': 0.6,     # probability of each attribute changing
    'mutpb': 0.9,     # probability of mutation
    'cxpb':  0.5,     # probability of crossover
    'eta':   20,      # eta: Crowding degree of the crossover
                      # A high eta will produce children resembling
                      # to their parents, while a small eta will
                      # produce solutions much more different.
    'ngen':  3,       # Number of generations
    'mu':    64,      # Population size
    'k':     24,      # Number selected to move to next generation
    'early_stop': {'threshold': [0, 1, 1], 'agg': all},
    # alternatively 'early_stop': {'abs_change': [10], 'agg': 'all'},
    # alternatively early_stop: {percent_change: [10], agg: all}
    # alternatively early_stop: {threshold: [10], agg: any}
}

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions