Generalizes the original PBT pipeline into a fully blackbox type API by decomposing training into async trials and introducing a PBT controller in charge of coordination. This is inspired by the Google Vizier architecture. Extend experimental PBT results to WaveNet
Mainly API/program architecture/distributed systems design paper. No really strong improvements on original PBT idea.
-
Problem PBT application: require continuous and simult training of all workers and may encounter issues when workers could be preempted by other higher priority jobs in real distributed working envs
- Also glass-box: trainer has to know info about parallel workers and performs weight copying and hyperparam changes - can require computational graph to be changed (i.e. activations)
- Parallel workers read and write to the same database and decide to warmstart from another workers checkpoint!
-
Glass-box limitations:
- Cant make changes to comp graph easily
- Does not easily handle case of worker job being preempted by another worker job
- Not extendable to advanced evolutation/mutation strategies
-
Here: Propose to decompose whole training into multiple trials
- In each trial a worker only trains for limited amount of time
- Each trial is dependent on one or more other trials (exploit idea)
- trainer/controller introduced to oversee the whole population: Decides about the hyperparams and checkpoint for warmstarting the agent
-
Vizier hyperparam service: blackbox - makes setup of experiments easier - highest flexibility
-
Succes of PBT - makes decisions based on incomplete obervations = non/convergent objective values
-
Generalized framework = stateless service - each request to service does not depend on any other. Allows for high scalability
- All decisions are being mad by a central controller
- Small work load for each trial executed by each worker
- Trial = continuous training session - use protocol buffer
- Represent parameters in a DAG to incorporate notion parameter dependence
-
Advantages of this approach:
- Allows for tuning of all hyperparams regardless of comp graph
- Allows for training diff/non-diff objectives
- Allows for all hyperparams to be adapted dynamically over time
- Maximizes flexibility!
-
Initiator Based Evolution: Every trial is guaranteed to participate in at least one reproduction procedure - guarantee is important for small population sizes
- Fitness: Can be multiple criteria. Using weighting or Pareto improvement criterion for comparison
- Repro:
- Initiator: Assymetric evo - send request to trainer to init competition
- Opponent selection: Compare to last 2 generations of potential opponents
- Winner = Parent for current reproduction: Every population member participates in a binary tournament once and only once
- Reproduction: Copy from parent + crossover/mutation
- For categorical params - sample = mutate
- For discrete params = sample 0/1 lower or higher next values
-
Proposed Budget mode: simulate evolution with large generation size using small number of workers
- Only start reproduction when initiators generation has reached specified population size
-
Training Replay: Ultimately need to train on a slightly modified dataset (train + val + test) - simply apply previously obtained schedule
- Can also apply only subset of schedule - useful when doing ensembling
- Trial dependency graph: training replay requires extracting dependency graph of cetrain final trial in order to perform same training procedure.Do so via topoligical sort
-
Opponent selection strategy: Only look at past 1-2 generations appears to yield best results!
- In Jaderberg et al PBT is introduced as being async. Here claimed that not - sznc glass-box system