Hyperparameter optimization #115
Replies: 10 comments 1 reply
-
I started a search for the problems: np.cos(2.3 * X[:, 0]) * np.sin(2.3 * X[:, 0] * X[:, 1] * X[:, 2]) - 10.0,
(np.exp(X[:, 3]*0.3) + 3)/(np.exp(X[:, 1]*0.2) + np.cos(X[:, 0]) + 1.1) for I allowed the search 4 cores, and a maximum time of 10 minutes (unlimited iterations) using I ran a search 3 times for each of the two expressions (total of 1 hour for an evaluation). I set a maxsize of 30. After the search, I took the loss of the most accurate expression found. I took the median of these losses over the 3 searches, and then the average over the two expressions. After 9366 trials, the best hyperparameters found were:
This is quite surprising and it's far away from the current defaults. (note that I fix The search should continue running for the next week so I'll update this with any updated parameters. |
Beta Was this translation helpful? Give feedback.
-
@JayWadekar @patrick-kidger @kazewong potentially of interest to each of you. |
Beta Was this translation helpful? Give feedback.
-
71000 trials in. Keep in mind these were only tuned for the above problems, and likely should be re-tuned for specific problem domains. Also keep in mind it was limited to 10 minutes on 4 cores. More cores and time probably justifies larger numbers of expressions at once. Here are the single best hyperparameters: loss 0.278591
alpha 0.048618
annealing False
binary_operators ['*', '/', '+', '-']
crossoverProbability 0.051212
fractionReplaced 0.026796
fractionReplacedHof 0.036723
maxsize 30
model_selection accuracy
ncyclesperiteration 634.0
niterations 10000
npop 38.0
optimize_probability 0.154329
optimizer_iterations 8.0
optimizer_nrestarts 2.0
parsimony 0.003075
perturbationFactor 0.002067
populations 15.0
skip_mutation_failures True
topn 15.0
tournament_selection_p 0.80458
unary_operators ['sin', 'cos', 'exp', 'log']
useFrequency True
warmupMaxsizeBy 0.076342
weightAddNode 0.401272
weightDeleteNode 1.271694
weightDoNothing 0.078377
weightInsertNode 6.211138
weightMutateConstant 0.079556
weightMutateOperator 0.526905
weightRandomize 0.00027
weightSimplify 0.002 Since this might be noisy - perhaps this trial got lucky - here are the median hyperparameters of the top 10 trials: (Ran with loss 0.293183
alpha 0.036668
annealing False
crossoverProbability 0.065701
fractionReplaced 0.000364
fractionReplacedHof 0.034940
maxsize 30.000000
ncyclesperiteration 555.500000
niterations 10000.000000
npop 33.000000
optimize_probability 0.137467
optimizer_iterations 8.000000
optimizer_nrestarts 2.000000
parsimony 0.003176
perturbationFactor 0.076306
populations 15.500000
skip_mutation_failures True
topn 12.000000
tournament_selection_p 0.859390
useFrequency True
warmupMaxsizeBy 0.083059
weightAddNode 0.791716
weightDeleteNode 1.734292
weightDoNothing 0.214623
weightInsertNode 5.103537
weightMutateConstant 0.048217
weightMutateOperator 0.474846
weightRandomize 0.000234
weightSimplify 0.002000 We can also look at the spread on these. Many of these are log distributed, so let's look at the standard deviation in log10 space Using This gives the following: loss 0.008101
alpha 0.229157
crossoverProbability 0.174091
fractionReplaced 0.633015
fractionReplacedHof 0.163865
maxsize 0.000000
ncyclesperiteration 0.064149
niterations 0.000000
npop 0.098414
optimize_probability 0.167777
optimizer_iterations 0.053846
optimizer_nrestarts 0.155451
parsimony 0.126659
perturbationFactor 0.568420
populations 0.076493
topn 0.035872
tournament_selection_p 0.036660
warmupMaxsizeBy 0.144004
weightAddNode 0.252111
weightDeleteNode 0.152587
weightDoNothing 0.196531
weightInsertNode 0.118023
weightMutateConstant 0.346641
weightMutateOperator 0.185854
weightRandomize 1.023768
weightSimplify 0.000000
dtype: float64 One can roughly interpreting "1" here as meaning the error on the mean (in log space) is from 10x the value to 1/10x the value. The most uncertain values are Edit: here's the copy-pastable version, with only the main hyperparams (niterations, maxsize, warmupMaxsizeBy, and the operators are excluded): alpha=0.036668,
annealing=False,
crossoverProbability=0.065701,
fractionReplaced=0.000364,
fractionReplacedHof=0.034940,
ncyclesperiteration=555,
npop=33,
optimize_probability=0.137467,
optimizer_iterations=8,
optimizer_nrestarts=2,
parsimony=0.003176,
perturbationFactor=0.076306,
populations=15,
skip_mutation_failures=True,
topn=12,
tournament_selection_p=0.859390,
useFrequency=True,
weightAddNode=0.791716,
weightDeleteNode=1.734292,
weightDoNothing=0.214623,
weightInsertNode=5.103537,
weightMutateConstant=0.048217,
weightMutateOperator=0.474846,
weightRandomize=0.000234,
weightSimplify=0.002000, |
Beta Was this translation helpful? Give feedback.
-
For posterity here is the mean of the top 10 trials computed in log space ( loss 0.292467
alpha 0.036416
crossoverProbability 0.064505
fractionReplaced 0.000479
fractionReplacedHof 0.034607
maxsize 30.000000
ncyclesperiteration 541.969433
niterations 10000.000000
npop 31.599738
optimize_probability 0.127483
optimizer_iterations 8.540416
optimizer_nrestarts 1.515717
parsimony 0.003407
perturbationFactor 0.063978
populations 15.777888
topn 12.360778
tournament_selection_p 0.853237
warmupMaxsizeBy 0.078130
weightAddNode 0.841462
weightDeleteNode 1.753645
weightDoNothing 0.210091
weightInsertNode 5.129917
weightMutateConstant 0.033357
weightMutateOperator 0.494547
weightRandomize 0.000647
weightSimplify 0.002000 |
Beta Was this translation helpful? Give feedback.
-
Hmm. Lots of interesting things here.
|
Beta Was this translation helpful? Give feedback.
-
Yes, very curious indeed. Thanks for this writeup. Quick comments before I run to a meeting:
|
Beta Was this translation helpful? Give feedback.
-
This sounds right: too much migration would reduce diversity, so it's important to either reduce the migration rate, or increase Regarding the weights, it might be interesting to only optimize those and look at the result. My guess is they vary quite a bit by problem - more complex expressions would probably favor the mutations which add nodes to the tree. |
Beta Was this translation helpful? Give feedback.
-
Yep, everything you've said, across both posts, makes sense. |
Beta Was this translation helpful? Give feedback.
-
One interesting thing I noticed: increasing the floating point precision from float32 to float64 doesn't really hurt evaluation speed: a single evaluation of a 48-token expression over 200 datapoints only changes from 9.791 us to 10.542 us (see However, having higher precision can help search speed by avoiding weird combinations of constants. Sometimes the genetic algorithm figures out it can achieve higher precision of a particular constant by stacking multiple constants together. I know this sounds weird, but from the genetic algorithm's perspective, whatever gives it higher accuracy is a good thing! One example is to achieve 2/3 to higher precision (since you can't represent this exactly in floating point numbers), it would simply multiply by 2.0 and divide by 3.0 in a subsequent operation (both of which can be stored exactly), rather than store a single constant approximately. From my very non-rigorous experiments, it seems like using higher precision in the search (you can set this in PySR with |
Beta Was this translation helpful? Give feedback.
-
Running some new tuning runs in https://github.com/MilesCranmer/pysr_wandb - W&B sweep file here: https://github.com/MilesCranmer/pysr_wandb/blob/master/sweep.yml. According to W&B, these are the most important parameters to tune: My initial conclusion is that the following change to hyperparameters (from the current defaults) is quite good: model.set_params(
population_size=75, # default 33
tournament_selection_n=23, # default 10
tournament_selection_p=0.8, # default 0.86
ncyclesperiteration=100, # default 550
parsimony=1e-3, # default 0.0032
fraction_replaced_hof=0.08, # default 0.035
optimizer_iterations=25, # default 8
crossover_probability=0.12, # default 0.066
weight_optimize=0.06, # default 0.0
populations=50, # default 15
adaptive_parsimony_scaling=100.0, # default 20
) although |
Beta Was this translation helpful? Give feedback.
-
The file
hyperparamopt.py
in benchmarks is for doing distributed hyperparameter optimization. It has been useful in the past for tuning the defaults of PySR for a generic set of problems.print_best_model.py
will print the results.This discussion thread will hold various hyperparam solutions to different problems, to try to see if there are some better defaults to use, etc. (anybody can feel free to post good parameter sets they found).
Beta Was this translation helpful? Give feedback.
All reactions