-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MLJ Integration #226
MLJ Integration #226
Conversation
@ablaom I'd be very interested to hear your thoughts on this implementation as well. |
Thanks for working on an MLJ integration. Very exciting to have new kind class of model, and implemented in Julia, right? Currently on leave but will definitely be supporting this. @OkonSamuel may also be able to look at this. On a quick scan, I couldn't find any trait definitions (aka metadata) as defined, e.g. using And it will be good idea to add |
Fantastic, thank you. Will add the metadata. Yes, this package is 100% Julia. The Python frontend PySR is just a lightweight scikit-learn wrapper via PyJulia, but the engine is all on the Julia side. So far the existing Julia interface in SymbolicRegression.jl is not as user-friendly, so I am looking forward to adding this MLJ integration as I think it will certainly improve that, as well as extensibility. |
Benchmark Results
Benchmark PlotsA plot of the benchmark results have been uploaded as an artifact to the workflow run for this PR. |
For this, I think a natural way in MLJ would be to allow users to pass |
@MilesCranmer I have taken a first pass through the code and given my review above. |
I just want to say that having integration here or a separate packages are both fine, the latter is not uncommon: |
Thanks. I wonder if those were created before the lightweight interface package was created? It looks like the time-to-load is negligible ( #226 (comment) ) so we could just make it a hard dependency for simplicity. |
In case this is relevant to the discussion about the hyperparmater struct (aka "options" or, in MLJ, "model") (I didn't follow the details): The actual struct fields do not matter. The MLJ interface only interacts with the struct through properties, which can be overloaded to be different from the fields. But, yes the (public) keyword constructor should match the properties, if these are different from the fields. For example, in MLJ's SRR(;options = SymbolicRegression.Options(
binary_operators=[+, *, /, -],
unary_operators=[cos, exp],
npopulations=20
)) |
Okay I think I found a nice way around this. The way it is implemented is far from elegant, but the user point of view is nice. It uses some macro tricks to store all the parameters and defaults from the I thought about just using For better or for worse, now you get all hyperparameters directly in the SRRegressor (see expanded snippet):julia> model = SRRegressor(binary_operators=[+, *, -, /], unary_operators=[cos], maxsize=25, nested_constraints=[cos=>[cos=>0]], niterations=100)
SRRegressor(
binary_operators = Function[+, *, -, /],
unary_operators = [cos],
constraints = nothing,
elementwise_loss = nothing,
loss_function = nothing,
tournament_selection_n = 12,
tournament_selection_p = 0.86f0,
topn = 12,
complexity_of_operators = nothing,
complexity_of_constants = nothing,
complexity_of_variables = nothing,
parsimony = 0.0032f0,
alpha = 0.1f0,
maxsize = 25,
maxdepth = nothing,
fast_cycle = false,
turbo = false,
migration = true,
hof_migration = true,
should_simplify = nothing,
should_optimize_constants = true,
output_file = nothing,
npopulations = 15,
perturbation_factor = 0.076f0,
annealing = false,
batching = false,
batch_size = 50,
mutation_weights = MutationWeights(0.048, 0.47, 0.79, 5.1, 1.7, 0.002, 0.00023, 0.21, 0.0),
crossover_probability = 0.066f0,
warmup_maxsize_by = 0.0f0,
use_frequency = true,
use_frequency_in_tournament = true,
adaptive_parsimony_scaling = 20.0,
npop = 33,
ncycles_per_iteration = 550,
fraction_replaced = 0.00036f0,
fraction_replaced_hof = 0.035f0,
verbosity = 1000000000,
save_to_file = true,
probability_negate_constant = 0.01f0,
seed = nothing,
bin_constraints = nothing,
una_constraints = nothing,
progress = true,
terminal_width = nothing,
optimizer_algorithm = "BFGS",
optimizer_nrestarts = 2,
optimizer_probability = 0.14f0,
optimizer_iterations = nothing,
optimizer_options = nothing,
recorder = nothing,
recorder_file = "pysr_recorder.json",
early_stop_condition = nothing,
timeout_in_seconds = nothing,
max_evals = nothing,
skip_mutation_failures = true,
enable_autodiff = false,
nested_constraints = [cos => [cos => 0]],
deterministic = false,
define_helper_functions = true,
niterations = 100,
parallelism = :multithreading,
numprocs = nothing,
procs = nothing,
addprocs_function = nothing,
runtests = true,
loss_type = Nothing,
selection_method = SymbolicRegression.MLJInterfaceModule.choose_best)
There are a couple of nested hyperparameters, for example |
31816b8
to
560fd78
Compare
78bc164
to
dbf540b
Compare
Okay the output format is updated and seems to work. |
Okay I have added the full docstrings. Anything else missing? I'll probably update the README shortly after merging. I'm really liking MLJ so far, so I think I'll just have that be the default user interface, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can confirm that the documentation renders properly on my terminal.
This looks very good and detailed. Thanks @MilesCranmer, We can wait to have this addition to the MLJ registry.
Added! Once it merges, what should I do to add it to the MLJ registry? Should I submit an issue requesting a registry update on the MLJ repo? |
@OkonSamuel I also realized I need to fix the behavior of sample weights for multitarget case. What is the expected format of the sample weights |
Actually I think I just fixed it. MLJ assumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!
Thanks so much for your help @OkonSamuel; much appreciated! |
[Diff since v0.19.1](v0.19.1...v0.20.0) **Closed issues:** - [Feature]: MLJ integration (#225) - [Feature]: ternary operators (#227) **Merged pull requests:** - MLJ Integration (#226) (@MilesCranmer)
@ericphanson I tried an initial implementation. How is this?
Potentially fixes #225
The current things missing are:
extra
parameter viaSymbolicRegression.Dataset(...; extra::NamedTuple=...)
). Maybe this just isn't suitable for MLJ though?variable_names
, or, once Dimensional constraints #220 merges, passunits
. Again, perhaps this just is not MLJ-compatible so we don't need to worry about it.Here's a demo:
The default selection going into
.best_idx
is set bymodel.selection_method
which returns an index for chosen equation.@Moelf is this the right approach for MLJ or would it be better to have a separate package? MLJModelInterface.jl looks pretty lightweight.