Skip to content

Conversation

@MilesCranmer
Copy link
Owner

@MilesCranmer MilesCranmer commented Jun 24, 2024

These new experimental Expression types store both the operators and variable names within the object, rather than the plain Node which only stores the enum information about an expression.

This also adds ParametricExpression to learn basis expressions that have variable constants depending on class:

using SymbolicRegression
using Random: MersenneTwister
using Zygote
using MLJBase: machine, fit!, predict

rng = MersenneTwister(0)
X = NamedTuple{(:x1, :x2, :x3, :x4, :x5)}(ntuple(_ -> randn(rng, Float32, 30), Val(5)))
X = (; X..., classes=rand(rng, 1:2, 30))
p1 = rand(rng, Float32, 2)
p2 = rand(rng, Float32, 2)

y = [
    2 * cos(X.x4[i] + p1[X.classes[i]]) + X.x1[i]^2 - p2[X.classes[i]] for
    i in eachindex(X.classes)
]

model = SRRegressor(;
    niterations=10,
    binary_operators=[+, *, /, -],
    unary_operators=[cos, exp],
    populations=10,
    expression_type=ParametricExpression,  # Subtype of `AbstractExpression`
    expression_options=(; max_parameters=2),
    autodiff_backend=:Zygote,
    parallelism=:multithreading,
)

mach = machine(model, X, y)
fit!(mach)
ypred = predict(mach, X)

so it basically learns $y= 2 \cos(x_4 + \alpha) + x_1^2 - \beta$ for $\alpha$ and $\beta$ parameters (which can be different according to the classes parameter – here there are two classes/types of behavior).

This ParametricExpression is just a single implementation of AbstractExpression but you can see how you can do pretty custom things now.

Fixes #340. Fixes #337. Fixes #336.


TODO:

  • Allow passing a class feature to MLJ which will have special treatment.
  • Debug why some of the tests seem to get stuck and take 3x longer to finish than normal.
  • Consider documenting this, or just leaving it as an experimental undocumented feature until it stabilizes.
  • Add Enzyme backend.
  • Add example to docs.
  • Consider moving to Literate.jl for docs?
  • Fix ResourceMonitor weirdness

@atharvas
Copy link
Contributor

I was encountering some issues with constraint parsing in Options.jl. Check out the comment. Don't know why the test cases don't catch the issue.

@MilesCranmer
Copy link
Owner Author

Going to punt StructuredExpressions until later. @eelregit let me know if you are at all interested in this! StructuredExpression would let you evolve within a fixed functional form. Seems like there are a couple missing methods that would allow it to work but hopefully won't take too much work. I'll have to pause on this side of things for now.

@MilesCranmer
Copy link
Owner Author

Seems like the garbage collection is going crazy in the tests, which is why they are so slow. The reason why 1.6 and 1.8 are much faster is – I think – because DispatchDoctor.jl is turned off. So something about DispatchDoctor.jl is causing the GC to overwork itself... Possibly related to MilesCranmer/DispatchDoctor.jl#57 and MilesCranmer/DispatchDoctor.jl#58?

@MilesCranmer MilesCranmer force-pushed the parametric-expressions branch from 10f396b to daee883 Compare October 6, 2024 16:00
@MilesCranmer
Copy link
Owner Author

MilesCranmer commented Oct 6, 2024

Fixed the performance regression in the unittests with SymbolicML/DynamicExpressions.jl@74c8dc1.

Edit: still seems to hang around a bit. It's something to do with DispatchDoctor for sure, from studying the PProf outputs. So it won't affect actual runtime performance, just the testing. So probably fine to merge for now.

@MilesCranmer MilesCranmer enabled auto-merge October 6, 2024 20:33
@MilesCranmer MilesCranmer merged commit 749cc34 into master Oct 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants