-
-
Notifications
You must be signed in to change notification settings - Fork 215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add unified System
type
#3543
base: master
Are you sure you want to change the base?
Conversation
Is the idea to write down all system types (ODE/NonLinear/SDE/...) through |
Yes. Everything is a |
Even now we have bits of this. DDEs are constructed through ODESystem, SDDEs through SDESystem. |
True. Very interesting, I think it sounds like a great idea! Simpler for the user, and I assume it simplifies the code base significantly and reduces the maintenance burden of duplicating changes across every system type. Would structural simplification continue to work independently of the system type? Personally I would be in favor of other similar "merges/simplifications" too, if they don't force ambiguities or "loss of information". E.g. to unify variables and parameters, specify parameter dependencies as normal (initialization?) equations, detect the independent variable from the only differentiated variable, etc. I guess that's a different discussion, though. |
Those are all on the list. One at a time |
Yeah, maintaining so many different systems is really annoying. See the reviews in #3497 (review).
Also something we're working on
iirc we did this at one point with the |
Cool! Yes, one at the time. Sorry for going off topic, just excited about the push in this direction 😅 |
Along those lines, while keeping |
Yes, that's one thing here I hope we can get to. The main reason is because we want to capture the jump stuff at a higher level, since if you've already created ConstantRateJump/VariableRateJump then you've already committed to it being irregular / directly handled jumps. This makes it not possible to then switch over to the RegularJump description of it as Poisson variables for tau-leaping. But that also means there are restrictions. ConstantRateJump/VariableRateJump is more general in what it allows, since that definition is able to potentially change the tolerance of the integrator as its @parameters rate
@poissonian eta(rate*A*B)
D(A) ~ alpha*A - eta
D(B) ~ alpha2*B - eta
D(AB) ~ eta The things that are weird about a Poissonian representation is:
So it's a bit tricker than a Brownian. |
Having the rate as an argument would be consistent with how it is written mathematically, and detecting the coefficients, and putting them in a list to internally build an affect function, seems not much harder than what is done for building the SDE noise matrix now. I agree it would be less general in the affect, but we don't need to require its use -- it would just be a nice convenience for people that have equations easily written in that form. |
Just to give more color here, there's already pieces that overlapping in several ways.
So the proliferation of types is making it difficult to realize the combinations, especially the ones that aren't common and don't have names today. Why not have a JumpProblem with an ImplicitDiscreteSystem? You might think that's crazy, but that's actually a nice internal representation of implicit Tau leaping 😅, so why not use it as a codegen path? Why not mix SDDE and JumpSystem? DAEs and jumps are the way of handling conservation laws. The reality is that there is one fundamental mathematical unpinning that we're settling down towards. One piece to pull in that mathematically makes it make sense is the Levy Decomposition Theorem, or the Lévy–Khintchine representation https://en.wikipedia.org/wiki/L%C3%A9vy_process, which states that all continuous-time stochastic processes can be written in this decomposition of Brownian motions and jumps. This means that even any noise colored noise process, like pink noise, in theory can be written down in this system (e.g. you can represent pink noise as an SDE with white noise processes, and the solution will have the probability distribution of pink noise). Thus we are taking that foundation, making it all implicit functions so instead of It's just optimal discrete-continuous differential algebraic stochastic jump delay equations, or ODCDASJDEs for short. And we can put neural networks in there, so it's just your standard differentiable Neural ODCDASJDEs symbolic-numeric framework. That's it guys, we can touch grass now. But it's funny that every subset of the two features tends to be enough of a novelty to be a domain and active field of research:
and I can keep going. The key is that we are trying to smoothly move between combinations of more than just two of these fields: the naming isn't even developed, and the combination of stuff doesn't have names generally beyond like 3, which makes sense because the number of combinations is egregiously large. But you can imagine how this is going. I have an implicit jump diffusion with some continuous event handling in which I want to train a neural network. For the Brownian motions, I first simplify it with a Girsanov transformation https://docs.sciml.ai/ModelingToolkit/dev/systems/SDESystem/#ModelingToolkit.Girsanov_transform to turn the general noise into an additive noise system, and then use a MomentClosure https://github.com/augustinas1/MomentClosure.jl to approximate the jumps as ODEs, and on the additive noise Neural SDAE I use index reduction to send it to index-1 form and train the neural network use the differentiable SDAE solver, where the codegen gives me optimized Jacobians (sparsity etc.) and the symbolic-numeric improvements that are coming very soon. That's the dream. And I think having to make MomentClosure for JumpSystem, JumpSystem + Optimization, etc. and Discrete, etc. is a type mess. This system type, the ODCDASJDEs, are the standard form. Every transformation takes a ODCDASJDE to an ODCDASJDE. We will need to be diligent that passes check assumptions though (i.e. this is only valid if there are no jumps, no discrete, etc.) but those can be a standard information ( |
So the way to think about this as a compiler is the system type is a stable IR representation with layered dialects. If you think about LLVM for example, there are dialects of MLIR like Stable HLO. You can write passes in Stable HLO, and then lower to LLVM IR that doesn't use the higher intrinsic, and at the end of the day the codegen requires going from that final form that is closer to the machine matching. So MLIR has dialects for the things like linear algebra, which you do optimizations on, before lowering that to "everything is just arrays and indexing memory", and on that lower level the LLVM IR knows how to generate machine code because that is a closer model to the machine. In the same way, the ODCDASJDE System needs to model the mathematical problem types it's trying to build towards. So it has noise_eqs, callbacks, jumps, discrete subsystem, and cost functions possible. But the construction just comes from an array of equations: @possonian eta(A*B)
@brownian zeta(t)
eqs = [
D(A) ~ D(B) + eta + z
D(B) ~ -eta + zeta
0 ~ sin(z(k+1) +z(k))
] The lowering passes on eqs are then systematically decomposing the higher level representation to the lower level representation: the brownians make the noise_eqs, the discrete system is separated out, the jumps are constructed, etc. Each of those are just passes in structural simplify, which we should make independent functions when we can. Expanding connections can also be thought of a lowering here too, just one specifically about hierarchical models. Then when building a problem, we validate. To build, This form of layering transformations can make it valid to say, construct the eqs without any brownians and instead directly use noise_eqs, because you're basically just intercepting the compiler at a lower level. That's not something to commit to as a stable public API of course, but it is something that we should document as a lower step to make the process easier to understand internally, and if someone feels really compelled to over-optimize they could directly write to noise_eqs, though we would highly highly recommend they don't and we'd like to see benchmarks of how much time this is saving if we're going to help with maintaining that at all (it better be at least like a 25% time reduction). The only piece that is hard to make a fully independent part of the pipeline is the final simplifications of structural simplification, namely the index reduction and tearing passes. This is because the tearing passes include SCC construction and scheduling, as a major form of CSE and specialized codegen information. So tearing and index reduction somewhat need to go together at the end of the pipeline because they rely on the stability of incidence graph not changing in order to define their scheduling representation. But now the other thing to think about is, are there some transforms that may be easier to write on the higher level representation? I'm not sure. Right now Girsanov is written on the noise eqs, and that's probably the easier way to do that. Regular <-> Irregular jumping behavior, probably easier to gather the jumps and then do the swapping between them. Linearization for controls wants to be done on the lowered form (there's a nice generalization to stochastic and all of that too). So it is likely fairly common for the passes we are writing to want to come after lowering from the high level representation to the internal (noise_eqs, jump list, etc.) representation. This means we may want and easy way to call the full set of lowering passes separately from structural simplification in general, though structural simplification will need to lower before the DAE parts. |
That's a lot to unpack, but on the regular vs. non-regular jump issue I'd think it is easier to have users specify individual symbolic jumps (explicitly or as Brownians), and then aggregate them if one wants a RegularJump under the hood. This would be relatively straightforward to do right now, and even easier with the new There are also cases where one might not want to aggregate the jumps for leaping as one may need access to the individual jump rates/affects at different times. So those leaping algorithms would probably prefer the individual jump representation (for example, split-step methods as I previously mentioned). |
One notable thing to think about is what is left out of the ODCDASJDE System. There's a lot covered 😅, but there are some things specifically not covered:
|
I am slightly worried that throwing out different representations will make really hard to build packages on top of MTK (e.g. Catalyst), and that we are essentially forsaking the entire MTK as an IR and just turn it into a normal modelling package. Realistically, when is this planned for? Actually, adapting Catlyst to something like this could easily soak up all my available time for a year+. Right now I am barely holding on just trying to keep up with the normal rate of change. |
This won't solve the removing systems types thing, however, would it make sense to make |
Well I don't think it will actually change much because we effectively already do this, just half-assed 😅. The SDESystem code is almost a copy-paste at this point of the ODESystem code. Brownians with system actually just generates an ODESystem with brownians in the equations, and structural simplification then creates an SDESystem from that so it has a place to put the noise_eqs. That actually has a "bug" in some ways though with the sparse Jacobian construction: it cannot use the dependency graph from the structural simplification because SDESystem forgets to cache it while ODESystem does. So... The point is, we already do this, there's just some weird quirks of history where we don't. This will mostly just delete duplicated code and make things a lot simpler. This also means, when is this planned for? I don't know maybe it's like two months off? Again, it's not a huge change because there is already lots of shared dispatches, code duplication etc., it ends up mostly being a refactoring rather than an actual change. Most of MTK keeping "different system types" is effectively already faked at this point just for backwards compatibility.
No, the exact opposite. As I just described, you already have to do this with the IR right now, it's just the interactions between the system types are underdefined so you need to read the code to know the right way to do things. Brownians in ODESystem -> SDESystem? You have to read the code in order to know that's the "right" way to do things. ODESystem already has a With the change, everything is just a System, and there are standard queries for "only ODEs allowed", etc. It should simplify and document the IR much more simply because instead of a plethora of 16 different types where every combination of sharing 70% of the code (not an underestimate or exaggeration), we just have one. It should make it a lot less buggy too since we don't need to duplicate dispatches. This gives us two well-defined points. There's the high-level modeling interface: @possonian eta(A*B)
@brownian zeta(t)
eqs = [
D(A) ~ D(B) + eta + z
D(B) ~ -eta + zeta
0 ~ sin(z(k+1) +z(k))
]
sys = System(eqs)
sys = structural_simplify(sys)
prob = AbstractProblem(sys, [...])
sol = solve(prob) but then there's also the IR form, which is |
I will need some (lots!) of time to think this through. Ideally, it would be nice not to rush into thing,s and sure we are in a stable place before we move on to this (i.e. I'd prefer waiting til at least end of summer). I realise maybe part of the idea is to make MTK easier to maintain. However, I do think it would make sense to do some work on the test suite, figuring out which packages should downstream test which ones etc. So that when big change happens, we are more than ready for any potential consequences. |
Yeah that was the reasoning behind the SymbolicJump representation I mentioned in the other thread. It could give us a way to lower to a jump to So what we've been missing is just the Thinking of how this
@isaacsas and this effectively solves the problem that stalled the regular jump stuff for awhile. The biggest issue there was that allowing different regular jumps would be a much nicer model that would match the other jumps better, but it has always required an aggregated jump itself. It's always been a bit of an odd interface break, but there is no form of (rate, affect) per jump that can match anywhere near the performance of the RegularJump, it's like orders of magnitude different. But unlike ConstantRateJump, the space of allowed changes is a lot lower, so it really doesn't need to allow an affect! function, so it was then always this question of, what is the right specification we should use for the user and will this completely change the solver infrastructure? The real answer is that the jumps should just be symbolically modeled, and RegularJump solvers should always assume they have that aggregated function, and the building of that aggregated function can be done optimally with codegen. Trying to stuff a code-level representation of different Poisson jumps just isn't possible to make have good performance, and really the issue is that the previous idea was trying to mix user-level representation and solver-level implementation, when in reality they should be kept separate. |
Well we're definitely not rushing into this... we're 10 years into the process and we already started collapsing everything into a System type back in 2023 #2207. So the more proper thing to say is that after 8 years of building these interfaces, we sat down and whiteboarded this for about a year, figured out the ins and outs and tested a whole lot of things, and now have been in the middle of the redesign for about 2 years. That's hardly a rush 😅.
Now is really the time to do it because we're starting to do this combinatorial expansion of the feature set. The issue of the last 3 months has been this "SDEs promote to float but ODEs don't and we test it on ODEs but not SDEs", "ODE initialization has this hook in But it is rather straightforward to do the split here, since this will be the main aspect of the v10. The other things are relatively minor to most users but are good to do in terms of concretizing the definition of the IR and its proper interfaces (#3204). The big visible change will likely be this system unification. So Catalyst can stay on v9 for a bit if it wants, and we can backport bug fixes as needed. But again, the types of bugs you hit, like: # Create model. Checks when input type is `Float64` the produced values are also `Float64`.
rn = @reaction_network begin
(k1,k2), X1 <--> X2
end
# Checks when values are `Float32` (a valid type and should be preserved).
u0 = [:X1 => 1.0f0, :X2 => 3.0f0]
ps = [:k1 => 2.0f0, :k2 => 3.0f0]
oprob = ODEProblem(rn, u0, 1.0f0, ps)
osol = solve(oprob, Tsit5())
@test eltype(osol[:X1]) == eltype(osol[:X2]) == typeof(oprob[:X1]) == typeof(oprob[:X2]) == Float32 # These are all `Float64`
@test eltype(osol.t) == typeof(oprob.tspan[1]) == typeof(oprob.tspan[2]) == Float32 That's a structural problem which needs a structural solution, so we can patch a few things in v9 as needed to get your tests running but it's ultimately a waste of time. It needs this design change to actually be robust, so we should just change the design and then you update to v10 if you want that stuff to go away. That possibility to have those act differently is just a consequence of the v9 design. |
I'd say this is just a CI issue (and maybe there are structural issues that are causing it). It is a bug (?), the reason we caught it in Catalyst is that we actually test this kind of things. Had MTK/OrdinaryDiffEq actually had tests for this kind of things (or checked what went on downstream) this would have been caught and not an issue. Even right now, if |
That's really not understanding the issue. The Catalyst tests are already miniscule in comparison to the MTK tests. But there's a combinatorial set of features you can test w.r.t. system types. You just found something not covered in the SDESystem tests, cool. But is it covered in OptimizationSystem? DiscreteSystem? Every type for every system. Every combination. The actual problem that you are finding is that Catalyst does more extensively test and stress the SDEs more than anything else. That is because 95% of contributors spend 100% of their time only focusing on ODESystem and maybe NonlinearSystem. So those two are by far the most robust. But any change they make, you probably need to mirror that change to SDESystem. And if someone forgets, now that's a lingering bug. And if everybody just has to remember to do the right thing every single time, that will inevitably not work sometimes. ODEs will get more extensive tests, more extensive debugging, etc. The best thing to do is to change the system so that every ODE test is also an SDE test. The way to do that is make it the same piece of code. I don't see how you could think it makes anything easier for anyone. The most common PR review comment on MTK right now is to tell someone "oh you made a change? Do the same thing for XSystem for X in [SDESystem, OptimizationSystem, ...]". In order to contribute to MTK correctly, you do already need to learn what SDESystem is, because if you change ODESystem you better figure out what to do with SDESystem too. For example, look at #3514 adding CSE. This is exactly the kind of thing where you will come by a year and then say "why is XSystem slower?", oh because the CSE PR missed a system. Oops. Wait... why does adding a feature have to be copied 8 times, where if you forget Who is that easier for? For a new contributor, doing it to one system type would be much easier than having to learn about some system you've never used (OptmizationSystem) and then realizing that needed the same change too. For a maintainer, doing it on one system type is much easier than having to paste around the same fix and double/triple checking you got them all. Like, I don't understand how this is an argument, it's clearly easier for nobody.
What is with the fear mongering? It would take almost 0 changes for Catalyst to use this. I don't understand what is so scary here. In fact, Catalyst can adopt this in a completely non-breaking way just by find/replacing XSystem -> System. This literally would take you less than 5 minutes to do the whole repo. I will just do that myself and make the PR. Literally the code: using Catalyst, ModelingToolkit, Latexify
rn = @reaction_network Repressilator begin
hillr(P₃,α,K,n), ∅ --> m₁
hillr(P₁,α,K,n), ∅ --> m₂
hillr(P₂,α,K,n), ∅ --> m₃
(δ,γ), m₁ <--> ∅
(δ,γ), m₂ <--> ∅
(δ,γ), m₃ <--> ∅
β, m₁ --> m₁ + P₁
β, m₂ --> m₂ + P₂
β, m₃ --> m₃ + P₃
μ, P₁ --> ∅
μ, P₂ --> ∅
μ, P₃ --> ∅
end
odesys = convert(ODESystem, rn)
psymmap = (rn.α => .5, rn.K => 40, rn.n => 2, rn.δ => log(2)/120,
rn.γ => 5e-3, rn.β => 20*log(2)/120, rn.μ => log(2)/60)
u₀symmap = [rn.m₁ => 0., rn.m₂ => 0., rn.m₃ => 0., rn.P₁ => 20.,
rn.P₂ => 0., rn.P₃ => 0.]
# time interval to solve on
tspan = (0., 10000.)
# create the ODEProblem we want to solve
oprob = ODEProblem(rn, u₀map, tspan, pmap)
odesys = complete(odesys)
oprob2 = ODEProblem(odesys, u₀symmap, tspan, psymmap) would work exactly the same way if you change this line of code to System: https://github.com/SciML/Catalyst.jl/blob/v14.4.1/src/reactionsystem_conversions.jl#L507 . It's 3 characters! You have spent more time writing in this thread than the total number of characters to change for the whole repo. I will just do the PR, it's literally not worth the effort to argue about how we will have somebody find the time to do a 5 minute change. Now again, I would recommend that struct ODESystem end
export ODESystem and the v15 syntax will still work without a change to the user interface! So you don't need to change anything regardless of whether the system type is unified. |
Sorry, I thought this might be coupled with a generalisation of Sounds good, let's keep the loop open with this one, I agree it can make a lot of thing way easier for us (and Aayush especially!) |
Checklist
contributor guidelines, in particular the SciML Style Guide and
COLPRAC.
Additional context
Add any other context about the problem here.