Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large SSA support #16

Closed
ChrisRackauckas opened this issue Mar 14, 2018 · 12 comments
Closed

Large SSA support #16

ChrisRackauckas opened this issue Mar 14, 2018 · 12 comments

Comments

@ChrisRackauckas
Copy link
Member

We need to work out how we want to do large SSAs. We can build hybrid algorithms over the RegularJumps, but @isaacsas mentioned that large SSAs want to be able to do re-ordering so that's not the right data structure. But our current setup with tuples is optimized for less than 16 jumps, so if we want to use ConstantRateJump for this then we need to somehow use FunctionWrappers and allow putting them in an array. However, there might be more information needed, in which case it might be useful to write SSAs on yet another jump type.

@isaacsas
Copy link
Member

For SSAs that require more information, maybe another option would be to just have the AggregatorAlgorithm for that SSA take the extra info through its constructor. Then the dispatched JumpProblem for that algorithm can setup a JumpAggregation with the necessary info. This wouldn't require any new jump types. Another option would be making the extra info required parameters for the dispatched JumpProblem associated with a specific SSA. This would potentially allow the use of these SSAs for other applications that have jumps, as long as the required extra info can be reasoned out...

Also, I don't think large SSAs necessarily need to reorder the vector of jump rates or affects; one could always internally maintain a permutation vector (ignoring if this has performance implications). However, they do need to be able to only update a small subset of jump rates and species populations each step. You want to avoid loops over all reactions or all species during each step.

@isaacsas
Copy link
Member

With regard to handling larger networks in the current codebase; I’ve now played around with using FunctionWrappers for both Direct and FRM with a larger network. (1D diffusion through hopping with equal rates between 50 lattice sites, with 10 molecules per site initially -- this gives 98 total reactions). All I did was replace the tuple for rates in direct.jl with a vector of FunctionWrapper wrapped rates, and replaced the affect! tuple with an affect! vector (but not function wrapped). You can see my code at https://github.com/isaacsas/DiffEqJumpExtensions. This example is in test/diffusiontest.jl.

It definitely makes a big difference over the tuple approach, but does seem to be a bit slower than BioSimulator. Benchmarks:

(key Direct2 = regular Direct with tuples, DirectVec = Direct with function wrappers, FRM = FRM with tuples, FRMVec = FRM with function wrappers)

Full solution paths saving:

Solving with method: DiffEqJumpExtensions.Direct2, using SSAStepper
6.989 s (2888363 allocations: 78.24 MiB)
Solving with method: DiffEqJumpExtensions.DirectVEC, using SSAStepper
12.926 ms (19000 allocations: 5.32 MiB)
Solving with method: DiffEqJumpExtensions.FRM, using SSAStepper
7.066 s (5822174 allocations: 137.61 MiB)
Solving with method: DiffEqJumpExtensions.FRMVEC, using SSAStepper
17.142 ms (18992 allocations: 5.32 MiB)

Saving at fixed times:

Solving with method: DiffEqJumpExtensions.Direct2, using SSAStepper and 25000 points
7.220 s (2911520 allocations: 86.58 MiB)
Solving with method: DiffEqJumpExtensions.DirectVEC, using SSAStepper and 25000 points
14.798 ms (34440 allocations: 13.62 MiB)
Solving with method: DiffEqJumpExtensions.FRM, using SSAStepper and 25000 points
7.188 s (5898882 allocations: 147.15 MiB)
Solving with method: DiffEqJumpExtensions.FRMVEC, using SSAStepper and 25000 points
17.508 ms (34523 allocations: 13.62 MiB)
BioSimulator, Direct, saving at 25000 points:
10.827 ms (111906 allocations: 7.27 MiB)
BioSimulator, FRM, saving at 25000 points:
10.781 ms (111906 allocations: 7.27 MiB)

I'd also like to try an approach like BioSimulator and Bionetgen, where there is just a global rate function that takes the network info and dynamically calculates a requested rate. I don't know why this would be faster for a smaller network, but perhaps for a system with thousands (or tens of thousands) of reactions it could be quicker to evaluate a global function with different parameters (the rate id) than to lookup and evaluate a dedicated function for each specific reaction...

@ChrisRackauckas
Copy link
Member Author

ChrisRackauckas commented Mar 16, 2018

What did you leave affect!s unwrapped? That would make the put two dynamic dispatches in there which might be hurting.

I'd also like to try an approach like BioSimulator and Bionetgen, where there is just a global rate function that takes the network info and dynamically calculates a requested rate.

Oh you mean like a stoich_rate(u,p,t,idxs,pidx) = p[pidx] * product(u[idx] for idx in idxs)?

Did you profile to see where all of the time is spent? I noticed for instance that this is missing an @inbounds: https://github.com/isaacsas/DiffEqJumpExtensions/blob/master/src/aggregators/directvec.jl#L41-L43

@isaacsas
Copy link
Member

I wasn't quite sure what type signature to use in FunctionWrapper for the affect! functions. For rates I used

RateWrapper = FunctionWrappers.FunctionWrapper{Float64,Tuple{typeof(u), typeof(p), typeof(t)}}

setting them up in the JumpProblem constructor. (Though perhaps I should replace the Float64 with the type of t.) If you can tell me what the corresponding wrapper type for affect! should be I'm happy to try that out too. I'm still learning Julia, so I wasn't sure how to extract the general type signature for an affect! function (which takes an integrator which is not constructed yet or available within the JumpProblem constructor).

I haven't tried using the profiler yet, but that is on the TODO list...

@ChrisRackauckas
Copy link
Member Author

It should just ignore the output, so it would be helpful to do

wrapped_f = x->(affect!(x);nothing)

and then you can type the return as Void.

@isaacsas
Copy link
Member

isaacsas commented Mar 16, 2018

What type should the input have? Or should I just use Any like:

FunctionWrapper{Void,Tuple{Any}}(wrapped_f)

And yes, something like you write for stoich_rate. Though it might make sense to manually calculate the product, and the function would need to handle things like homodimerization reactions for which we use

p[pidx]/2 * u[idx] * (u[idx]-1)

@ChrisRackauckas
Copy link
Member Author

You can use Any or DiffEqBase.DEIntegrator if you want a stricter subtype.

Though it might make sense to manually calculate the product, and the function would need to handle things like homodimerization reactions for which we use

Yup, that makes sense.

@isaacsas
Copy link
Member

Just to record my final benchmarking results:

I'm not seeing a big difference in benchmarking the Direct method with using any of a vector of affects!, FunctionWrappers.FunctionWrapper{Void,Tuple{Any}}, or FunctionWrappers.FunctionWrapper{Void,Tuple{DiffEqBase.DEIntegrator}}. I'm guessing this due to there only being one affect![i] function called within the affect callback each step, so it isn't the dominant cost in the Direct method, which makes many rate calls per step. The evaluation of affect! may only matter in more optimized SSAs that don't recalculate all the rates each step.

I put together a jump aggregator that uses a single global rate function and single global affect function for mass action reactions (similar to the BioSimulator design, but using vectors of vectors of pairs to store stochiometry instead dense/sparse matrices). For the direct method, on a 1D diffusion example with 510 reactions I find (DirectMA = mass action version, DirectVEC = function wrappers version)

tf = 20 seconds, N = 256 lattice sites
Solving with method: DiffEqJumpExtensions.DirectMA{Array{Float64,1},Array{Array{Pair{Int64,Int64},1},1}}, using SSAStepper and 25000 points
531.611 ms (25029 allocations: 52.39 MiB)
Solving with method: DiffEqJumpExtensions.DirectVEC, using SSAStepper and 25000 points
2.448 s (25029 allocations: 52.39 MiB)
BioSimulator, Direct, saving at 25000 points:
971.853 ms (2997953 allocations: 194.88 MiB)

tf = 200 seconds, N = 256 lattice sites
Solving with method: DiffEqJumpExtensions.DirectMA{Array{Float64,1},Array{Array{Pair{Int64,Int64},1},1}}, using SSAStepper and 25000 points
4.819 s (25029 allocations: 52.39 MiB)
Solving with method: DiffEqJumpExtensions.DirectVEC, using SSAStepper and 25000 points
21.878 s (25029 allocations: 52.39 MiB)
BioSimulator, Direct, saving at 25000 points:
7.625 s (3002321 allocations: 195.12 MiB)

So the general conclusion seems to be that for generic jumps/affects, it would be good to have Direct.jl transition to FunctionWrappers once the system has enough jumps. For mass action systems it makes sense to have a specialized approach that uses a single function for rates and a single function for affects.

@ChrisRackauckas
Copy link
Member Author

How's the implementation? Is it easy to abstract and use the two jump types in the same SSA? That would mean the interface gains some complexity, but at least if the backend is independent of how the rates are formed then it's still not too bad.

@isaacsas
Copy link
Member

isaacsas commented Mar 20, 2018

I don't think that would be tough. Right now I'm not using the affect!'s and rates that are passed into aggregate, but it wouldn't be much work to also include them in the JumpAggregation functors for the new mass action Direct method. It would just mean that after I loop through the mass action jumps and call the global rate function I would then loop through the constant rate jumps as in the original DirectJumpAggregation. I could set it up to use recursion if the latter are tuples, or a straight loop if they are a vector of FunctionWrappers.

You can see how I set it up for just mass action at:

https://github.com/isaacsas/DiffEqJumpExtensions

The relevant files are

  • aggregators.jl, where I define a DirectMA aggregator algorithm that takes the scaled rate constants, the reactant stochiometry, and the net reaction stochiometry.
  • directma.jl, which defines the JumpAggregation. Instead of calling the callback jump/affects!, the JumpAggregation stores the aggregator algorithm, and then calls global executerx! and evalrxrate functions that take the current population vector and the net stochiometry / reactant stochiometry.
  • massaction_rates.jl, which defines the executerx! and evalrxrate functions.

For the stochiometry representation I tried a couple interfaces. A vector of vectors of pairs seemed to work well. (Here a given pair maps the id of a species to the stochiometric coefficient of the species for a given reaction.)

The main issue I'm having is to figure out where the extra reaction info that the mass action routines need is passed in. An extra jump type would work, but what happens then for SSAs that want even more info (like a dependency graph)? Do we extend the jump types again, or for everything beyond stochiometry just pass this extra info through the aggregator algorithm or associated jump_problem constructor? Or should we have a MassActionSystem type that the new jump type stores an instance of, and which we can add fields to later on if needed?

@ChrisRackauckas
Copy link
Member Author

The main issue I'm having is to figure out where the extra reaction info that the mass action routines need is passed in. An extra jump type would work, but what happens then for SSAs that want even more info (like a dependency graph)? Do we extend the jump types again, or for everything beyond stochiometry just pass this extra info through the aggregator algorithm or associated jump_problem constructor? Or should we have a MassActionSystem type that the new jump type stores an instance of, and which we can add fields to later on if needed?

I'm not sure about the best way to do this. I think it may be impossible to plan that far ahead, and instead the code should be written for the current methods and keep expanding as needed. When we need it we'll probably think of the best solution, but trying to plan for it is really really hard since we don't even know what we need.

@isaacsas
Copy link
Member

isaacsas commented Mar 21, 2018

Fair enough; right now I'm planning to just implement a mass action jump type (like the regular jump type), which will be another field in the JumpSet that is passed to the JumpProblem constructor. That type can be expanded to hold a dependency graph when I get to the faster SSAs (which will only work with mass action jumps).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants