Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Juliacon2023 Revisions #373

Merged
merged 116 commits into from
Jan 4, 2024
Merged

Juliacon2023 Revisions #373

merged 116 commits into from
Jan 4, 2024

Conversation

gzagatti
Copy link
Contributor

This PR addresses the paper reviews by @gdalle and @mschauer, which were very beneficial to our paper. Many thanks to both of you!

Apart from the changes to the paper. I would also like to address each issue raised individually.

(As a side note, I have merged this PR with PR #372, so it might be advisable to first merge that PR.)

Review from @gdalle.

  • P1, Abstract: it is unclear how much of Coevolve and Alg 5 are "your contribution" vs adaptations from earlier works

I have modified the abstract to make our contribution more explicit and last paragraph in page 5. Our main contribution regarding the new algorithm, Coevolve, is the development of a queued thinning algorithm that steps in sync with model time bringing a few benefits.

Previous version of the thinning algorithm require the proposal of the next time to be confirmed before it can proceed to evolve time causing at least two issues. First, it increases the time-complexity of the algorithm. Let's assume that a traditional thinning algorithm takes on average P proposals before accepting the final one. Then for each of the D dependent sub-processes, we need a total of O(PD) steps for each round of thinning. Second, each time proposal needs to look ahead in time to evaluate the conditional intensity which makes it very difficult to couple TPPs with ODEs or any other stochastic process or unknown callback.

Coevolve reduces the number of time proposal rejections for each round of thinning. Once a new round starts, it makes one proposal for each sub-process O(D), after which model-time evolves. Then, we stop time for each proposal to determine whether we can accept or reject them. If we reject the proposal, we only need to make a single new proposal. Although, it is still the case that we need on average P proposals before accepting the final proposal for each sub-process. Now, we only need to accept a single proposal before triggering a new round of update. Therefore, on average we will make much fewer proposals than O(PD). The benefit of this approach is that we do not need to look ahead in time, so we can couple our process with any other process that might affect the conditional intensity. Additionally, we only need to evaluate the conditional intensity if time eventually reaches the proposal, so exotic intensities might not need to be evaluated as often.

Since these changes allow for more efficiently sampling and new simulation possibilities, we believe they go beyond adaptions from earlier work to make it work with JumpProcesses to stand as a new contribution.

  • P2, Eq 2.1: bounds are inverted in the integral

Fixed.

  • P4, Alg 3: comment on the meaning of u and v

Fixed.

  • P5, Alg 4: it isn't obvious at first why non-accepted events should still be added to the queue

There was a small bug in Alg 4, when $u > L^\ast(t)$, then $u \leftarrow L^\ast(t)$.

In any case, it is Alg 5 which is responsible for adding candidates to the queue.

The proposed candidates are always added to the priority queue because we need to stop at each proposed time in order. When the candidate is pre-rejected, we update the bounds and make a new proposal. Alternatively, if the candidate time has not been pre-rejected, we draw the acceptance threshold and compute the intensity rate to make a decision. If the candidate is accepted, we trigger a new round of thinning. Otherwise, we update the bounds and make a new proposal. Overall, we avoid unnecessary updates.

I have added a discussion of this point to the last paragraph of Section 4.

  • P5, Alg 5: line 15 should be highlighted as the place where synchrony happens

What would be the purpose of highlighting that line? I'm not sure what you mean by synchrony. Although line 15 is an important part of the algorithm, I believe it has to work together with the rest of the routine so I would like to avoid highlighting that line in particular lest it confuses readers.

  • P5: the concept of callback is not explained

Added an explanation on the first paragraph of Section 5.

  • P6: the twelve different aggregators would benefit from a table recap

I added a table to the Annex and re-wrote the paragraphs in Section 5 accordingly.

  • P6: why can the thinning aggregators not handle variable rates? it doesn't seem to be a theoretical limitation since Ogata's Alg 3 is designed for that

You are right. I qualified to say constant rate thinning aggregators.

  • P6: in sentences like "The latter implementation is sourced from [3] and follows Algorithm 5 very closely." or "The implementation of this aggregator takes inspiration from [2], and improves the method in several ways", your contributions become blurred with the original works (see remark about abstract)

You are right. I have refactored this paragraphs to make things more clear. Also, with expanded discussion at the end of Section 4 a lot of these sentences were removed.

  • P6: the way I see it, being able to integrate a jump process with diffeqs is the main appeal of main loop synchrony, and I would stress it more

Fixed.

  • P6: the notion of stepper is never defined

Fixed. Explanation added to the paragraph.

  • P8: "increasing expected node degree"... with the graph size

Fixed.

  • P9: does the length of the interval L where bounds hold have an impact on the rate of rejection? for Hawkes process IIRC it's a compromise

Yes, it does, but not to the point of stabilizing the rejection rate. I tried to change $L$ to stabilize the rejection rate but after some trial-and-error I didn't obtain a satisfactory result. Finding optimal ways of choosing $L$, $\bar{B}$, $\underline{B}$ would be a good area of future research which is mentioned in the conclusion. As the algorithm stands without any optimization on these variables, our it is already pretty competitive.

  • P9: why does Coevolve allow arbitrary saving where CHV doesn't?

Coevolve is synced with model time and does not look ahead. CHV operates on compensated time as you can see from Equation (4.8). Therefore, to stop at pre-specified times during the simulation you would have to perform root-finding which is not cheap. Otherwise, you can run the simulation capturing all the event times, then replay the simulation with discrete callbacks that stop at the realized events and wherever else you want which is not synchronous. In any case, none of these options are currently implemented by CHV.

Fine-grained saving, is one of the main advantages of Coevolve compared to CHV at the moment. However, we are currently thinking about how to allow arbitrary saving in CHV.

  • P10: would non-evolutionary point processes (typically spatial ones) also be supported in a "general point process simulation" library like the one you envision?

JumpProcesses already supports spatial point processes. For instance, check this tutorial. In the conclusion we mention that a more thorough comparison between spatial jump processes and spatial point processes could surface new algorithms that could be added to the library in the same vein that the exercise in this paper led to a new algorithm for TPPs.

Review from @mschauer.

  • You are conditioning on everything in Ht−, so the explanation in line 95 is off.

I didn't quite understand your comment here. But I have tried to re-write the paragraph to improve readability.

In line 95, I am trying to explain in words the meaning of $p^\ast (t)$, so it's indeed the case that $p^\ast (t)$ is conditioned on the history $H_{t−}$.

  • Line 124 needs "independently"

Fixed.

  • Thinning is exact and root-finding methods are some less exact than that, so maybe change the section title in 4.1?

I have added a footnote to the first mention of "exact methods". Although not all "exact methods" are completely exact since some of them rely on root finding approximation methods, we follow convention and denote all methods that describe the realization of each point in the process chronologically as "exact methods".

  • In (4.9) I use a Taylor polynomial on λ(t)≈a+bt<a′+b′t plus some padding to find a linear bound in ZigZagBoomerang.jl. The nice thing is that we can get Taylor approximations by automatic differentiation of the rate function and the amount of padding needed can be learned. Maybe you can consider this.

I had the chance to study your work, in particular the paper

Joris Bierkens, Paul Fearnhead, Gareth Roberts: The Zig-Zag Process and Super-Efficient Sampling for Bayesian Analysis of Big Data. The Annals of Statistics, 2019, 47. Vol., Nr. 3, pp. 1288-1320.

I think that the Taylor polynomial could be a promising avenue for automatically setting the upper boundary without user-input. I have added a discussion of this line of work in Section 4.2 where I mention the paper above in the context of non-fixed upper-bounds for thinning methods. I have also adapted Algorithm 5 to allow for non-fixed boundaries to allow for such approaches.

Going beyond the paper, I think there is great opportunity for cross-fertilization. The ZigZag Process could be simulated with JumpProcesses.jl and it would be interesting to perform a benchmark of the different methods. I am also wondering if you attempted to use other algorithms, such as thinning with intervals (like Coevole) or the CHV algorithm (for another take of an inverse method).

Fix #369 JuliaCon/proceedings-review#133.

gaurav-arya and others added 30 commits July 23, 2023 18:14
Add test for Coevolve calling user-provided funcs with t beyond tspan
The previous ExtendedJumpArray code contains code that implements part
of the broadcastable interface, but is not actually used due to the
entire interface not being included. This commit replaces that broadcast
code, reusing much of the arg packing/repacking.

Because of this, the current code uses the fallback indexing method when
broadcasting, which requires branching inside the broadcast loop. This
branch is likely well-branch-predicted, but the presence of the branch
prevents kernel functions from being fused into efficent SIMD
instructions.

This defines a ExtendedJumpArrayStyle, a broadcast style that includes
two sub-styles that represent the styles for broadcasting the `u` and
`jump_u` entries. `copyto!` overloads are specified that do the efficent
thing and call `copyto!`, using the broadcast computation tree, on `u`
and `jump_u` separately. Binary style rules are defined such that
broadcasts fall back to the old method if you do something like adding
an ExtendedJumpArray to a normal vector.

A Julia >=1.9 extension is added for FastBroadcast.jl, where the
equivilant of the `copyto!` call, `fast_materialize!` is similarlly
overloaded.
Add non-default broadcasting of ExtendedJumpArray's.
Pass `integrator` around to make adding support for spatial constant rate jumps easier.
Refactor internal functions in spatial SSAs.
I will need to change some of the functions in bracketing for spatial SSA RSSACRDirect. The tests will help me make sure that I do not break anything.
…code.

Co-authored-by: Sam Isaacson <isaacsas@users.noreply.github.com>
This is in preparation for the spatial SSA RSSACRDirect.
@isaacsas
Copy link
Member

@gzagatti this shouldn't be merging into master, but merging into the paper branch right?

@isaacsas
Copy link
Member

@gzagatti do you have a pdf of the paper with your changes highlighted so we can see what changed easily instead of having to reread the whole thing?

@isaacsas
Copy link
Member

@gzagatti have you confirmed that both reviewers have completed their reviews and you are clear on their comments / questions? Looking at the threads there seem to be some clarification questions you asked that weren't answered, and I'm not sure both reviewers actually finished their reviews. It would be good to make sure we have the full reviews and the full response before resubmitting anything.

@gzagatti gzagatti changed the base branch from master to juliacon2023 December 14, 2023 03:19
@gdalle
Copy link
Contributor

gdalle commented Dec 14, 2023

It would be easier to complete the reviews if we can regenerate the paper and docs with your changes, so I would suggest merging whatever they are first. What do you think?

@gzagatti
Copy link
Contributor Author

@isaacsas find the latest version of paper and
the same version with the diffs. I would recommend to check the diff'ed version to locate most of the fixes and check the cleaned version. Otherwise, I find it quite messy.

We can complete the review on our side before we request another check from the reviewers.

@gdalle I am happy to accommodate both ways.

@gdalle
Copy link
Contributor

gdalle commented Dec 15, 2023

@gdalle I am happy to accommodate both ways.

Then I'd say merge the whole thing and ping me so I do another review pass :)

@isaacsas isaacsas merged commit 43f0a15 into SciML:juliacon2023 Jan 4, 2024
4 of 5 checks passed
@isaacsas isaacsas deleted the juliacon2023 branch January 4, 2024 20:07
@isaacsas isaacsas restored the juliacon2023 branch January 4, 2024 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remarks about paper [JuliaCon Proceedings review]