#8 [3/4]: Implement model fitting with mgcv #20

zsusswein · 2024-06-30T20:50:23Z

Important

This PR a trial for using a stacked PR workflow. Do not merge.

Extends #18 to fit a GAM using the supplied formula and data. There are two implemented backends, mgcv::gam() and mgcv::bam(). Dispatch to the appropriate backend is handled via an internal S3 class -- the methods are exported but the generic is private so the methods are still internal to the package. Decent default arguments are supplied, with optional user-supplied arguments passed via a list.

There's also a new warning for legal but unwise parameterization.

codecov · 2024-06-30T20:53:01Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (5ee9a2c) to head (0283df9).

Additional details and impacted files

@@            Coverage Diff             @@
##              main       #20    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files            3         6     +3     
  Lines           87       268   +181     
==========================================
+ Hits            87       268   +181

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

R/fit_model.R

R/formula.R

seabbs

This all looks pretty good so far.

Having everything based on main makes it quite a pain. Are you just doing this to get the CI? Why not just update the CI to trigger on PRs against branches that aren't main.

My only concern here is about how opinionated this is getting - i.e. very. We might want that and in that case its fine but I do think we should be taking choices here that still allow people to use it for other things even if the default behaviour is very opinionated.

My other concern is there are a few notes about implementing other backends etc but what is coded here doesn't really seem set up for success there (unless success is lots of internal character string based switches which I assume it isn't). Is there an issue scoping out how extensions would be implemented?

zsusswein · 2024-07-02T20:28:35Z

Having everything based on main makes it quite a pain. Are you just doing this to get the CI? Why not just update the CI to trigger on PRs against branches that aren't main.

Yep this is just for CI. I'll switch the base before opening the PR. I think before updating CI rules we want to check in on whether the whole stacked PR thing is working.

My only concern here is about how opinionated this is getting - i.e. very. We might want that and in that case its fine but I do think we should be taking choices here that still allow people to use it for other things even if the default behaviour is very opinionated.

Agreed -- I think the push and pull here is really helpful. I don't want to be overly rigid with my view of the target user and target use-case, but also I want to avoid stepping into the territory of EpiNow2/Epinowcast/EpiAware. Those are closer to custom model foundries than I think this package should be. That doesn't we should prevent any/all customization (and the current implementation is likely too constraining), but I do want to hear a convincing use-case for e.g. custom model formula and arguments. What would the user flow be? Who is this user? Does this package provide enough benefit over handrolling in mgcv or jumping right to EpiNow2 for us to optimize for their use-case?

My other concern is there are a few notes about implementing other backends etc but what is coded here doesn't really seem set up for success there (unless success is lots of internal character string based switches which I assume it isn't). Is there an issue scoping out how extensions would be implemented?

The current flow is here is that RtGam() fits the model and is the class constructor. I'd strongly prefer we avoid the public data constructor -> public model constructor flow, but perhaps an internal S3 class that switches would be possible? Issue to discuss is #21

seabbs · 2024-07-02T20:35:22Z

I think before updating CI rules we want to check in on whether the whole stacked PR thing is working.

Agree but from a review perspective the current setup of having to wade through all the old commits is very bad which makes it hard to know if stacked PRs are working

Those are closer to custom model foundries than I think this package should be. That doesn't we should prevent any/all customization (and the current implementation is likely too constraining), but I do want to hear a convincing use-case for e.g. custom model formula and arguments. What would the user flow be? Who is this user? Does this package provide enough benefit over handrolling in mgcv or jumping right to EpiNow2 for us to optimize for their use-case?

Yes I agree. I think the main user is NNH right? The question is do we want to ever on the fly do things like use a Poisson observation model because something isn't working or try some settings tweaks in bam because again something is a problem.

If you have to leave the package to make any of these tiny tweaks you are using all the benefits of the good defaults and the nice pre and post-processing

constructor flow, but perhaps an internal S3 class that switches would be possible? Issue to discuss is #21

zsusswein · 2024-07-02T20:37:43Z

Sure, but this PR isn't open for review yet as I note at the top?

seabbs · 2024-07-02T20:39:16Z

Sure, but this PR isn't open for review yet as I note at the top?

Fair enough. @seabbs out for now then!

Allow optional switching between gam() and bam(), both for functionality now and to illustrate how one might extend to different backends in the future.

R/RtGam.R

R/fit_model.R

seabbs

This looks good. It looks like the conclusion of #21 is to allow for them in the future which I think makes sense.

A few points for discussion.

... vs a list + does this allow actual default args to fit_model vs an internal list?
Having the class check be in fit_model.default vs when preparing the class. Doing so allows for user extension at the cost of control and potentially some opaque errors (though not at the moment).

zsusswein · 2024-07-08T13:45:54Z

... vs a list + does this allow actual default args to fit_model vs an internal list?

Addressing here

Having the class check be in fit_model.default vs when preparing the class. Doing so allows for user extension at the cost of control and potentially some opaque errors (though not at the moment).

I think the counterpoint to consider is what to do if/when we need to move to S3 for other steps (I'm thinking predict and maybe formula would be needed). Centralizing the check for backends into one place adds clarity I'd think, at the cost of extensibility as you noted.

seabbs · 2024-07-08T16:30:02Z

(I'm thinking predict and maybe formula would be needed)

If you did do this and backends didn't have methods these would by default throw an informative error (but not a incredibly pretty one) if you just had no default method

It makes documentation slightly more straightforward

Allowing future extensions.

seabbs

Overall looking good a few points to clear up about method dispatch etc. Can you summarise some of your responses/decisions to prior comments? I have a few times just flagged the same thing and looking back at your answers I am not clear that the decisions have all been justified.

R/RtGam.R

R/fit_model.R

R/formula.R

kgostic

No major comments. Documentation is pretty verbose, but I think that's probably a good thing. My only comment is that if you're writing that much, might be a good idea to refer back to a reference in case people want even more, or in case the methods/best practices evolve.

R/formula.R

@seabbs

As suggested by @seabbs. This allows ... to be evaluated and handles dispatch.

seabbs

Nice!

Has basic info but still needs model fit diagnostics

Co-authored-by: Sam Abbott <contact@samabbott.co.uk>

By pointing the public function to the slot with the stored diagnostic list. Re-order `RtGam()` to use the public function for diagnostic reporting.

zsusswein added the v0.1.0 label Jun 30, 2024