Rewrite of Model.plot() #281

drvdputt · 2024-05-07T19:53:09Z

First pull request for #280

Solves an issue where the plot would crash if no attenuation component was present in the model.
Allows some customization by passing keyword arguments.
Uses the tabulate() function of model, instead of dealing with astropy model classes directly (a few exceptions remain, but can be addressed once the Fitter API is pulled in)
Removes plot() from base.py

jdtsmith · 2024-05-08T15:33:58Z

This looks good overall, thanks. I admit I'm still a tad bit confused about what role Model plays in the overall lifecycle. Here's my understanding of where we got to, please let me know if this agrees with your understanding:

Science Pack: Read-only YAML file (though it can be edited directly), ingest to a Features table (light astropy.table subclass). Table can also be edited however you like using astropy.table capabilities.
Model created from and wraps a Features table (or it can create one for you, or read a YAML file for you):
- Has the capability to "intelligently" update said Features table: guess & fit (and maybe others).
- A Model can save itself and (should have) restore itself from disk. Always "stores itself" as meta-data within the Features table (which astropy reads/writes for us).
- A Model can tabulate sub-components of the physical model. E.g. "the full physical model", or "all the lines in the h2_lines group" or "just this one named dust feature". Might be nice to pass "sub-tables" to do this.
- A model can plot any spectrum together with the (componentized) model.
A Model needs to know which Fitter to use, in order to run fit using the correct one. This suggests a Fitter should be an (optional) __init parameter for Model (with a default). When inquiring for any information about the physical model, Model relies on Model.tabulate, which consults its self.fitter, e.g. for self.fitter.gaussian(params) (standardized by the Fitter API), etc.
Instrument details are needed for the model to correctly tabulate, fit, etc. These arrive always "at the last minute", i.e. in terms of the Spectrum1D object(s) being fit, guessed, tabulate'd, etc. A Model has no longer-term awareness of the instrument.

drvdputt · 2024-05-08T16:17:52Z

This looks good overall, thanks. I admit I'm still a tad bit confused about what role Model plays in the overall lifecycle. Here's my understanding of where we got to, please let me know if this agrees with your understanding:
1. Science Pack: Read-only YAML file (though it can be edited directly), ingest to a `Features` table (light `satrapy.table` subclass).  Table can also be edited however you like using `astropy.table` capabilities.

I see the YAML file as concise instructions for how to generate the Features table. Specific edits that are not easily expressed in the YAML file (e.g. based on logic in your Python code), can be applied afterwards.

2. `Model` created from and wraps a `Features` table (or it can create one for you, or read a YAML file for you):

   * Has the capability to "intelligently" update said `Features` table: `guess` & `fit` (and maybe others).
   * A `Model` can `save` itself and (should have) `restore` itself from disk.  Always "stores itself" as meta-data within the `Features` table (which astropy reads/writes for us).
   * A `Model` can `tabulate` sub-components of the physical model.  E.g. "the full physical model", or "all the lines in the `h2_lines` group" or "just this one named dust feature".  Might be nice to pass "sub-tables" to do this.
   * A model can `plot` _any_ spectrum together with the (componentized) model.

That all sounds correct. Model knows how to deal with the Features table, the Spectrum1D data, and manages the Fitter and its inputs and outputs.

3. A `Model` needs to know which `Fitter` to use, in order to run `fit` using the correct one.  This suggests a `Fitter` should be an (optional) `__init` parameter for `Model` (with a default).  When inquiring for _any information_ about the physical model, `Model` relies on `Model.tabulate`, which consults its `self.fitter`, e.g. for `self.fitter.gaussian(params)` (standardized by the `Fitter` API), etc.

Yes, Model initializes and sets up the fitter, according to the information provided by the user and the Features table. At the initialization step, a subclass of Fitter needs to be chosen. I plan to do it as a hackable constant in the model.py module, i.e. FITTER = APFitter. Alternative options are a constructor argument or settable constant at the class level.

In my current implementation, a new Fitter object is constructed whenever it is needed. Every time a fit is performed, Fitter is initialized, and set up. It is then used to perform the fit, after which the results are extracted from the Fitter and written out to the Features table. The Fitter object could then be discarded in principle, but for diagnostic reasons I keep it around.

For tabulate, the above logic is reused to set up a temporary Fitter with a subset of the Features table (which now contains the fit results). In fact, the fit() logic already uses a subset as only features within the observed wavelength range are used. Instead of performing a fit, Fitter.evaluate_model() is called, and the numbers are packaged in the desired format by Model.

So technically no direct calls to self.fitter.gaussian(params) are needed in tabulate(). But I will still implement such evaluation functions, as they can be used by the different Fitter subclasses for additional consistency.

4. Instrument details are needed for the model to correctly `tabulate`, fit, etc.  These arrive always "at the last minute", i.e. in terms of the `Spectrum1D` object(s) being `fit`, `guessed`, `tabulate`'d, etc.  A `Model` has no longer-term awareness of the instrument.

This is how it currently works. In a few cases, I find this a minor annoyance. E.g. it is not always clear to the user why the instrument information is needed in some cases. Keeping a long-term instrument awareness for Model is a valid choice. But I feel it does not have major implications for the functionality, aside from simplifying some of the function calls.

jdtsmith · 2024-05-08T17:40:31Z

A Model needs to know which Fitter to use, in order to run fit using the correct one. This suggests a Fitter should be an (optional) __init parameter for Model (with a default). When inquiring for any information about the physical model, Model relies on Model.tabulate, which consults its self.fitter, e.g. for self.fitter.gaussian(params) (standardized by the Fitter API), etc.

Yes, Model initializes and sets up the fitter, according to the information provided by the user and the Features table. At the initialization step, a subclass of Fitter needs to be chosen. I plan to do it as a hackable constant in the model.py module, i.e. FITTER = APFitter. Alternative options are a constructor argument or settable constant at the class level.

I think keep it simple with a string-based keyword with a default (which may change at some point), e.g. fitter='SPFitter'. The model should instantiate an object from this class on init and reuse it IMO (see below). Then it's trivial to "try different fitters". BTW, you could easily make your APFitter have a couple subclasses, APFitterTrustRegion, etc. for the various flavors. Or pass an option to it (see below).

In my current implementation, a new Fitter object is constructed whenever it is needed. Every time a fit is performed, Fitter is initialized, and set up. It is then used to perform the fit, after which the results are extracted from the Fitter and written out to the Features table. The Fitter object could then be discarded in principle, but for diagnostic reasons I keep it around.

So every fit creates a Fitter? That seems like overkill, and might be an unnecessary slowness. The fitter can surely be instantiated on init and just get asked new questions (fit this, evaluate that, etc.). Some fitters might store intermediate values for speed on subsequent usage (with maybe a way to pass a live_dangerously flag (ok not really that) to the Fitter indicating: "nothing has changed here except the fluxes, feel free to re-use whatever stuff you cached from last time"). This could substantially improve performance for fitting many near identical spectra in a row.

For tabulate, the above logic is reused to set up a temporary Fitter with a subset of the Features table (which now contains the fit results). In fact, the fit() logic already uses a subset as only features within the observed wavelength range are used. Instead of performing a fit, Fitter.evaluate_model() is called, and the numbers are packaged in the desired format by Model.

That's pretty clever. Then we do need to normalize how Model throws out features that are "too far" (lines 5sigma I recall?). Now that you mention it, I do think it's smart to have this culling happen at the Model level, since fits will not be equivalent if they select different subsets of features to fit. I had/have some logic for this somewhere that was pretty clever, probably in my fitter branch; can take a look.

So technically no direct calls to self.fitter.gaussian(params) are needed in tabulate(). But I will still implement such evaluation functions, as they can be used by the different Fitter subclasses for additional consistency.

Now that I understand it, I suppose we don't have to; keeps the Fitter API simpler ("you must take a Features table with arbitrary number of features of standardized kinds, and do the right thing to evaluate it"). If we ever want live plotting during the fit, we'll need some kind of raw fast_plot, but perhaps each Fitter could implement that on its own using it's own messy guts, if it wants to.

We can take additional **kwargs in Model.__init to pass to the Fitter class (and/or just let users do my_model.fitter.live_plot = True and similar).

Instrument details are needed for the model to correctly tabulate, fit, etc. These arrive always "at the last minute", i.e. in terms of the Spectrum1D object(s) being fit, guessed, tabulate'd, etc. A Model has no longer-term awareness of the instrument.

This is how it currently works. In a few cases, I find this a minor annoyance. E.g. it is not always clear to the user why the instrument information is needed in some cases. Keeping a long-term instrument awareness for Model is a valid choice. But I feel it does not have major implications for the functionality, aside from simplifying some of the function calls.

You can fit a Model with one instrument and apply it with another, since it is "just a physical model". Spectrum1D's need to have their instruments encoded in meta-data, then the user will "feel no pain", or even really know what happens behind the curtain. We can certainly provide some tooling to help users add instrument metadata to their Spectrum1D's, and encourage other tools to do so using our instrument taxonomy.

jdtsmith · 2024-05-09T13:12:10Z

Note, I have retargeted this PR to dev; let's stage everything there and then you can test it before deploying to master.

jdtsmith · 2024-05-10T18:59:33Z

Please take a look at the review comments above and let me know what needs addressing.

drvdputt · 2024-05-10T19:53:15Z

I have copied these comments to #280 , and will address some of them there when the Fitter API pull request is up.

If we ever want live plotting during the fit, we'll need some kind of raw fast_plot, but perhaps each Fitter could implement that on its own using it's own messy guts, if it wants to.

If we add more plotting options, like this example, we might eventually want to move the actual plotting code to a separate module. Model can still have utility functions like .plot(), of course, for the most common user-facing tasks.

I will now solve the merge conflict, after which we are probably good to merge this.

jdtsmith · 2024-05-13T13:57:20Z

Did you see my reviews above @drvdputt? Pending those small changes is this ready to merge to dev? Hack day starting today; if we get dev assembled I can have my student do some testing on it.

drvdputt · 2024-05-13T14:11:00Z

I do not see any comments except the main thread. Here is a screenshot.

jdtsmith · 2024-05-13T14:19:14Z

Can't see that?

My fault I think (though others find this confusing): https://github.com/orgs/community/discussions/10369

drvdputt · 2024-05-13T14:32:39Z

No I still can't see anything. I'm not sure if I'm familiar with the feature you're trying to use.

jdtsmith

I neglected to "finish review". Some of these are more "observations" than "do something right now" comments.

pahfit/model.py

jdtsmith · 2024-05-08T15:15:56Z

pahfit/model.py

+        # total model
+        model = self._construct_astropy_model(
+            inst, z, use_instrument_fwhm=use_instrument_fwhm


Confused here: why is this called model? We are in Model, so shouldn't this be self? Or perhaps really we should be using self[.features].tabulate() to get "the full fitted current model to apply to some wavelengths`.

Also, doesn't a Model know its own redshift? That seems like a "physics detail", not one that is updated by fit, but certainly a physical piece of information. It's not in the Features table, but it should be "in the model". One can obviously update the redshift in the model if one wants.

Or is this one of the vestiges that need to be removed?

For now, it is ok to use the underlying astropy model. Will be replaced by Fitter eventually, or even better, just another tabulate call. It seems I'm only using this variable in one place, so I will do a quick check to see if there's an obvious simplification, and use tabulate instead.

Right, we don't want the Model to "know anything" about the Fitter it is using.

pahfit/scripts/plot_pahfit.py

jdtsmith · 2024-05-13T14:36:03Z

Sorry I had neglected to "submit" the review, which can't be done from mobile GH.

jdtsmith · 2024-05-13T14:36:50Z

No I still can't see anything. I'm not sure if I'm familiar with the feature you're trying to use.

Usually I just do ala carte comments, this time (apparently) I "started a review". Note to self: pending means pending you to submit, not pending PR author to respond. Sigh.

jdtsmith · 2024-05-13T17:32:47Z

Let me know when you think this one is ready.

drvdputt · 2024-05-13T17:36:58Z

Just did some testing. The tests are failing in the expected place (unrelated to these changes, will be fixed once all the rest is done). Everything in the demo notebook is also running. I say we are good to go with this one.

drvdputt mentioned this pull request May 7, 2024

Workarounds for issues with Features column definition and formatting #282

Closed

jdtsmith changed the base branch from master to dev May 9, 2024 13:10

This was referenced May 9, 2024

Piecewise/partial integration of drvdputt/experimental #280

Closed

Simplify feature bounds using np.nan instead of masking #283

Merged

Dries Van De Putte added 3 commits May 10, 2024 15:56

Replace Model.plot() implementation by method that uses tabulate()

5ce6067

scalefac_resid argument for plot() and use it in plot_pahfit script

1bbe11f

Remove old plot from PAHFITBase

99dda28

drvdputt force-pushed the plot_rewrite branch from 63c65d7 to 99dda28 Compare May 10, 2024 19:56

jdtsmith requested changes May 13, 2024

View reviewed changes

Dries Van De Putte added 2 commits May 13, 2024 13:17

Fix accidentally removed pahfit.units import

6a85e38

Tweaks to Model.plot() based on pull request PAHFIT#281 review

50f232c

jdtsmith merged commit 68ce437 into PAHFIT:dev May 13, 2024
3 of 15 checks passed

drvdputt deleted the plot_rewrite branch June 25, 2024 17:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rewrite of Model.plot() #281

Rewrite of Model.plot() #281

drvdputt commented May 7, 2024

jdtsmith commented May 8, 2024 •

edited

Loading

drvdputt commented May 8, 2024

jdtsmith commented May 8, 2024 •

edited

Loading

jdtsmith commented May 9, 2024

jdtsmith commented May 10, 2024

drvdputt commented May 10, 2024 •

edited

Loading

jdtsmith commented May 13, 2024

drvdputt commented May 13, 2024

jdtsmith commented May 13, 2024 •

edited

Loading

drvdputt commented May 13, 2024

jdtsmith left a comment

jdtsmith May 8, 2024

jdtsmith May 9, 2024

drvdputt May 13, 2024

jdtsmith May 13, 2024

jdtsmith commented May 13, 2024

jdtsmith commented May 13, 2024 •

edited

Loading

jdtsmith commented May 13, 2024

drvdputt commented May 13, 2024

Rewrite of Model.plot() #281

Rewrite of Model.plot() #281

Conversation

drvdputt commented May 7, 2024

jdtsmith commented May 8, 2024 • edited Loading

drvdputt commented May 8, 2024

jdtsmith commented May 8, 2024 • edited Loading

jdtsmith commented May 9, 2024

jdtsmith commented May 10, 2024

drvdputt commented May 10, 2024 • edited Loading

jdtsmith commented May 13, 2024

drvdputt commented May 13, 2024

jdtsmith commented May 13, 2024 • edited Loading

drvdputt commented May 13, 2024

jdtsmith left a comment

Choose a reason for hiding this comment

jdtsmith May 8, 2024

Choose a reason for hiding this comment

jdtsmith May 9, 2024

Choose a reason for hiding this comment

drvdputt May 13, 2024

Choose a reason for hiding this comment

jdtsmith May 13, 2024

Choose a reason for hiding this comment

jdtsmith commented May 13, 2024

jdtsmith commented May 13, 2024 • edited Loading

jdtsmith commented May 13, 2024

drvdputt commented May 13, 2024

jdtsmith commented May 8, 2024 •

edited

Loading

jdtsmith commented May 8, 2024 •

edited

Loading

drvdputt commented May 10, 2024 •

edited

Loading

jdtsmith commented May 13, 2024 •

edited

Loading

jdtsmith commented May 13, 2024 •

edited

Loading