Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rewrite of Model.plot() #281

Merged
merged 5 commits into from
May 13, 2024
Merged

Rewrite of Model.plot() #281

merged 5 commits into from
May 13, 2024

Conversation

drvdputt
Copy link
Contributor

@drvdputt drvdputt commented May 7, 2024

First pull request for #280

  • Solves an issue where the plot would crash if no attenuation component was present in the model.
  • Allows some customization by passing keyword arguments.
  • Uses the tabulate() function of model, instead of dealing with astropy model classes directly (a few exceptions remain, but can be addressed once the Fitter API is pulled in)
  • Removes plot() from base.py

@jdtsmith
Copy link
Contributor

jdtsmith commented May 8, 2024

This looks good overall, thanks. I admit I'm still a tad bit confused about what role Model plays in the overall lifecycle. Here's my understanding of where we got to, please let me know if this agrees with your understanding:

  1. Science Pack: Read-only YAML file (though it can be edited directly), ingest to a Features table (light astropy.table subclass). Table can also be edited however you like using astropy.table capabilities.
  2. Model created from and wraps a Features table (or it can create one for you, or read a YAML file for you):
    • Has the capability to "intelligently" update said Features table: guess & fit (and maybe others).
    • A Model can save itself and (should have) restore itself from disk. Always "stores itself" as meta-data within the Features table (which astropy reads/writes for us).
    • A Model can tabulate sub-components of the physical model. E.g. "the full physical model", or "all the lines in the h2_lines group" or "just this one named dust feature". Might be nice to pass "sub-tables" to do this.
    • A model can plot any spectrum together with the (componentized) model.
  3. A Model needs to know which Fitter to use, in order to run fit using the correct one. This suggests a Fitter should be an (optional) __init parameter for Model (with a default). When inquiring for any information about the physical model, Model relies on Model.tabulate, which consults its self.fitter, e.g. for self.fitter.gaussian(params) (standardized by the Fitter API), etc.
  4. Instrument details are needed for the model to correctly tabulate, fit, etc. These arrive always "at the last minute", i.e. in terms of the Spectrum1D object(s) being fit, guessed, tabulate'd, etc. A Model has no longer-term awareness of the instrument.

@drvdputt
Copy link
Contributor Author

drvdputt commented May 8, 2024

This looks good overall, thanks. I admit I'm still a tad bit confused about what role Model plays in the overall lifecycle. Here's my understanding of where we got to, please let me know if this agrees with your understanding:

1. Science Pack: Read-only YAML file (though it can be edited directly), ingest to a `Features` table (light `satrapy.table` subclass).  Table can also be edited however you like using `astropy.table` capabilities.

I see the YAML file as concise instructions for how to generate the Features table. Specific edits that are not easily expressed in the YAML file (e.g. based on logic in your Python code), can be applied afterwards.

2. `Model` created from and wraps a `Features` table (or it can create one for you, or read a YAML file for you):

   * Has the capability to "intelligently" update said `Features` table: `guess` & `fit` (and maybe others).
   * A `Model` can `save` itself and (should have) `restore` itself from disk.  Always "stores itself" as meta-data within the `Features` table (which astropy reads/writes for us).
   * A `Model` can `tabulate` sub-components of the physical model.  E.g. "the full physical model", or "all the lines in the `h2_lines` group" or "just this one named dust feature".  Might be nice to pass "sub-tables" to do this.
   * A model can `plot` _any_ spectrum together with the (componentized) model.

That all sounds correct. Model knows how to deal with the Features table, the Spectrum1D data, and manages the Fitter and its inputs and outputs.

3. A `Model` needs to know which `Fitter` to use, in order to run `fit` using the correct one.  This suggests a `Fitter` should be an (optional) `__init` parameter for `Model` (with a default).  When inquiring for _any information_ about the physical model, `Model` relies on `Model.tabulate`, which consults its `self.fitter`, e.g. for `self.fitter.gaussian(params)` (standardized by the `Fitter` API), etc.

Yes, Model initializes and sets up the fitter, according to the information provided by the user and the Features table. At the initialization step, a subclass of Fitter needs to be chosen. I plan to do it as a hackable constant in the model.py module, i.e. FITTER = APFitter. Alternative options are a constructor argument or settable constant at the class level.

In my current implementation, a new Fitter object is constructed whenever it is needed. Every time a fit is performed, Fitter is initialized, and set up. It is then used to perform the fit, after which the results are extracted from the Fitter and written out to the Features table. The Fitter object could then be discarded in principle, but for diagnostic reasons I keep it around.

For tabulate, the above logic is reused to set up a temporary Fitter with a subset of the Features table (which now contains the fit results). In fact, the fit() logic already uses a subset as only features within the observed wavelength range are used. Instead of performing a fit, Fitter.evaluate_model() is called, and the numbers are packaged in the desired format by Model.

So technically no direct calls to self.fitter.gaussian(params) are needed in tabulate(). But I will still implement such evaluation functions, as they can be used by the different Fitter subclasses for additional consistency.

4. Instrument details are needed for the model to correctly `tabulate`, fit, etc.  These arrive always "at the last minute", i.e. in terms of the `Spectrum1D` object(s) being `fit`, `guessed`, `tabulate`'d, etc.  A `Model` has no longer-term awareness of the instrument.

This is how it currently works. In a few cases, I find this a minor annoyance. E.g. it is not always clear to the user why the instrument information is needed in some cases. Keeping a long-term instrument awareness for Model is a valid choice. But I feel it does not have major implications for the functionality, aside from simplifying some of the function calls.

@jdtsmith
Copy link
Contributor

jdtsmith commented May 8, 2024

  1. A Model needs to know which Fitter to use, in order to run fit using the correct one. This suggests a Fitter should be an (optional) __init parameter for Model (with a default). When inquiring for any information about the physical model, Model relies on Model.tabulate, which consults its self.fitter, e.g. for self.fitter.gaussian(params) (standardized by the Fitter API), etc.

Yes, Model initializes and sets up the fitter, according to the information provided by the user and the Features table. At the initialization step, a subclass of Fitter needs to be chosen. I plan to do it as a hackable constant in the model.py module, i.e. FITTER = APFitter. Alternative options are a constructor argument or settable constant at the class level.

I think keep it simple with a string-based keyword with a default (which may change at some point), e.g. fitter='SPFitter'. The model should instantiate an object from this class on init and reuse it IMO (see below). Then it's trivial to "try different fitters". BTW, you could easily make your APFitter have a couple subclasses, APFitterTrustRegion, etc. for the various flavors. Or pass an option to it (see below).

In my current implementation, a new Fitter object is constructed whenever it is needed. Every time a fit is performed, Fitter is initialized, and set up. It is then used to perform the fit, after which the results are extracted from the Fitter and written out to the Features table. The Fitter object could then be discarded in principle, but for diagnostic reasons I keep it around.

So every fit creates a Fitter? That seems like overkill, and might be an unnecessary slowness. The fitter can surely be instantiated on init and just get asked new questions (fit this, evaluate that, etc.). Some fitters might store intermediate values for speed on subsequent usage (with maybe a way to pass a live_dangerously flag (ok not really that) to the Fitter indicating: "nothing has changed here except the fluxes, feel free to re-use whatever stuff you cached from last time"). This could substantially improve performance for fitting many near identical spectra in a row.

For tabulate, the above logic is reused to set up a temporary Fitter with a subset of the Features table (which now contains the fit results). In fact, the fit() logic already uses a subset as only features within the observed wavelength range are used. Instead of performing a fit, Fitter.evaluate_model() is called, and the numbers are packaged in the desired format by Model.

That's pretty clever. Then we do need to normalize how Model throws out features that are "too far" (lines 5sigma I recall?). Now that you mention it, I do think it's smart to have this culling happen at the Model level, since fits will not be equivalent if they select different subsets of features to fit. I had/have some logic for this somewhere that was pretty clever, probably in my fitter branch; can take a look.

So technically no direct calls to self.fitter.gaussian(params) are needed in tabulate(). But I will still implement such evaluation functions, as they can be used by the different Fitter subclasses for additional consistency.

Now that I understand it, I suppose we don't have to; keeps the Fitter API simpler ("you must take a Features table with arbitrary number of features of standardized kinds, and do the right thing to evaluate it"). If we ever want live plotting during the fit, we'll need some kind of raw fast_plot, but perhaps each Fitter could implement that on its own using it's own messy guts, if it wants to.

We can take additional **kwargs in Model.__init to pass to the Fitter class (and/or just let users do my_model.fitter.live_plot = True and similar).

  1. Instrument details are needed for the model to correctly tabulate, fit, etc. These arrive always "at the last minute", i.e. in terms of the Spectrum1D object(s) being fit, guessed, tabulate'd, etc. A Model has no longer-term awareness of the instrument.

This is how it currently works. In a few cases, I find this a minor annoyance. E.g. it is not always clear to the user why the instrument information is needed in some cases. Keeping a long-term instrument awareness for Model is a valid choice. But I feel it does not have major implications for the functionality, aside from simplifying some of the function calls.

You can fit a Model with one instrument and apply it with another, since it is "just a physical model". Spectrum1D's need to have their instruments encoded in meta-data, then the user will "feel no pain", or even really know what happens behind the curtain. We can certainly provide some tooling to help users add instrument metadata to their Spectrum1D's, and encourage other tools to do so using our instrument taxonomy.

@jdtsmith jdtsmith changed the base branch from master to dev May 9, 2024 13:10
@jdtsmith
Copy link
Contributor

jdtsmith commented May 9, 2024

Note, I have retargeted this PR to dev; let's stage everything there and then you can test it before deploying to master.

@jdtsmith
Copy link
Contributor

Please take a look at the review comments above and let me know what needs addressing.

@drvdputt
Copy link
Contributor Author

drvdputt commented May 10, 2024

I have copied these comments to #280 , and will address some of them there when the Fitter API pull request is up.

If we ever want live plotting during the fit, we'll need some kind of raw fast_plot, but perhaps each Fitter could implement that on its own using it's own messy guts, if it wants to.

If we add more plotting options, like this example, we might eventually want to move the actual plotting code to a separate module. Model can still have utility functions like .plot(), of course, for the most common user-facing tasks.

I will now solve the merge conflict, after which we are probably good to merge this.

@jdtsmith
Copy link
Contributor

Did you see my reviews above @drvdputt? Pending those small changes is this ready to merge to dev? Hack day starting today; if we get dev assembled I can have my student do some testing on it.

@drvdputt
Copy link
Contributor Author

I do not see any comments except the main thread. Here is a screenshot.
Screenshot 2024-05-13 at 10-10-38 Rewrite of Model plot() by drvdputt · Pull Request #281 · PAHFIT_pahfit

@jdtsmith
Copy link
Contributor

jdtsmith commented May 13, 2024

image

Can't see that?

My fault I think (though others find this confusing): https://github.com/orgs/community/discussions/10369

@drvdputt
Copy link
Contributor Author

No I still can't see anything. I'm not sure if I'm familiar with the feature you're trying to use.

Copy link
Contributor

@jdtsmith jdtsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I neglected to "finish review". Some of these are more "observations" than "do something right now" comments.

pahfit/model.py Show resolved Hide resolved
pahfit/model.py Show resolved Hide resolved
pahfit/model.py Outdated Show resolved Hide resolved
pahfit/model.py Show resolved Hide resolved
pahfit/model.py Show resolved Hide resolved
pahfit/model.py Outdated
Comment on lines 449 to 456
# total model
model = self._construct_astropy_model(
inst, z, use_instrument_fwhm=use_instrument_fwhm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confused here: why is this called model? We are in Model, so shouldn't this be self? Or perhaps really we should be using self[.features].tabulate() to get "the full fitted current model to apply to some wavelengths`.

Also, doesn't a Model know its own redshift? That seems like a "physics detail", not one that is updated by fit, but certainly a physical piece of information. It's not in the Features table, but it should be "in the model". One can obviously update the redshift in the model if one wants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is this one of the vestiges that need to be removed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, it is ok to use the underlying astropy model. Will be replaced by Fitter eventually, or even better, just another tabulate call. It seems I'm only using this variable in one place, so I will do a quick check to see if there's an obvious simplification, and use tabulate instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, we don't want the Model to "know anything" about the Fitter it is using.

pahfit/scripts/plot_pahfit.py Show resolved Hide resolved
@jdtsmith
Copy link
Contributor

Sorry I had neglected to "submit" the review, which can't be done from mobile GH.

@jdtsmith
Copy link
Contributor

jdtsmith commented May 13, 2024

No I still can't see anything. I'm not sure if I'm familiar with the feature you're trying to use.

Usually I just do ala carte comments, this time (apparently) I "started a review". Note to self: pending means pending you to submit, not pending PR author to respond. Sigh.

@jdtsmith
Copy link
Contributor

Let me know when you think this one is ready.

@drvdputt
Copy link
Contributor Author

Just did some testing. The tests are failing in the expected place (unrelated to these changes, will be fixed once all the rest is done). Everything in the demo notebook is also running. I say we are good to go with this one.

@jdtsmith jdtsmith merged commit 68ce437 into PAHFIT:dev May 13, 2024
3 of 15 checks passed
@drvdputt drvdputt deleted the plot_rewrite branch June 25, 2024 17:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants