Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calibration and interpretation of log_scale parameter is poor #138

Closed
mileslucas opened this issue Apr 15, 2021 · 25 comments · Fixed by #139
Closed

Calibration and interpretation of log_scale parameter is poor #138

mileslucas opened this issue Apr 15, 2021 · 25 comments · Fixed by #139

Comments

@mileslucas
Copy link
Member

Hi! a quick update on the emcee convergence issue. I have the following situation:

  1. I am working with flux-calibrated data
  2. I have a stellar model with flux computed at the surface of the star
  3. I am trying to fit for the log_scale parameter and use that to calculate the radius

The problem however is that within the spectrum model code, the log_scale is scaling the normalized emulator interpolated spectrum (mean flux) to match the flux calibrated data flux, and so the best log_scale doesn't fall within physically realistic radius values.

        # Only rescale flux_mean and flux_std
        if "log_scale" in self.params:
            scale = np.exp(self.params["log_scale"])
            fluxes[-2:] = rescale(fluxes[-2:], scale)

Additionally, say for example if my raw model flux (computed at the surface of the star) is about 10-20 dex higher than the flux calibrated data, and I use an optimum log_scale that visually works well to match the two, then the emcee posteriors for all best fit parameters appeared flat. The same happens when I do not fit for log_scale and simply let it autoscale (the following code). I understand this huge scaling offset between data and model is contributing massively to the covariance matrix.

        # Renorm to data flux if no "log_scale" provided
        if "log_scale" not in self.params:
            factor = _get_renorm_factor(self.data.wave, flux, self.data.flux)
            flux = rescale(flux, factor)
            X = rescale(X, factor)

The only situation where I get well constrained posteriors for all parameters is when I scale up the flux calibrated data to roughly match the level of the stellar model flux at the surface of the star before performing the fit.

This is an example of the retrieved Teff, both when I fit for a visually good log_scale with appropriate priors and also when I do not fit for log_scale at all and let it autoscale.
Screen Shot 2021-04-14 at 1 33 19 PM

This is an example of the same retrieved Teff, where I again fit for a visually good log_scale and also when I do not fit for log_scale at all, only here I've modified the input flux calibrated data to roughly match the level of the model flux at the surface of the star. Of course this is not the desired way to do this if I'm looking to properly solve for the radius of the object.
Screen Shot 2021-04-14 at 1 36 17 PM

I'm wondering if this version of the code stores the flux scaling value somewhere before everything gets normalized inside the emulator? Maybe I could call that value and use it within the spectrum model code to specifically fit for a log((R/d)^2)? Thanks!

Originally posted by @iyera22 in #136 (comment)

@mileslucas
Copy link
Member Author

So the automatic rescaling finds the ratio of integrated flux between the data and the model, its roughly equivalent to

factor = np.trapz(data, wl) / np.trapz(model, wl)
return model * factor

I would recommend running this (or calling the _get_renorm_factor function manually) and using that as a starting point.

As for the interpretability of the log_scale parameter, I'm not really sure what you are doing. I have never used calibrated spectra myself and I'm not actively doing spectroscopy anymore, so it's beyond my knowledge, unfortunately. Perhaps @iancze can help explain?

@iancze
Copy link
Collaborator

iancze commented Apr 15, 2021

I'm wondering if there might be a units issue going on here.

(Obviously) for the comparison both data and model should be in the same units. When dealing with non-flux calibrated spectra, it's easy to wrap any pre-factors for distance scale back into the normalization constant. This can also obscure more subtle points like the different sampling density if one spectrum is in counts/pixel and the other is in ergs/s/cm^2/A for example, and would also paper over any possible bugs in Starfish.

We used flux-calibrated spectra for the 2015 paper, so in principle this is definitely possible. But that solution w.r.t. the distance scaling was fairly hacky, so hopefully we (Miles) managed to fix it with the emulator improvements. To dig into what's going on, I would hold off on full-sampling for the moment, and try to bound the problem with some debugging data

  1. What are the units of your flux-calibrated spectra?
  2. If you integrate up all the flux (+ a possible bolometric correction), do you recover a synthetic photometric magnitude that makes sense compared to existing data?
  3. Given an estimate of stellar parameters, does Starfish return a model spectrum that makes sense (maybe Miles can point to an example here just querying a spectrum w/out needing to sample)? Are there any functional differences between the code that produces a single spectrum vs. how the sampling loop is implemented? Is it possible we're missing a conversion factor in one of the routines?

@iyera22 you are absolutely right that getting log_scale in the ballpark is the most important thing for doing the sampling correctly. Hopefully we can converge to the answer/bug solution quickly by bounding it w/ these examples. It's possible Starfish is doing the renormalization incorrectly relative to the emulator. The emulator source code is here for reference.

@aishaiyer
Copy link

Sure, thank you!

  1. I'm working with ergs/cm2/s/cm units for my flux-calibrated data
  2. Yes, upon integrating the flux I can go back and get a photometric magnitude that compares to the data I'm working with.
  3. A single call of the SpectrumModel object gives a model spectrum that looks reasonable in shape, and it appears to be scaled to match the flux calibrated data. So of course even a grid point model spectrum doesn't match a SpectrumModel call as the interpolated model is scaled to match the data.

Whether I include a log_scale parameter or not it uses either of the two code snippets I mentioned in my post above so I believe it is still scaling the mean flux calculated from the emulator to match the data first and then completing the reconstruction with the weights. I also think this is the same when implementing the sampling loop, I don't see a major difference overall.

So if I use a log_scale, that is again calling the rescale function from transforms.py that is essentially scaling up the mean flux from the emulator or simply using a factor value to calculate the offset between the mean emulator flux and the data following the code snippet from Miles' post above. I wonder if the 'true' scaling value is recorded somewhere before everything gets normalized within the emulator? I think after that point the SpectrumModel object only refers to the eigenspectra and the mean flux that is coming out of the emulator and so perhaps that initial scaling from the raw model spectrum flux gets forgotten somewhere?

Upon checking the emulator source code it is consistent in that the only outputs there are the eigenpectra, weights and the mean flux (+mean std). I was hoping maybe if this were modified to spit out the offset between the mean / normalized emulator interpolated spectrum from the original raw model flux from the grid (at the surface of the star) then I could add that to the log_scale parameter called within the Spectrum Model code when calculating the 'true offset' between the model and the data.

@mileslucas
Copy link
Member Author

mileslucas commented Apr 16, 2021

I believe it is still scaling the mean flux calculated from the emulator to match the data first and then completing the reconstruction with the weights

to be clear, the log_scale is a purely multiplicative factor. In the current modeling code you see it is only applied to flux_mean and flux_std, but then later you'll find

flux = (weights @ eigenspectra) * flux_std + flux_mean

so we can do some mental algebra and see that rescaling just the mean and std has the exact same effect as scaling the flux after reconstructing from the eigenspectra

flux = factor * ((weights @ eigenspectra) * flux_std + flux_mean)
# is equivalent to
flux = (weights @ eigenspectra) * (flux_std * factor) + (flux_mean * factor)

This happens every time inside the model evaluation (calling model()), which is what is called by the log-likelihood code (which is what is called by the built-in optimize code), so there is no opportunity for inconsistency there.

I was hoping maybe if this were modified to spit out the offset between the mean / normalized emulator interpolated spectrum from the original raw model flux

The current model is basically set up as either "you fit log_scale fully" or "you never fit log_scale". There's not really a straightforward way to automatically fit log_scale the first time you run the model but not any time after that. My advice is still to manually alter log_scale to taste (using diagnostics with model.plot()) or manually calculate the ratio (or integrated flux ratio) between emulator.flux_mean and your data and use that as a starting point.

@aishaiyer
Copy link

Hi, thanks for the clarification,

  1. Ok, so the factor term (manually calculated or not, using log_scale) is the ratio between the integrated fluxes of
    (a) mean flux from the emulator (flux_mean)
    vs
    (b) the flux calibrated data spectrum (self.data.flux inside the SpectrumModel class), correct?
    (as the emulator normalizes my original model grid spectra to 1 to remove any ‘uninteresting correlations’ as shown here:)
    (inside emulator.py)
        fluxes = np.array(list(grid.fluxes))
        # Normalize to an average of 1 to remove uninteresting correlation
        fluxes /= fluxes.mean(1, keepdims=True)

so is log_scale here really log(R^2/D^2)? What I am looking for is the ratio between the non-normalized grid model flux (my original model grid + model spectra for interpolated parameter values before the normalization in the emulator occurs) Vs the flux calibrated data (observed star) as that scaling will be helpful to calculate (R/D)^2 to get constraints on the radius. Am I missing something?

  1. The other issue I’m having is that if I do not bother to fit the log_scale at all (say I fit only for Teff, logZ, logg), this is the scenario: For example, if the integrated flux of my stellar grid model spectra at the surface of the star is roughly of order 1e13, and the integrated flux of my flux calibrated data is of order 1e-7 both in same physical units (ergs/s/cm2/cm), running the two through Starfish (with a properly converged emulator object, and then calling a SpectrumModel object and defining rough Teff, logZ, logg values and running that through MAP & emcee where the log_scale gets automatically calculated using the _get_renorm_factor function and the model spectra gets scaled accordingly to match the data) then I get badly constrained posteriors as shown above.
    -->The fits work fine ONLY when I go the opposite way (modify my data spectrum and not the model) where I do not modify the model spectrum at all, and instead scale up my flux calibrated data spectrum by a constant value to roughly match the level of the raw grid model spectra at the stellar surface (i.e. scaling up from the data spectrum from 1e-7 to 1e13) and then run both through Starfish without fitting for log_scale (where it would autoscale the normalized emulator model to now match the artificially scaled data i.e to 1e13 order of magnitude), then I get well constrained posteriors. Does this mean that in the former case the huge scaling difference between the original data spectrum (of order 1e-7) and original grid model spectrum (of order 1e13) is causing the covariance matrix to blow up despite log_scale autoscaling everything before computing the likelihood? Would you have an idea of what might be the issue here? I'm happy to send out my spectrum model+emulator with my grid model files separately to debug if needed.

I am somewhat set for now, as I'm currently following the ‘opposite approach’ where I un-dilute my flux calibrated data spectrum by a constant value so it matches the order of magnitude of my original model grid spectrum at the surface of the star and then perform the starfish sampling, not fitting for log_scale (and let it autoscale). This hacky way seems to work fine for a couple of targets I’ve tried and for the purposes of my work for now, but I hope there would be a more straightforward way of solving for the radius from the starfish fits in a future upgrade? Thanks!

@mileslucas
Copy link
Member Author

Okay, I think I've fully got my head wrapped around the problem-

When you create an emulator from a grid it becomes unitless because of https://github.com/iancze/Starfish/blob/fceb6cc41ddec569c95e4157c8406c0b586f6140/Starfish/emulator/emulator.py#L286
so log_scale cannot be interpreted physically.

I think the easiest way to resolve this would be to add an option to Emulator.from_grid to skip this normalization. @iancze do you remember if the emulator can perform alright without this pre-processing step?

@zjzhang42
Copy link

Based on my discussion with @iyera22 in a separate slack channel several days ago, I think the issue of log_scale might have something to do with the emulator.absolute_flux parameter or self.flux_scalar in the update_Theta() function of star_class.py:
https://github.com/gully/Starfish/blob/fe9879b67a12ddd1a101cf07c60de25bdb46a705/Starfish/emulator.py#L142

This was initially suggested by @gully in his mix_model_omega2 branch - see detailed discussions here: #38

If we would like to keep this feature for the new Starfish version -
When updating the Theta parameters during fitting, the physical parameter Omega should be thereby scaled by such a flux_scalar factor (interpolated at the current grid parameters) to produce the "correct" fluxes for reconstructed spectra. For example,
https://github.com/zjzhang42/Starfish/blob/eeed1f05370282ad6117f591871cd596d2711a8e/scripts/star_class.py#L346

Hope this helps.

@aishaiyer
Copy link

Thank you both! Yea that would be a helpful feature, although I'm wondering how significant the correlation can be if not normalized within the emulator reconstruction? How much would that affect the covariance matrix?

@iancze
Copy link
Collaborator

iancze commented Apr 19, 2021

Re: Miles' point about normalization: The PHOENIX spectral grid is provided in units of intensity at the stellar surface (if my memory serves me correctly) so there is intrinsically a very large change in amplitude from say 3200K to 7000K. While in theory you might be able to construct an emulator directly on the unnormalized spectra, it would make the PCA decomposition very inefficient. The largest variation in an unnormalized grid is due to the amplitude differences, so several PCA terms would need to be devoted to absorbing this amplitude change. That's the main reason why the each spectrum is normalized to 1 mean before PCA.

One solution to this is to keep track of the amplitude term as a function of Teff, logg, Z, etc... as the emulator is being constructed, and create a smooth interpolator object for these normalizations. Then, when the emulator reconstructs spectra (which have mean of 1), the amplitude rescaling can be applied after the fact and bring the product to the correct units. I remember @gully and I had discussed this at some point, but I forget if we ever actually implemented it (@gully must have needed it for his LkCa 4 project), and how that factored into Miles' rewrite. Hence the confusing issue we now have! My apologies for loosing track of this!

As it stands, it looks like the physical interpretation of log_scale is incorrect... it's merely a nuisance parameter. I think the suggestion by @zjzhang42 sounds like a reasonable approach to produce correct fluxes.

In the end, though, this is probably only worth it if you are additionally fitting photometry simultaneously with the spectra via synthetic photometry. Spectrophotometric flux calibration uncertainties are in general worse than photometric calibration uncertainties. So, unless you are fitting to additional photometric points, I would think that the estimate of stellar radius that comes from fitting spectra alone will have a significant uncertainty attached to it.

@mileslucas
Copy link
Member Author

mileslucas commented Apr 20, 2021

Okay, I'm playing around with a couple ideas-

my first was to use least-squares to fit the emulator weights to the (N_grid) factors, basically

norm_factors = fluxes.mean(1, keepdims=True)
norm_mean = norm_factors.mean()
Q = np.linalg.lstsq(emu.weights, norm_factors - norm_mean)[0]

If we look at the errors between the least-squares fit and the originals, things look okay, although some clear trends which I believe are due to how the iterator for the grid fluxes jumps between parameters

Screen Shot 2021-04-19 at 8 11 45 PM

Now, when we use the GPs to generate the weights, we can do a matrix multiply and generate a factor to return to the calibrated flux units

mu, cov = emu(params)
weights = np.random.multivariate_normal(mu, cov)
norm = weights @ W + norm_mean
# ...
flux = norm * raw_flux

I tested this with my SPeX PHOENIX emulator from the doc examples, and this is the fractional error between the calibrated PHOENIX spectrum and the reconstructed one
Screen Shot 2021-04-19 at 8 05 10 PM

Screen Shot 2021-04-19 at 8 07 44 PM

so, first off, there is some weird baseline difference between them at the 2% level, and the overall error is on the 10% level. This seems pretty poor, so I'll work on implementing an interpolator over the indices and see how that error compares

@gully
Copy link
Collaborator

gully commented Apr 20, 2021

I agree with everything @iancze said! I already implemented the interpolation of flux scalars as a term flux_scalars, which is implemented my (undocumented) mix_model_omega2 branch, and maybe slightly more legibly in the application to Brown Dwarf spectra that I wrote with Mark Marley and that has ZJ applied in recent publications:

https://github.com/gully/jammer-Gl570D

And to reiterate what Ian said, the principal application for this feature is either if you have (accurate) flux calibrated spectra or photometry, you're simultaneously fitting photometry, or you're fitting a mixture model of two spectra and the relative fluxes matter as in Gully-Santiago et al. 2017..

@gully
Copy link
Collaborator

gully commented Apr 20, 2021

In fact, Appendix A of Gully-Santiago et al. 2017 lays out the math for the implementation, see text starting with:

The reason for standardizing the flux is a matter of practicality: the number of PCA eigenspectra components in the spectral emulator scales steeply with the pixel-to-pixel variance of...

We tweaked the implementation since then to improve the accuracy for high dynamic range of temperatures.

@gully
Copy link
Collaborator

gully commented Apr 20, 2021

The interpretation of log_scale was previously noted as depending on the use case in #37

@aishaiyer
Copy link

Thank you all, this is definitely helpful and yes I agree with @iancze and @gully I am in fact working with flux calibrated spectra (hopefully accurate enough), so I am hoping this flux_scalar value can help me get the radius.

@aishaiyer
Copy link

Thanks all, I'm trying to revisit this issue, I have a question for @gully, I'm trying to test out the implementation from your branch especially 'star_marley.py' (I hope I'm looking in the right place, I'm a bit lost) and I needed clarification on which Starfish version is needed to support this. I gather the emulator.py file that includes the interpolation for the flux_scalar value is in version 2.0? could you confirm? I was also wondering I don't see a function to train the emulator in that version, could I get more guidance on how to get started? What's a more efficient way to include log omega in the sampling with v.3.0?

@aishaiyer
Copy link

Specifically, what I'm looking for is something similar to the example implementation of brown dwarf low res data that you mentioned and that @zjzhang42 has demonstrated in his paper, by fitting for log_omega as an additional physical parameter. Is there a branch compatible with v3.0 that includes this?

@mileslucas
Copy link
Member Author

@iyera22 the functionality Gully mentioned is not compatible with v0.3 of Starfish (yet).

@mileslucas mileslucas mentioned this issue May 26, 2021
4 tasks
@gully
Copy link
Collaborator

gully commented May 27, 2021

Hi @iyera22, I spoke with @zjzhang42 and he plans to make his version of my version of Ian's version of Starfish public. Lol, that's a lot of versions... @mileslucas 's version (v0.3) is incompatible with my version, and by extension ZJ's version. It looks like Miles is working to rectify these two versions. It sounds like you should connect with ZJ to get the brown dwarf version working, or wait for Miles' extended version. Does that address your question?

@iancze
Copy link
Collaborator

iancze commented May 27, 2021

@gully @zjzhang42 that's great to hear. Perhaps it makes sense for me to migrate iancze/Starfish to a Github organization like Starfish-dev/Starfish, and give you all maintainer access? I think it would be really great for the long-term health of the package if we could centralize development around forks and pull requests back to a single repository.

@gully
Copy link
Collaborator

gully commented May 27, 2021

Maybe! I'd say let's wait for ZJ's version to appear so we can see what differences there are? Or maybe we're talking about other bigger/broader scope and experimentation?
I've thought of making a new code that's a microservice for programmatic access to synthetic spectral models, some of which is implemented in Starfish already. The core Starfish functionality could be the spectral emulation and likelihood functions, with an abstraction layer for the raw model access. I'm not completely sold on this idea, just a thought I had recently... the resulting surgery on Starfish might cause other unintended consequences. But my point is that Starfish supporting the model access, the emulator, and the likelihood could be cut into pieces to make the code more sustainable, and a microservice for raw model access may be easier to maintain and extend for new entrants.

@aishaiyer
Copy link

Thank you! That would be tremendously helpful! I look forward to the release of @zjzhang42 's version and this potential new flexible version.

@iancze
Copy link
Collaborator

iancze commented May 27, 2021

@gully I'm on board with the idea of splitting Starfish into smaller, more maintainable modules focused on specific use cases, not unlike the exoplanet-dev organization. I think the grid access, emulator, and likelihood are great examples to draw the line, and something like a starfish-dev organization would be a great way to centralize all of this.

@mileslucas already did a lot of legwork to make everything more modular in v0.3, and I would envision this as building on that to its logical conclusion. This would also make it easier to rewrite, say, the likelihood component to be autodiff capable.

But I think even before that, centralizing around a single Starfish-dev organization would help coordinate development efforts.

@gully
Copy link
Collaborator

gully commented May 28, 2021

Ah I understand the proposition now, making a GitHub organization. Yeah, sure I think that's a great idea. Loosely affiliated collections of packages can be stuck there too. And experimental new versions can be stuck there too.

Also yes, autodiffable (or if someone has the guts to write out the analytical derivatives) is the way to go. One issue may be making the interpolation and resampling step autodiffable. I tried several times to figure this out and a few of those worked with different tradeoffs. But anyways that's a completely different issue unrelated to this one.

@zjzhang42
Copy link

I just notice that most of the Starfish scripts I used and modified are already in my branch: https://github.com/zjzhang42/Starfish/tree/ZJ_BD_v0 And I strongly agree that we should merge all these branches into a solid single package.

To illustrate how to combine all these separate scripts to complete a spectral fitting, I could provide the yaml and bash scripts as examples. All these would probably come after July, as I am now overwhelmed by an upcoming PhD defense.

@spencerhurt
Copy link

spencerhurt commented Jun 10, 2021

I'm working with @zjzhang42 on forward analyses of brown dwarf atmospheres this summer and have run into problems related to this issue. It was suggested that I share some details here.

I started by fitting PHOENIX models to M-dwarf spectra as a test case. Similar to the original issue raised @iyera22, I kept getting flatter-than-expected posteriors for any MCMC runs using Starfish v0.3. Specifically, the surface gravity and metallicity posteriors fill up the allowed parameter space and the global covariance hyper-parameters appear to be unconstrained. This behavior can be seen below in an example of the chains from a fit, where the global covariance parameters just wander. Besides providing bad fits, this often leads to problematic global covariance parameters that return errors (i.e. the covariance matrix is not positive-definite or nans or infs cannot be on the diagonal). Looking at the overall covariance matrix after optimizing the model shows that the flux uncertainties and global covariance are dominated by the emulator covariance matrix.

M4_Gl213_chains

Because the example spectrum included in the documentation provides good fits, I used this to test what could be going wrong. The initial fits provided nice, Gaussian distributions for each parameter and the total covariance matrix looks like the one below, where the flux uncertainties and global covariance are apparent on the diagonal.

no_scaling

I noticed that the example spectrum is approximately on the same scale as the normalized emulator spectra, giving a log_scale parameter or rescale factor of ~0. However, if I multiply the example spectrum by some scaling factor (akin to a unit change or an object at a further distance), the Starfish model then needs a non-zero log_scale and the MCMC fits begin to behave poorly again (consistent with what has been noted in this issue thread). Notably, the total covariance looks very different with a non-zero log_scale and is dominated by the emulator covariance, as seen below, agreeing with what I observed for my M dwarf tests. Because the global covariance is unable to make a significant contribution to the total covariance, it makes sense that these hyper-parameters wander so freely.

needs_scaling

At first, I thought that the emulator covariance was being scaled incorrectly, but tests show that it is scaled by np.exp(log_scale)**2 as expected. I then wondered if it were something such as a units problem, given that the model spectra are normalized from the emulator reconstruction. Using @zjzhang42's branch, I adapted the v0.3 code to rescale an emulator spectrum to the original spectrum's scale using flux_scalar (see this commit to my fork), which allows us to physically-interpret the log_scale parameter. However, the emulator covariance continues to dominate and my fits are still poor. This isn't surprising given that flux_scalar * np.exp(log_scale) is still scaling the emulator by the same amount as just np.exp(log_scale) alone when flux_scalar is excluded from the code. A corner plot below for a flux-calibrated M dwarf spectrum with the same units as the PHOENIX spectra using flux_scalar is shown below. The global covariance hyper-parameters are still wandering to arbitrary values, causing fits to fail early-on.

m4gl213_with_fluxscalar_corner.pdf

Perhaps I am completely wrong and the emulator covariance is supposed to be the primary source of uncertainty in our models. However, I am surprised that it would dominate so much given that the emulators for the PHOENIX spectra tend to be well-trained. Further, I do not understand why the emulator covariance matrix would grow so much larger than the flux uncertainties when the emulator model is being down-scaled to smaller values. I also think it's worth noting that after incorporating the flux_scalar parameter, the covariance matrices seem to be calculated the same way in v0.3 as in v0.2, so I'm not certain where a discrepancy could arise. Any input would be greatly appreciated!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants