-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Equivalent width uncertainties are significantly lower in v2 #127
Comments
Copying from Slack: if you’re doing the nominal error on the profile-weighted flux from propagation of errors, that estimate is equivalent to assuming that the profile used is perfectly correct. If things are noisy e.g. the line width will be imperfect and that won’t be included in the error propagation. |
@ashodkh would it be possible for you to make your nice figure (comparing the The EW is derived simply as the ratio of the integrated line-flux and the continuum flux (with Gaussian propagation of errors on each of those quantities), so it would be helpful to figure out where the original issue arises. |
I am continuing to do tests using repeat observations, but I've tentatively reverted the change between Note that this means that |
@ashodkh @janewman-pitt-edu I reverted the flux inverse variance estimate to what was used in |
Hi @@moustakas , the link to the diff when you closed the issue allowed me to find the relevant code easily. I think there is indeed a bug. I think you wanted code equivalent to : An easy test of the formula is whether it gives the right result for an unweighted average all with equal errors, (e.g., we could set w_i = 1 , sum(w_i) = N, sigma_i = sigma_x ). The first one gives sigma^2 = N*(sigma_x^2) / N^2 = sigma_x^2/N which is the usual formula for standard error. The formula used in the code gives: sigma^2 = N*(sigma_x^2) / N = sigma_x^2 (edited) |
First, @janewman-pitt-edu, thanks for working out that I did indeed have a bug in my code which made the I've made progress on this ticket by simulating some data and I think I'm pretty satisfied with the results, although I welcome your input. In case you're interested, the code I wrote for these sims can be found here-- I created two types of simulations:
In both cases I:
For example, here's one realization from the first simulation class: ![]() And here's one realization from the second simulation class: ![]() In each panel, the vertical dashed lines indicate the In addition, Note:
I then ran a suite of 5000 Monte Carlo realizations for each case and generated the following summary figure: ![]() From this figure I conclude:
Finally, I compared the signal-to-noise ratio, ![]() |
@moustakas are you fixing the Gaussian sigma etc. in the fitting? It sounds like not (but maybe you meant you just used a fixed value for the simulations)? I think we really need to see how things scale at low S/N and when the parameters are left free. @ashodkh has shown pretty convincingly that the right error propagation formula (but not marginalizing over profile uncertainties) is way over-optimistic in that limit. |
With input from @dstndstn, we realized that determining the Gaussian-integrated emission-flux and uncertainty is analogous to the matched-filtering problem in photometry. Here's a quick write-up that has gone into the FastSpecFit paper and the code changes have been implemented in the ![]() |
As it happens I pointed Ashod to the Horne algorithm as another analogous (matched-filter-esque) case to look at last week :) Note that this is still making the assumption that the correct profile is perfectly known. At low S/N, that fails badly. The right thing to do is to marginalize over the profile uncertainty; do you get errors on the velocity dispersion from the procedures you are using (e.g. from chi-squared as a function of sigma)? If so, and if it remains too expensive to marginalize, it's likely not too bad to calculate the estimated flux at the +/- 1 sigma limits on the velocity dispersion, and then add the resulting error estimate in quadrature to the fixed-profile uncertainties you're calculating above. |
(If the PDFs are not Gaussian it's better to evaluate at +/- 1 sigma on dispersion than to calculate derivatives and use propagation of errors) |
Unfortunately my code does not return an estimate of the uncertainty on the line-width. The Also recall that I fit all the lines simultaneously using physically motivated constraints (e.g., the line-widths of the Balmer lines are all tied together), so it's not trivial to decouple the lines in order to get a per-line estimate of the variance in the best-fitting parameters. So in the spirit of gotta make a final decision and move on, I'm planning to run a few more tests and then simply document the various caveats. For the record, the data model includes both the Gaussian- and boxcar-integrated flux and uncertainty. End-users will simply need to make decisions about how they use the uncertainties for their specific science case (e.g., they can use my VAC to select specific samples of objects and then remeasure the fluxes and uncertainties using bootstrapping, another optimization algorithm, etc.) BTW, here are the updated sims, where (1) I corrected a bug in my input inverse variance spectrum; and (2) I now allow both the velocity center and line-width to be optimized / fitted, not just the amplitude. And I ran a range of sims with per-pixels signal-to-noise ratios of 3, 5, 10, and 15. All the sim outputs can be retrieved here (although I reserve the right to move or overwrite these files)-- Below, I highlight the From these results I conclude that:
S/N = 10 ![]() ![]() ![]()
S/N = 5 ![]() ![]() ![]() |
I think it is certainly out of scope for the current version, but you could perhaps apply some regularization before inversion to get a somewhat conservative estimate of the error on linewidth. Another option which would be fast and stable though less accurate than marginalizing would be to just estimate the second derivative of chi sq WRT velocity dispersion with your favorite estimator (cf. https://en.wikipedia.org/wiki/Finite_difference_coefficient#Central_finite_difference) , fixing all other parameters, to determine what delta-linewidth corresponds to delta-chi-sq = 1. |
(This follows from properties of maximum likelihood estimators, which least squares is a special case of) |
(thinking about this a little more, it's likely better to use the 68% cutoff for chi-squared distribution with DOF = the number of parameters you were fitting, delta-chi-sq = 1 corresponds to the case where linewidth is the only thing you fit) |
@moustakas Can you please clarify how you're making the histogram plots for the isolated line? I have been running similar simulations and finding that the profile-weighted uncertainty is indeed optimal, in that it agrees with bootstrapping. The box-car uncertainty is larger by a factor of ~1.3, which is proved to be the case for a Gaussian profile in https://ui.adsabs.harvard.edu/abs/1986PASP...98..609H/abstract. |
@ashodkh thanks for the nice confirmation. The histogram is just (noisy) flux minus true flux (input to the sim) divided by the inferred uncertainty-- |
@moustakas I remade the histograms with my sims, and I am not finding that the profile-weighted uncertainty is underestimated (the 1.23 stddev in your isolated line plots). Both profile-weighted and boxcar fluxes have correct uncertainties and no bias. I looked into your code and I couldn't find what caused the difference. ![]() |
@ashodkh I'm not sure why your sims are different. Are you refitting the emission line, including allowing the line-width and velocity center to vary? My code is there for you to inspect! I did confirm that the numbers reported in my plots are correct.
|
@moustakas My apologies, I did not notice that your updated sims also fitted for line-width and velocity center. Mine is only fitting for the amplitude. That could explain the difference. |
@ashodkh @janewman-pitt-edu I'm going to close this ticket as having resolved the primary issue, which ultimately a bug in the v2 flux inverse variance(s). Let's revisit after the new VACs are in place and continue to discuss ways of improving the uncertainties in the catalog. |
Yes, I agree -- next steps are for future versions :) |
I noticed that the EW error estimates between v1 and v2 have changed drastically. v2 has significantly lower EW uncertainties.
![Screen Shot 2023-05-16 at 2 10 46 PM](https://private-user-images.githubusercontent.com/81383507/239379627-0fd6e6a9-3ed6-4925-be1b-b73bea51c852.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwNDEzMDgsIm5iZiI6MTcxOTA0MTAwOCwicGF0aCI6Ii84MTM4MzUwNy8yMzkzNzk2MjctMGZkNmU2YTktM2VkNi00OTI1LWJlMWItYjczYmVhNTFjODUyLnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA3MjMyOFomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTQ5MDdjNjU3ZDQ5ZDU5NDhlZTUyYzgwZjgwY2U2ZjcyNmU4YTlkM2RhOGM1MTI5OWM2OTE3NTAwMjMwMTNlYWYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Gwv1P0jwSl1rH6C_z-1egM0vc4gb-C5n2Q-g9YSaULY)
This leads to 4-5 sigma detections of very low equivalent widths, even though the flux seems consistent with noise. Here are some examples:
https://fastspecfit.desi.lbl.gov/target/sv1-bright-27348-39627883852857837?index=1
https://fastspecfit.desi.lbl.gov/target/sv1-bright-27345-39627883852857767?index=1
The text was updated successfully, but these errors were encountered: