Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: why is RMS weighted by arrival weight? #29

Closed
luca-s opened this issue Aug 17, 2022 · 8 comments
Closed

Question: why is RMS weighted by arrival weight? #29

luca-s opened this issue Aug 17, 2022 · 8 comments

Comments

@luca-s
Copy link
Contributor

luca-s commented Aug 17, 2022

I stumbled across this part of the code and I realized that the event RMS is weighted by arrival weight. While the weighting scheme has its interesting value, especially when comparing different NonLinLoc solutions this weighted RMS works as a score, I find the weighting to be an issue when comparing solutions across several locators or across different velocity models. In those cases I would rather have the RMS computed in the "standard" way, without arrival weights. That would make the value meaningful for comparison.

For this reason I would like to understand better why the RMS is computed the way it is.

Thanks.

@alomax
Copy link
Collaborator

alomax commented Aug 18, 2022

Good question. The basic answer is that an RMS that is not weighted by arrival weight (quality, pick uncertainty, travel-time error, ...) can be highly biased by bad or outlier data and so not very informative. A weighted RMS is used in Hypoinverse:

https://pubs.usgs.gov/of/1978/0694/report.pdf
image

However, there are complications in NLL depending on if the L2 GAU_ANALYTIC formulation of Tarantola and Valette 1982 http://www.ipgp.fr/~tarantola/Files/Professional/Papers_PDF/IP_QI_latex.pdf is used, in which case the weights are prior covariances on the picks and travel-time (Eq 10.9 in Tarantola and Valette 1982), and not sensitive to bad or outlier data.

Or a NLL EDT formulation, in which case the weights are the posterior contribution that the arrival makes to the EDT pdf stack. This is important because EDT intrinsically and efficiently down-weights outlier data so that their residuals can be very large, not including this posterior weight can give extremely large RMS in the presence of outlier data. But the EDT weights are a somewhat ad-hoc, hybrid of sums of probabilities, not simple covariances, so I doubt there is a clear or robust statistical basis for EDT weights and the resulting RMS. I tend to use the ellipsoid extent (len3 or se3, simple proxy for PDF extent) instead of RMS for filtering location results, along with sometimes number of readings, gap, or other prior measures.

In any case, the residuals are listed in the output, so an unweighted RMS can always be calculated.

Anthony

@luca-s
Copy link
Contributor Author

luca-s commented Aug 18, 2022

Thank you very much for taking the time for answering, this is all good information.

@luca-s luca-s closed this as completed Aug 18, 2022
@FMassin
Copy link

FMassin commented Aug 18, 2022

Interesting!

So, it is currently unfair to compare NLL RMS from other location methods such as those included in SeisComP?

@luca-s
Copy link
Contributor Author

luca-s commented Aug 18, 2022

@FMassin As @alomax wrote "the residuals are listed in the output, so an unweighted RMS can always be calculated" so we should consider doing so in SED NLL plugin for SeisComP

@alomax
Copy link
Collaborator

alomax commented Aug 18, 2022

The rms from the hypo71 output seems to be parsed here. But hypo71 does use an rms "corrected for average P & S residual"
https://pubs.usgs.gov/of/1972/0224/report.pdf
image

But, in general (i.e. always), I would suppose that statistics from two different procedures (or even the same procedure with very different input configurations) cannot be directly compared. Perhaps, for the case of hypocenter location, only the statistics between events with similar station distribution, proportion of P and S picks, etc, within a single location configuration (velocity model, ...) and location algorithm, can be directly compared.

@FMassin
Copy link

FMassin commented Aug 19, 2022

The rms from the hypo71 output seems to be parsed here.

I think this is for the SeisComP interface. The actual hypo71 code is in https://github.com/SeisComP/contrib-ipgp/tree/master/apps/3rd-party/Hypo71PC.

@alomax
Copy link
Collaborator

alomax commented Aug 19, 2022

Yeah - I was a bit confused, as the link in the e-mail version of your comment pointed to the SCP interface...

In any case, what hypo71 is doing with weights and AVRPS is not immediately clear
(FORTRAN!), but there is some possible weight XWT.

@FMassin
Copy link

FMassin commented Aug 19, 2022

Me too ! It took me a while to get it and edited my comment sorry! I'm still confused about this FNO variable that seem to be incremented by 1 for each data point anyway...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants