pc_align: improved sampling strategy, before/after stats reporting, and doc guidance on evaluating improvement #423

dshean · 2024-01-26T18:52:53Z

Is your feature request related to a problem? Please describe.
pc_align returns a lot of output to stdout, including some key metrics for evaluation of the transformation quality. New users don't know how to interpret all of this, or how to evaluate whether the final transform actually improved the alignment between their input datasets. Many just run the tool, and proceed with analysis, even though sometimes the transformation made the alignment between their datasets worse.

There is some limited information on evaluation in the current doc, but we should offer improved guidelines or recommendations for evaluation of the results.

https://stereopipeline.readthedocs.io/en/latest/tools/pc_align.html#interpreting-the-transform
https://stereopipeline.readthedocs.io/en/latest/tools/pc_align.html#error-metrics-and-outliers

We can also use more sophisticated sampling approaches to validate the improvement of the transformation.

Describe the solution you'd like
pc_align should report statistics for the "improvement" beyond just reporting the initial and final residuals.

Input: error percentile of smallest errors (meters): 16%: 0.604849, 50%: 2.01722, 84%: 3.62022
Input: mean of smallest errors (meters): 25%: 0.442947, 50%: 0.92913, 75%: 1.4753, 100%: 2.09795

and

Output: error percentile of smallest errors (meters): 16%: 0.690319, 50%: 1.67165, 84%: 2.68878
Output: mean of smallest errors (meters): 25%: 0.519557, 50%: 0.974861, 75%: 1.30205, 100%: 1.75889

There should be final lines of output summarizing stats on the difference between input and output residuals, computed on a point-by-point basis, and perhaps differences between the summary statistics...

We typically look at the difference in the median (50%) "before" and "after" numbers, plus the difference in the spread (so "84% minus 16% before" and "84% minus 16% after") of the distributions to evaluate improvement. These two numbers could be used as primary stats for success/improvement. pc_align should compute and displace the spread before and after.

I recommend that we change the terms "error percentile of smallest errors (meters)" and "mean of smallest errors (meters)". I realize pc_align throws out 25%, which is why "smallest errors" is included in these terms, but I think we can be more descriptive. Really, we're talking about "point distance residuals", not necessarily "errors", as some of the residuals could be due to real changes in some parts of the surface (e.g., glacier melt, vegetation change).

I think we should report stats for the "inliers" used during the "calibration" as well as the full sample of difference values. I realize this why two lines of output are provided, but I think we can improve how this is reported so it is easier for users to understand.

Personally, I would like to see a more sophisticated sampling approach that isolates random samples for calibration and validation. One way to do this would be to remove the initial 25% outliers, and then from the inliers, use a random subset (say, 80%) for the calibration and a random subset (say, 20%) for validation to independently check the result. Right now by default, we are using the same set of points for both calibration and validation, unless the user withholds samples before calling pc_align and then does their own validation independently of the tool.

We should at least include some newlines in pc_align output for improved readability, but I think it would be best to report these "improvement metrics" separately from (after) the main stdout stream (which includes runtimes, the transformation and other stuff that can be overwhelming for new users). Right now the numbrers that matter are buried. Basically, make it is easier for people to easily see relevant information and determine whether things worked.

The documentation for pc_align should have a section dedicated to interpreting the stdout (beyond just describing the metrics and recommending people visualize the output). Right now there is only this...

As such, a way of judging the effectiveness of the tool is to look at the mean of the smallest 75% of the errors before and after alignment.

As mentioned above, this is not what we typically use. I am open to other suggestions here on the best stats to use. Really, it might be best to compute signed (rather than absolute) residuals along local "down" direction (or normal to ellipsoid), as the absolute errors will potentially miss skewed distributions. This could be done as a final step for reporting, after minimizing absolute distances.

I think the doc should also mention how to review the observed translation magnitude and evaluate whether it is appropriate given the expected geolocation accuracy of the two inputs. For example, if aligning a WV DEM with expected horizontal/vertical geolocation accuracy of ~3-5 m CE90/LE90 and ICESat-2 points with expected horizontal/vertical geolocation accuracy of ~3/0.1 m, the combined translation magnitude should be <10 m. If the resulting magnitude is 200 m, then something went wrong, and the output should not be used for analysis.

Describe alternatives you've considered
We currently do this type of evaluation with custom scripts that ingest the csv files and/or pc_align output log to compute/extract relevant numbers and plot with Python scripts. Seems much better to have pc_align report this directly.

Additional context
Add any other context or screenshots about the feature request here.

The text was updated successfully, but these errors were encountered:

dshean · 2024-01-26T18:54:06Z

Tagging @ShashankBice, @rhugonnet and @adehecq for review, input and other ideas on potential improvements for pc_align.

rhugonnet · 2024-01-27T11:39:08Z

Hi @dshean, @oleg-alexandrov,

Hmmm, it's hard to generalize best practices for any case (hence the research on DEM coregistration I am doing right now). But I have thought a lot about this the past few years, and we've been in the process of adding similar things to xDEM.

Here are the 3 main "generic" ideas that would work for any scenario:

(Easiest to implement but computationally intensive) Use independent random subsamples (20+) and perform several independent coregistrations to evaluate the uncertainty in the coregistration transform: Joins your proposition of a calibration/test sample @dshean , but with more repeatability to derive uncertainty on the transform, and not have only residuals for calibration + test (that will likely be very similar, so might not help interpreting the goodness in fit). This is the easiest solution to implement without new tools, and helps getting an idea of how much the coreg can work in any case, but will largely overestimate the uncertainty compared to when using all samples at once (especially the least there are in the first place).
(Harder to implement but more helpful) Comparing mid-range correlation amplitudes on the residuals before/after coregistration (requires a spatial correlation analysis): The primary symptoma of misaligned data is not an increased spread in residuals (whether estimated with STD/NMAD/percentile, it's generally only a few % changes), but rather a big change in mid/long-range correlated errors on the residuals, that people usually interpret visually (we can see landforms when there's an uncorrect shift, a ramp when there's a tilt, etc; there are much larger correlation amplitudes with misalignment).

A practical example: I recently reviewed a paper trying to correct biases due to jitter or other spatial noises with terrain segmentation + statistical fits on the residuals. The method worked quite well with a lot of static surfaces (visually on the maps of residuals) but, when wanting to put a quantitative estimation of the improvement, the STD/NMAD decrease was barely visible at ~10%. However, when the authors used the mid- and long-range correlation in residuals (that I proposed to them, based on discussions in my 2022 paper on how to use correlation metrics to evaluate corrections), the improvement was more than 60%, sometimes close to 100%! 🥳
This is something that would also be very useful for you @ShashankBice @oleg-alexandrov to evaluate the improvement after jitter solve! 😉

It's basically just comparing the sill (correlated variance, Y axis) of the mid and long ranges found for the variogram model of the residuals, that can be derived like this: https://xdem.readthedocs.io/en/stable/basic_examples/plot_infer_spatial_correlation.html#sphx-glr-basic-examples-plot-infer-spatial-correlation-py.

(Ideal but not there yet) Propagate errors theoretically through the coregistration based on a modelling of the error structure (and validate it with simulation): This is what I am doing in our study... coming soon 🙂, we can come back to it here once it's done.

Hope this helps!

adehecq · 2024-01-30T10:26:29Z

Hi @dshean,
You're raising a good point and it's good that we discuss this indeed. With time, I've become used to looking at these stats, but they are certainly not very intuitive for new users...
I agree that it would be nice to have a simpler summary at the end so it would be more visible. It would also be good to better explain the translation magnitude.
Personally, for the kind of input errors I typically handle, I see a clear improvement of all these stats before and after, so it was always enough for me to tell whether or not the alignment worked. But I can see how it can be more difficult to tell for finer alignment.
I am happy with both suggestions, but indeed, @rhugonnet's suggestion to use spatial statistics seem promising. If someone is able to spend a bit of time on this, that would probably be the way to go!

dshean · 2024-02-03T01:48:05Z

Thanks for all of your thoughts. Will follow up in more detail later and discuss options with Oleg.

In the meantime, I created a little notebook to ingest and visualize the current pc_align output: https://github.com/dshean/demcoreg/blob/master/demcoreg/pc_align_output.ipynb

It's not the best example for a number of reasons, but a start.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pc_align: improved sampling strategy, before/after stats reporting, and doc guidance on evaluating improvement #423

pc_align: improved sampling strategy, before/after stats reporting, and doc guidance on evaluating improvement #423

dshean commented Jan 26, 2024

dshean commented Jan 26, 2024

rhugonnet commented Jan 27, 2024 •

edited

adehecq commented Jan 30, 2024

dshean commented Feb 3, 2024

pc_align: improved sampling strategy, before/after stats reporting, and doc guidance on evaluating improvement #423

pc_align: improved sampling strategy, before/after stats reporting, and doc guidance on evaluating improvement #423

Comments

dshean commented Jan 26, 2024

dshean commented Jan 26, 2024

rhugonnet commented Jan 27, 2024 • edited

adehecq commented Jan 30, 2024

dshean commented Feb 3, 2024

rhugonnet commented Jan 27, 2024 •

edited