Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarking: Discussion of quantifiable metrics for image registration accuracy for multimodal MSI datasets #2

Open
NHPatterson opened this issue Oct 20, 2021 · 2 comments

Comments

@NHPatterson
Copy link

NHPatterson commented Oct 20, 2021

Background

Aligning sets of coordinates and their pixel intensity data accurately between MSI and other modalities is often visually evaluated which is subjective but not uninformative. Other metrics like DICE and similar segmentation based metrics are informative, but are often relegated to macro-scale tissue features in what has been published.

There are computational challenges in that some metrics will not be able to assess accuracy for data with very large differences in scale (i.e., microscopy (sub-micron) and MSI) or will only evaluate them at the macro, tissue morphology level rather than the level of individual multi-cellular structures. Evaluation at the pixel-wise level is difficult1 because multi-scale may be very different:
image

For MALDI and microscopy, groups have used the laser ablation mark as a spatial reference of the true origin (a ground truth) of the MSI pixel in microscopy space23 but this does require additional experiments to capture the ablation mark image after MSI. For other MSI approaches, this is not feasible as there may not be anything left of the tissue or the method may not produce a microscopy measurable impact.

Proposal

Discuss! I can volunteer some data using the laser ablation mark (MALDI - microscopy) registration procedure, but I consider this ground truth for which automatic or ablation mark naïve data can be compared against. Nicely, it can be compared pixel-wise and registration error assessed globally and locally, but it is obviously data from the same section.

@claesenm proposed using the "visual" evaluation approach combined with multiple expert observers, This would involve having multiple transformations already and the best being selected from them by viewers looking at images. The best transformation could be computed as a weighted composite of the best ranked transformations.

Footnotes

  1. Anal. Chem. 2021, 93, 1, 445–477

  2. Anal. Chem. 2018, 90, 21, 12395–12403

  3. Nat Methods 18, 799–805 (2021)

@claesenm
Copy link
Member

I think using an ablation mark-based registration as ground truth is a very good idea. For benchmarking purposes, I would say that the fact these data stem from the same section are also advantageous, since it reduces a lot of sources of ambiguity for potential quantitative metrics.

Another way to define quantitative metrics could be based on easily identifiable biological structures across modalities (e.g., blood vessels). As part of the benchmark suite we could consider creating some automated pipelines to identify such reference structures for various modalities. I think this would be useful to a lot of people. This approach also works for consecutive sections with the usual caveats.

@AlanRace
Copy link
Member

I also like the idea of ablation mark data as a ground truth to enable comparison (and new development) of techniques which do not have such ablation marks (raster mode MALDI, DESI, SIMS, REIMS...). I wonder whether we can design an experiment which would enable us to generate a "ground truth" for serial section data as well. Perhaps we could acquire high resolution microscopy images of both sections prior to analysis, register these images and then transform the laser ablation crater coordinates using this transformation to make the "ground truth" for the serial section. Maybe this data already exists in some form so wouldn't require acquiring new data - perhaps in your autofluorescence paper @NHPatterson?

@claesenm Defined biological structures is also interesting - we might, however, have to consider the length scales of the data as at some pixel sizes we may not be able to resolve blood vessels. We might be able to think of a set of features suitable for a given sample type and pixel size range though. If we could compile that somehow into document describing how we recommend evaluating alignment for various sample and experimental setups I think that would be very useful. And then as you say, this could be the basis of an automated pipeline for identifying such structures (e.g. detect sample type and pixel size from metadata, then try and identify features appropriate for this setup).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants