Skip to content

Fix CoordinateSource for 11028 #1253

@garrettwrong

Description

@garrettwrong

User reported having trouble loading 11028, which is a centers coordinate source based on MRC files. There are two issues I found so far loading the data.

First, the mrc header is i) incorrectly reporting 3 values ii) reporting 2 distinct values via mrcfile, one with a small floating point error. This can be seen in the mrcfile(...).header. ASPIRE-Python code would have ignored (i), but (ii) causes ASPIRE's sanity check to fail and raises an error. If we expect to load files with bad header data, the code will need to be more permissive. Lets try emitting a log warning, and if that is annoying loading this data, an actual warning.

In the diff below I changed the value to vx.y because vx.x appeared to have a small error. Unsure what to do there in general. I suspect it does not matter outside of populating pixel_size which could be overrode later...

diff --git a/src/aspire/image/image.py b/src/aspire/image/image.py
index 92f43caa..58e3534e 100644
--- a/src/aspire/image/image.py
+++ b/src/aspire/image/image.py
@@ -866,9 +866,7 @@ class Image:
         # Convert from recarray to single values,
         #   checks uniformity.
         if isinstance(vx, np.recarray):
-            if vx.x != vx.y:
-                raise ValueError(f"Voxel sizes are not uniform: {vx}")
-            vx = vx.x
+            vx = vx.y

The second issue is my fault! Shame :D. (fe5a009). I previously extended this code to load TIFF data, which changed coordinates sources from using mrcfile to aspire.Image. All the logic for opening multiple files and dealing with pixel sizes was consolidated to Image. It would seem since that time we have not attempted to load any non-square micrographs. Image raises on non-square data. We do want to keep the ability to load multiple formats, since that was also an old request, but I don't want opening code duplicated all around. The lines below are the lines I changed to get current user's data to load.

diff --git a/src/aspire/source/coordinates.py b/src/aspire/source/coordinates.py
index 21fec57a..f4f2bf1a 100644
--- a/src/aspire/source/coordinates.py
+++ b/src/aspire/source/coordinates.py
@@ -308,7 +308,7 @@ class CoordinateSource(ImageSource, ABC):
 
         mrc_shapes = np.zeros((self.num_micrographs, 2), dtype=int)
         for i, mrc in enumerate(self.mrc_paths):
-            mrc_shapes[i, :] = Image.load(mrc).resolution
+            mrc_shapes[i, :] = mrcfile.open(mrc)._data.shape
 
         return mrc_shapes
 
@@ -469,7 +469,7 @@ class CoordinateSource(ImageSource, ABC):
         # their origin micrograph
         for mrc_index, coord_list in grouped.items():
             # Load file as 2D numpy array.
-            arr = Image.load(self.mrc_paths[mrc_index]).asnumpy()[0]
+            arr = mrcfile.open(self.mrc_paths[mrc_index])._data
 
             # create iterable of the coordinates in this mrc
             # we don't need to worry about exhausting this iter

We're not using any Image properties.... so one easy way to do this might be to have an internal Image._load that returns the raw image data and pixel size for any of the supported extensions. Then the load method just calls that and returns Image(im, pixel_size) line . The two source lines change to call the _load method. This is way less likely to cause trouble than attempting non-square images at this time. Only a few lines change, no new stuff, just breakup the existing code into two functions.

Of course we should add a test(s) for the rectangular case so that this doesn't regress again.

Metadata

Metadata

Assignees

Labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions