-
Notifications
You must be signed in to change notification settings - Fork 46
Description
Hello,
When running the TITAN slide encoder in STAMP, I encountered an IndexError originating from vision_transformer.get_alibi(). After debugging, this appears to be caused by an incorrect inference of patch_size_lvl0, which collapses the inferred 2D patch grid to 1×1 and leads to invalid indexing using bg_mask.
I noticed a hard-coded assumption that one patch corresponds to 256 microns, which does not match the actual patch spacing used in the extracted features for my 40× slides (0.2525 mpp), grid spacing in pixels is 512 px .
Example debug output:
patch_features torch.Size([420, 768])
patch_coords torch.Size([420, 2]) tensor([[0, 0], [512, 0], [0, 512], [512, 512], ...])
patch_size_lvl0 1013
code in _generate_slide_embedding:
coords_tensor = torch.tensor(coords.coords_um, dtype=self.precision)
# Convert coordinates from microns to pixels
patch_size_lvl0 = math.floor(256 / coords.mpp) # Inferred from TITAN docs
coords_px = coords_tensor / coords.mpp
coords_px = coords_px.to(torch.int64)For 40× slides (MPP ≈ 0.25 µm/px), this yields patch_size_lvl0 ≈ 1013 px, while the actual coordinate stride is 512 px. This mismatch causes TITAN to infer a 1×1 grid, triggering:
IndexError: index 1 is out of bounds for axis 0 with size 1
I Inferred patch_size_lvl0 directly from coordinate spacing instead of assuming 256 microns:
This resolves the error and preserves the intended grid geometry.
My Questions are:
- Is the
256microns assumption intended for a specific TITAN preprocessing setup? - Shouldn't
patch_size_lvl0be derived from coordinate stride instead? - is it recommended to downsample to 20× before feature extraction, and does STAMP support downsampling?
Thanks!