Skip to content

fix: fix ImageDataset axis order and add tests#105

Merged
KenyaOtsuka merged 6 commits into
devfrom
fix-image-dataset
May 14, 2026
Merged

fix: fix ImageDataset axis order and add tests#105
KenyaOtsuka merged 6 commits into
devfrom
fix-image-dataset

Conversation

@ganow
Copy link
Copy Markdown
Contributor

@ganow ganow commented Apr 14, 2025

Problem

ImageDataset had three bugs, with no tests to catch them:

  1. Wrong axis order in returned image array
    __getitem__ returned images in HWC format (H, W, C), but PyTorch's DataLoader and most deep learning models expect CHW format (C, H, W). This caused shape mismatches when feeding images into networks.

  2. Incorrect auto-detection of stimulus names
    When stimulus_names=None, file stems were extracted using a custom _removesuffix helper instead of Path.stem. This was unnecessarily verbose and fragile.

  3. Non-deterministic order of auto-detected stimulus names
    When stimulus_names=None, Path.glob() was used without sorting, so the order of stimulus names was filesystem-dependent and not reproducible.

Fix

  • Added .transpose(2, 0, 1) to convert image arrays from HWC to CHW before returning.
  • Replaced the _removesuffix-based list comprehension with path.stem.
  • Added sorted() to ensure auto-detected stimulus names are always in alphabetical order.

Tests

Added tests/dl/torch/test_dataset.py with 9 test cases:

  • test_getitem_returns_chw_shape — verifies CHW axis order using a non-square image (H≠W) to fully discriminate every axis
  • test_getitem_preserves_channels — verifies per-channel values are correctly mapped after transpose
  • test_dataloader_integration_batch_shape — end-to-end check via DataLoader (the original failure path)
  • test_value_range_normalized_to_unit_interval — verifies pixel values are in [0, 1]
  • test_len_matches_stimulus_names — verifies __len__
  • test_explicit_stimulus_names_respected — verifies explicit stimulus_names are used as-is
  • test_explicit_stimulus_names_preserve_input_order — verifies input order is preserved when stimulus_names is given
  • test_auto_detected_stimulus_names_use_stem — verifies file stems are used when stimulus_names=None
  • test_auto_detected_stimulus_names_are_sorted — verifies alphabetical ordering of auto-detected names

Also removed the empty TestImageDataset stub from test_torch.py.

@ganow ganow marked this pull request as ready for review April 14, 2025 13:02
@ganow ganow added the bug label Apr 14, 2025
Comment thread bdpy/dl/torch/dataset.py Outdated
@ganow ganow changed the base branch from main to dev October 24, 2025 11:18
@ganow ganow force-pushed the fix-image-dataset branch from cb5aeb3 to d1c0b5c Compare May 13, 2026 05:27
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 13, 2026

Coverage

Coverage Report
FileStmtsMissCoverMissing
bdpy/bdata
   bdata.py39919551%79, 104, 109, 113, 118, 122, 132–134, 190, 233–239, 252–262, 276–277, 310, 314, 318–356, 405–411, 419–420, 425–426, 443–450, 468–469, 475, 508, 539, 548, 560, 589–598, 610, 625, 661, 683–691, 696–729, 738, 750–757, 761–767, 771–799, 803–824, 828–862, 866–868, 872–874, 878–887
   featureselector.py641281%62–67, 69–74
   metadata.py67199%84
   utils.py1133767%71, 82, 85–86, 95, 127–173, 201, 246, 258, 263
bdpy/dataform
   datastore.py1078521%59–75, 90–93, 97–98, 102–113, 116–119, 122–127, 131–132, 137–158, 190–197, 222–259, 262–265
   features.py29816545%31–32, 43–46, 90–92, 101–103, 107, 111, 115, 119, 152–153, 157–161, 168–197, 214–215, 224–225, 232–236, 274, 288, 305–319, 323, 327, 331, 335, 339, 343, 347, 351, 355, 359, 364–394, 398–418, 422–462, 465, 470–477, 491–493, 496–499, 502–505, 508–512, 515–516, 536–549
   kvs.py1771393%21, 24, 114, 118, 127–131, 171, 173, 185, 254, 282
   pd.py9544%25–27, 43–44
   sparse.py67790%29, 52–58, 74, 109, 123
   utils.py12120%3–18
bdpy/dataset
   utils.py45450%3–98
bdpy/distcomp
   distcomp.py921880%33, 35, 49, 53, 55, 66–70, 74, 76, 81–82, 89–93, 97
bdpy/dl
   caffe.py60600%4–129
bdpy/dl/torch
   base.py432444%31–41, 48, 54, 60, 63, 73–83, 90, 96, 102, 105
   dataset.py744046%37–39, 67–72, 75, 78–88, 122–130, 133, 136–149, 196, 199
   models.py33322632%148–169, 297–316, 327–331, 345–350, 442–494, 515–517, 528–587, 611–614, 625–684, 708–711, 722–771, 790–793, 804–853, 872–875
   torch.py1215555%188–225, 228, 231–243, 246–281
bdpy/dl/torch/domain
   core.py46296%47, 63
   feature_domain.py24196%30
   image_domain.py81396%91, 94, 257
bdpy/evals
   metrics.py954553%49–53, 82–112, 130–142, 151–152, 157, 172–179
bdpy/feature
   feature.py30293%69–70
bdpy/fig
   __init__.py550%6–10
   draw_group_image_set.py90900%3–182
   fig.py88880%16–164
   makeplots2.py2632630%1–608
   makeplots.py3363360%1–729
   tile_images.py59590%1–193
bdpy/ml
   crossvalidation.py592754%47–48, 113–114, 117–118, 138, 164–196
   learning.py3139769%9, 47–48, 52, 56, 63, 95–108, 113–129, 132, 162–174, 188–213, 297, 313, 317–319, 322–323, 333, 343–344, 349–350, 360–368, 371–372, 380, 415–422, 443, 456, 464, 473, 505–507, 546, 559, 562, 571, 580, 585, 606
   model.py14012014%29–39, 54–70, 86–144, 156–169, 184–222, 225, 230–250, 254–258, 271–285
   searchlight.py161319%32–51
bdpy/mri
   fmriprep.py4974519%25–34, 38, 44–62, 65–75, 78–89, 92–160, 163–194, 230–360, 367–380, 384, 388–390, 394, 398–400, 410–434, 437–454, 457–464, 471–472, 475–491, 494, 498, 502–815, 819–831, 842–862
   glm.py403610%46–95
   image.py241921%29–54
   load_epi.py281836%36–50, 56–63, 82–88
   load_mri.py191616%16–36
   roi.py24821712%37–100, 165–235, 241–314, 320–387, 399–466, 473–499
   spm.py15813912%26–155, 162–166, 170, 174–179, 183–300
bdpy/opendata
   __init__.py110%1
   openneuro.py2102100%1–329
bdpy/pipeline
   config.py36294%37–38
bdpy/preproc
   interface.py521669%111–123, 148–157
   preprocessor.py1296947%35, 44, 112–114, 121–128, 138–189, 196–227
   select_top.py23196%55
bdpy/recon
   utils.py55550%4–146
bdpy/recon/torch
   icnn.py1611610%15–478
bdpy/recon/torch/modules
   critic.py44295%58, 132
   encoder.py29197%29
   generator.py72593%47, 52, 68, 128, 309
   latent.py34391%16, 21, 32
bdpy/recon/torch/task
   inversion.py831187%22, 40, 45, 50, 57, 62, 67, 72, 96, 210, 225
bdpy/stats
   corr.py43393%57, 68, 102
bdpy/task
   callback.py71494%114, 161, 166, 234
   core.py16194%50
bdpy/util
   info.py473623%19–79
   utils.py36878%60, 116–121, 140–142
TOTAL5981363639% 

Tests Skipped Failures Errors Time
218 0 💤 0 ❌ 0 🔥 18.256s ⏱️

@github-actions
Copy link
Copy Markdown

Coverage

Coverage Report
FileStmtsMissCoverMissing
bdpy/bdata
   bdata.py39919551%79, 104, 109, 113, 118, 122, 132–134, 190, 233–239, 252–262, 276–277, 310, 314, 318–356, 405–411, 419–420, 425–426, 443–450, 468–469, 475, 508, 539, 548, 560, 589–598, 610, 625, 661, 683–691, 696–729, 738, 750–757, 761–767, 771–799, 803–824, 828–862, 866–868, 872–874, 878–887
   featureselector.py641281%62–67, 69–74
   metadata.py67199%84
   utils.py1133767%71, 82, 85–86, 95, 127–173, 201, 246, 258, 263
bdpy/dataform
   datastore.py1078521%59–75, 90–93, 97–98, 102–113, 116–119, 122–127, 131–132, 137–158, 190–197, 222–259, 262–265
   features.py29816545%31–32, 43–46, 90–92, 101–103, 107, 111, 115, 119, 152–153, 157–161, 168–197, 214–215, 224–225, 232–236, 274, 288, 305–319, 323, 327, 331, 335, 339, 343, 347, 351, 355, 359, 364–394, 398–418, 422–462, 465, 470–477, 491–493, 496–499, 502–505, 508–512, 515–516, 536–549
   kvs.py1771393%21, 24, 114, 118, 127–131, 171, 173, 185, 254, 282
   pd.py9544%25–27, 43–44
   sparse.py67790%29, 52–58, 74, 109, 123
   utils.py12120%3–18
bdpy/dataset
   utils.py45450%3–98
bdpy/distcomp
   distcomp.py921880%33, 35, 49, 53, 55, 66–70, 74, 76, 81–82, 89–93, 97
bdpy/dl
   caffe.py60600%4–129
bdpy/dl/torch
   base.py432444%31–41, 48, 54, 60, 63, 73–83, 90, 96, 102, 105
   dataset.py74740%1–192
   models.py33322632%148–169, 297–316, 327–331, 345–350, 442–494, 515–517, 528–587, 611–614, 625–684, 708–711, 722–771, 790–793, 804–853, 872–875
   torch.py1215555%188–225, 228, 231–243, 246–281
bdpy/dl/torch/domain
   core.py46296%47, 63
   feature_domain.py24196%30
   image_domain.py81396%91, 94, 257
bdpy/evals
   metrics.py954553%49–53, 82–112, 130–142, 151–152, 157, 172–179
bdpy/feature
   feature.py30293%69–70
bdpy/fig
   __init__.py550%6–10
   draw_group_image_set.py90900%3–182
   fig.py88880%16–164
   makeplots2.py2632630%1–608
   makeplots.py3363360%1–729
   tile_images.py59590%1–193
bdpy/ml
   crossvalidation.py592754%47–48, 113–114, 117–118, 138, 164–196
   learning.py3159769%9, 47–48, 52, 56, 63, 95–108, 113–129, 132, 162–174, 188–213, 297, 313, 317–319, 322–323, 333, 343–344, 349–350, 360–368, 371–372, 380, 415–422, 443, 456, 464, 473, 505–507, 546, 559, 562, 571, 580, 585, 606
   model.py14012014%29–39, 54–70, 86–144, 156–169, 184–222, 225, 230–250, 254–258, 271–285
   searchlight.py161319%32–51
bdpy/mri
   fmriprep.py4974519%25–34, 38, 44–62, 65–75, 78–89, 92–160, 163–194, 230–360, 367–380, 384, 388–390, 394, 398–400, 410–434, 437–454, 457–464, 471–472, 475–491, 494, 498, 502–815, 819–831, 842–862
   glm.py403610%46–95
   image.py241921%29–54
   load_epi.py281836%36–50, 56–63, 82–88
   load_mri.py191616%16–36
   roi.py24821712%37–100, 165–235, 241–314, 320–387, 399–466, 473–499
   spm.py15813912%26–155, 162–166, 170, 174–179, 183–300
bdpy/opendata
   __init__.py110%1
   openneuro.py2102100%1–329
bdpy/pipeline
   config.py36294%37–38
bdpy/preproc
   interface.py521669%111–123, 148–157
   preprocessor.py1296947%35, 44, 112–114, 121–128, 138–189, 196–227
   select_top.py23196%55
bdpy/recon
   utils.py55550%4–146
bdpy/recon/torch
   icnn.py1611610%15–478
bdpy/recon/torch/modules
   critic.py44295%58, 132
   encoder.py29197%29
   generator.py72593%47, 52, 68, 128, 309
   latent.py34391%16, 21, 32
   optimizer.py22959%8–26
bdpy/recon/torch/task
   inversion.py881583%11–16, 22, 40, 45, 50, 57, 62, 67, 72, 96, 210, 225
bdpy/stats
   corr.py43393%57, 68, 102
bdpy/task
   callback.py71494%114, 161, 166, 234
   core.py16194%50
bdpy/util
   info.py473623%19–79
   utils.py36878%60, 116–121, 140–142
TOTAL5998368339% 

Tests Skipped Failures Errors Time
209 0 💤 0 ❌ 0 🔥 15.920s ⏱️

@ganow ganow requested a review from KenyaOtsuka May 13, 2026 05:29
@ganow ganow changed the title bugfix: fix the behavior of ImageDataset fix: fix the behavior of ImageDataset May 13, 2026
Add tests/dl/torch/test_dataset.py with 9 test cases covering CHW axis
order, per-channel values, DataLoader integration, value normalization,
length, explicit stimulus ordering, and auto-detection via Path.stem.

Also sort auto-detected stimulus names for deterministic ordering, and
remove the empty TestImageDataset stub from test_torch.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ganow ganow changed the title fix: fix the behavior of ImageDataset fix: fix ImageDataset axis order and add tests May 13, 2026
Copy link
Copy Markdown

@KenyaOtsuka KenyaOtsuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. I left a few minor comments.

Optionally, since this changes the behavior of ImageDataset from HWC to CHW, it might be helpful to document that ImageDataset now returns images in CHW format.

Comment thread tests/dl/torch/test_dataset.py Outdated
from pathlib import Path

import numpy as np
import torch
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch seems to be unused in this file. Could you remove it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's true. thank you for mentioning it

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in c82dfac

Comment thread tests/dl/torch/test_dataset.py Outdated
root = Path(self.tmpdir.name)
_save_image(root / "a.jpg", r=200, g=100, b=50)
_save_image(root / "b.jpg", r=10, g=20, b=30)
_save_image(root / "c.jpg", r=0, g=128, b=255)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tests that check exact pixel values, it may be better to use a lossless format such as PNG.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in 57451e7

Copy link
Copy Markdown

@KenyaOtsuka KenyaOtsuka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you!

@KenyaOtsuka KenyaOtsuka merged commit 0bdb419 into dev May 14, 2026
4 of 6 checks passed
@KenyaOtsuka KenyaOtsuka deleted the fix-image-dataset branch May 14, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants