feat: add data loading by LAdam-ix · Pull Request #4 · RationAI/stain-normalization

LAdam-ix · 2026-03-11T06:43:18Z

Data loading follows the structure from the ML template repository. Applies the modification to create modified image that is used as a input.
One of these data modifications is selected randomly with random intensity (from specific range):

HEDFactor: Adjusts hematoxylin and eosin stain intensity independently in HED color space. Simulates differences in staining concentration.
ExposureAdjustment: Changes overall brightness. Simulates scanner exposure differences.
CombinedModifications: Combines HED intensity changes with gamma correction for more realistic compound variations.
HVSModification: Shifts hue, saturation, and brightness in HSV space. Produces more drastic color changes to make the model robust against extreme stain differences.
The stain separate / color space transformations from skimage takes either int 255 images or float 0-1 images and return 0-1.

Summary by CodeRabbit

Release Notes

New Features
- Introduced a complete data pipeline with support for training, validation, testing, and prediction workflows
- Added image transformation and normalization tools including exposure adjustment and color space modifications
- Configured dataset management and batch processing infrastructure
Chores
- Added project configuration and setup files

coderabbitai · 2026-03-11T06:43:40Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR introduces a complete data processing infrastructure for stain normalization, including PyTorch Lightning data modules, configurable tile-based datasets for training/testing/prediction, image transformation classes for color space modifications (HED and HSV), project configuration, and utility functions. It establishes the foundational data pipeline without runtime logic changes.

Changes

Cohort / File(s)	Summary
Project Configuration `.gitignore`, `pyproject.toml`	Adds project metadata, Python dependencies, development tooling (mypy, pre-commit, ruff), UV environment configuration, and comprehensive ignore patterns for Python projects.
Type Aliases `stain_normalization/type_aliases.py`	Defines public type aliases for tensors and batch structures: Sample, PredictSample, Batch, PredictBatch, Outputs.
Data Utilities `stain_normalization/data/utils/__init__.py`, `stain_normalization/data/utils/collate_fn.py`	Introduces collate_fn utility for batching tensors and metadata from DataLoader items via torch.stack and list collection.
Image Modifications `stain_normalization/data/modification/__init__.py`, `stain_normalization/data/modification/exposure_adjustment.py`, `stain_normalization/data/modification/hed_factor.py`, `stain_normalization/data/modification/combined_modification.py`, `stain_normalization/data/modification/hsv_modification.py`	Adds Albumentations-based color space transformations: ExposureAdjustment (brightness scaling), HEDFactor (hematoxylin/eosin channel adjustment), HSVModification (hue/saturation/value shifts), and CombinedModifications (multi-channel HED transformations with gamma adjustments).
Datasets `stain_normalization/data/datasets/__init__.py`, `stain_normalization/data/datasets/train_dataset.py`, `stain_normalization/data/datasets/test_dataset.py`, `stain_normalization/data/datasets/predict_dataset.py`	Introduces three tile-based dataset classes: TrainDataset (yields modified/original image pairs), TestDataset (yields modified images with metadata), PredictDataset (yields tiles for inference). Each manages per-slide tile generation and optional normalization transforms.
Data Module & Package Structure `stain_normalization/data/__init__.py`, `stain_normalization/data/data_module.py`	Adds PyTorch Lightning DataModule for Hydra-configured dataset instantiation, supporting train/validation/test/predict dataloaders with configurable batch sizes, workers, and persistent workers.
Hydra Configuration Files `configs/data/datasets/stain_normalization/train.yaml`, `configs/data/datasets/stain_normalization/test.yaml`, `configs/data/datasets/stain_normalization/val.yaml`, `configs/data/datasets/stain_normalization/predict.yaml`	Defines YAML configurations for dataset instantiation via Hydra, specifying target classes, data modification pipelines, normalization defaults, and MLflow artifact URIs.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐰 Data hops through pipelines clean and bright,
With stains transformed from wrong to right,
HED and HSV in harmony blend,
Tiles stacked and batched from start to end,
A rabbit's delight—configurations all set!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 15.15% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add data loading' directly and clearly describes the primary change: implementing data loading infrastructure for the project.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch feature/ml-data

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-03-11T06:43:41Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request establishes the foundational data loading and augmentation pipeline for the stain normalization project. It integrates with PyTorch Lightning for efficient data management and introduces a suite of image modification techniques to generate diverse training inputs, simulating real-world variations in histopathological images. This setup is crucial for developing robust models capable of handling different staining protocols and scanner settings.

Highlights

Data Loading Infrastructure: Introduced a comprehensive data loading system based on the ML template repository, including LightningDataModule and specialized datasets for training, validation, testing, and prediction.
Image Modification Techniques: Added several image modification classes to simulate variations in staining and exposure, enhancing model robustness. These include HEDFactor, ExposureAdjustment, CombinedModifications, and HVSModification.
Project Configuration: Configured pyproject.toml with project metadata, core dependencies (e.g., lightning, torch, albumentations, rationai libraries), development dependencies, and PDM scripts for various ML lifecycle stages.

Changelog

.gitignore
- Added a comprehensive .gitignore file to exclude common Python, ML, and IDE-related files and directories from version control.
pyproject.toml
- Added project metadata, including name, version, authors, and Python requirements.
- Defined core dependencies for machine learning tasks, including lightning, torch, albumentations, and several rationai libraries.
- Specified development dependencies for static analysis and code formatting (mypy, pre-commit, ruff).
- Configured PDM scripts for mask_generate, tiler, train, validate, test, predict, lint, format, and mypy_check.
stain_normalization/data/init.py
- Added DataModule to the package's public API.
stain_normalization/data/data_module.py
- Implemented DataModule inheriting from LightningDataModule to manage data loading for different stages (fit, validate, test, predict).
- Configured DataLoader instances with options for batch size, shuffling, dropping last batch, and worker processes.
stain_normalization/data/datasets/init.py
- Exported PredictDataset, TestDataset, and TrainDataset for module-level access.
stain_normalization/data/datasets/predict_dataset.py
- Created PredictDataset and _PredictSlideTiles for handling data loading during prediction, including optional image normalization and metadata extraction.
stain_normalization/data/datasets/test_dataset.py
- Implemented TestDataset and _TestSlideTiles to load data for testing, applying specified image modifications and optional normalization, and returning both original and modified images along with metadata.
stain_normalization/data/datasets/train_dataset.py
- Developed TrainDataset and _TrainSlideTiles for training data, applying image modifications and optional normalization, and returning modified and original images.
stain_normalization/data/modification/init.py
- Exported CombinedModifications, ExposureAdjustment, HEDFactor, and HVSModification classes.
stain_normalization/data/modification/combiend_modification.py
- Added CombinedModifications class to apply random intensity scaling and gamma correction to H&E channels in HED color space.
stain_normalization/data/modification/exposure_adjustment.py
- Introduced ExposureAdjustment class to simulate varying scanner exposure by randomly scaling image brightness.
stain_normalization/data/modification/hed_factor.py
- Created HEDFactor class to independently adjust the intensity of Hematoxylin and Eosin stains in HED color space.
stain_normalization/data/modification/hvs_modification.py
- Implemented HVSModification class to randomly shift hue, saturation, and value in HSV color space for robust color variation augmentation.
stain_normalization/data/utils/init.py
- Exported collate_fn for batch processing.
stain_normalization/data/utils/collate_fn.py
- Defined collate_fn to customize how samples are combined into batches for DataLoader.
stain_normalization/type_aliases.py
- Defined type aliases for Sample, PredictSample, Batch, PredictBatch, and Outputs to improve code readability and type hinting.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces data loading and augmentation capabilities for the stain normalization project. The overall structure is well-organized, following standard practices for PyTorch and Lightning. I've identified a few areas for improvement, mainly concerning potential bugs in the image modification classes where input images are not consistently normalized to the expected [0.0, 1.0] float range. I've also suggested refactoring to reduce code duplication in the dataset classes, and pointed out some minor issues with naming and configuration. Addressing these points will improve the robustness and maintainability of the new data pipeline.

stain_normalization/data/modification/combined_modification.py

stain_normalization/data/modification/hed_factor.py

stain_normalization/data/modification/hsv_modification.py

pyproject.toml

stain_normalization/data/datasets/train_dataset.py

stain_normalization/data/modification/combined_modification.py

stain_normalization/data/modification/hed_factor.py

stain_normalization/data/modification/hvs_modification.py

coderabbitai

Actionable comments posted: 12

🧹 Nitpick comments (3)

stain_normalization/data/data_module.py (2)

51-65: Inconsistent persistent_workers setting between loaders.

train_dataloader and val_dataloader set persistent_workers=self.num_workers > 0, but test_dataloader and predict_dataloader omit it. For consistency and potential performance benefits when num_workers > 0, consider adding it.

♻️ Proposed fix

     def test_dataloader(self) -> Iterable[PredictBatch]:
         return DataLoader(
             self.test,
             batch_size=self.batch_size,
             num_workers=self.num_workers,
             collate_fn=collate_fn,
+            persistent_workers=self.num_workers > 0,
         )

     def predict_dataloader(self) -> Iterable[PredictBatch]:
         return DataLoader(
             self.predict,
             batch_size=self.batch_size,
             num_workers=self.num_workers,
             collate_fn=collate_fn,
+            persistent_workers=self.num_workers > 0,
         )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/data/data_module.py` around lines 51 - 65,
test_dataloader and predict_dataloader omit the persistent_workers flag while
train_dataloader and val_dataloader use persistent_workers=self.num_workers > 0;
update the two methods (test_dataloader and predict_dataloader) to pass
persistent_workers=self.num_workers > 0 to DataLoader so their behavior matches
train_dataloader/val_dataloader and benefits from worker persistence when
num_workers > 0.

21-31: Consider handling unexpected stage values.

The match statement has no default case. If an unexpected stage is passed, the method silently does nothing and subsequent dataloader calls will fail with AttributeError. A warning or explicit error would aid debugging.

♻️ Proposed fix

             case "predict":
                 self.predict = instantiate(self.datasets["predict"])
+            case _:
+                raise ValueError(f"Unknown stage: {stage}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/data/data_module.py` around lines 21 - 31, The
setup(self, stage: str) method currently ignores unexpected stage values because
the match has no default branch; update the match in setup to include a default
case (case _: ) that raises a clear exception (e.g., ValueError(f"Unknown stage:
{stage}")) or logs a warning and raises, so callers fail fast and you avoid
later AttributeError when accessing self.train/self.val/etc.; modify the setup
function to add this default branch referenced by the existing setup method name
so any invalid stage produces an explicit error.

stain_normalization/data/modification/combiend_modification.py (1)

50-55: Random state is not reproducible.

Using np.random.uniform directly makes augmentations non-reproducible across runs. Consider using self.random_generator or accepting random params via the **params dict to support albumentations' replay functionality for reproducible augmentations.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/data/modification/combiend_modification.py` around lines
50 - 55, The modify_channel method uses np.random.uniform which makes
augmentations non-reproducible; change it to draw random values from the
operator's controlled RNG (e.g., use
self.random_generator.uniform(*self.intensity_range) and
self.random_generator.uniform(*self.brightness_range)) or accept randomized
values via a params dict passed into modify_channel (e.g., read intensity_scale
and brightness_shift from params if present) so that the sampling can be seeded
and replayed; ensure the class sets up self.random_generator in the constructor
when not provided and keep usage around exposure.adjust_gamma and np.clip as
before.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Line 11: The pyproject.toml dependency string "torchmetrics>=1.4.14" is
invalid; update the requirement to a released version available on PyPI (for
example replace "torchmetrics>=1.4.14" with "torchmetrics>=1.4.3" or
"torchmetrics>=1.5.0") so dependency resolution succeeds—locate the dependency
line containing "torchmetrics>=1.4.14" and change it to a valid version
specifier.
- Around line 15-20: The Git dependencies "rationai-mlkit", "rationai-masks",
"rationai-tiling", and "rationai-staining" in pyproject.toml must be pinned to
immutable refs (a commit SHA or a tagged release) instead of leaving them as
bare git+https URLs; update each dependency string (e.g., rationai-mlkit @
git+https://.../mlkit.git) to include `@git`+https://.../repo.git@<tag-or-commit>
(or add ?ref=<commit>) so installs are reproducible and no longer track the
default branch.

In `@stain_normalization/data/datasets/test_dataset.py`:
- Around line 64-67: The modify transform is being fed uint8 data
(original_image_255) but expects float images in [0,1]; change the call so
modify receives a normalized float image (e.g., pass original_image_255 / 255.0
or cast to float32 and divide) instead of original_image_255, then assign
modified_image_raw = self.modify(image=normalized_image)["image"] and keep
modified_image = modified_image_raw and original_image = normalized_image to
maintain consistent dtypes (referencing modify, modified_image_raw,
modified_image, original_image).

In `@stain_normalization/data/datasets/train_dataset.py`:
- Around line 58-63: The call to the transform in __getitem__ uses
original_image_255 (uint8 0-255), but modify expects float images in [0,1];
normalize before passing it to modify and keep naming consistent: convert
original_image_255 to float by dividing by 255.0 (e.g., original_float =
original_image_255 / 255.0) and call self.modify(image=original_float) so
CombinedModifications / HVSModification receive proper float input, then use the
returned ["image"] as the modified image and retain original_image =
original_float for downstream use.

In `@stain_normalization/data/modification/__init__.py`:
- Around line 1-3: The import in stain_normalization.data.modification.__init__
references a misspelled module name "combiend_modification"; rename the module
file to combined_modification.py and update the import to use
combined_modification so that CombinedModifications is imported from the
correctly named module; ensure any other imports referencing
combiend_modification are updated as well (search for combiend_modification and
replace with combined_modification).

In `@stain_normalization/data/modification/combiend_modification.py`:
- Around line 41-48: The HED pipeline is receiving uint8 [0-255] images causing
separate_stains to misinterpret intensities; update the entry point
(apply/modify method that calls separate_stains in combiend_modification.py) to
first convert the input image to float in [0,1] (e.g., img = img.astype(float) /
255.0) before calling separate_stains, then continue with modify_channel,
combine_stains and return; apply the same normalization fix in hed_factor.py
wherever separate_stains is called to ensure consistent input dtype/range.

In `@stain_normalization/data/modification/exposure_adjustment.py`:
- Around line 28-39: The brightness_factor is being sampled inside apply(),
which breaks Albumentations' deterministic/replay behavior; move the sampling
into get_params() so it returns {"brightness_factor": <sampled_value>} (use
self.brightness_range to draw the value) and then update apply(self, img:
NDArray[Any], **params: Any) to read brightness_factor =
params["brightness_factor"] instead of sampling; ensure the class
(ExposureAdjustment) implements get_params() to return that dict so
additional_targets and replay modes receive the same parameter.

In `@stain_normalization/data/modification/hed_factor.py`:
- Around line 44-50: Currently H and E channels are being clipped in HED space
which discards valid scaled concentrations and the function also doesn't ensure
a float32 [0,1] RGB return; instead, stop clipping h and e before reconstruction
(leave h = hed_image[:, :, 0] * h_factor and e = hed_image[:, :, 1] * e_factor,
keep d as is), pass the stacked HED to combine_stains(…) to reconstruct RGB,
then enforce clipping and dtype on the final image (e.g., np.clip(modified_rgb,
0, 1) and cast to np.float32) before returning; reference symbols: hed_image,
separate_stains, combine_stains, h_factor, e_factor, d, modified_rgb.
- Around line 30-42: The random sampling of h_factor and e_factor must be moved
out of apply() into get_params() so Albumentations can generate seeded,
replayable params; implement get_params(self) to draw h_factor and e_factor
using self.random_generator.uniform(*self.h_range) and
self.random_generator.uniform(*self.e_range) and return them in a dict (e.g.,
{"h_factor": h_factor, "e_factor": e_factor}); then update apply(self, img,
**params) to read h_factor = params["h_factor"] and e_factor =
params["e_factor"] (ensuring additional_targets will receive the same params) so
Compose(..., seed=...) and ReplayCompose behave deterministically.

In `@stain_normalization/data/modification/hvs_modification.py`:
- Around line 50-53: The rgb2hsv call assumes float images in [0,1], so ensure
the input 'img' is normalized before conversion: detect if img.dtype is uint8
and, if so, convert img to float and divide by 255.0 (e.g., replace/prepare the
local 'img' used by rgb2hsv), then perform the hue/saturation/value edits on
'hsv_image'; if the function must preserve input dtype for callers, convert the
modified image back to the original dtype (e.g., multiply by 255 and cast to
uint8) before returning. Reference symbols: img, rgb2hsv, hsv_image, hue_shift,
saturation_scale, value_scale in hvs_modification.py.
- Around line 42-44: Fix the typo in the docstring in
stain_normalization/data/modification/hvs_modification.py by replacing the
incorrect phrase "RGB image with HVS modifiedications as a float32" with "RGB
image with HVS modifications as a float32" (locate the Returns: block containing
the string "RGB image with HVS modifiedications" and update it accordingly).

In `@stain_normalization/data/utils/collate_fn.py`:
- Around line 7-8: collate_fn currently stacks only the input tensors and leaves
targets as a Python list, which mismatches _TrainSlideTiles.__getitem__ (returns
(Tensor, Tensor)) and breaks training; either make this function
predict/test-only (rename and narrow its type to accept
PredictSample->PredictBatch) or stack the targets as well by returning
(torch.stack([x[0] for x in batch]), torch.stack([x[1] for x in batch])) and
updating the type signature to tuple[Tensor, Tensor] so downstream training code
receives batched target tensors.

---

Nitpick comments:
In `@stain_normalization/data/data_module.py`:
- Around line 51-65: test_dataloader and predict_dataloader omit the
persistent_workers flag while train_dataloader and val_dataloader use
persistent_workers=self.num_workers > 0; update the two methods (test_dataloader
and predict_dataloader) to pass persistent_workers=self.num_workers > 0 to
DataLoader so their behavior matches train_dataloader/val_dataloader and
benefits from worker persistence when num_workers > 0.
- Around line 21-31: The setup(self, stage: str) method currently ignores
unexpected stage values because the match has no default branch; update the
match in setup to include a default case (case _: ) that raises a clear
exception (e.g., ValueError(f"Unknown stage: {stage}")) or logs a warning and
raises, so callers fail fast and you avoid later AttributeError when accessing
self.train/self.val/etc.; modify the setup function to add this default branch
referenced by the existing setup method name so any invalid stage produces an
explicit error.

In `@stain_normalization/data/modification/combiend_modification.py`:
- Around line 50-55: The modify_channel method uses np.random.uniform which
makes augmentations non-reproducible; change it to draw random values from the
operator's controlled RNG (e.g., use
self.random_generator.uniform(*self.intensity_range) and
self.random_generator.uniform(*self.brightness_range)) or accept randomized
values via a params dict passed into modify_channel (e.g., read intensity_scale
and brightness_shift from params if present) so that the sampling can be seeded
and replayed; ensure the class sets up self.random_generator in the constructor
when not provided and keep usage around exposure.adjust_gamma and np.clip as
before.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ea116703-d2f2-4324-bcc9-7f83226e08e3

📥 Commits

Reviewing files that changed from the base of the PR and between 0ce77d2 and 3151772.

⛔ Files ignored due to path filters (2)

pdm.lock is excluded by !**/*.lock
uv.lock is excluded by !**/*.lock

📒 Files selected for processing (16)

.gitignore
pyproject.toml
stain_normalization/data/__init__.py
stain_normalization/data/data_module.py
stain_normalization/data/datasets/__init__.py
stain_normalization/data/datasets/predict_dataset.py
stain_normalization/data/datasets/test_dataset.py
stain_normalization/data/datasets/train_dataset.py
stain_normalization/data/modification/__init__.py
stain_normalization/data/modification/combiend_modification.py
stain_normalization/data/modification/exposure_adjustment.py
stain_normalization/data/modification/hed_factor.py
stain_normalization/data/modification/hvs_modification.py
stain_normalization/data/utils/__init__.py
stain_normalization/data/utils/collate_fn.py
stain_normalization/type_aliases.py

pyproject.toml

stain_normalization/data/datasets/test_dataset.py

stain_normalization/data/datasets/train_dataset.py

stain_normalization/data/modification/__init__.py

stain_normalization/data/modification/hed_factor.py

stain_normalization/data/modification/hsv_modification.py

stain_normalization/data/utils/collate_fn.py

pdm.lock

pyproject.toml

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

pyproject.toml (1)
15-20: ⚠️ Potential issue | 🟠 Major

Pin the Git dependencies to immutable refs.

These direct Git requirements currently float with whatever the upstream default branch points to at install time, so builds are not reproducible. The packaging spec and pip both support pinning a tag or commit, and pip explicitly prefers full commit hashes for VCS requirements. (pip.pypa.io)
Example pinning style
-    "rationai-mlkit @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/mlkit.git",
+    "rationai-mlkit @ git+https://gitlab.ics.muni.cz/rationai/digital-pathology/libraries/mlkit.git@<full-commit-sha>",
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` around lines 15 - 20, The Git-based dependencies
("rationai-mlkit", "rationai-masks", "rationai-tiling", and "rationai-staining")
are currently unpinned and should be locked to immutable refs; update their
pyproject.toml requirement strings to include a tag or, preferably, a full
commit hash (VCS-style ref) instead of leaving them floating so builds are
reproducible (e.g., replace the current git+https://... entries for
rationai-mlkit, rationai-masks, rationai-tiling, and rationai-staining with the
same URLs suffixed by @<tag-or-commit-hash> or the full commit SHA).

🧹 Nitpick comments (1)

pyproject.toml (1)
5-5: Avoid pinning requires-python to one patch release.

==3.12.5 is strict equality, so installers will reject 3.12.6+ even if the project only relies on Python 3.12 semantics. If the goal is reproducible dev environments, keep the exact patch in CI or .python-version and publish a range here instead, e.g. >=3.12,<3.13. (docs.astral.sh)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 5, The requires-python entry is pinned to a single
patch (requires-python = "==3.12.5"), which will reject other 3.12 patch
releases; update the requires-python value to a compatible range such as
">=3.12,<3.13" in pyproject.toml and keep any exact patch pinning for
reproducible dev environments in CI or a .python-version file instead.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 26-35: The [tool.uv.scripts] table is invalid for uv; remove this
table and its entries (mask_generate, tiler, train, validate, test, predict,
lint, format, mypy_check) from pyproject.toml, and either (A) invoke these shell
commands directly via CI/task runner/uv run, or (B) implement thin Python
wrapper functions (e.g., in a small module exposing functions that call
preprocessing.mask_generator, preprocessing.tiler, and invoke
stain_normalization via run() wrappers) and register them as console scripts
under [project.scripts] (PEP 621) as module:function entries; choose one
approach and update pyproject.toml accordingly.

---

Duplicate comments:
In `@pyproject.toml`:
- Around line 15-20: The Git-based dependencies ("rationai-mlkit",
"rationai-masks", "rationai-tiling", and "rationai-staining") are currently
unpinned and should be locked to immutable refs; update their pyproject.toml
requirement strings to include a tag or, preferably, a full commit hash
(VCS-style ref) instead of leaving them floating so builds are reproducible
(e.g., replace the current git+https://... entries for rationai-mlkit,
rationai-masks, rationai-tiling, and rationai-staining with the same URLs
suffixed by @<tag-or-commit-hash> or the full commit SHA).

---

Nitpick comments:
In `@pyproject.toml`:
- Line 5: The requires-python entry is pinned to a single patch (requires-python
= "==3.12.5"), which will reject other 3.12 patch releases; update the
requires-python value to a compatible range such as ">=3.12,<3.13" in
pyproject.toml and keep any exact patch pinning for reproducible dev
environments in CI or a .python-version file instead.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 6cdf6cc6-e6a9-439a-9644-2f99fd55ab09

📥 Commits

Reviewing files that changed from the base of the PR and between 9874f1c and e4b8fda.

📒 Files selected for processing (1)

pyproject.toml

pyproject.toml

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

stain_normalization/data/modification/hsv_modification.py (1)
22-33: Consider implementing get_transform_init_args_names for serialization support.

Custom albumentations transforms that store init parameters should implement get_transform_init_args_names() to enable proper pipeline serialization/deserialization. Without it, hue_shift_range, saturation_range, and value_range will be dropped when saving/loading pipelines.

This applies to sibling classes as well (ExposureAdjustment, CombinedModifications), so it may warrant a project-wide fix.
♻️ Proposed fix
         self.hue_shift_range = hue_shift_range
         self.saturation_range = saturation_range
         self.value_range = value_range
+
+    def get_transform_init_args_names(self) -> tuple[str, ...]:
+        return ("hue_shift_range", "saturation_range", "value_range")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@stain_normalization/data/modification/hsv_modification.py` around lines 22 -
33, Implement get_transform_init_args_names on the HSVModification class to
return the init parameter names so albumentations can serialize/deserialize the
transform (include "hue_shift_range", "saturation_range", "value_range", and any
base args like "always_apply" and "p"); update the same method on sibling
classes ExposureAdjustment and CombinedModifications to list their constructor
parameters similarly so pipeline saving/loading preserves those fields.
pyproject.toml (1)
5-5: Verify whether the 3.12.5 floor is required for a runtime-specific reason. Pinning to a patch release rejects otherwise compatible Python 3.12 versions (3.12.0-3.12.4). No other references to 3.12.5 were found in the codebase. If no specific 3.12.5 requirement exists, requires-python = ">=3.12,<3.14" would be a safer, more inclusive constraint.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyproject.toml` at line 5, Check whether the specific Python patch floor
"3.12.5" in the requires-python entry in pyproject.toml is required for a
runtime or dependency reason; if you find no code/dependency that relies on
3.12.5-specific behavior (search for usage or pinned dependency notes), relax
the constraint to a broader compatible range such as requires-python =
">=3.12,<3.14" to allow all 3.12.x interpreters, otherwise document the concrete
reason for 3.12.5 in a comment or changelog and keep the exact pin.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pyproject.toml`:
- Around line 1-29: The pyproject.toml defines PEP 621 metadata under [project]
but is missing a [build-system] table; add a [build-system] block that declares
a reproducible build backend (e.g., set requires to a minimal list such as
"setuptools>=61" and "wheel" or "poetry-core" depending on your chosen backend,
and set build-backend to the matching backend string) so pip and other tools
know how to build the project; update pyproject.toml to include the new
[build-system] section alongside the existing [project] metadata.

In `@stain_normalization/data/modification/combined_modification.py`:
- Around line 13-29: The constructor of CombinedModification uses an outdated
super call (super().__init__(always_apply, p)) which will raise a TypeError with
Albumentations 2.x+; in CombinedModification.__init__ replace the super call to
only pass p (e.g., super().__init__(p=1.0) or super().__init__(p=p)) and set the
default p to 1.0 to preserve "always apply" behavior, leaving assignments to
self.intensity_range and self.brightness_range unchanged.

In `@stain_normalization/data/modification/hsv_modification.py`:
- Around line 55-57: The hsv_modification.py function returns modified_rgb from
hsv2rgb which yields float64, conflicting with the docstring and sibling classes
like CombinedModifications; update the return to explicitly cast the array to
float32 before returning (e.g., convert modified_rgb to dtype float32) and
ensure the function signature/docstring still states float32 so types are
consistent with CombinedModifications and other modifiers.

---

Nitpick comments:
In `@pyproject.toml`:
- Line 5: Check whether the specific Python patch floor "3.12.5" in the
requires-python entry in pyproject.toml is required for a runtime or dependency
reason; if you find no code/dependency that relies on 3.12.5-specific behavior
(search for usage or pinned dependency notes), relax the constraint to a broader
compatible range such as requires-python = ">=3.12,<3.14" to allow all 3.12.x
interpreters, otherwise document the concrete reason for 3.12.5 in a comment or
changelog and keep the exact pin.

In `@stain_normalization/data/modification/hsv_modification.py`:
- Around line 22-33: Implement get_transform_init_args_names on the
HSVModification class to return the init parameter names so albumentations can
serialize/deserialize the transform (include "hue_shift_range",
"saturation_range", "value_range", and any base args like "always_apply" and
"p"); update the same method on sibling classes ExposureAdjustment and
CombinedModifications to list their constructor parameters similarly so pipeline
saving/loading preserves those fields.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 20008277-a9db-4881-be99-098daf1c92b3

📥 Commits

Reviewing files that changed from the base of the PR and between ac94a75 and 6c7ecbf.

📒 Files selected for processing (5)

pyproject.toml
stain_normalization/data/modification/__init__.py
stain_normalization/data/modification/combined_modification.py
stain_normalization/data/modification/hed_factor.py
stain_normalization/data/modification/hsv_modification.py

🚧 Files skipped from review as they are similar to previous changes (1)

stain_normalization/data/modification/init.py

pyproject.toml

stain_normalization/data/modification/combined_modification.py

stain_normalization/data/modification/hsv_modification.py

pyproject.toml

configs/data/datasets/stain_normalization/predict.yaml

stain_normalization/data/datasets/test_dataset.py

stain_normalization/data/modification/hed_factor.py

pyproject.toml

matejpekar · 2026-03-12T14:57:29Z

@172454 please check the correctness of modifications

172454

Make the documentation and naming more explicit regarding OD (optical density).

stain_normalization/data/modification/hed_factor.py

stain_normalization/data/modification/combined_modification.py

stain_normalization/data/modification/hed_factor.py

vejtek

A comment on the train/test/predict datasets: as currently implemented, the code is very repetitive, and it's hard to spot where they differ. At least, docstrings that navigate the reader to the relevant differences would be welcome.

stain_normalization/data/utils/collate_fn.py

matejpekar · 2026-03-12T22:41:09Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new stain-normalization project structure, including a comprehensive .gitignore, pyproject.toml with new dependencies (e.g., lightning, albumentations, torch), and a DataModule for handling various dataset stages (train, val, test, predict). It defines specific dataset configurations (predict.yaml, test.yaml, train.yaml, val.yaml) and image normalization parameters (normalize/default.yaml). Additionally, it adds several image modification transformations (CombinedModifications, ExposureAdjustment, HEDFactor, HSVModification) and type aliases for data handling. Review feedback suggests pinning GitLab dependencies in pyproject.toml to specific commit hashes for reproducible builds, making MLflow artifact URIs in dataset configurations more flexible using Hydra, optimizing the DataModule's setup method to prevent redundant dataset instantiation, and improving the robustness of image float conversion in ExposureAdjustment by using skimage.util.img_as_float.

pyproject.toml

configs/data/datasets/stain_normalization/predict.yaml

stain_normalization/data/data_module.py

stain_normalization/data/modification/exposure_adjustment.py

feat: add data loading

3151772

LAdam-ix requested a review from matejpekar March 11, 2026 06:43

LAdam-ix self-assigned this Mar 11, 2026

LAdam-ix requested review from a team and JakubPekar March 11, 2026 06:43

gemini-code-assist bot reviewed Mar 11, 2026

View reviewed changes

coderabbitai bot reviewed Mar 11, 2026

View reviewed changes

feat: add dataset configs with MMCI TL 512 URIs

9874f1c

matejpekar requested changes Mar 11, 2026

View reviewed changes

pdm.lock Outdated Show resolved Hide resolved

pyproject.toml Show resolved Hide resolved

LAdam-ix force-pushed the feature/ml-data branch from e4b8fda to f3acae8 Compare March 12, 2026 11:31

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

pyproject.toml Outdated Show resolved Hide resolved

chore: migrate from pdm to uv

8d58794

LAdam-ix force-pushed the feature/ml-data branch from f3acae8 to 8d58794 Compare March 12, 2026 11:33

LAdam-ix requested a review from matejpekar March 12, 2026 11:44

LAdam-ix added 2 commits March 12, 2026 13:48

chore: fix requires-python range

ac94a75

fix: review ai agent feedback

6c7ecbf

coderabbitai bot reviewed Mar 12, 2026

View reviewed changes

pyproject.toml Show resolved Hide resolved

stain_normalization/data/modification/combined_modification.py Show resolved Hide resolved

stain_normalization/data/modification/hsv_modification.py Outdated Show resolved Hide resolved

LAdam-ix added 2 commits March 12, 2026 14:44

chore: update uv.lock

851b4e5

fix: ai agent review

e608241

matejpekar requested changes Mar 12, 2026

View reviewed changes

matejpekar requested review from 172454 and vejtek and removed request for JakubPekar March 12, 2026 14:56

172454 suggested changes Mar 12, 2026

View reviewed changes

stain_normalization/data/modification/hed_factor.py Outdated Show resolved Hide resolved

stain_normalization/data/modification/combined_modification.py Outdated Show resolved Hide resolved

stain_normalization/data/modification/hed_factor.py Show resolved Hide resolved

fix: review feedback

6560ac8

LAdam-ix requested a review from matejpekar March 12, 2026 18:41

172454 approved these changes Mar 12, 2026

View reviewed changes

matejpekar previously approved these changes Mar 12, 2026

View reviewed changes

vejtek requested changes Mar 12, 2026

View reviewed changes

stain_normalization/data/utils/collate_fn.py Outdated Show resolved Hide resolved

gemini-code-assist bot reviewed Mar 12, 2026

View reviewed changes

pyproject.toml Show resolved Hide resolved

configs/data/datasets/stain_normalization/predict.yaml Show resolved Hide resolved

stain_normalization/data/data_module.py Show resolved Hide resolved

stain_normalization/data/modification/exposure_adjustment.py Outdated Show resolved Hide resolved

fix: add dataset docstrings, move collate_fn, use img_as_float

51d9708

LAdam-ix dismissed matejpekar’s stale review via 51d9708 March 13, 2026 08:34

matejpekar approved these changes Mar 13, 2026

View reviewed changes

LAdam-ix requested a review from vejtek March 13, 2026 09:24

vejtek approved these changes Mar 13, 2026

View reviewed changes

vejtek merged commit babaf36 into main Mar 13, 2026
2 of 3 checks passed

vejtek deleted the feature/ml-data branch March 13, 2026 11:54

coderabbitai bot mentioned this pull request Mar 14, 2026

feat: add model and loss function #8

Merged

LAdam-ix restored the feature/ml-data branch March 14, 2026 13:48

LAdam-ix deleted the feature/ml-data branch March 14, 2026 13:49

Conversation

LAdam-ix commented Mar 11, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

gemini-code-assist bot commented Mar 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

matejpekar commented Mar 12, 2026

Uh oh!

172454 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vejtek left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

matejpekar commented Mar 12, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

LAdam-ix commented Mar 11, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 11, 2026 •

edited

Loading