Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ RUN curl -fsSL https://github.com/libjpeg-turbo/libjpeg-turbo/releases/download/
WORKDIR /opt/app/

ARG PYTORCH_CUDA_INDEX_URL=https://download.pytorch.org/whl/cu128
ARG GIT_MODEL_DEPENDENCIES="git+https://github.com/lilab-stanford/MUSK.git git+https://github.com/Mahmoodlab/CONCH.git git+https://github.com/prov-gigapath/prov-gigapath.git"
ARG GIT_MODEL_DEPENDENCIES="git+https://github.com/lilab-stanford/MUSK.git git+https://github.com/Mahmoodlab/CONCH.git git+https://github.com/prov-gigapath/prov-gigapath.git git+https://github.com/facebookresearch/sam2.git"

RUN python -m ensurepip --upgrade \
&& python -m pip install --upgrade pip setuptools pip-tools \
Expand Down
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ pip install git+https://github.com/Mahmoodlab/CONCH.git
pip install git+https://github.com/prov-gigapath/prov-gigapath.git
```

AtlasPatch-backed tissue segmentation is available through hs2p's `sam2` path in the bundled install.

## Python API

```python
Expand Down
19 changes: 16 additions & 3 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ In practice, the config controls:
- preprocessing/tiling parameters
- output directory
- batch size, workers, precision, and GPU count
- whether to save tiling previews through `tiling.preview.save`
- whether to save mask and tiling previews through `tiling.preview.save_mask_preview` / `tiling.preview.save_tiling_preview`
- whether to save tile artifacts alongside slide-level outputs

## Common Overrides
Expand All @@ -79,7 +79,9 @@ Common overrides:
- `output_dir=/path/to/output`
- `speed.num_gpus=4`
- `speed.num_dataloader_workers=8` (`null` keeps auto mode)
- `tiling.preview.save=true`
- `tiling.preview.save_mask_preview=true`
- `tiling.preview.save_tiling_preview=true`
- `tiling.preview.tissue_contour_color=[157, 219, 129]`
- `tiling.params.region_tile_multiple=6` (hierarchical extraction)
- `model.name=...`
- `model.output_variant=...`
Expand Down Expand Up @@ -138,6 +140,17 @@ slide2vec /path/to/config.yaml speed.num_gpus=4

If you pass `--run-on-cpu`, the CLI uses CPU execution instead.

## Segmentation Notes

`tiling.seg_params.method` controls how hs2p segments tissue before it extracts coordinates:

- `hsv` uses the HSV heuristic
- `otsu` thresholds the saturation channel with Otsu
- `threshold` applies a fixed saturation threshold
- `sam2` runs the AtlasPatch SAM2 tissue segmentation path on an internal `8.0 um/px` thumbnail

When `method: sam2` is selected, `sam2_checkpoint_path` and `sam2_config_path` are optional. If they are left blank, hs2p downloads the default AtlasPatch checkpoint and SAM2 config from Hugging Face.

## Outputs

The CLI writes explicit artifact directories under the run output directory:
Expand All @@ -149,7 +162,7 @@ The CLI writes explicit artifact directories under the run output directory:
- `slide_embeddings/<sample_id>.pt` or `.npz`
- `slide_embeddings/<sample_id>.meta.json`
- optional `slide_latents/<sample_id>.pt` or `.npz`
- `process_list.csv` with backend provenance columns (`requested_backend`, `backend`) carried through from hs2p, plus embedding provenance columns (`encoder_name`, `output_variant`, `feature_kind`) once feature artifacts are written
- `process_list.csv` with hs2p provenance columns (`annotation`, `requested_backend`, `backend`) carried through from hs2p, plus embedding provenance columns (`encoder_name`, `output_variant`, `feature_kind`) once feature artifacts are written
- the resolved saved config file for the run
- `logs/` with the main log plus distributed worker stdout/stderr captures when multi-GPU workers are used

Expand Down
22 changes: 22 additions & 0 deletions docs/documentation.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# Documentation Log

## 2026-04-18

- Aligned slide2vec with hs2p 4.0.0's unified tiling/sampling contract by preserving the new `annotation` column in process lists and translating preview configs to hs2p's `save_mask_preview` / `save_tiling_preview` / `tissue_contour_color` fields.

- Split the live tiling UI into a coordinates-extraction bar plus a separate preview-generation bar, and moved the final tiling summary into a dedicated `tiling.summary` event so it prints once at the very end.

## 2026-04-17

- Kept per-slide backend-selection notices, but switched Rich rendering to the console print path used by hs2p so they appear above the live bar without corrupting it.

## 2026-04-17

- Added a selective hs2p progress bridge so slide2vec keeps its own run/config summaries while still surfacing bridged tissue and backend-selection events from upstream tiling.

## 2026-04-17

- Removed slide2vec's extra preflight backend-resolution pass for `backend="auto"` so tiling now relies on hs2p's own resolver once per slide.

## 2026-04-17

- Aligned slide2vec's bundled preprocessing schema with hs2p 3.3.0 by switching the default tissue-segmentation config to the new `method`-based SAM2-capable schema and documenting AtlasPatch-backed `sam2` usage.

## 2026-04-17

- Reworked the docs landing page into a product-style hero with action buttons, feature cards, and a summary panel to make the site feel less like a flat index.
Expand Down
11 changes: 9 additions & 2 deletions docs/python-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,16 @@ preprocessing = PreprocessingConfig(
requested_spacing_um=0.5,
requested_tile_size_px=224,
tissue_threshold=0.1,
segmentation={"downsample": 64},
segmentation={
"method": "hsv",
"downsample": 64,
},
filtering={"ref_tile_size": 224},
preview={
"save_mask_preview": False,
"save_tiling_preview": False,
"downsample": 32,
"tissue_contour_color": (157, 219, 129),
},
)
embedded = model.embed_slide("/path/to/slide.svs", preprocessing=preprocessing)
Expand All @@ -82,12 +86,13 @@ Common fields:
- `requested_tile_size_px`
- `tissue_threshold`
- `backend` - `"auto"`, `"cucim"`, `"openslide"`, `"vips"`, or `"asap"`
- `segmentation` - forwarded to hs2p's segmentation config; `method` supports `"hsv"`, `"otsu"`, `"threshold"`, or `"sam2"`
- `on_the_fly` - read tiles directly from WSI during embedding (default `True`)
- `use_supertiles` - group tiles into spatial blocks to reduce WSI read calls (default `True`)
- `read_coordinates_from` - reuse pre-extracted coordinates
- `read_tiles_from` - reuse pre-extracted tile tar archives
- `resume` - resume from a previous tiling run (default `False`)
- `preview`
- `preview` - forwarded to hs2p's preview config; `save_mask_preview` and `save_tiling_preview` control whether hs2p writes the two preview images, and `tissue_contour_color` controls the tissue contour RGB color

For hierarchical extraction, see the [dedicated section](#hierarchical-feature-extraction) below.

Expand Down Expand Up @@ -236,6 +241,8 @@ result = pipeline.run(manifest_path="/path/to/slides.csv")

The manifest schema matches HS2P and accepts optional `mask_path` and `spacing_at_level_0` columns. Patient-level models additionally require a `patient_id` column; see [Patient manifest format](models.md#patient-manifest-format).

When you select `segmentation.method="sam2"`, hs2p uses the AtlasPatch tissue segmentation path and can download the default checkpoint/config automatically if you do not provide local paths.

### Reusing pre-extracted coordinates

If you already have tiling coordinates from a previous run, use `run_with_coordinates(...)` to skip the tiling stage:
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ classifiers = [
"Programming Language :: Python :: 3.13",
]
dependencies = [
"hs2p[asap,cucim,openslide,vips]>=3.2.1",
"hs2p[asap,cucim,openslide,sam2,vips]>=4.0.0",
"omegaconf",
"matplotlib",
"numpy<2",
Expand Down Expand Up @@ -88,7 +88,7 @@ fm = [
"pandas",
"pillow",
"rich",
"hs2p[asap,cucim,openslide,vips]>=3.2.1",
"hs2p[asap,cucim,openslide,sam2,vips]>=4.0.0",
"wandb",
"torch>=2.3,<2.8",
"torchvision>=0.18.0",
Expand Down
19 changes: 12 additions & 7 deletions slide2vec/api.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,8 +72,17 @@ def from_config(cls, cfg: Any) -> "PreprocessingConfig":
gpu_decode = bool(tiling.gpu_decode)
adaptive_batching = bool(tiling.adaptive_batching)
preview_cfg = tiling.preview
preview_save = bool(preview_cfg.save)
preview_downsample = int(preview_cfg.downsample)
preview_save = bool(preview_cfg.save_mask_preview)
preview_tiling_save = bool(preview_cfg.save_tiling_preview)
preview_kwargs: dict[str, Any] = {
"save_mask_preview": preview_save,
"save_tiling_preview": preview_tiling_save,
"downsample": int(preview_cfg.downsample),
}
preview_kwargs["tissue_contour_color"] = tuple(
int(channel) for channel in preview_cfg.tissue_contour_color
)
preview_kwargs["mask_overlay_alpha"] = float(preview_cfg.mask_overlay_alpha)
return cls(
backend=tiling.backend,
requested_spacing_um=float(tiling.params.requested_spacing_um),
Expand Down Expand Up @@ -104,11 +113,7 @@ def from_config(cls, cfg: Any) -> "PreprocessingConfig":
resume=bool(cfg.resume),
segmentation=dict(tiling.seg_params),
filtering=dict(tiling.filter_params),
preview={
"save_mask_preview": preview_save,
"save_tiling_preview": preview_save,
"downsample": preview_downsample,
},
preview=preview_kwargs,
)

def with_backend(self, backend: str) -> "PreprocessingConfig":
Expand Down
13 changes: 8 additions & 5 deletions slide2vec/configs/default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,14 @@ tiling:
# downsample controls which pyramid level is read for tissue segmentation.
# Larger values are faster and use less memory; smaller values can improve mask precision.
downsample: 64 # find the closest downsample in the slide for tissue segmentation
sthresh: 8 # segmentation threshold (positive integer, using a higher threshold leads to less foreground and more background detection) (not used when use_otsu=True)
sthresh: 8 # segmentation threshold (positive integer, using a higher threshold leads to less foreground and more background detection) (not used when method="otsu")
sthresh_up: 255 # upper threshold value for scaling the binary mask
mthresh: 7 # median filter size (positive, odd integer)
close: 4 # additional morphological closing to apply following initial thresholding (positive integer)
use_otsu: false # use otsu's method instead of simple binary thresholding
use_hsv: true # use HSV thresholding instead of simple binary thresholding
method: "hsv" # tissue segmentation method: "hsv", "otsu", "threshold", or "sam2"
sam2_checkpoint_path: # optional when method="sam2"; if empty, hs2p downloads the default AtlasPatch checkpoint from Hugging Face
sam2_config_path: # optional local override for the SAM2 model config; if empty, hs2p downloads the default AtlasPatch config from Hugging Face
sam2_device: "cpu" # device for SAM2 inference, e.g. "cpu", "cuda", or "cuda:0"
filter_params:
ref_tile_size: ${tiling.params.requested_tile_size_px} # reference tile size at the target spacing
a_t: 4 # area filter threshold for tissue (positive integer, the minimum size of detected foreground contours to consider, relative to the reference tile size ref_tile_size, e.g. a value 10 means only detected foreground contours of size greater than 10 [ref_tile_size, ref_tile_size] tiles at spacing tiling.params.requested_spacing_um will be kept)
Expand All @@ -60,9 +62,10 @@ tiling:
blur_threshold: 50.0 # minimum blur score (higher is sharper)
qc_spacing_um: 2.0 # spacing at which pixel-based QC is evaluated
preview:
save: true # save preview images of slide tiling and mask overlays
save_mask_preview: true # save preview images of mask overlays
save_tiling_preview: true # save preview images of tile layouts
downsample: 32 # downsample to use for preview rendering
mask_overlay_color: [157, 219, 129] # RGB color used for tissue overlays in batch mask previews
tissue_contour_color: [157, 219, 129] # RGB color used for tissue contours in batch mask previews
mask_overlay_alpha: 0.5 # alpha used for tissue overlays in batch mask previews

speed:
Expand Down
88 changes: 66 additions & 22 deletions slide2vec/inference.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@
import pandas as pd
import torch
from hs2p import SlideSpec, FilterConfig, PreviewConfig, SegmentationConfig, TilingConfig, load_tiling_result, tile_slides
from hs2p.wsi.backend import resolve_backend
from hs2p.utils.stderr import run_with_filtered_stderr, run_with_filtered_stdio
from hs2p import progress as hs2p_progress
from hs2p.utils.stderr import run_with_filtered_stderr
import numpy as np
from transformers.image_processing_utils import BaseImageProcessor

Expand Down Expand Up @@ -52,8 +52,11 @@
from slide2vec.model_settings import canonicalize_model_name
from slide2vec.runtime_types import LoadedModel
from slide2vec.progress import (
NullProgressReporter,
ProgressEvent as Slide2VecProgressEvent,
emit_progress,
emit_progress_event,
get_progress_reporter,
read_progress_events,
read_tiling_progress_snapshot,
)
Expand Down Expand Up @@ -81,6 +84,49 @@ class BatchTransformSpec:
resize_interpolation: str = "bilinear"


_BRIDGED_HS2P_PROGRESS_KINDS = {
"backend.selected",
"tissue.started",
"tissue.progress",
"tissue.finished",
"tiling.progress",
"tiling.finished",
"preview.started",
"preview.progress",
"preview.finished",
}


class _Hs2pProgressBridge:
def __init__(self, downstream) -> None:
self._downstream = downstream

def emit(self, event) -> None:
if event.kind not in _BRIDGED_HS2P_PROGRESS_KINDS:
return
self._downstream.emit(
Slide2VecProgressEvent(kind=event.kind, payload=dict(event.payload))
)

def close(self) -> None:
return None

def write_log(self, message: str, *, stream=None) -> None:
if hasattr(self._downstream, "write_log"):
self._downstream.write_log(message, stream=stream)


@contextmanager
def _bridge_hs2p_progress_to_slide2vec():
downstream = get_progress_reporter()
if isinstance(downstream, NullProgressReporter):
yield
return
bridge = _Hs2pProgressBridge(downstream)
with hs2p_progress.activate_progress_reporter(bridge):
yield


@dataclass(kw_only=True)
class PreparedBatch:
indices: Any
Expand Down Expand Up @@ -370,7 +416,7 @@ def embed_slides(
output_dir=work_dir,
num_workers=execution.num_preprocessing_workers,
)
_emit_tiling_finished(
_emit_tiling_summary(
process_list_path,
expected_total=len(slide_records),
successful_slides=prepared_slides,
Expand Down Expand Up @@ -561,7 +607,7 @@ def embed_patients(
output_dir=work_dir,
num_workers=execution.num_preprocessing_workers,
)
_emit_tiling_finished(
_emit_tiling_summary(
process_list_path,
expected_total=len(slide_records),
successful_slides=prepared_slides,
Expand Down Expand Up @@ -850,7 +896,7 @@ def run_pipeline(
output_dir=output_dir,
num_workers=execution.num_preprocessing_workers,
)
_emit_tiling_finished(
_emit_tiling_summary(
process_list_path,
expected_total=len(slide_records),
successful_slides=successful_slides,
Expand Down Expand Up @@ -2624,7 +2670,7 @@ def _num_rows(data) -> int:
return len(data)


def _emit_tiling_finished(
def _emit_tiling_summary(
process_list_path: Path,
*,
expected_total: int,
Expand All @@ -2642,7 +2688,7 @@ def _emit_tiling_finished(
discovered_tiles=discovered_tiles,
)
emit_progress(
"tiling.finished",
"tiling.summary",
total=int(snapshot.total),
completed=int(snapshot.completed),
failed=int(snapshot.failed),
Expand Down Expand Up @@ -2770,19 +2816,6 @@ def _tile_slides(
) -> list[Any]:
_preload_asap_wholeslidedata(preprocessing)
tiling_cfg, segmentation_cfg, filtering_cfg, preview_cfg, read_coordinates_from, resume = _build_hs2p_configs(preprocessing)
for slide in slides:
backend_selection = resolve_backend(
tiling_cfg.requested_backend,
wsi_path=slide.image_path,
mask_path=slide.mask_path,
)
if backend_selection.reason is not None:
emit_progress(
"backend.selected",
sample_id=slide.sample_id,
backend=backend_selection.backend,
reason=backend_selection.reason,
)

def _run_tile_slides():
return tile_slides(
Expand All @@ -2799,7 +2832,8 @@ def _run_tile_slides():
jpeg_backend=preprocessing.jpeg_backend,
)

return run_with_filtered_stdio(_run_tile_slides)
with _bridge_hs2p_progress_to_slide2vec():
return run_with_filtered_stderr(_run_tile_slides)


def _preload_asap_wholeslidedata(preprocessing: PreprocessingConfig) -> None:
Expand Down Expand Up @@ -2886,6 +2920,16 @@ def _resolve_path_str(value: Any) -> str | None:
process_df.to_csv(process_list_path, index=False)


def _build_preview_config(preview: dict[str, Any]) -> PreviewConfig:
return PreviewConfig(
save_mask_preview=bool(preview["save_mask_preview"]),
save_tiling_preview=bool(preview["save_tiling_preview"]),
downsample=int(preview["downsample"]),
tissue_contour_color=tuple(int(channel) for channel in preview["tissue_contour_color"]),
mask_overlay_alpha=float(preview["mask_overlay_alpha"]),
)


def _build_hs2p_configs(preprocessing: PreprocessingConfig):
requested_tile_size_px = (
preprocessing.requested_region_size_px
Expand All @@ -2902,7 +2946,7 @@ def _build_hs2p_configs(preprocessing: PreprocessingConfig):
)
segmentation_cfg = SegmentationConfig(**dict(preprocessing.segmentation))
filtering_cfg = FilterConfig(**dict(preprocessing.filtering))
preview_cfg = PreviewConfig(**dict(preprocessing.preview))
preview_cfg = _build_preview_config(dict(preprocessing.preview))
return (
tiling_cfg,
segmentation_cfg,
Expand Down
Loading
Loading