Skip to content

Commit

Permalink
Merge pull request #235 from Dana-Farber-AIOS/dev-cicd
Browse files Browse the repository at this point in the history
add pypi publish to github workflows
  • Loading branch information
ryanccarelli committed Nov 24, 2021
2 parents ef28a6d + e33d19d commit 6e42087
Show file tree
Hide file tree
Showing 13 changed files with 178 additions and 38 deletions.
39 changes: 39 additions & 0 deletions .github/workflows/publish-to-pypi.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Publish PathML distribution to PyPI and TestPyPI

on:
push:
tags: ['v[0-9].[0-9]+.[0-9]+']

jobs:
build-n-publish:
name: Build and publish PathML distribution to PyPI and TestPyPI
runs-on: ubuntu-18.04
if: github.ref == 'refs/heads/master'
steps:
- uses: actions/checkout@master
- name: Set up Python 3.9
uses: actions/setup-python@v1
with:
python-version: 3.9
- name: Install pypa/build
run: >-
python -m
pip install
build
-- user
- name: Build a binary wheel and a source tarball
run: >-
python -m
build
--sdist
--wheel
--outdir dist/
- name: Publish distribution to Test PyPI
uses: pypa/gh-action-pypi-publish@master
with:
password: ${{ secrets.TEST_PYPI_API_TOKEN }}
repository_url: https://test.pypi.org/legacy/
- name: Publish distribution to PyPI
uses: pypa/gh-action-pypi-publish@master
with:
password: ${{ secrets.PYPI_API_TOKEN }}
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,11 @@

<img src=https://raw.githubusercontent.com/Dana-Farber-AIOS/pathml/master/docs/source/_static/images/overview.png width="750">

![tests](https://github.com/Dana-Farber-AIOS/pathml/actions/workflows/tests-conda.yml/badge.svg?branch=master)
![tests](https://github.com/Dana-Farber-AIOS/pathml/actions/workflows/tests-conda.yml/badge.svg?branch=dev)
[![Documentation Status](https://readthedocs.org/projects/pathml/badge/?version=latest)](https://pathml.readthedocs.io/en/latest/?badge=latest)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
[![PyPI version](https://img.shields.io/pypi/v/pathml)](https://pypi.org/project/pathml/)
[![Downloads](https://pepy.tech/badge/pathml)](https://pepy.tech/project/pathml)
[![codecov](https://codecov.io/gh/Dana-Farber-AIOS/pathml/branch/master/graph/badge.svg?token=UHSQPTM28Y)](https://codecov.io/gh/Dana-Farber-AIOS/pathml)

A toolkit for computational pathology and machine learning.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,9 @@
"\n",
"## Consistent API for loading images from diverse modalities and file formats\n",
"\n",
"`PathML` provides support for loading a wide array of imaging modalities and file formats under a standardized syntax. In this vignette, we highlight code snippets for loading a range of image types ranging from brightfield H&E and IHC to highly multiplexed immunofluorescence and spatial expression and proteomics, from small images to gigapixel scale:\n",
"`PathML` provides support for loading a wide array of imaging modalities and file formats under a standardized syntax. \n",
"\n",
"In this vignette, we highlight code snippets for loading a range of image types ranging from brightfield H&E and IHC to highly multiplexed immunofluorescence and spatial expression and proteomics, from small images to gigapixel scale:\n",
"\n",
"| Imaging modality | File format | Source | Image dimensions (X, Y, Z, C, T)\n",
"| :- | :- | :- | :- \n",
Expand Down Expand Up @@ -379,7 +381,7 @@
"\n",
"Full documentation of the `PathML` API is available at https://pathml.org. \n",
"\n",
"Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/vignettes/"
"Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/manuscript_vignettes_stable"
]
}
],
Expand Down
12 changes: 9 additions & 3 deletions examples/manuscript_vignettes_stable/workflow_HE_vignette.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,13 @@
"\n",
"## Example workflow for H&E images\n",
"\n",
"Here we demonstrate a typical workflow for preprocessing of H&E images. The image used in this example is publicly avilalable for download: http://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/\n",
"Here we demonstrate a typical workflow for preprocessing of H&E images, consisting of the following steps:\n",
"\n",
"1. Loading the raw image\n",
"2. Defining a simple preprocessing pipeline for tissue detection\n",
"3. Creating a PyTorch DataLoader for interfacing with any downstream machine learning model\n",
"\n",
"The image used in this example is publicly avilalable for download: http://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/\n",
"\n",
"**a. Load the image**"
]
Expand Down Expand Up @@ -159,11 +165,11 @@
"\n",
"1. Loading the raw image\n",
"2. Define a simple preprocessing pipeline for tissue detection\n",
"3. Create a PyTorch DataLoader for with any downstream machine learning model\n",
"3. Create a PyTorch DataLoader for interfacing with any downstream machine learning model\n",
"\n",
"Full documentation of the `PathML` API is available at https://pathml.org. \n",
"\n",
"Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/vignettes/"
"Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/manuscript_vignettes_stable"
]
}
],
Expand Down
Binary file not shown.
21 changes: 15 additions & 6 deletions examples/manuscript_vignettes_stable/workflow_IF_vignette.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,17 @@
"\n",
"## Example workflow for immunofluorescence images\n",
"\n",
"Here we demonstrate a typical workflow for preprocessing of immunofluorescence images. The image used in this example is a tissue microarray (TMA) generated on the CODEX spatial proteomics imaging platform, from Schurch et al., *Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front* (Cell, 2020). The image used in this example is publicly avilalable for download from the Cancer Imaging Archive: https://doi.org/10.7937/tcia.2020.fqn0-0326"
"Here we demonstrate a typical workflow for preprocessing of immunofluorescence images, comsisting of the following steps:\n",
"\n",
"1. Loading a raw image in TIFF format\n",
"2. Defining a preprocessing pipeline for cell segmentation and marker quantification for each cell\n",
"3. Integrating with other commonly used tools such as `Scanpy` for working with the quantified cell-level data:\n",
" - dimensionality reduction\n",
" - clustering\n",
" - co-occurence analysis\n",
" - visualization \n",
"\n",
"The image used in this example is a tissue microarray (TMA) generated on the CODEX spatial proteomics imaging platform, from Schurch et al., *Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front* (Cell, 2020). The image used in this example is publicly avilalable for download from the Cancer Imaging Archive: https://doi.org/10.7937/tcia.2020.fqn0-0326"
]
},
{
Expand Down Expand Up @@ -415,25 +425,24 @@
{
"cell_type": "markdown",
"metadata": {
"jp-MarkdownHeadingCollapsed": true,
"tags": []
},
"source": [
"### Summary\n",
"\n",
"Here we demonstrate a complete `PathML` workflow for analyzing immunofluorescence images:\n",
"\n",
"1. Loading raw image in TIFF format\n",
"2. Define a preprocessing pipeline for cell segmentation and marker quantification for each cell\n",
"3. Integrate with other commonly used tools such as `Scanpy` for working with the quantified cell-level data:\n",
"1. Loading a raw image in TIFF format\n",
"2. Defining a preprocessing pipeline for cell segmentation and marker quantification for each cell\n",
"3. Integrating with other commonly used tools such as `Scanpy` for working with the quantified cell-level data:\n",
" - dimensionality reduction\n",
" - clustering\n",
" - co-occurence analysis\n",
" - visualization \n",
"\n",
"Full documentation of the `PathML` API is available at https://pathml.org. \n",
"\n",
"Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/vignettes/"
"Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/manuscript_vignettes_stable"
]
}
],
Expand Down
7 changes: 7 additions & 0 deletions pathml/core/slide_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -251,6 +251,7 @@ def run(
level=0,
tile_pad=False,
overwrite_existing_tiles=False,
write_dir=None,
):
"""
Run a preprocessing pipeline on SlideData.
Expand All @@ -269,6 +270,9 @@ def run(
Defaults to ``False``.
overwrite_existing_tiles (bool): Whether to overwrite existing tiles. If ``False``, running a pipeline will
fail if ``tiles is not None``. Defaults to ``False``.
write_dir (str): Path to directory to write the processed slide to. The processed SlideData object
will be written to the directory immediately after the pipeline has completed running.
The filepath will default to "<write_dir>/<slide.name>.h5path. Defaults to ``None``.
"""
assert isinstance(
pipeline, pathml.preprocessing.pipeline.Pipeline
Expand Down Expand Up @@ -320,6 +324,9 @@ def run(
pipeline.apply(tile)
self.tiles.add(tile)

if write_dir:
self.write(Path(write_dir) / f"{self.name}.h5path")

@property
def shape(self):
"""
Expand Down
16 changes: 12 additions & 4 deletions pathml/core/slide_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,11 @@
License: GNU GPL 2.0
"""

from torch.utils.data import ConcatDataset
from pathlib import Path
import reprlib
from pathlib import Path

import dask.distributed
from torch.utils.data import ConcatDataset


class SlideDataset:
Expand Down Expand Up @@ -36,17 +38,23 @@ def __repr__(self):
out += ")"
return out

def run(self, pipeline, **kwargs):
def run(self, pipeline, client=None, distributed=True, **kwargs):
"""
Runs a preprocessing pipeline on all slides in the dataset
Args:
pipeline (pathml.preprocessing.pipeline.Pipeline): Preprocessing pipeline.
client: dask.distributed client
distributed (bool): Whether to distribute model using client. Defaults to True.
kwargs (dict): keyword arguments passed to :meth:`~pathml.core.slide_data.SlideData.run` for each slide
"""
# run preprocessing
if client is None and distributed:
client = dask.distributed.Client()
for slide in self.slides:
slide.run(pipeline, **kwargs)
slide.run(
pipeline=pipeline, client=client, distributed=distributed, **kwargs
)

def reshape(self, shape, centercrop=False):
for slide in self.slides:
Expand Down
15 changes: 12 additions & 3 deletions pathml/ml/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ class TileDataset(torch.utils.data.Dataset):
Each item is a tuple of (``tile_image``, ``tile_masks``, ``tile_labels``, ``slide_labels``) where:
- ``tile_image`` is a torch.Tensor of shape (n_channels, tile_height, tile_width)
- ``tile_image`` is a torch.Tensor of shape (C, H, W) or (T, Z, C, H, W)
- ``tile_masks`` is a torch.Tensor of shape (n_masks, tile_height, tile_width)
- ``tile_labels`` is a dict
- ``slide_labels`` is a dict
Expand Down Expand Up @@ -83,8 +83,17 @@ def __getitem__(self, ix):

labels = {key: val for key, val in self.h5["tiles"][k]["labels"].attrs.items()}

# swap axes from HWC to CHW for pytorch
im = tile_image.transpose(2, 0, 1)
if tile_image.ndim == 3:
# swap axes from HWC to CHW for pytorch
im = tile_image.transpose(2, 0, 1)
elif tile_image.ndim == 5:
# in this case, we assume that we have XYZCT channel order (OME-TIFF)
# so we swap axes to TCZYX for batching
im = tile_image.transpose(4, 3, 2, 1, 0)
else:
raise NotImplementedError(
f"tile image has shape {tile_image.shape}. Expecting an image with 3 dims (HWC) or 5 dims (XYZCT)"
)

masks = np.stack(list(masks.values()), axis=0) if masks else None

Expand Down
31 changes: 18 additions & 13 deletions pathml/preprocessing/transforms.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,16 +13,13 @@
import pathml.core
import pathml.core.slide_data
import spams
from pathml.utils import (
RGB_to_GREY,
RGB_to_HSI,
RGB_to_HSV,
RGB_to_OD,
normalize_matrix_cols,
)
from pathml.utils import (RGB_to_GREY, RGB_to_HSI, RGB_to_HSV, RGB_to_OD,
normalize_matrix_cols)
from skimage import restoration
from skimage.exposure import (equalize_adapthist, equalize_hist,
rescale_intensity)
from skimage.measure import regionprops_table
from skimage.exposure import equalize_hist, equalize_adapthist, rescale_intensity


# Base class
class Transform:
Expand Down Expand Up @@ -151,15 +148,17 @@ class RescaleIntensity(Transform):
'2-tuple' : Use range_values as explicit min/max intensities.
"""

def __init__(self, in_range='image', out_range = 'dtype'):
def __init__(self, in_range="image", out_range="dtype"):
self.in_range = in_range
self.out_range = out_range

def __repr__(self):
return f"RescaleIntensity(in_range={self.in_range}, out_range={self.out_range})"

def F(self, image):
image = rescale_intensity(image, in_range=self.in_range, out_range=self.out_range)
image = rescale_intensity(
image, in_range=self.in_range, out_range=self.out_range
)
return image

def apply(self, tile):
Expand Down Expand Up @@ -187,7 +186,7 @@ def __repr__(self):
return f"HistogramEqualization(nbins={self.nbins}, mask = {self.mask})"

def F(self, image):
image = equalize_hist(image, nbins=self.nbins, mask = self.mask)
image = equalize_hist(image, nbins=self.nbins, mask=self.mask)
return image

def apply(self, tile):
Expand All @@ -209,7 +208,7 @@ class AdaptiveHistogramEqualization(Transform):
nbins (int): Number of gray bins for histogram (“data range”).
"""

def __init__(self, kernel_size=None, clip_limit = 0.3, nbins=256):
def __init__(self, kernel_size=None, clip_limit=0.3, nbins=256):
self.kernel_size = kernel_size
self.clip_limit = clip_limit
self.nbins = nbins
Expand All @@ -218,7 +217,12 @@ def __repr__(self):
return f"AdaptiveHistogramEqualization(kernel_size={self.kernel_size}, clip_limit={self.clip_limit}, nbins={self.nbins})"

def F(self, image):
image = equalize_adapthist(image, kernel_size=self.kernel_size, clip_limit=self.clip_limit, nbins=self.nbins)
image = equalize_adapthist(
image,
kernel_size=self.kernel_size,
clip_limit=self.clip_limit,
nbins=self.nbins,
)
return image

def apply(self, tile):
Expand Down Expand Up @@ -1359,6 +1363,7 @@ def F(self, image):
nuclear_segmentation_predictions = np.squeeze(
nuclear_segmentation_predictions, axis=0
)
del model
return cell_segmentation_predictions, nuclear_segmentation_predictions
else:
raise NotImplementedError(f"model={self.model} currently not supported.")
Expand Down
20 changes: 20 additions & 0 deletions tests/core_tests/test_slide_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -170,3 +170,23 @@ def compare_dict_ignore_order(d1, d2):
if d1[k1] != d2[k2]:
return False
return True


@pytest.mark.parametrize("write", [True, False])
def test_run_and_write(tmpdir, write):
wsi = HESlide("tests/testdata/small_HE.svs", backend="openslide", name="testwrite")
pipe = Pipeline()

if write:
write_dir_arg = tmpdir
else:
write_dir_arg = None

wsi.run(pipe, tile_size=500, distributed=False, write_dir=write_dir_arg)

written_path = tmpdir / "testwrite.h5path"

if write:
assert written_path.isfile()
else:
assert not written_path.isfile()
21 changes: 20 additions & 1 deletion tests/core_tests/test_slide_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
License: GNU GPL 2.0
"""

from dask.distributed import Client
import pytest
from pathlib import Path

from pathml.core import SlideData, Tile
Expand Down Expand Up @@ -38,3 +38,22 @@ def test_run_pipeline_and_tile_dataset_and_reshape(slide_dataset):
tile_after_reshape = slide_dataset[0].tiles[0]
assert isinstance(tile_after_reshape, Tile)
assert tile_after_reshape.image.shape == (25, 25, 3)


@pytest.mark.parametrize("write", [True, False])
def test_run_and_write_dataset(tmpdir, write, slide_dataset):
pipe = Pipeline()

if write:
write_dir_arg = tmpdir
else:
write_dir_arg = None

slide_dataset.run(pipe, tile_size=500, distributed=False, write_dir=write_dir_arg)

for s in slide_dataset:
written_path = tmpdir / f"{s.name}.h5path"
if write:
assert written_path.isfile()
else:
assert not written_path.isfile()

0 comments on commit 6e42087

Please sign in to comment.