Merge pull request #235 from Dana-Farber-AIOS/dev-cicd

add pypi publish to github workflows
Dana-Farber-AIOS · Nov 24, 2021 · 6e42087 · 6e42087
2 parents ef28a6d + e33d19d
commit 6e42087
Show file tree

Hide file tree

Showing 13 changed files with 178 additions and 38 deletions.
diff --git a/.github/workflows/publish-to-pypi.yml b/.github/workflows/publish-to-pypi.yml
@@ -0,0 +1,39 @@
+name: Publish PathML distribution to PyPI and TestPyPI
+
+on:
+  push:
+    tags: ['v[0-9].[0-9]+.[0-9]+']
+
+jobs:
+  build-n-publish:
+    name: Build and publish PathML distribution to PyPI and TestPyPI
+    runs-on: ubuntu-18.04
+    if: github.ref == 'refs/heads/master'
+    steps:
+    - uses: actions/checkout@master
+    - name: Set up Python 3.9
+      uses: actions/setup-python@v1
+      with:
+        python-version: 3.9
+    - name: Install pypa/build
+      run: >-
+        python -m
+        pip install
+        build
+        -- user
+    - name: Build a binary wheel and a source tarball
+      run: >-
+        python -m
+        build
+        --sdist
+        --wheel
+        --outdir dist/
+    - name: Publish distribution to Test PyPI
+      uses: pypa/gh-action-pypi-publish@master
+      with:
+        password: ${{ secrets.TEST_PYPI_API_TOKEN }}
+        repository_url: https://test.pypi.org/legacy/
+    - name: Publish distribution to PyPI
+      uses: pypa/gh-action-pypi-publish@master
+      with:
+        password: ${{ secrets.PYPI_API_TOKEN }}
diff --git a/README.md b/README.md
@@ -2,10 +2,11 @@
 
 <img src=https://raw.githubusercontent.com/Dana-Farber-AIOS/pathml/master/docs/source/_static/images/overview.png width="750">
 
-![tests](https://github.com/Dana-Farber-AIOS/pathml/actions/workflows/tests-conda.yml/badge.svg?branch=master)
+![tests](https://github.com/Dana-Farber-AIOS/pathml/actions/workflows/tests-conda.yml/badge.svg?branch=dev)
 [![Documentation Status](https://readthedocs.org/projects/pathml/badge/?version=latest)](https://pathml.readthedocs.io/en/latest/?badge=latest)
 [![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
 [![PyPI version](https://img.shields.io/pypi/v/pathml)](https://pypi.org/project/pathml/)
+[![Downloads](https://pepy.tech/badge/pathml)](https://pepy.tech/project/pathml)
 [![codecov](https://codecov.io/gh/Dana-Farber-AIOS/pathml/branch/master/graph/badge.svg?token=UHSQPTM28Y)](https://codecov.io/gh/Dana-Farber-AIOS/pathml)
 
 A toolkit for computational pathology and machine learning.

diff --git a/examples/manuscript_vignettes_stable/loading_images_vignette.ipynb b/examples/manuscript_vignettes_stable/loading_images_vignette.ipynb
@@ -11,7 +11,9 @@
     "\n",
     "## Consistent API for loading images from diverse modalities and file formats\n",
     "\n",
-    "`PathML` provides support for loading a wide array of imaging modalities and file formats under a standardized syntax. In this vignette, we highlight code snippets for loading a range of image types ranging from brightfield H&E and IHC to highly multiplexed immunofluorescence and spatial expression and proteomics, from small images to gigapixel scale:\n",
+    "`PathML` provides support for loading a wide array of imaging modalities and file formats under a standardized syntax. \n",
+    "\n",
+    "In this vignette, we highlight code snippets for loading a range of image types ranging from brightfield H&E and IHC to highly multiplexed immunofluorescence and spatial expression and proteomics, from small images to gigapixel scale:\n",
     "\n",
     "| Imaging modality | File format | Source | Image dimensions (X, Y, Z, C, T)\n",
     "| :- | :- | :- | :- \n",
@@ -379,7 +381,7 @@
     "\n",
     "Full documentation of the `PathML` API is available at https://pathml.org.  \n",
     "\n",
-    "Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/vignettes/"
+    "Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/manuscript_vignettes_stable"
    ]
   }
  ],

diff --git a/examples/manuscript_vignettes_stable/workflow_HE_vignette.ipynb b/examples/manuscript_vignettes_stable/workflow_HE_vignette.ipynb
@@ -9,7 +9,13 @@
     "\n",
     "## Example workflow for H&E images\n",
     "\n",
-    "Here we demonstrate a typical workflow for preprocessing of H&E images. The image used in this example is publicly avilalable for download: http://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/\n",
+    "Here we demonstrate a typical workflow for preprocessing of H&E images, consisting of the following steps:\n",
+    "\n",
+    "1. Loading the raw image\n",
+    "2. Defining a simple preprocessing pipeline for tissue detection\n",
+    "3. Creating a PyTorch DataLoader for interfacing with any downstream machine learning model\n",
+    "\n",
+    "The image used in this example is publicly avilalable for download: http://openslide.cs.cmu.edu/download/openslide-testdata/Aperio/\n",
     "\n",
     "**a. Load the image**"
    ]
@@ -159,11 +165,11 @@
     "\n",
     "1. Loading the raw image\n",
     "2. Define a simple preprocessing pipeline for tissue detection\n",
-    "3. Create a PyTorch DataLoader for with any downstream machine learning model\n",
+    "3. Create a PyTorch DataLoader for interfacing with any downstream machine learning model\n",
     "\n",
     "Full documentation of the `PathML` API is available at https://pathml.org.  \n",
     "\n",
-    "Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/vignettes/"
+    "Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/manuscript_vignettes_stable"
    ]
   }
  ],

diff --git a/examples/manuscript_vignettes_stable/workflow_HE_vignette.pdf b/examples/manuscript_vignettes_stable/workflow_HE_vignette.pdf
diff --git a/examples/manuscript_vignettes_stable/workflow_IF_vignette.ipynb b/examples/manuscript_vignettes_stable/workflow_IF_vignette.ipynb
@@ -10,7 +10,17 @@
     "\n",
     "## Example workflow for immunofluorescence images\n",
     "\n",
-    "Here we demonstrate a typical workflow for preprocessing of immunofluorescence images. The image used in this example is a tissue microarray (TMA) generated on the CODEX spatial proteomics imaging platform, from Schurch et al., *Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front* (Cell, 2020). The image used in this example is publicly avilalable for download from the Cancer Imaging Archive: https://doi.org/10.7937/tcia.2020.fqn0-0326"
+    "Here we demonstrate a typical workflow for preprocessing of immunofluorescence images, comsisting of the following steps:\n",
+    "\n",
+    "1. Loading a raw image in TIFF format\n",
+    "2. Defining a preprocessing pipeline for cell segmentation and marker quantification for each cell\n",
+    "3. Integrating with other commonly used tools such as `Scanpy` for working with the quantified cell-level data:\n",
+    "    - dimensionality reduction\n",
+    "    - clustering\n",
+    "    - co-occurence analysis\n",
+    "    - visualization \n",
+    "\n",
+    "The image used in this example is a tissue microarray (TMA) generated on the CODEX spatial proteomics imaging platform, from Schurch et al., *Coordinated Cellular Neighborhoods Orchestrate Antitumoral Immunity at the Colorectal Cancer Invasive Front* (Cell, 2020). The image used in this example is publicly avilalable for download from the Cancer Imaging Archive: https://doi.org/10.7937/tcia.2020.fqn0-0326"
    ]
   },
   {
@@ -415,25 +425,24 @@
   {
    "cell_type": "markdown",
    "metadata": {
-    "jp-MarkdownHeadingCollapsed": true,
     "tags": []
    },
    "source": [
     "### Summary\n",
     "\n",
     "Here we demonstrate a complete `PathML` workflow for analyzing immunofluorescence images:\n",
     "\n",
-    "1. Loading raw image in TIFF format\n",
-    "2. Define a preprocessing pipeline for cell segmentation and marker quantification for each cell\n",
-    "3. Integrate with other commonly used tools such as `Scanpy` for working with the quantified cell-level data:\n",
+    "1. Loading a raw image in TIFF format\n",
+    "2. Defining a preprocessing pipeline for cell segmentation and marker quantification for each cell\n",
+    "3. Integrating with other commonly used tools such as `Scanpy` for working with the quantified cell-level data:\n",
     "    - dimensionality reduction\n",
     "    - clustering\n",
     "    - co-occurence analysis\n",
     "    - visualization \n",
     "\n",
     "Full documentation of the `PathML` API is available at https://pathml.org.  \n",
     "\n",
-    "Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/vignettes/"
+    "Full code for this vignette is available at https://github.com/Dana-Farber-AIOS/pathml/tree/master/examples/manuscript_vignettes_stable"
    ]
   }
  ],

diff --git a/pathml/core/slide_data.py b/pathml/core/slide_data.py
@@ -251,6 +251,7 @@ def run(
         level=0,
         tile_pad=False,
         overwrite_existing_tiles=False,
+        write_dir=None,
     ):
         """
         Run a preprocessing pipeline on SlideData.
@@ -269,6 +270,9 @@ def run(
                 Defaults to ``False``.
             overwrite_existing_tiles (bool): Whether to overwrite existing tiles. If ``False``, running a pipeline will
                 fail if ``tiles is not None``. Defaults to ``False``.
+            write_dir (str): Path to directory to write the processed slide to. The processed SlideData object
+                will be written to the directory immediately after the pipeline has completed running.
+                The filepath will default to "<write_dir>/<slide.name>.h5path. Defaults to ``None``.
         """
         assert isinstance(
             pipeline, pathml.preprocessing.pipeline.Pipeline
@@ -320,6 +324,9 @@ def run(
                 pipeline.apply(tile)
                 self.tiles.add(tile)
 
+        if write_dir:
+            self.write(Path(write_dir) / f"{self.name}.h5path")
+
     @property
     def shape(self):
         """

diff --git a/pathml/core/slide_dataset.py b/pathml/core/slide_dataset.py
@@ -3,9 +3,11 @@
 License: GNU GPL 2.0
 """
 
-from torch.utils.data import ConcatDataset
-from pathlib import Path
 import reprlib
+from pathlib import Path
+
+import dask.distributed
+from torch.utils.data import ConcatDataset
 
 
 class SlideDataset:
@@ -36,17 +38,23 @@ def __repr__(self):
         out += ")"
         return out
 
-    def run(self, pipeline, **kwargs):
+    def run(self, pipeline, client=None, distributed=True, **kwargs):
         """
         Runs a preprocessing pipeline on all slides in the dataset
 
         Args:
             pipeline (pathml.preprocessing.pipeline.Pipeline): Preprocessing pipeline.
+            client: dask.distributed client
+            distributed (bool): Whether to distribute model using client. Defaults to True.
             kwargs (dict): keyword arguments passed to :meth:`~pathml.core.slide_data.SlideData.run` for each slide
         """
         # run preprocessing
+        if client is None and distributed:
+            client = dask.distributed.Client()
         for slide in self.slides:
-            slide.run(pipeline, **kwargs)
+            slide.run(
+                pipeline=pipeline, client=client, distributed=distributed, **kwargs
+            )
 
     def reshape(self, shape, centercrop=False):
         for slide in self.slides:

diff --git a/pathml/ml/dataset.py b/pathml/ml/dataset.py
@@ -14,7 +14,7 @@ class TileDataset(torch.utils.data.Dataset):
 
     Each item is a tuple of (``tile_image``, ``tile_masks``, ``tile_labels``, ``slide_labels``) where:
 
-        - ``tile_image`` is a torch.Tensor of shape (n_channels, tile_height, tile_width)
+        - ``tile_image`` is a torch.Tensor of shape (C, H, W) or (T, Z, C, H, W)
         - ``tile_masks`` is a torch.Tensor of shape (n_masks, tile_height, tile_width)
         - ``tile_labels`` is a dict
         - ``slide_labels`` is a dict
@@ -83,8 +83,17 @@ def __getitem__(self, ix):
 
         labels = {key: val for key, val in self.h5["tiles"][k]["labels"].attrs.items()}
 
-        # swap axes from HWC to CHW for pytorch
-        im = tile_image.transpose(2, 0, 1)
+        if tile_image.ndim == 3:
+            # swap axes from HWC to CHW for pytorch
+            im = tile_image.transpose(2, 0, 1)
+        elif tile_image.ndim == 5:
+            # in this case, we assume that we have XYZCT channel order (OME-TIFF)
+            # so we swap axes to TCZYX for batching
+            im = tile_image.transpose(4, 3, 2, 1, 0)
+        else:
+            raise NotImplementedError(
+                f"tile image has shape {tile_image.shape}. Expecting an image with 3 dims (HWC) or 5 dims (XYZCT)"
+            )
 
         masks = np.stack(list(masks.values()), axis=0) if masks else None
 

diff --git a/pathml/preprocessing/transforms.py b/pathml/preprocessing/transforms.py
@@ -13,16 +13,13 @@
 import pathml.core
 import pathml.core.slide_data
 import spams
-from pathml.utils import (
-    RGB_to_GREY,
-    RGB_to_HSI,
-    RGB_to_HSV,
-    RGB_to_OD,
-    normalize_matrix_cols,
-)
+from pathml.utils import (RGB_to_GREY, RGB_to_HSI, RGB_to_HSV, RGB_to_OD,
+                          normalize_matrix_cols)
 from skimage import restoration
+from skimage.exposure import (equalize_adapthist, equalize_hist,
+                              rescale_intensity)
 from skimage.measure import regionprops_table
-from skimage.exposure import equalize_hist, equalize_adapthist, rescale_intensity
+
 
 # Base class
 class Transform:
@@ -151,15 +148,17 @@ class RescaleIntensity(Transform):
             '2-tuple' : Use range_values as explicit min/max intensities.
     """
 
-    def __init__(self, in_range='image', out_range = 'dtype'):
+    def __init__(self, in_range="image", out_range="dtype"):
         self.in_range = in_range
         self.out_range = out_range
 
     def __repr__(self):
         return f"RescaleIntensity(in_range={self.in_range}, out_range={self.out_range})"
 
     def F(self, image):
-        image = rescale_intensity(image, in_range=self.in_range, out_range=self.out_range)
+        image = rescale_intensity(
+            image, in_range=self.in_range, out_range=self.out_range
+        )
         return image
 
     def apply(self, tile):
@@ -187,7 +186,7 @@ def __repr__(self):
         return f"HistogramEqualization(nbins={self.nbins}, mask = {self.mask})"
 
     def F(self, image):
-        image = equalize_hist(image, nbins=self.nbins, mask = self.mask)
+        image = equalize_hist(image, nbins=self.nbins, mask=self.mask)
         return image
 
     def apply(self, tile):
@@ -209,7 +208,7 @@ class AdaptiveHistogramEqualization(Transform):
         nbins (int): Number of gray bins for histogram (“data range”).
     """
 
-    def __init__(self, kernel_size=None, clip_limit = 0.3, nbins=256):
+    def __init__(self, kernel_size=None, clip_limit=0.3, nbins=256):
         self.kernel_size = kernel_size
         self.clip_limit = clip_limit
         self.nbins = nbins
@@ -218,7 +217,12 @@ def __repr__(self):
         return f"AdaptiveHistogramEqualization(kernel_size={self.kernel_size}, clip_limit={self.clip_limit}, nbins={self.nbins})"
 
     def F(self, image):
-        image = equalize_adapthist(image, kernel_size=self.kernel_size, clip_limit=self.clip_limit, nbins=self.nbins)
+        image = equalize_adapthist(
+            image,
+            kernel_size=self.kernel_size,
+            clip_limit=self.clip_limit,
+            nbins=self.nbins,
+        )
         return image
 
     def apply(self, tile):
@@ -1359,6 +1363,7 @@ def F(self, image):
             nuclear_segmentation_predictions = np.squeeze(
                 nuclear_segmentation_predictions, axis=0
             )
+            del model
             return cell_segmentation_predictions, nuclear_segmentation_predictions
         else:
             raise NotImplementedError(f"model={self.model} currently not supported.")

diff --git a/tests/core_tests/test_slide_data.py b/tests/core_tests/test_slide_data.py
@@ -170,3 +170,23 @@ def compare_dict_ignore_order(d1, d2):
         if d1[k1] != d2[k2]:
             return False
     return True
+
+
+@pytest.mark.parametrize("write", [True, False])
+def test_run_and_write(tmpdir, write):
+    wsi = HESlide("tests/testdata/small_HE.svs", backend="openslide", name="testwrite")
+    pipe = Pipeline()
+
+    if write:
+        write_dir_arg = tmpdir
+    else:
+        write_dir_arg = None
+
+    wsi.run(pipe, tile_size=500, distributed=False, write_dir=write_dir_arg)
+
+    written_path = tmpdir / "testwrite.h5path"
+
+    if write:
+        assert written_path.isfile()
+    else:
+        assert not written_path.isfile()
diff --git a/tests/core_tests/test_slide_dataset.py b/tests/core_tests/test_slide_dataset.py
@@ -3,7 +3,7 @@
 License: GNU GPL 2.0
 """
 
-from dask.distributed import Client
+import pytest
 from pathlib import Path
 
 from pathml.core import SlideData, Tile
@@ -38,3 +38,22 @@ def test_run_pipeline_and_tile_dataset_and_reshape(slide_dataset):
     tile_after_reshape = slide_dataset[0].tiles[0]
     assert isinstance(tile_after_reshape, Tile)
     assert tile_after_reshape.image.shape == (25, 25, 3)
+
+
+@pytest.mark.parametrize("write", [True, False])
+def test_run_and_write_dataset(tmpdir, write, slide_dataset):
+    pipe = Pipeline()
+
+    if write:
+        write_dir_arg = tmpdir
+    else:
+        write_dir_arg = None
+
+    slide_dataset.run(pipe, tile_size=500, distributed=False, write_dir=write_dir_arg)
+
+    for s in slide_dataset:
+        written_path = tmpdir / f"{s.name}.h5path"
+        if write:
+            assert written_path.isfile()
+        else:
+            assert not written_path.isfile()