analysis/02-LoadPancreasImages.Rmd

---
title: "Load example images"
author: "Nicolas Damond"
date: "`r Sys.Date()`"
output: workflowr::wflow_html
editor_options:
  chunk_output_type: console
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

This script downloads 100 example images and masks form the pancreas IMC dataset available [here](http://dx.doi.org/10.17632/cydmwsfztj.2).
The dataset is associated to the following publication:

[Damond et al. A Map of Human Type 1 Diabetes Progression by Imaging Mass Cytometry. Cell Metabolism. 2019 Mar 5;29(3):755-768](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821395)

The images and masks have been created using the [imctools](https://github.com/BodenmillerGroup/imctools) package and the [IMC segmentation pipeline](https://github.com/BodenmillerGroup/ImcSegmentationPipeline). 
We will use the [cytomapper](https://www.bioconductor.org/packages/release/bioc/html/cytomapper.html) package to read in the images and create `CytoImageList` objects.

# Download and read-in images

Here, a subset of 100 images from the pancreas IMC dataset is downloaded.
We use the `loadImages` function of the `cytomapper` package to read them into a `CytoImageList` object.

```{r load-images, message=FALSE}
library(cytomapper)

# Download the zipped folder image and unzip it
url.images <- ("https://data.mendeley.com/public-files/datasets/cydmwsfztj/files/b37054d2-d5d0-4c48-a001-81ff77136f41/file_downloaded")
download.file(url.images, destfile = "data/PancreasData/ImageSubset.zip")
unzip("data/PancreasData/ImageSubset.zip", exdir = "data/PancreasData/")
file.remove("data/PancreasData/ImageSubset.zip")

# Load the images as a CytoImageList object
images <- loadImages("data/PancreasData/", pattern="_full_clean.tiff")
images
```

We also download the associated segemntation masks and read them into a `CytoImageList` object.

```{r load-masks}
# Download the zipped folder masks and unzip it
url.masks <- ("https://data.mendeley.com/public-files/datasets/cydmwsfztj/files/13679a61-e9b4-4820-9f09-a5bbc697647c/file_downloaded")
download.file(url.masks, destfile = "data/PancreasData/Masks.zip")
unzip("data/PancreasData/Masks.zip", exdir = "data/PancreasData/")
file.remove("data/PancreasData/Masks.zip")

# Load the images as a CytoImageList object
masks <- loadImages("data/PancreasData/", pattern="_full_mask.tiff")
masks
```

Here, we remove the downloaded images again.

```{r clean-up-2, message = FALSE}
# Remove image stacks
images.del <- list.files("data/PancreasData/", pattern="_full_clean.tiff", full.names = TRUE)
file.remove(images.del)

# Remove masks
masks.del <- list.files("data/PancreasData/", pattern="_full_mask.tiff", full.names = TRUE)
file.remove(masks.del)
```

# Load panel data

Here, we will download the panel information, which contains antibody-related metadata.
However, for some datasets, the channel-order and the panel order do not match.
For this, the channel-mass file is used to match panel information and image stack slices.
This will be important later to set the `channelNames` of the `CytoImageList` objects.

```{r load-panel}
# Import panel
url.panel <- ("https://data.mendeley.com/public-files/datasets/cydmwsfztj/files/2f9fecfc-b98f-4937-bc38-ae1b959bd74d/file_downloaded")
download.file(url.panel, destfile = "data/PancreasData/panel.csv")
panel <- read.csv("data/PancreasData/panel.csv")

# Import channel-mass file
url.channelmass <- ("https://data.mendeley.com/public-files/datasets/cydmwsfztj/files/704312eb-377c-42e2-8227-44bb9aca0fb3/file_downloaded")
download.file(url.channelmass, destfile = "data/PancreasData/ChannelMass.csv")
channel.mass <- read.csv("data/PancreasData/ChannelMass.csv", header = FALSE)
```

# Process images and masks

We will now have to process the images to make them compatible with `cytomapper`.
The masks are 16-bit images and need to be re-scaled in order to obtain integer cell IDs.

```{r scale-masks}
# Before scaling
masks[[1]]

masks <- scaleImages(masks, value = (2 ^ 16) - 1)

# After scaling
masks[[1]]
```

Next, we will add the `ImageName` to the images and masks objects.
This information is stored in the metadata columns of the `CytoImageList` objects
and is used by `cytomapper` to match single cell data, images and mask

```{r add-image-names}
mcols(images)$ImageName <- gsub("_a0_full_clean", "", names(images))
mcols(masks)$ImageName <- gsub("_a0_full_mask", "", names(masks))
```

We downloaded the full set of segmentation masks.
To match the segmentation masks to the corresponding images, we will subset them.
As a safty check, we will make sure that the `ImageName`s of the masks are identical to those of the images.

```{r subset-masks}
masks <- masks[mcols(masks)$ImageName %in% mcols(images)$ImageName]
identical(mcols(masks)$ImageName, mcols(images)$ImageName)
```

Finally, we will use the protein short name as `channelNames`.
Again, we need to make sure that the names match the correct order of the channels.

```{r add-channel-names}
# Match panel and stack slice information
panel <- panel[panel$full == 1,]
panel <- panel[match(channel.mass[,1], panel$MetalTag),]

# Add channel names to the  image stacks CytoImageList object
channelNames(images) <- panel$shortname
```

# Save the CytoImageList objects

Here, we will save the generated `CytoImageList` objects for convenient access later on.

```{r save}
saveRDS(images, "data/PancreasData/pancreas_images.rds")
saveRDS(masks, "data/PancreasData/pancreas_masks.rds")
```

# Clean up

We will delete all unecessary files.

```{r clean-up, message = FALSE}
file.remove("data/PancreasData/panel.csv", "data/PancreasData/ChannelMass.csv")
```