Merge pull request #85 from BodenmillerGroup/devel

Devel
BodenmillerGroup · Nov 28, 2023 · a869169 · a869169
2 parents 36ab967 + bb9b77a
commit a869169
Show file tree

Hide file tree

Showing 6 changed files with 256 additions and 22 deletions.
diff --git a/.gitignore b/.gitignore
@@ -18,3 +18,4 @@ outputs/*
 !publication/README.md
 !publication/protocol.md
 !CHANGELOG.md
+!DEVELOPMENT.md
diff --git a/11-spatial_analysis.Rmd b/11-spatial_analysis.Rmd
@@ -334,20 +334,83 @@ plotSpatial(spe,
     scale_color_brewer(palette = "Set3")
 ```
 
-The next code chunk visualizes the cell type compositions of the
-detected cellular neighborhoods (CN).
+There are now different visualizations to examine the cell type composition 
+of the detected cellular neighborhoods (CN). First we can look at the total 
+number of cells per cell type and CN.
 
 ```{r}
-for_plot <- prop.table(table(spe$cn_celltypes, spe$celltype), 
-                       margin = 1)
+for_plot <- table(as.character(spe$cn_celltypes), spe$celltype)
 
+pheatmap(for_plot, 
+         color = viridis(100), display_numbers = TRUE, 
+         number_color = "white", number_format = "%.0f")
+```
+
+Next, we can observe per cell type the fraction of CN that they are distributed
+across.
+
+```{r}
+for_plot <- prop.table(table(as.character(spe$cn_celltypes), spe$celltype), margin = 2)
+
+pheatmap(for_plot, 
+         color = viridis(100), display_numbers = TRUE, 
+         number_color = "white", number_format = "%.2f")
+```
+
+Similarly, we can visualize the fraction of each CN made up of each cell type.
+
+```{r}
+for_plot <- prop.table(table(as.character(spe$cn_celltypes), spe$celltype), margin = 1)
+
+pheatmap(for_plot, 
+         color = viridis(100), display_numbers = TRUE, 
+         number_color = "white", number_format = "%.2f")
+```
+
+This visualization can also be scaled by column to account for the relative 
+cell type abundance.
+
+```{r}
 pheatmap(for_plot, 
          color = colorRampPalette(c("dark blue", "white", "dark red"))(100), 
          scale = "column")
 ```
 
-CN 1 and CN 6 are mainly composed of tumor cells with CN 6 forming the
-tumor/stroma border. CN 3 is mainly composed of B and BnT cells
+Lastly, we can visualize the enrichment of cell types within cellular neighborhoods
+using the `regionMap` function of the `lisaClust` package.
+
+```{r}
+library(lisaClust)
+regionMap(spe, 
+          cellType = "celltype",
+          region = "cn_celltypes")
+```
+
+It is also recommended to visualize some images to confirm the interpretation of
+cellular neighborhoods. For this we can either use the `lisClust::hatchingPlot` or
+the `imcRtools::plotSpatial` functions:
+
+```{r}
+# hatchingPlot
+cur_spe <- spe[,spe$sample_id == "Patient1_003"]
+cur_sce <- as(cur_spe, "SingleCellExperiment")
+cur_sce$x <- spatialCoords(cur_spe)[,1]
+cur_sce$y <- spatialCoords(cur_spe)[,2]
+cur_sce$region <- as.character(cur_sce$cn_celltypes)
+
+hatchingPlot(cur_sce, region = "region", cellType = "celltype") +
+    scale_color_manual(values = metadata(spe)$color_vectors$celltype)
+```
+
+```{r, fig.height=8, fig.width=10}
+# plotSpatial
+plotSpatial(spe[,spe$sample_id == "Patient1_003"],
+            img_id = "cn_celltypes", node_color_by = "celltype", node_size_fix = 0.7) +
+    scale_color_manual(values = metadata(spe)$color_vectors$celltype)
+```
+
+CN 1 and CN 6 are mainly enriched for tumor cells with CN 6 forming the
+tumor/stroma border. CN 3 is mainly enriched for B and BnT cells
 indicating TLS. CN 5 is composed of aggregated plasma cells and most T
 cells.
 
@@ -408,15 +471,12 @@ derive numeric vectors for each cell which can then again be clustered
 using kmeans. All steps are supported by the `lisaClust` function which
 can be applied to a `SingleCellExperiment` and `SpatialExperiment` object.
 
-
 In the following example, we calculate the LISA curves within a 10µm, 20µm and
 50µm neighborhood around each cell. Increasing these radii will lead to broader
 and smoother spatial clusters. However, a number of parameter settings should be
 tested to estimate the robustness of the results.
 
 ```{r lisaClust, fig.height=12, fig.width=12, message=FALSE}
-library(lisaClust)
-
 set.seed(220705)
 spe <- lisaClust(spe, 
                  k = 6,
@@ -448,15 +508,6 @@ In this case, CN 1 and 4 contain tumor cells but no CN is forming the
 tumor/stroma interface. CN 3 represents TLS. CN 2 indicates T cell
 subtypes and plasma cells are aggregated to CN 5.
 
-As an alternative way of visualizing the enrichment of cell types within the 
-detected CNs, the `lisaClust` package provides the `regionMap` function.
-
-```{r}
-regionMap(spe, 
-          cellType = "celltype",
-          region = "region")
-```
-
 ## Spatial context analysis
 
 Downstream of CN assignments, we will analyze the spatial context (SC)

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,4 +4,9 @@
 
 **Version 1.0.1** [2023-10-19]
 
-- Added seed before `predict` call after training a classifier
+- Added seed before `predict` call after training a classifier
+
+**Version 1.0.2** [2023-11-27]
+
+- Added developers documentation
+- Added more ways to visualize cell type composition per CN
diff --git a/DEVELOPMENT.md b/DEVELOPMENT.md
@@ -0,0 +1,134 @@
+# Useful information when developing this book
+
+This document is to guide future developers to maintain and extend the IMC
+data analysis book. 
+
+## General setup
+
+* The IMC data analysis book is written in [bookdown](https://bookdown.org/). 
+* Each section is stored in its own `.Rmd` file with `index.Rmd` building the landing page
+* References are stored in `book.bib`
+* At the end of each `.Rmd` file a number of unit tests are executed. These 
+unit tests are always executed but their results are not shown in the book.
+
+### Continous integration/continous deployment
+
+* CI/CD is executed based on the workflow [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/blob/main/.github/workflows/build.yml).
+* On the first of each month based on the [Dockerfile](https://github.com/BodenmillerGroup/IMCDataAnalysis/blob/main/Dockerfile) a new Docker image is build. We are doing this so that the workflow is always tested against the newest software versions.
+* The Docker image is pushed to the Github Container Registry [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/pkgs/container/imcdataanalysis).
+* The Docker image is date tagged and `latest` always refers to the newest build.
+* Once the Docker image is build, the IMC data analysis book is executed within the
+newest Docker image. This will also run all unit tests.
+
+**Of note:** Sometimes the calculation of the UMAP produces slightly different
+results. If that happens the workflow run can be re-executed by clicking the `Re-run jobs` button of the workflow run.
+This test could also be excluded on the long run.
+
+* When pushing to `main` (either directly or via a PR), the CI/CD workflow is
+executed. 
+* If the Dockerfile changed (e.g., if you want to add a new package), a new Docker image is build and the workflow is executed within the new Docker image.
+* If the Dockerfile did not change, the workflow is executed within the most recent Docker image.
+
+## Updating the book
+
+This section describes how to update the book. You want to do this to add new content
+but also to fix bugs or adjust unit tests.
+
+### Work on the devel branch
+
+It is recommended to work on the `devel` branch of the Github repository to add
+new changes. 
+
+### Work within the newest Docker container
+
+It is also recommended to always work within a Docker container based on the newest
+Docker image available:
+
+1. After installing [Docker](https://docs.docker.com/get-docker/) you can first pull the container via:
+
+```
+docker pull ghcr.io/bodenmillergroup/imcdataanalysis:yyyy-mm-dd
+```
+
+and then run the container:
+
+```
+docker run -v /path/to/IMCDataAnalysis:/home/rstudio/IMCDataAnalysis \
+	-e PASSWORD=bioc -p 8787:8787  \
+	ghcr.io/bodenmillergroup/imcdataanalysis:yyyy-mm-dd
+```
+
+2. An RStudio server session can be accessed via a browser at `localhost:8787` using `Username: rstudio` and `Password: bioc`.  
+3. Navigate to `IMCDataAnalysis` and open the `IMCDataAnalysis.Rproj` file.  
+4. Code in the individual files can now be executed or the whole workflow can be build by entering `bookdown::render_book()`.
+
+### Adding new packages
+
+If you need to add new packages to the workflow, make sure to add them to the
+[software requirements](https://bodenmillergroup.github.io/IMCDataAnalysis/prerequisites.html#software-requirements)
+section and to the Dockerfile.
+
+### Opening a pull request
+
+Now you can change the content of the book. 
+Once you have added all changes, push the changes to `devel` and open a pull request
+to `main`. Wait until all checks have passed and you can merge the PR.
+
+### Add changes to CHANGELOG.md
+
+Please track the changes that you are making in the [CHANGELOG.md](CHANGELOG.md) file.
+
+### Trigger a new release
+
+Once you have added the changes to the CHANGELOG, merged the pull request and 
+the workflow has been executed on CI/CD, you can trigger a new release.
+
+* Go to [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/releases) and click on `Draft a new release` at the top of the page.
+* Under `Choose a tag` create a new tag and give details on the release.
+* With each release the corresponding [Zenodo repository](https://zenodo.org/records/10209942) is updated.
+
+## Updating the data
+
+For new `steinbock` releases and specifically if the Mesmer version changes, the
+example data should be updated. The example data are stored on Central NAS 
+and are hosted on Zenodo. 
+
+### Re-analyse the example data
+
+* You can find the raw data on [zenodo](https://zenodo.org/records/7575859).
+* On Central NAS under projects/IMCWorkflow/zenodo create a new folder called `steinbock_0.x.y` where x denotes the new major version and y the new minor version.
+* Copy the `steinbock.sh` script from the folder of the previous version to to folder of the newest version.
+* Change the steinbock version number in the `steinbock.sh` script and execute it.
+* It should generate all relevant files and zip all folders.
+
+### Upload data to zenodo
+
+* On [zenodo](https://zenodo.org/records/7624451), click on `New version` and replace all files with the newer version. No need to upload the raw data to zenodo as they are hosted in a different repository
+
+### Adjust the book
+
+* Work in the most recent Docker container and on the devel branch.
+* Manually go through each section, update the links in the [Prerequisites](https://bodenmillergroup.github.io/IMCDataAnalysis/prerequisites.html#download-data) section
+* Make sure to check and asjust the unit tests at the end of each file
+* Make sure that the text (e.g. clustering) still matches the results
+
+*Important:* as we are training a random forest classifier on manually gated cells, these gated cells won't match the newest version of the data if the Mesmer version changed. For this, we have the  `code/transfer_labels.R` script that automatically re-gates cells in the new SPE object.
+
+* Go through all sections until `Cell phenotyping`
+* Based on the old `gated_cells` and the new SPE object, execute the `code/transfer_labels.R` script
+* Zip the new `gated_cells` and upload them to a new version on [zendod](https://zenodo.org/records/8095133)
+* Adjust the link to the new gated cells in the [Prerequisites](https://bodenmillergroup.github.io/IMCDataAnalysis/prerequisites.html#download-data) section
+* Make sure that the new classification results closely match the new results
+
+* Continue going through the book
+
+### Execute the book
+
+* When you are done working through the book, within the Docker container open the RProject file and execute `bookdown::render_book()` to make sure that it can be executed from beginning to end.
+* Under `data/CellTypeValidation` have a look at the PNGs to check if celltypes were correctly detected.
+
+### Add changes to CHANGELOG.md
+
+Finally, add all the recent changes to the CHANGELOG, create and merge a PR and create a new release (see above).
+
+
diff --git a/README.md b/README.md
@@ -10,7 +10,6 @@ R workflow highlighting analyses approaches for multiplexed imaging data.
 
 ## Scope
 
-
 This workflow explains the use of common R/Bioconductor packages to pre-process and analyse single-cell data obtained from segmented multichannel images.
 While we use imaging mass cytometry (IMC) data as an example, the concepts presented here can be applied to images obtained by other technologies (e.g. CODEX, MIBI, mIF, CyCIF, etc.).
 The workflow can be largely divided into the following parts:
@@ -23,6 +22,13 @@ The workflow can be largely divided into the following parts:
 6. Image visualization
 7. Spatial analyses
 
+## Update freeze
+
+This workflow has been actively developed until December 2023. At that time
+we used the most recent (`v.0.16.0`) version of `steinbock` to process the 
+example data. If you are having issues when using newer versions of `steinbock`
+please open an issue [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues).
+
 ## Usage
 
 To reproduce the analysis displayed at [https://bodenmillergroup.github.io/IMCDataAnalysis/](https://bodenmillergroup.github.io/IMCDataAnalysis/) clone the repository via:
@@ -58,6 +64,20 @@ docker pull ghcr.io/bodenmillergroup/imcdataanalysis:<year-month-date>
 3. Navigate to `IMCDataAnalysis` and open the `IMCDataAnalysis.Rproj` file.  
 4. Code in the individual files can now be executed or the whole workflow can be build by entering `bookdown::render_book()`.
 
+## Feedback
+
+We provide the workflow as an open-source resource. It does not mean that
+this workflow is tested on all possible datasets or biological questions and 
+there exist multiple ways of analysing data. It is therefore recommended to
+check the results and question their biological interpretation.
+
+If you notice an issue or missing information, please report an issue
+[here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues). We also
+welcome contributions in form of pull requests or feature requests in form of
+issues. Have a look at the source code at:
+
+[https://github.com/BodenmillerGroup/IMCDataAnalysis](https://github.com/BodenmillerGroup/IMCDataAnalysis)
+
 ## Contributing guidelines
 
 For feature requests and bug reports, please raise an issue [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues).
@@ -68,10 +88,11 @@ To add new libraries to the container please add them to the [Dockerfile](Docker
 
 ## Maintainer
 
-[Nils Eling](https://github.com/nilseling)
+[Daniel Schulz](https://github.com/SchulzDan)  
 
 ## Contributors
 
+[Nils Eling](https://github.com/nilseling)
 [Vito Zanotelli](https://github.com/votti)  
 [Daniel Schulz](https://github.com/SchulzDan)  
 [Jonas Windhager](https://github.com/jwindhager)   

diff --git a/index.Rmd b/index.Rmd
@@ -45,10 +45,19 @@ spatial analysis and the user will need to become familiar with the general
 framework to efficiently analyse data obtained from multiplexed imaging
 technologies.
 
+## Update freeze
+
+This workflow has been actively developed until December 2023. At that time
+we used the most recent (`v.0.16.0`) version of `steinbock` to process the 
+example data. If you are having issues when using newer versions of `steinbock`
+please open an issue [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues).
+
 ## Feedback and contributing
 
 We provide the workflow as an open-source resource. It does not mean that
-this workflow is tested on all possible datasets or biological questions.
+this workflow is tested on all possible datasets or biological questions and 
+there exist multiple ways of analysing data. It is therefore recommended to
+check the results and question their biological interpretation.
 
 If you notice an issue or missing information, please report an issue
 [here](https://github.com/BodenmillerGroup/IMCDataAnalysis/issues). We also
@@ -57,6 +66,19 @@ issues. Have a look at the source code at:
 
 [https://github.com/BodenmillerGroup/IMCDataAnalysis](https://github.com/BodenmillerGroup/IMCDataAnalysis)
 
+## Maintainer
+
+[Daniel Schulz](https://github.com/SchulzDan)  
+
+## Contributors
+
+[Nils Eling](https://github.com/nilseling)
+[Vito Zanotelli](https://github.com/votti)  
+[Daniel Schulz](https://github.com/SchulzDan)  
+[Jonas Windhager](https://github.com/jwindhager)   
+[Michelle Daniel](https://github.com/michdaniel)  
+[Lasse Meyer](https://github.com/lassedochreden)
+
 ## Citation
 
 The workflow has been published in