SeuratToGO

Description

In single-cell RNA sequencing (scRNA-seq), clusters are groups of cells that exhibit similar gene expression patterns. The primary goal of clustering in scRNA-seq analysis is to identify and group together cells that share similar transcriptional profiles. Each cluster represents a distinct population of cells with potentially similar cell types, biological states, or functions. An R pacakge called Seurat is a popular tool used to carry out the pre-processing, clustering and visualization steps in scRNA-seq analysis.

The package processes Seurat’s differential expression markers (after running FindAllMarkers() function in Seurat). This package reformats the gene markers to go through gene ontology (GO) analysis using DAVID (Database for Annotation, Visualization and Integrated Discovery). It also provides functions for analysis of DAVID output files and visualization.

Currently, users have to manually separate the clusters in Seurat’s markers dataframe using Excel and export it as a tab-delimited text file to upload to DAVID. They then have to manually combine all the DAVID output files (one for each clusters) to do further analysis.

The R package includes the main components: DESCRIPTION, NAMESPACE, man subdirectory and R subdirectory. Additionally, LICENSE, README and subdirectories vignettes, tests, data and inst are also explored. The SeuratToGO package was developed using R version 4.3.2 (2023-10-31 ucrt), Platform: x86_64-w64-mingw32/x64 (64-bit) and Running under: Windows 11 x64 (build 22621).

Installation

You can install the development version of SeuratToGO from GitHub with:

install.packages("devtools")
library("devtools")
devtools::install_github("dien-n-nguyen/SeuratToGO", build_vignettes = TRUE)
library("SeuratToGO")

To run the Shiny app:

SeuratToGO::run_SeuratToGO()

Overview

ls("package:SeuratToGO")
data(package = "SeuratToGO") 
browseVignettes("SeuratToGO")

SeuratToGO contains 5 functions.

separate_clusters for separating the differentially expressed markers data frame generated by Seurat and exporting it as a tab-delimited text file.
combine_david_files for combining all the DAVID output files into a list of data frames.
get_top_processes to get the top processes for a one specified cluster. The output is a dataframe in which each row is a biological process and each column is a property relating to that process, for example genes, p-value, population, etc… This is to get a closer look at the each cluster.
get_all_top_processes to get the p-values of the top processes for every cluster and consolidate them into one data frame.
top_processes_heatmap to generate a heatmap for all the top processes in each cluster

The package also contains a dataset called pbmc_markers, which contains differentially expressed markers generated using Seurat’s tutorial. It also contains a zip folder called david.zip in inst/extdata/ that contains sample DAVID output files if users want to view them.

An overview of the package is illustrated below. The steps highlighted yellow are not supported by this package, since DAVID’s API does not support the type of gene IDs we are working with. See the vignette for more details.

Contributions

The author of the package is Dien Nguyen. The author wrote all 5 functions mentioned above. separate_clusters uses the package magrittr for piping and the package dplyr for filtering and selecting. get_top_processes uses dplyr to sort data frames. top_processes_heatmap uses the package pheatmap to generate the heatmap. The pbmc_markers dataset was generated by following Seurat’s clustering tutorial. The DAVID output files were generated using the DAVID web server.

References

Bache S, Wickham H. 2022. magrittr: A Forward-Pipe Operator for R. https://magrittr.tidyverse.org, https://github.com/tidyverse/magrittr.
Benjamini Y, Hochberg Y. 1995. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B (Methodological. 57(1):289–300. doi:10.1111/j.2517-6161.1995.tb02031.x.
Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. 2018. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 36(5):411–420. doi:10.1038/nbt.4096.
Kolde R. 2019. Pheatmap: pretty heatmaps. https://github.com/raivokolde/pheatmap
R Core Team. 2023. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Sherman BT, Hao M, Qiu J, Jiao X, Baseler MW, Lane HC, Imamichi T, Chang W. 2022. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 50(W1):W216–W221. doi:10.1093/nar/gkac194.
Stuart T, Butler A, Hoffman P, Hafemeister C, Papalexi E, Mauck WM, Hao Y, Stoeckius M, Smibert P, Satija R. 2019. Comprehensive Integration of Single-Cell Data. Cell. 177(7):1888-1902.e21. doi:10.1016/j.cell.2019.05.031.
Wickham H, Bryan, J. 2019. R Packages (2nd edition). Newton, Massachusetts: O’Reilly Media. https://r-pkgs.org/
Wickham H, François R, Henry L, Müller K, Vaughan D. 2023. dplyr: A Grammar of Data Manipulation. https://dplyr.tidyverse.org, https://github.com/tidyverse/dplyr.

Acknowledgements

This package was developed as part of an assessment for 2022-2023 BCB410H: Applied Bioinformatics course at the University of Toronto, Toronto, CANADA. SeuratToGO welcomes issues, enhancement requests, and other contributions. To submit an issue, use the GitHub issues. Many thanks to those who provided feedback to improve this package.

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
R		R
data		data
inst		inst
man		man
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.Rmd		README.Rmd
README.md		README.md
SeuratToGO.Rproj		SeuratToGO.Rproj

License

Licenses found

dien-n-nguyen/SeuratToGO

Folders and files

Latest commit

History

Repository files navigation

SeuratToGO

Description

Installation

Overview

Contributions

References

Acknowledgements

About

Resources

License

Licenses found

Stars

Watchers

Forks

Languages