Skip to content

Final upload for SLE disease activity single cell multiomics code repository

License

Notifications You must be signed in to change notification settings

KevinTThomas/sle_activity_scrnaseq

Repository files navigation

This repository contains the code necessary to reproduce the analysis and figures for

"Ancestry-based differences in the immune system are associated with lupus severity" (2022), Slight-Webb, S, Thomas, K., et al...

Installation

Download the code using

git clone https://github.com/KevinTThomas/sle_activity_scrnaseq.git

Requirements

The analysis requires R (>=4.1), Rstudio (>= 2021.09.0), Python (>=3.8) run on Ubuntu 20.04. The R packages that are required for this analysis can be found in the DESCRIPTION file; this project, however, has itself been organized as an R package and - provided that you have BiocManager and remotes installed - the necessary packages can be installed with:

BiocManager::install('KevinTThomas/sle_activity_scrnaseq')

Additionally, to improve reproducibility and ease installation, a Docker container is provided. It can be retrieved using:

docker pull milescsmith/sle_activity_scrnaseq:4.1.2

The docker container is based on the rocker/rstudio:4.1.2 container and thus runs a version of RStudio appropriate for compiling the analysis notebooks. The dockerfile for building the above image can be found within this repository.

Because of the size of the data, it is recommended that the analysis be run on a Linux workstation with multiple cores and at least 64 GB of RAM or on an institutional cluster.

Analysis

The code is divided into two sections:

  • An analysis pipeline that processes the counts generated by Cell Ranger, performs QC and background noise removal, generates a Seurat object, normalizes data, performs batch correction, and carries out clustering and dimensional reductions.

  • Code for generating the figures used in the manuscript.

The code is implemented as a series of Rmarkdown files. Since the output of earlier notebooks is used as input for later notebooks, they should be run in the order:

a. 01_pp_soupx.Rmd
b. 02_pp_object_construction.Rmd
c. 03_pp_qc.Rmd
d. 04_analysis.Rmd
e. 05_plotting.Rmd

These can either be run from within Rstudio (using the knit function to compile the Rmarkdown documents into github-compatible markdown files or html files) or on the command line using:

R -e "knitr::knit('analysis/01_pp_soupX.Rmd')"

A small subset of the data is provided as a toy example in the demo_data folder, which the code is currently written to analyze. The notebooks are currently written to run on this toy dataset. To run them on the data from the study, the analysis/01_pp_soupX.Rmd file will need to have all instances of demo_data changed to the name of the directory in which the data is located. Within that data directory, create a subdirectory named "droplets" and place the final raw and filtered matrix folder for each run within their own appropriately named subdirectory. In the data directory, place a sample sheet describing the samples; the da_samplesheet_final.csv file can serve as a template, but it must have the columns

Subject_id ancestry classification age run Hashtag

Data

The scRNA-seq data has been deposited with the Gene Expression Omnibus and is available as series GSE189050

About

Final upload for SLE disease activity single cell multiomics code repository

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Languages