This repository contains the code necessary to reproduce the analysis and figures for
"Ancestry-based differences in the immune system are associated with lupus severity" (2022), Slight-Webb, S, Thomas, K., et al...
Download the code using
git clone https://github.com/KevinTThomas/sle_activity_scrnaseq.git
The analysis requires R (>=4.1),
Rstudio (>= 2021.09.0),
Python (>=3.8) run on Ubuntu 20.04. The R packages that are
required for this analysis can be found in the DESCRIPTION file; this
project, however, has itself been organized as an R package and -
provided that you have BiocManager
and remotes
installed - the
necessary packages can be installed with:
BiocManager::install('KevinTThomas/sle_activity_scrnaseq')
Additionally, to improve reproducibility and ease installation, a Docker container is provided. It can be retrieved using:
docker pull milescsmith/sle_activity_scrnaseq:4.1.2
The docker container is based on the rocker/rstudio:4.1.2 container and thus runs a version of RStudio appropriate for compiling the analysis notebooks. The dockerfile for building the above image can be found within this repository.
Because of the size of the data, it is recommended that the analysis be run on a Linux workstation with multiple cores and at least 64 GB of RAM or on an institutional cluster.
The code is divided into two sections:
-
An analysis pipeline that processes the counts generated by Cell Ranger, performs QC and background noise removal, generates a Seurat object, normalizes data, performs batch correction, and carries out clustering and dimensional reductions.
-
Code for generating the figures used in the manuscript.
The code is implemented as a series of Rmarkdown files. Since the output of earlier notebooks is used as input for later notebooks, they should be run in the order:
a. 01_pp_soupx.Rmd
b. 02_pp_object_construction.Rmd
c. 03_pp_qc.Rmd
d. 04_analysis.Rmd
e. 05_plotting.Rmd
These can either be run from within Rstudio (using the knit
function
to compile the Rmarkdown documents into github-compatible markdown files
or html files) or on the command line using:
R -e "knitr::knit('analysis/01_pp_soupX.Rmd')"
A small subset of the data is provided as a toy example in the
demo_data
folder, which the code is currently written to analyze.
The notebooks are currently written to run on this toy dataset. To run
them on the data from the study, the analysis/01_pp_soupX.Rmd
file will need
to have all instances of demo_data
changed to the name of the directory
in which the data is located. Within that data directory, create a subdirectory
named "droplets" and place the final raw and filtered matrix folder for each run
within their own appropriately named subdirectory. In the data directory, place a
sample sheet describing the samples; the da_samplesheet_final.csv
file can serve
as a template, but it must have the columns
Subject_id | ancestry | classification | age | run | Hashtag |
---|
The scRNA-seq data has been deposited with the Gene Expression Omnibus and is available as series GSE189050