The Great Repertoire Project v2

This repository contains code and data used in our study of the temporal dynamics of the circulating human antibody repertoire. Briefly, we investigated the dynamics of the human antibody repertoire over time using high-throughput sequencing of antibody transcripts in the peripheral blood. Our study uncovers a profound and previously underappreciated level of repertoire drift of the naïve B cell repertoire within individuals. Despite stable overall repertoire size and diversity, fine-level composition undergoes nearly complete turnover after four years. We observe a delicate interplay between the continuous replacement of naïve B cells and the imprint of immunological exposures, revealing a nuanced model of overall repertoire development. Additionally, a notable feature is the identification of persistent public clonotypes suggesting potentially convergent antibody responses. These findings deepen our understanding of immune system dynamics and offer important insights toward the optimization of vaccine and immunotherapy strategies.

Code

The code used in this project is assembled into a series of Juypter notecooks. There are two sets of notebooks, those containing code used for DATA PROCESSING and those containing code used to MAKE FIGURES. GitHub will render each of the notebooks, but the code cannot be executed from within GitHub. If you'd like to actually run the code contained in the notebooks, you must clone the repository.

NOTE: Whenever possible, the intermediate datasets required to run the code are included in this repository, however, many intermediate datasets are too large to be included. In such cases, links to the required datasets are provided in the appropriate notebook.

Datasets

We have generated several large datasets, in two primary groups: antibody sequences from two healthy adult subjects in 2016, and antibody sequences from the same two healthy adult subjects after four years.

Antibody sequencing data

Raw and processed datasets from each subject can be downloaded using the following links. Some of these datasets are quite large.

327059-2016
- Sequences: raw FASTQs, consensus FASTAs
- FASTQC: pre-trimming
- Annotated data: consensus AIRR TSVs
D103-2016
- Sequences: raw FASTQs, consensus FASTAs
- FASTQC: pre-trimming
- Annotated data: consensus AIRR TSVs
327059-2020
- Sequences: raw FASTQs, consensus FASTAs
- FASTQC: pre-trimming
- Annotated data: consensus AIRR TSVs
D103-2021
- Sequences: raw FASTQs, consensus FASTAs
- FASTQC: pre-trimming
- Annotated data: consensus AIRR TSVs

For each timepoint, there are a total of 18 samples: 3 technical replicates of each of 6 biological replicates. Biological replicates refer to different aliquots of peripheral blood monomuclear cells (PBMCs), from which total RNA was separately isolated and processed. Thus, sequences or clonotypes found in multiple biological replicates are assumed to have independently occurred in different cells. Technical relicates refer to independent library preparations using the same aliquot of PBMC-derived RNA. In each of the above datasets, samples 1-6 are biological replicates. Samples 7-12 and 13-18 are technical replicates of samples 1-6.

Requirements

Python 3.3+ (although Python 2.7 may work for many or most notebooks, this has not been tested)
Jupyter Notebook

Additionally, each notebook may require additional third-party Python packages. Any notebook-specific requirements, as well as instructions for package installation with pip, are provided in each notebook.

If you're new to Python, a great way to get started is to install the Anaconda Python distribution, which includes pip as well as a ton of useful scientific Python packages.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data_processing		data_processing
make_figures		make_figures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

The Great Repertoire Project v2

Code

Datasets

Antibody sequencing data

Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

CollinJ0/grp2_paper

Folders and files

Latest commit

History

Repository files navigation

The Great Repertoire Project v2

Code

Datasets

Antibody sequencing data

Requirements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages