GitHub - Mayrlab/hcl-utrome: generates human UTRome from Human Cell Landscape data

Overview

This repository contains a pipeline to reprocess the Human Cell Landscape data for cleavage site identification as reported in Fansler et al., bioRxiv, 2023.

🤔 Important Note: This pipeline is provided for the scientific record, not necessarily with reuse in mind. However, we made some engineering improvements when writing the analogous pipeline for mouse data (https://github.com/Mayrlab/mca-utrome) making it more geared toward reuse. In particular, we moved what are really pipeline parameters out of the Snakefile and into the config.yaml where they really belong. If considering rerunning this pipeline or applying it to other Microwell-seq data, you may want to start from that version instead, or at least incorporate those pipeline improvements here. Also, be mindful that both of these are resource heavy pipelines - we may be able to provide useful intermediate files to expediate generating output variants that do not require rerunning alignments (open an Issue).

Organization

The folders in the repository have the following purposes:

data - (created at runtime) output data files
envs - Conda environment YAML files for recreating the execution environment
metadata - metadata files that annotate input data files
scripts - scripts used by the Snakefile
qc - (created at runtime) output quality checks

All code is expected to be executed with this repository as the present working directory.

Source Code

The primary source code is found in the Snakefile and the scripts folder.

Input Data

Files in the metadata folder describe most of the information necessary to download the raw input sequencing files, as well as annotate the cells.

This pipeline also requires a HISAT2 index, which is not automatically retrieved. The location of this should be specified with the hisatIndex key in the config.yaml.

Reproducing the Pipeline

Cloning

This repository can be cloned with:

git clone https://github.com/Mayrlab/hcl-utrome.git

Prerequisite Software

This requires Conda/Mamba and Snakemake. If you do not already have a Conda installation, we strongly recommend Miniforge.

Configuration

Two configuration options in config.yaml should be adjusted by the user prior to running:

tmpdir: temporary directory for scratch
hisatIndex: human HISAT2 index

Optional parameters in the config.yaml that could be adjusted are:

minReadLength: the minumum read length required to include the resulting merged read
radiusGENCODE: radius for merging GENCODE transcripts
radiusPAS: radius for merging PolyASite entries
extUTR3: downstream distance from annotated gene locus to classify as "extended 3'UTR"
extUTR5: upwnstream distance from annotated gene locus to classify as "extended 5'UTR"

Additional parameters of interest in the Snakefile are:

epsilon: the initial radius within which read ends are merged to the mode
threshold: minimum TPM per cell type cutoff for filtering low-frequency cleavage sites
version: the human GENCODE version to be built upon
tpm: the minimum TPM threshold for PolyASite entries to be used as "supporting" evidence
likelihood: the minimum CleanUpdTSeq score that a cleavage site is not from internal priming to be considered a "likely" cleavage site
width: the width for truncating the UTRome
merge: the distance within which to merge 3'ends during scUTRquant quantification

Running

The full pipeline can be executed with simply

snakemake --use-conda

We encourage HPC users to configure a Snakemake profile and use this via a --profile argument.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Organization

Source Code

Input Data

Reproducing the Pipeline

Cloning

Prerequisite Software

Configuration

Running

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
envs		envs
metadata		metadata
scripts		scripts
README.md		README.md
Snakefile		Snakefile
config.yaml		config.yaml

Mayrlab/hcl-utrome

Folders and files

Latest commit

History

Repository files navigation

Overview

Organization

Source Code

Input Data

Reproducing the Pipeline

Cloning

Prerequisite Software

Configuration

Running

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages