Skip to content

Moore-Lab-UMass/CAPRA

Repository files navigation

CAPRA Pipeline

CAPRA

The identification of functional regulatory elements from STARR-seq data typically relies on peak-calling algorithms that assess RNA enrichment over input DNA across the genome. However, this approach often produces broad peaks encompassing multiple candidate cis-regulatory elements (cCREs), limiting resolution and complicating the interpretation of element-specific activity. This challenge is particularly acute in dense regulatory regions where multiple functional elements may be closely spaced.

To overcome this limitation, we developed CAPRA (CRE-centric Analysis and Prediction of Reporter Assays), a method that uses the Registry of cCREs as predefined anchors to directly quantify STARR-seq activity without requiring peak calling. CAPRA enables high-resolution, element-centric quantification of both enhancer and silencer activity by computing RNA-to-DNA fragment ratios for individual cCREs. It also supports quantification across cCRE pairs to assess potential combinatorial effects.

Step 0 - Pull and prep data from ENCODE portal

The initial step converts STARR-seq BAM files to a BED files denoting the position of each of the fragments. This particular script will download BAM files from the ENCODE portal, sort them using BEDtools, convert to BEDPE files and then finally output a BED file for each input BAM.

Input data:

Exp ID Biosample Lab RNA BAMs DNA BAMs
ENCSR661FOW K562 Tim Reddy, Duke ENCFF692WJN;ENCFF058NAC;ENCFF294XNE ENCFF778LRW

Additional scripts:

Required software:

cCRE Centric Mode

Step 1 - Extract overlapping fragments and create count matrix

This step creates quantification matrices by intersecting the STARR-seq fragments with cCREs. All fragments that overlap one cCRE in its entirety count towards the "solo" quantifications. All fragments that overlap two cCREs in their entirety count towards the "double" quantifications. Script will output a matrix with rDHS ID in the first column followed by DNA fragment counts then RNA fragment counts.

Input data:

Required software:

Step 2 - Run DESeq on matrices

This step calculates the normalized ratio of RNA to DNA fragments and statistical significance for each cCRE using DESeq2

Input data:

  • Quantification matrices from Step 1

Additional scripts:

Required software:

Note - different versions of R and DESeq2 may produce slightly different quantification values

Paired Sweep Mode

Step 3 - Paired sweep mode

This mode evaluates partially overlapping STARR-seq fragments across the genomic interval between a pair of high-coverage cCREs to resolve fine-scale relationships between local sequence features and reporter activity.

For each cCRE pair, there are two directional sweeps:

  • Forward sweep: Starting with fragments that fully overlap the first cCRE, CAPRA progressively tiles 10 bp windows toward the second cCRE, quantifying RNA and DNA fragment coverage at each step.
  • Reverse sweep: The same procedure is repeated in the opposite direction, starting from fragments that fully overlap the second cCRE and sweeping back toward the first.

Input data:

  • Fragment BED files from Step 0
  • BED file of paired cCREs (2 lines total, example below)
chr start end anchor id ccre id class
chr1 46195852 46196010 EH38D4377717 EH38E2809119 dELS
chr1 46196098 46196283 EH38D4377718 EH38E2809120 pELS

Required software:

About

CRE-centric Analysis and Prediction of Reporter Assays

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors