The identification of functional regulatory elements from STARR-seq data typically relies on peak-calling algorithms that assess RNA enrichment over input DNA across the genome. However, this approach often produces broad peaks encompassing multiple candidate cis-regulatory elements (cCREs), limiting resolution and complicating the interpretation of element-specific activity. This challenge is particularly acute in dense regulatory regions where multiple functional elements may be closely spaced.
To overcome this limitation, we developed CAPRA (CRE-centric Analysis and Prediction of Reporter Assays), a method that uses the Registry of cCREs as predefined anchors to directly quantify STARR-seq activity without requiring peak calling. CAPRA enables high-resolution, element-centric quantification of both enhancer and silencer activity by computing RNA-to-DNA fragment ratios for individual cCREs. It also supports quantification across cCRE pairs to assess potential combinatorial effects.
The initial step converts STARR-seq BAM files to a BED files denoting the position of each of the fragments. This particular script will download BAM files from the ENCODE portal, sort them using BEDtools, convert to BEDPE files and then finally output a BED file for each input BAM.
Input data:
- List of STARR-seq experiments (e.g. STARR-BAM-List.txt) with the following format
| Exp ID | Biosample | Lab | RNA BAMs | DNA BAMs |
|---|---|---|---|---|
| ENCSR661FOW | K562 | Tim Reddy, Duke | ENCFF692WJN;ENCFF058NAC;ENCFF294XNE | ENCFF778LRW |
Additional scripts:
Required software:
- BEDTools (version 2.30.0 was used in Moore...Weng (2024) bioRxiv)
This step creates quantification matrices by intersecting the STARR-seq fragments with cCREs. All fragments that overlap one cCRE in its entirety count towards the "solo" quantifications. All fragments that overlap two cCREs in their entirety count towards the "double" quantifications. Script will output a matrix with rDHS ID in the first column followed by DNA fragment counts then RNA fragment counts.
Input data:
- Fragment BED files from Step 0
- GRCh38 cCREs
- GRCh38 cCRE pairs
Required software:
- BEDTools (verion 2.30.0 was used in Moore...Weng (2024) bioRxiv)
This step calculates the normalized ratio of RNA to DNA fragments and statistical significance for each cCRE using DESeq2
Input data:
- Quantification matrices from Step 1
Additional scripts:
Required software:
- R (version 4.2.3 was used in Moore...Weng (2024) bioRxiv)
- DESeq2 (version 1.38.0 was used in Moore...Weng (2024) bioRxiv)
Note - different versions of R and DESeq2 may produce slightly different quantification values
This mode evaluates partially overlapping STARR-seq fragments across the genomic interval between a pair of high-coverage cCREs to resolve fine-scale relationships between local sequence features and reporter activity.
For each cCRE pair, there are two directional sweeps:
- Forward sweep: Starting with fragments that fully overlap the first cCRE, CAPRA progressively tiles 10 bp windows toward the second cCRE, quantifying RNA and DNA fragment coverage at each step.
- Reverse sweep: The same procedure is repeated in the opposite direction, starting from fragments that fully overlap the second cCRE and sweeping back toward the first.
Input data:
- Fragment BED files from Step 0
- BED file of paired cCREs (2 lines total, example below)
| chr | start | end | anchor id | ccre id | class |
|---|---|---|---|---|---|
| chr1 | 46195852 | 46196010 | EH38D4377717 | EH38E2809119 | dELS |
| chr1 | 46196098 | 46196283 | EH38D4377718 | EH38E2809120 | pELS |
Required software:
- BEDTools (verion 2.30.0 was used in Moore...Weng (2024) bioRxiv)