tools for 10x scATAC data
-
python3
libraries:pysam
-
R >= 3.5.1
libraries :docopt
,dplyr
,purrr
,tidyr
,readr
,ggplot2
,ggExtra
bioconductor libraries:Rsamtools
,GenomicRanges
,GenomicAlignments
,rtracklayer
All the R command line utilies uses docopt
for argument parsing. Make sure you have it installed.
for a successful ATACseq, one often observes a periodicity of the insert size distribution: a peak under 100bp (open chromatin), ~200bp (single nucelosome), ~400bp (two nucelosome), etc..
time ./python/get_insert_size.py atac_v1_pbmc_5k_possorted_bam.bam pbmc_atac_5k_insert.txt --barcodes filtered_peak_bc_matrix/barcodes.tsv
real 125m31.042s
user 120m25.411s
sys 5m4.193s
According to ENCODE standard For Bulk ATACseq. FRiP score should be > 0.3, although 0.2 is acceptable.
cellranger-atac
produces a file called fragment.tsv.gz
which contains the fragment for each cell with 4 columns:
chrom, chromStart, chromEnd, barcode and duplicateCount.
use this file to calculate the Frip score for each cell is much faster than using the bam files with the other script
calculate_atac_Frip_per_cell_from_bam.R
.
./R/calculate_atac_Frip_per_cell_from_fragment.R
Note that, fragment.tsv.gz
contains all fragments including cell barcodes that do not pass the cellranger-atac filters. You may want to filter it with cellranger-atac output filtered_peak_bc_matrix/barcodes.tsv
, provide the list to --barcodeList
argument.
According to ENCODE standard TSS enrichment remains in place as a key signal to noise measure. Need to test for scATAC.
TSSEscore
function from ATACseqQC
can do that.
split_scATAC_bam_by_cluster.py
split a 10x scATAC bam file by cluster id. see a detailed blog post at https://divingintogeneticsandgenomics.rbind.io/post/split-a-10xscatac-bam-file-by-cluster/
for a bam file size of 20G containing 5000 cells and 12 clusters, it takes ~3 hours.
split_scATAC_bam_by_cell.py
split a 10x scATAC bam file by cell barcode.
for a bam file size of 1G (one cluster) containing 285 cells, it takes ~7 mins.
http://andrewjohnhill.com/blog/2019/04/12/streamlining-scatac-seq-visualization-and-analysis/