Skip to content

crazyhottommy/scATACtools

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scATACtools

tools for 10x scATAC data

Dependencies

  • python3
    libraries: pysam

  • R >= 3.5.1
    libraries : docopt, dplyr, purrr, tidyr, readr , ggplot2, ggExtra bioconductor libraries: Rsamtools, GenomicRanges, GenomicAlignments, rtracklayer

All the R command line utilies uses docopt for argument parsing. Make sure you have it installed.

Quality control

Get the Fragment insert size from a 10x cellranger produced bam file

for a successful ATACseq, one often observes a periodicity of the insert size distribution: a peak under 100bp (open chromatin), ~200bp (single nucelosome), ~400bp (two nucelosome), etc..

time ./python/get_insert_size.py atac_v1_pbmc_5k_possorted_bam.bam pbmc_atac_5k_insert.txt --barcodes filtered_peak_bc_matrix/barcodes.tsv

real    125m31.042s
user    120m25.411s
sys     5m4.193s

FRIP score

According to ENCODE standard For Bulk ATACseq. FRiP score should be > 0.3, although 0.2 is acceptable.

cellranger-atac produces a file called fragment.tsv.gz which contains the fragment for each cell with 4 columns: chrom, chromStart, chromEnd, barcode and duplicateCount.

use this file to calculate the Frip score for each cell is much faster than using the bam files with the other script calculate_atac_Frip_per_cell_from_bam.R.

./R/calculate_atac_Frip_per_cell_from_fragment.R

Note that, fragment.tsv.gz contains all fragments including cell barcodes that do not pass the cellranger-atac filters. You may want to filter it with cellranger-atac output filtered_peak_bc_matrix/barcodes.tsv, provide the list to --barcodeList argument.

TSS enrichment

According to ENCODE standard TSS enrichment remains in place as a key signal to noise measure. Need to test for scATAC.

TSSEscore function from ATACseqQC can do that.

split 10x bam by cluster or cell

for a bam file size of 20G containing 5000 cells and 12 clusters, it takes ~3 hours.

  • split_scATAC_bam_by_cell.py split a 10x scATAC bam file by cell barcode.

for a bam file size of 1G (one cluster) containing 285 cells, it takes ~7 mins.

visualization

http://andrewjohnhill.com/blog/2019/04/12/streamlining-scatac-seq-visualization-and-analysis/

About

R, python, unix tools for 10x scATACseq data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published