# PBMCs from healthy 

[Scalable, multimodal profiling of chromatin accessibility and protein levels in single cells](https://www.biorxiv.org/content/10.1101/2020.09.08.286914v1.full.pdf)

[GEO dataset GSE149689](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE156478)

[SRA dataset PRJNA658080](https://www.ncbi.nlm.nih.gov/bioproject/PRJNA658080)

Used Chromium V1 chemistry from 10x Genomics for scATAC-seq w/ readlayout for STARsolo:
10x v1
* Whitelist, 737K-april-2014_rc.txt
* CB length, 14
* UMI start, 15
* UMI length, 10 (courtesy ATpoint)

Also has epitope based "ASAP-Seq" to get surface epitope information for a bunch of proteins


In [1]:

import pandas as pd

In [3]:
PRJNA629752_meta_in = pd.read_csv("PRJNA658080_SRA_datatable.txt", header=0, delimiter=r',')

In [11]:
PRJNA629752_meta_in.columns

Index(['ab_tags', 'Run', 'Assay Type', 'AvgSpotLen', 'Bases', 'BioProject',
       'BioSample', 'Bytes', 'Center Name', 'Consent', 'DATASTORE filetype',
       'DATASTORE provider', 'DATASTORE region', 'Experiment',
       'GEO_Accession (exp)', 'Instrument', 'LibraryLayout',
       'LibrarySelection', 'LibrarySource', 'lysis_buffer', 'Organism',
       'Platform', 'ReleaseDate', 'Sample Name', 'source_name', 'SRA Study',
       'target_molecule'],
      dtype='object')

In [12]:
PRJNA629752_meta_in[["Experiment","Run","source_name", "Assay Type", "target_molecule"]]

Unnamed: 0,Experiment,Run,source_name,Assay Type,target_molecule
0,SRX8970539,SRR12476601,Peripheral blood mononuclear cells,ATAC-seq,Accessible chromatin
1,SRX8970540,SRR12476602,Peripheral blood mononuclear cells,ATAC-seq,Accessible chromatin
2,SRX8970541,SRR12476603,Peripheral blood mononuclear cells,OTHER,Cell Surface Markers
3,SRX8970542,SRR12476604,Peripheral blood mononuclear cells,OTHER,Cell Surface Markers
4,SRX8970543,SRR12476605,Peripheral blood mononuclear cells,OTHER,Cell Surface Markers
5,SRX8970544,SRR12476606,Peripheral blood mononuclear cells,OTHER,Cell Surface Markers
6,SRX8970545,SRR12476607,Peripheral blood mononuclear cells,ATAC-seq,Accessible chromatin
7,SRX8970546,SRR12476608,Peripheral blood mononuclear cells,OTHER,Cell Surface Markers
8,SRX8970547,SRR12476609,Peripheral blood mononuclear cells,ATAC-seq,Accessible chromatin
9,SRX8970548,SRR12476610,Peripheral blood mononuclear cells,ATAC-seq,Accessible chromatin


In [None]:
#Start w/ PBMC ATAC-seq SRR12476610 "Broad_LibC_ATAC"

In [None]:
#Prefetch SRA file
prefetch --progress -o /fast_dir/seq_data/raw_sra/SRR12476610.sra SRR12476610


In [None]:
#Dump SRA file into fastq.gz
parallel-fastq-dump -t 8 --tmpdir /fast_dir/seq_data/raw_sra/temp \
    -s /fast_dir/seq_data/raw_sra/SRR12476610.sra \
    --dumpbase --clip --readids --gzip \
    --read-filter pass --split-files --origfmt \
    --outdir /fast_dir/seq_data/input_reads/pbmc/


In [None]:
cd /fast_dir/seq_data/input_reads/pbmc/
#Filter reads and check quality, turn on UMI processing and max read length=0 to not toss first read
fastp -p 12 -i SRR12476610_pass_1.fastq.gz \
      -I SRR12476610_pass_2.fastq.gz \
      -o SRR12476610_filt_1.fq.gz \
      -O SRR12476610_filt_2.fq.gz \
      -h SRR12476610_fastp.html \
      -j SRR12476610_fastp.json 
#      --umi --umi_loc read1 --umi_len 26 -l 0   #Finds and appends Barcode to read name, not what i want


In [None]:
snaptools index-genome \
        --input-fasta=/input_dir/corona_analysis/annotations/human/GRCh38_filt_dna_sm_covid.fa \
        --output-prefix=/input_dir/corona_analysis/annotations/human/hg38_covid_mm \
        --aligner=minimap2 \
        --path-to-aligner=/opt/conda/envs/env/bin/ \
        --num-threads=12

In [None]:
#Quantify scRNA-seq reads
cd /fast_dir/seq_data/input_reads/pbmc
snaptools align-paired-end \
        --input-reference=/input_dir/corona_analysis/annotations/human/GRCh38_filt_dna_sm_covid.fa \
        --input-fastq1=SRR12476610_filt_1.fq.gz \
        --input-fastq2=SRR12476610_filt_2.fq.gz \
        --output-bam=/fast_dir/seq_data/alignment_out/pbmc/SRR12476610_hg38_covid.bam \
        --path-to-aligner=/opt/conda/envs/env/bin/ \
        --aligner=minimap2 \
        --read-fastq-command=zcat \
        --min-cov=0 \
        --num-threads=5 \
        --if-sort=True \
        --tmp-folder=/fast_dir/seq_data/raw_sra/temp/ \
        --overwrite=TRUE        

In [None]:
MAESTRO scatac-init --platform 10x-genomics --format fastq --species GRCh38 \
--fastq-dir /fast_dir/seq_data/input_reads/pbmc --fastq-prefix SRR12476601_pass_ \
--cores 8 --directory /fast_dir/seq_data/scATAC/10X_PBMC_MAESTRO_healthy --outprefix 10X_PBMC_healthy \
--peak-cutoff 100 --count-cutoff 1000 --frip-cutoff 0.2 --cell-cutoff 50 \
--fasta /input_dir/corona_analysis/annotations/human/GRCh38_filt_dna_sm_covid.fa \
--whitelist /input_dir/corona_analysis/annotations/human/scRNA_10x_v1_whitelist.txt \
--rpmodel Enhanced 
#--annotation --method RP-based --signature human.immune.CIBERSORT
#--giggleannotation annotations/MAESTRO/giggle.all \