# PBMCs from healthy and CoVid

[Immunophenotyping of COVID-19 and influenza highlights the role of type I interferons in development of severe COVID-19](https://immunology.sciencemag.org/content/5/49/eabd1554)

[GEO dataset GSE149689](https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE149689)

[SRA dataset PRJNA629752](https://www.ncbi.nlm.nih.gov/Traces/study/?query_key=2&WebEnv=MCID_5fef99ec1b3f1452c11c8a61&o=subject_group_sam_s%3Aa)

Used Chromium V3 chemistry from 10x Genomics for scRNA-seq w/ readlayout:
r1(UMI+CB)=28
r2=91

In [2]:
import pandas as pd

In [3]:
PRJNA629752_meta_in = pd.read_csv("PRJNA629752_SRA_datatable.txt", header=0, delimiter=r',')

In [4]:
PRJNA629752_meta_in.columns

Index(['Run', 'AGE', 'Assay Type', 'AvgSpotLen', 'Bases', 'BioProject',
       'BioSample', 'Bytes', 'Cell_type', 'Center Name', 'Consent',
       'DATASTORE filetype', 'DATASTORE provider', 'DATASTORE region',
       'Experiment', 'gender', 'GEO_Accession (exp)', 'Instrument',
       'LibraryLayout', 'LibrarySelection', 'LibrarySource', 'Organism',
       'Platform', 'ReleaseDate', 'Sample Name', 'source_name', 'SRA Study',
       'subject_group', 'subject_status'],
      dtype='object')

In [8]:
PRJNA629752_meta_in[["Experiment","Run","subject_group", "subject_status"]]

Unnamed: 0,Experiment,Run,subject_group,subject_status
0,SRX8241106,SRR11680207,COVID-19 patient,severe COVID-19 patient
1,SRX8241107,SRR11680208,COVID-19 patient,mild COVID-19 patient
2,SRX8241108,SRR11680209,Influenza patient,
3,SRX8241109,SRR11680210,Influenza patient,
4,SRX8241110,SRR11680211,healthy control,
5,SRX8241111,SRR11680212,Influenza patient,
6,SRX8241112,SRR11680213,Influenza patient,
7,SRX8241113,SRR11680214,Influenza patient,
8,SRX8241114,SRR11680215,COVID-19 patient,severe COVID-19 patient
9,SRX8241115,SRR11680216,COVID-19 patient,severe COVID-19 patient


In [None]:
#Start w/ severe CoVid patient SRR11680221

In [None]:
#Prefetch SRA file
prefetch --progress -o /fast_dir/seq_data/raw_sra/SRR11680221.sra SRR11680221


In [None]:
#Dump SRA file into fastq.gz
parallel-fastq-dump -t 8 --tmpdir /fast_dir/seq_data/raw_sra/temp \
    -s /fast_dir/seq_data/raw_sra/SRR11680221.sra \
    --dumpbase --clip --readids --gzip \
    --read-filter pass --split-files --origfmt \
    --outdir /fast_dir/seq_data/input_reads/pbmc/


In [None]:
cd /fast_dir/seq_data/input_reads/pbmc/
#Filter reads and check quality, turn on UMI processing and max read length=0 to not toss first read
fastp -p 12 -i SRR11680221_pass_1.fastq.gz \
      -I SRR11680221_pass_2.fastq.gz \
      -o SRR11680221_filt_1.fq.gz \
      -O SRR11680221_filt_2.fq.gz \
      -h SRR11680221_fastp.html \
      -j SRR11680221_fastp.json 
#      --umi --umi_loc read1 --umi_len 26 -l 0   #Finds and appends Barcode to read name, not what i want


In [None]:
#Quantify scRNA-seq reads
cd /fast_dir/seq_data/input_reads/pbmc
salmon alevin -l ISR \
              -1 SRR11680221_pass_1.fastq.gz \
              -2 SRR11680221_pass_2.fastq.gz \
              --chromiumV3 \
              -i /data_dir/corona_analysis/annotations/human/salmon_ann/salmon_hg38_index \
              -p 10 --dumpMtx \
              --mrna /data_dir/corona_analysis/annotations/human/gencode_mt.txt \
              --rrna /data_dir/corona_analysis/annotations/human/rRNA_ensembl.txt \
              -o /fast_dir/seq_data/alignment_out/pbmc/ \
              --tgMap /data_dir/corona_analysis/annotations/human/salmon_grch38_gencode_tran2gene.txt
