# Processing Calderon 2019

Calderon et al., 2019 published the following paper [Landscape of stimulation-responsive chromatin across diverse human immune cells](https://www.nature.com/articles/s41588-019-0505-9). This paper was really interesting because the author looked at resting versus stimulated immunes cells via ATACseq and RNAseq and intersecting these datasets with GWAS for a variety of immune diseases. This notebook is incomplete but my goal was to understand what GWAS variants were prioritized in this study.

In [1]:
import os
import pandas as pd
import myvariant

In [2]:
os.chdir('/mnt/BioHome/jreyna/jreyna/projects/dchallenge/')

In [35]:
fn = 'results/main/calderon_2019/41588_2019_505_MOESM8_ESM.tsv'
supp = pd.read_table(fn)

In [36]:
def to_hgvs_id(sr):
    s = '{}:g.{}{}>{}'.format(sr.chr, sr.pos, sr.refAllele, sr.altAllele)
    return(s)

supp['hgvs_id'] = supp.apply(to_hgvs_id, axis=1)

In [47]:
t1d = supp.loc[supp.Phenotype == 'Type 1 diabetes']
#t1d = t1d.loc[t1d.stim == False]

In [48]:
# generate a dictionary of hgvs id to rsid
mv = myvariant.MyVariantInfo()
query = mv.getvariants(t1d.hgvs_id.unique(), fields=['dbsnp.rsid'])

hgvs_to_rsid = {}
for rec in query:
    
    if 'dbsnp' in rec:
        hgvs_to_rsid[rec['query']] = rec['dbsnp']['rsid']
    else:
        hgvs_to_rsid[rec['query']] = 'Not Found'

querying 1-4...done.


In [49]:
t1d['rsid'] = t1d['hgvs_id'].replace(hgvs_to_rsid)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  t1d['rsid'] = t1d['hgvs_id'].replace(hgvs_to_rsid)


In [50]:
t1d

Unnamed: 0,Phenotype,chr,pos,dbSNP134_id,gwas_pvalue,PMID,TotalDiscoverySamples,donor,cell,stim,...,logFC_atac,adj.P.Val_atac,nearby_de_gene_id,contrast_rna,logFC_rna,adj.P.Val_rna,tested_TF,ref_minus_alt_match,hgvs_id,rsid
54,Type 1 diabetes,chr1,206939904,3024505,2.100000e-11,19430480,16559,1002,Myeloid_DCs,False,...,-1.800962,4.183291e-04,,,,,ENSG00000008196_LINE4_TFAP2B_D_N1,3.825278,chr1:g.206939904G>A,rs3024505
56,Type 1 diabetes,chr1,206939904,3024505,2.100000e-11,19430480,16559,1002,Monocytes,True,...,-1.800962,4.183291e-04,,,,,ENSG00000008196_LINE4_TFAP2B_D_N1,3.825278,chr1:g.206939904G>A,rs3024505
58,Type 1 diabetes,chr1,206939904,3024505,4.700000e-10,21829393,12501,1002,Myeloid_DCs,False,...,-1.800962,4.183291e-04,,,,,ENSG00000008196_LINE4_TFAP2B_D_N1,3.825278,chr1:g.206939904G>A,rs3024505
69,Type 1 diabetes,chr1,206939904,3024505,2.100000e-11,19430480,16559,1002,Myeloid_DCs,False,...,-1.800962,4.183291e-04,,,,,ENSG00000008196_LINE4_TFAP2B_D_N1,3.825278,chr1:g.206939904G>A,rs3024505
75,Type 1 diabetes,chr1,206939904,3024505,4.700000e-10,21829393,12501,1002,Myeloid_DCs,False,...,-1.918953,5.971444e-05,,,,,ENSG00000008196_LINE4_TFAP2B_D_N1,3.825278,chr1:g.206939904G>A,rs3024505
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
726882,Type 1 diabetes,chr10,6114660,41295061,3.900000e-24,18978792,8207,1004,Effector_memory_CD8pos_T,True,...,2.556422,1.016120e-07,,,,,ENSG00000230336_LINE17148_CR8477943_I_N5,-4.025442,chr10:g.6114660C>A,rs41295061
726883,Type 1 diabetes,chr10,6114660,41295061,3.900000e-24,18978792,8207,1004,Effector_memory_CD8pos_T,True,...,2.556422,1.016120e-07,,,,,ENSG00000230336_LINE17148_CR8477943_I_N5,-4.025442,chr10:g.6114660C>A,rs41295061
726884,Type 1 diabetes,chr10,6114660,41295061,3.900000e-24,18978792,8207,1004,Th17_precursors,True,...,3.359023,1.384793e-09,,,,,ENSG00000230336_LINE17148_CR8477943_I_N5,-4.025442,chr10:g.6114660C>A,rs41295061
726885,Type 1 diabetes,chr10,6114660,41295061,3.900000e-24,18978792,8207,1004,Th17_precursors,True,...,3.359023,1.384793e-09,,,,,ENSG00000230336_LINE17148_CR8477943_I_N5,-4.025442,chr10:g.6114660C>A,rs41295061


In [51]:
t1d.rsid.unique()

array(['rs3024505', 'rs61839660', 'rs998592', 'rs41295061'], dtype=object)

In [55]:
t1d.drop_duplicates(subset=['chr', 'pos'])

Unnamed: 0,Phenotype,chr,pos,dbSNP134_id,gwas_pvalue,PMID,TotalDiscoverySamples,donor,cell,stim,...,logFC_atac,adj.P.Val_atac,nearby_de_gene_id,contrast_rna,logFC_rna,adj.P.Val_rna,tested_TF,ref_minus_alt_match,hgvs_id,rsid
54,Type 1 diabetes,chr1,206939904,3024505,2.1e-11,19430480,16559,1002,Myeloid_DCs,False,...,-1.800962,0.0004183291,,,,,ENSG00000008196_LINE4_TFAP2B_D_N1,3.825278,chr1:g.206939904G>A,rs3024505
7748,Type 1 diabetes,chr10,6094697,61839660,5.1e-09,22293688,16179,1004,Effector_memory_CD8pos_T,True,...,-1.774487,2.767275e-05,,,,,ENSG00000116017_LINE45_ARID3A_D_N1,-4.021372,chr10:g.6094697C>T,rs61839660
12690,Type 1 diabetes,chr16,11199678,998592,4.96e-09,17632545,3105,1003,Effector_memory_CD8pos_T,True,...,1.615495,5.408296e-05,,,,,ENSG00000072310_LINE94_SREBF1_D_N3,-4.183004,chr16:g.11199678C>T,rs998592
27254,Type 1 diabetes,chr10,6114660,41295061,3.9e-24,18978792,8207,1004,Effector_memory_CD8pos_T,True,...,2.556422,1.01612e-07,,,,,ENSG00000164330_LINE255_EBF1_D_N1,2.932422,chr10:g.6114660C>A,rs41295061


In [46]:
for x in supp.Phenotype.unique():
    print(x)

Transmission distortion
Maternal transmission distortion
Serum creatinine
Height
Breast cancer (male breast cancer)
Irritible bowel syndrome
Primary biliary cirrhosis
Crohn's disease
Type 1 diabetes
Selective immunoglobulin A deficiency (IgAD)
Ulcerative colitis
Inflammatory bowel disease
Total cholesterol
Plasma alpha-linolenic acid levels
Plasma docosapentaenoic acid levels
Plasma eicosapentaenoic acid levels
Classical Hodgkin's lymphoma
Helicobacter pylori seroprevalence
Efavirenz pharmacokinetics (log-transformed trough efavirenz concentration)
Coronary artery disease (CAD) with age of onset <=50
Coronary artery disease (CAD)
Aortic root size
Serum urate
Uric acid
Gout
LDL cholesterol
Glycan Peak 1
Biantennary nongalactosylated glycans
Biantennary nongalactosylated glycans in women
Desialylated Glycan Peak 1 in women
Glycan Peak 1 in women
Desialylated Glycan Peak 1
Height (adults)
Height (adult females)
Paternal transmission distortion
Fasting blood glucose
Triglycerides
Body mass