# DENTIST R and cpp cpmparison


## Overview

DENTIST code is implemented in cpp `https://github.com/Yves-CHEN/DENTIST`. To incorporate it in the pecotmr package an R version is developed. 

Here we will test if the R version code have the same result of cpp version.

## Important note

DENTIST does not have a easy way to compile it in the system. Their paths in the makefile is quite messy and we need to install all dependencies and re-wrote the make file, which is quite time consuming. 

However, DENTIST provided an excutable file that can do the same things, but we cannot modify it so that we can only have input and outputs. Given that DENTIST involves randomness in splitting the region into S1 and S2 two parts and iterated this multiple times, so the numerical value of different attempts might be different. So here our strategy is to run DENTIST 10 times in each system, take the average for numerical things and compare if they are similar enough.

## Input

DENTIST the original program needs PLINK file as input, and also sumstat in COJO format (including columns `SNP A1 A2 freq b se p N`). Here we use the first round RSS_QC result as reference, found a region have outliers after allele QC. We use this region to verify the similarity of result.

+ PLINK file: `/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_input_MWE`

+ Sumstat: `/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_MAF_filtered.txt`


Notes about inputs: DENTIST only allow region with 2000+ variables, also, should not include variants that have maf = 0 in PLINK. So here the plink file is already filtered by threshold 0.01.

## Details

### 1. MWE data preparation

For DENTIST in github https://github.com/Yves-CHEN/DENTIST version 1.3.0.0, it requires PLINK data, so here I prepared it.

In [217]:
library(tidyverse)
library(susieR)
library(plink2R)
library(pecotmr)
library(vroom)

# These variants exist in both plink file and original bellenguez sumstat data

sumstat = read_tsv("/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat.tsv")

# get variant list to extract from PLINK data
sumstat %>% pull(SNP) %>% 
    write.table("/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/MWE_snplist.txt", row.names = F, quote = F)

[1mRows: [22m[34m2485[39m [1mColumns: [22m[34m8[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (3): SNP, A1, A2
[32mdbl[39m (5): freq, b, se, p, N

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [None]:
rds$LD 

In [None]:
sumstat

In [None]:
./DENTIST_1.3.0.0 --bfile  /home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK --gwas-summary /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat.tsv --out D_3000

In [204]:
# extract the PLINK files from chromosome 1

plink2 --bfile /mnt/vast/hpc/csg/FunGen_xQTL/ROSMAP/Genotype/geno_by_chrom/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.1 --extract /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/MWE_snplist.txt --make-bed --maf 0.01 --geno 0.01 --out ~/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK

PLINK v2.00a5LM 64-bit Intel (23 Sep 2023)     www.cog-genomics.org/plink/2.0/
(C) 2005-2023 Shaun Purcell, Christopher Chang   GNU General Public License v3
Logging to /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK.log.
Options in effect:
  --bfile /mnt/vast/hpc/csg/FunGen_xQTL/ROSMAP/Genotype/geno_by_chrom/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.1
  --extract /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/MWE_snplist.txt
  --geno 0.01
  --maf 0.01
  --make-bed
  --out /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK

Start time: Fri Mar 15 15:58:48 2024
515666 MiB RAM detected, ~467195 available; reserving 257833 MiB for main
workspace.
Allocated 81579 MiB successfully, after larger attempt(s) failed.
Using up to 48 threads (change this with --threads).
1153 samples (0 females, 0 males, 1153 ambiguous; 1153 founders) loaded from
/mnt/vast/hpc/csg/FunGen_xQTL/ROSMAP/Genotype/geno_by_chrom/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.1.fam.



In [205]:
genotype = read_plink("~/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK")

In [214]:
# calculate LD matrix to run it in rCPP interface 

geno = read_plink("/home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK")$bed
LD = as.matrix(cor(geno))
write.table(LD, "/home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/LD.tsv", sep = "\t", quote = FALSE, row.names = TRUE, col.names = TRUE)

#########FINISHED############
#sumstat = read_delim("/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_MAF_filtered.txt")
LD = vroom("/home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/LD.tsv")[,-1]
LD = as.matrix(LD)

“Detected 2485 column names but the data has 2486 columns (i.e. invalid file). Added 1 extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file.”


In [218]:
LD

chr1:206510390_A_G,chr1:206511644_A_C,chr1:206512294_T_C,chr1:206512396_C_T,chr1:206513188_C_T,chr1:206514603_T_C,chr1:206516658_C_T,chr1:206516810_C_T,chr1:206516811_G_A,chr1:206516953_C_T,⋯,chr1:207936120_G_A,chr1:207936435_A_G,chr1:207937004_C_T,chr1:207937160_C_T,chr1:207937391_AT_A,chr1:207937575_C_T,chr1:207937580_C_T,chr1:207938098_A_G,chr1:207938277_C_T,chr1:207938395_C_T
1.00000000,0.99409214,0.22390305,0.90411284,0.17337791,0.43821645,0.15516385,0.08338565,0.13202961,0.9949434,⋯,0.1217689431,0.1217689431,-0.0206324142,0.1217689431,0.1217689431,0.1217689431,0.008728159,0.1217689431,0.0020829765,0.1217689431
0.99409214,1.00000000,0.22262588,0.90555517,0.17534669,0.43900313,0.14809457,0.07757592,0.13340517,0.9924058,⋯,0.1232687562,0.1232687562,-0.0220252092,0.1232687562,0.1232687562,0.1232687562,0.007626478,0.1232687562,0.0009814535,0.1232687562
0.22390305,0.22262588,1.00000000,0.20167758,0.06443504,0.40632115,0.04647080,0.01478903,0.03450568,0.2227890,⋯,0.0262888500,0.0262888500,-0.0242009038,0.0262888500,0.0262888500,0.0262888500,-0.037377488,0.0262888500,-0.0364369753,0.0262888500
0.90411284,0.90555517,0.20167758,1.00000000,0.21126874,0.47553462,-0.20288867,-0.14380576,0.15119697,0.9039246,⋯,0.0889485083,0.0889485083,-0.0126168014,0.0889485083,0.0889485083,0.0889485083,0.013408413,0.0889485083,0.0075354162,0.0889485083
0.17337791,0.17534669,0.06443504,0.21126874,1.00000000,0.10570158,-0.07275443,-0.04590871,-0.02806835,0.1736936,⋯,-0.0355426488,-0.0355426488,0.0007391420,-0.0355426488,-0.0355426488,-0.0355426488,0.013399418,-0.0355426488,0.0123427326,-0.0355426488
0.43821645,0.43900313,0.40632115,0.47553462,0.10570158,1.00000000,-0.14238551,0.04160169,0.07166351,0.4377692,⋯,-0.0008651596,-0.0008651596,0.0001232571,-0.0008651596,-0.0008651596,-0.0008651596,0.017904166,-0.0008651596,0.0133554713,-0.0008651596
0.15516385,0.14809457,0.04647080,-0.20288867,-0.07275443,-0.14238551,1.00000000,0.63454480,-0.03168311,0.1555628,⋯,0.0613416691,0.0613416691,-0.0444175610,0.0613416691,0.0613416691,0.0613416691,-0.017658879,0.0613416691,-0.0191758509,0.0613416691
0.08338565,0.07757592,0.01478903,-0.14380576,-0.04590871,0.04160169,0.63454480,1.00000000,-0.03135318,0.0836635,⋯,-0.0356573856,-0.0356573856,-0.0640682032,-0.0356573856,-0.0356573856,-0.0356573856,0.005770301,-0.0356573856,0.0050104403,-0.0356573856
0.13202961,0.13340517,0.03450568,0.15119697,-0.02806835,0.07166351,-0.03168311,-0.03135318,1.00000000,0.1322207,⋯,-0.0109814715,-0.0109814715,-0.0501823031,-0.0109814715,-0.0109814715,-0.0109814715,0.032755550,-0.0109814715,0.0323093340,-0.0109814715
0.99494335,0.99240582,0.22278903,0.90392463,0.17369357,0.43776922,0.15556282,0.08366350,0.13222071,1.0000000,⋯,0.1220372769,0.1220372769,-0.0232485640,0.1220372769,0.1220372769,0.1220372769,0.014705099,0.1220372769,0.0081530838,0.1220372769


4123 variants and 1153 samples

In [215]:
X = geno
EAF = c()
for(mm in 1:ncol(X)){
            EAF[mm] = sum(X[,mm])/(2*nrow(X))
}

sumstat$freq = EAF
write_tsv(sumstat, "/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat.tsv")


In [216]:
sumstat

SNP,A1,A2,freq,b,se,p,N
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
chr1:206510390_A_G,G,A,0.51691240,-0.0030,0.0082,0.3572371,1153
chr1:206511644_A_C,C,A,0.51474415,-0.0047,0.0082,0.2832645,1153
chr1:206512294_T_C,C,T,0.96010408,0.0069,0.0212,0.6275880,1153
chr1:206512396_C_T,T,C,0.46357329,-0.0025,0.0082,0.3802295,1153
chr1:206513188_C_T,T,C,0.03469211,0.0165,0.0238,0.7559322,1153
chr1:206514603_T_C,C,T,0.79011275,0.0010,0.0105,0.5379371,1153
chr1:206516658_C_T,T,C,0.03729402,-0.0070,0.0217,0.3735064,1153
chr1:206516810_C_T,T,C,0.01517780,-0.0239,0.0352,0.2485761,1153
chr1:206516811_G_A,A,G,0.01604510,-0.0152,0.0335,0.3250111,1153
chr1:206516953_C_T,T,C,0.51604510,-0.0031,0.0082,0.3526972,1153


In [200]:
LD[,1]


### 2. DENTIST  -- github compiled version implementation

In [None]:
# DENTIST -- github compiled version

~/RSS_QC/DENTIST/DENTIST_1.3.0.0  --bfile  /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_input_sep --gwas-summary /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/sumstat_sep_20.tsv --out DENTIST_40var

In [None]:
# DENTIST -- github compiled version

~/RSS_QC/DENTIST/DENTIST_1.3.0.0  --bfile  /home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_input_MWE --gwas-summary /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_MAF_updated.txt --out DENTIST_new

sumstat: /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/sumstat_sep_20.tsv
LD: /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/LD_sep_20.txt
PLINK: /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_input_sep

In [20]:
wc -l ~/RSS_QC/DENTIST/DENTIST_result.DENTIST.short.txt

63 /home/hs3393/RSS_QC/DENTIST/DENTIST_result.DENTIST.short.txt


Here the row number of *.short.txt shows how many outliers are identified in this region and should be removed.

So for DENTIST compiled version, the **number of outlier is 63**. 

### 3. DENTIST -- Rcpp version

In [35]:
sumstat = read_delim("/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_MAF_filtered.txt") #%>% mutate(z = b / se)
LD = vroom("/home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/LD_MWE.tsv")[,-1]
LD = as.matrix(LD)
dentist_result = dentist(zScore = sumstat$z, LD = LD, nSample = 1153)

[1mRows: [22m[34m4123[39m [1mColumns: [22m[34m8[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (3): SNP, A1, A2
[32mdbl[39m (5): freq, b, se, p, N

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.
“Detected 4123 column names but the data has 4124 columns (i.e. invalid file). Added 1 extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file.”


In [119]:
sumstat = read_delim("/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_MAF_filtered.txt") #%>% mutate(z = b / se)

[1mRows: [22m[34m4123[39m [1mColumns: [22m[34m8[39m
[36m──[39m [1mColumn specification[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m "\t"
[31mchr[39m (3): SNP, A1, A2
[32mdbl[39m (5): freq, b, se, p, N

[36mℹ[39m Use `spec()` to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set `show_col_types = FALSE` to quiet this message.


In [139]:
dentist_result

imputed_z,rsq,corrected_z,iter_to_correct,is_problematic,original_z
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
-1.83233588,0.9843676,2.9014788,10,0,-1.46956522
-1.14057440,0.9652836,-2.7764632,10,0,-1.65789474
-1.15736728,0.9672809,-2.5731372,10,0,-1.62280702
-1.24256029,0.9885428,-3.3885340,10,0,-1.60526316
0.28931101,0.6342879,-0.2464973,10,0,0.14024390
-1.70599901,0.9849100,1.7831310,10,0,-1.48695652
-1.47904789,0.9530842,-0.1171599,10,0,-1.50442478
-0.36542404,0.9731943,-8.3081095,1,0,-1.72566372
-1.95041772,0.8768238,2.8081253,10,0,-0.96486486
-1.28476532,0.5818886,1.9098788,10,0,-0.04980843


In [36]:
dentist_result %>% filter(is_problematic > 0) %>% nrow()

So for DENTIST RCPP version, the **number of outlier is 1053**. The results looks quite different... Take a look at the LD.

In [69]:
library(readr)
library(dplyr, warn.conflicts = FALSE)
library(vroom)
library(RcppArmadillo)
library(Rcpp)
source("/home/rd2972/software/pecotmr/R/run_dentist.R")
source("/home/rd2972/software/pecotmr/R/RcppExports.R")
sourceCpp("/home/hs3393/RSS_QC/pecotmr/src/dentist.cpp")
sourceCpp("/home/rd2972/software/pecotmr/src/RcppExports.cpp")
#dentist_LD_result = vroom("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/output_LD.txt")

“No Rcpp::export attributes or RCPP_MODULE declarations found in source”


In [55]:
LD

chr1:206011748_A_G,chr1:206012565_A_G,chr1:206012721_C_G,chr1:206013858_A_G,chr1:206014205_G_A,chr1:206015713_G_A,chr1:206016996_A_T,chr1:206018106_T_C,chr1:206018145_C_T,chr1:206018463_T_C,⋯,chr1:208455523_C_T,chr1:208455658_G_A,chr1:208455862_A_G,chr1:208456632_T_C,chr1:208457317_A_C,chr1:208457639_C_T,chr1:208458441_G_A,chr1:208459950_T_C,chr1:208460159_A_G,chr1:208461262_G_A
1.00000000,0.98500629,0.98215134,0.97722319,-0.16241358,0.98013632,0.94805046,0.94283218,0.312660171,0.41408706,⋯,-0.021963764,-0.0193532592,-0.0138176149,-0.039436222,0.0256967804,-0.005392992,-0.0135394991,-0.0461785952,-0.044373583,-0.0411658907
0.98500629,1.00000000,0.99703144,0.98655338,-0.16597212,0.98062367,0.94822880,0.95174415,0.305991357,0.40531837,⋯,-0.024245263,-0.0216382623,-0.0055390197,-0.041944429,0.0225030342,-0.007702361,-0.0153967323,-0.0429995460,-0.041161901,-0.0382699506
0.98215134,0.99703144,1.00000000,0.98367739,-0.16673209,0.97479826,0.94244220,0.94886559,0.304141005,0.40288333,⋯,-0.024789252,-0.0221849130,-0.0042933214,-0.042531913,0.0217025520,-0.008264625,-0.0158425589,-0.0437020510,-0.041865722,-0.0390144232
0.97722319,0.98655338,0.98367739,1.00000000,-0.16553057,0.99096828,0.95579392,0.96236864,0.309334231,0.40969015,⋯,-0.022075062,-0.0194857825,-0.0114711522,-0.039436839,0.0250915751,-0.005639725,-0.0136646556,-0.0400754211,-0.038252242,-0.0352470225
-0.16241358,-0.16597212,-0.16673209,-0.16553057,1.00000000,-0.16225296,-0.07833410,-0.08209722,-0.033804260,-0.05615093,⋯,-0.021080030,0.0368082378,0.0606923459,-0.019753167,-0.0102328423,-0.021702814,-0.0279977193,-0.0072278676,-0.005818709,0.0187061610
0.98013632,0.98062367,0.97479826,0.99096828,-0.16225296,1.00000000,0.96493215,0.95671660,0.313204436,0.41476782,⋯,-0.020266689,-0.0262699565,-0.0152635169,-0.037391336,0.0274109927,-0.003872517,-0.0122088692,-0.0376714618,-0.035855042,-0.0327423041
0.94805046,0.94822880,0.94244220,0.95579392,-0.07833410,0.96493215,1.00000000,0.99127458,0.299981014,0.40379655,⋯,-0.027551593,0.0003804079,-0.0003697598,-0.045727553,0.0330202418,-0.010885808,-0.0076633060,-0.0474268837,-0.045571677,-0.0369573921
0.94283218,0.95174415,0.94886559,0.96236864,-0.08209722,0.95671660,0.99127458,1.00000000,0.296396524,0.39906019,⋯,-0.029298390,-0.0014986479,0.0032685364,-0.047709651,0.0307219444,-0.012585963,-0.0091252968,-0.0497536193,-0.047891132,-0.0394115749
0.31266017,0.30599136,0.30414100,0.30933423,-0.03380426,0.31320444,0.29998101,0.29639652,1.000000000,-0.03499675,⋯,-0.016766301,-0.0160300508,0.0370233024,-0.008195615,-0.0242489625,0.038052609,-0.0026999291,-0.0209378224,-0.020386566,-0.0065450325
0.41408706,0.40531837,0.40288333,0.40969015,-0.05615093,0.41476782,0.40379655,0.39906019,-0.034996747,1.00000000,⋯,-0.034190273,-0.0151785978,0.0181593439,-0.046386610,-0.0266248820,0.009562230,-0.0189365839,-0.0321306984,-0.031440272,-0.0603098837


In [166]:
original_plink = vroom("/mnt/vast/hpc/csg/FunGen_xQTL/ROSMAP/Genotype/geno_by_chrom/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.1.bim")

In [54]:
result1 = vroom("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_Rcpp/20240308_version/DENTIST_Rcpp_output_01.txt")

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V2052,V2053,V2054,V2055,V2056,V2057,V2058,V2059,V2060,V2061
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.985010,0.997030,-0.165970,0.980620,0.9517400,0.3059900,0.1963500,0.24287000,1.9603e-01,0.200050,⋯,0.0159500,0.01011600,-0.00949390,-1.4634e-03,-0.0104610,-0.0104610,-0.0242450,-0.00553900,-0.0419440,-0.0430000
0.977220,0.983680,-0.165530,0.990970,0.9623700,0.3093300,0.1821200,0.24568000,1.9891e-01,0.184900,⋯,0.0252130,0.01956100,-0.01308400,4.3563e-03,-0.0073216,-0.0073216,-0.0220750,-0.01147100,-0.0394370,-0.0400750
0.948050,0.942440,-0.078334,0.964930,0.9912700,0.2999800,0.1746400,0.23784000,1.9103e-01,0.177670,⋯,0.0142890,0.00809080,-0.01694700,9.1581e-05,-0.0035575,-0.0035575,-0.0275520,-0.00036976,-0.0457280,-0.0474270
0.414090,0.402880,-0.056151,0.414770,0.3990600,-0.0349970,-0.0490780,-0.02528900,-3.2538e-02,-0.046423,⋯,0.0057649,-0.00058517,-0.04052900,-1.2314e-03,-0.0481700,-0.0481700,-0.0341900,0.01815900,-0.0463870,-0.0321310
0.582510,0.574340,-0.182270,0.589240,0.6041700,0.1733100,0.0681870,0.16826000,8.4623e-02,0.077967,⋯,-0.0131740,-0.02404500,0.02470500,-6.2777e-03,0.0038312,0.0038312,-0.0474740,0.04000800,-0.0347840,-0.0316340
0.289360,0.280880,-0.048627,0.290170,0.2728200,0.9004400,0.7084100,-0.03901300,-7.5352e-05,0.724340,⋯,-0.0075276,-0.01285700,-0.02182200,-3.2347e-03,-0.0328830,-0.0328830,-0.0233400,0.04111700,-0.0169860,-0.0147250
0.197800,0.210030,-0.066790,0.198570,0.1854400,0.6333200,0.9384300,-0.00181520,1.8330e-02,0.987060,⋯,-0.0296610,-0.03375500,-0.01208400,1.7388e-03,-0.0503800,-0.0503800,-0.0097299,0.06845400,-0.0203450,-0.0299390
0.183230,0.213230,-0.064954,0.184010,0.1714600,0.5908600,0.9381800,-0.00176530,1.7826e-02,0.986700,⋯,-0.0288450,-0.03282700,-0.01175200,8.5082e-03,-0.0489950,-0.0489950,-0.0094623,0.06657200,-0.0197860,-0.0291160
0.186980,0.216420,-0.066252,0.187760,0.1750000,0.6072600,0.9508200,-0.00257810,1.6673e-02,1.000000,⋯,-0.0333890,-0.03739900,-0.01365600,8.8226e-03,-0.0497280,-0.0497280,-0.0102980,0.07230000,-0.0206620,-0.0300530
0.203670,0.197390,-0.065482,0.204390,0.1912600,0.6430300,0.8990000,-0.00095894,1.9565e-02,0.945680,⋯,-0.0248920,-0.02896000,-0.01008500,-5.7873e-03,-0.0496530,-0.0496530,-0.0088565,0.06247200,-0.0194390,-0.0289770


ERROR: Error in parse(text = x, srcfile = src): <text>:1:150: unexpected input
1: result2 = fread("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_Rcpp/20240308_version/DENTIST_Rcpp_output_11.txt"）
                                                                                                                                                         ^


In [56]:
result1 = vroom("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_Rcpp/20240308_version/DENTIST_Rcpp_output_02.txt")

In [73]:
dentist_result = dentist(zScore = sumstat$z[2001:4001], LD = LD[2001:4001, 2001:4001], nSample = 1153)

ERROR: Error in value[[3L]](cond): Adjusted rsq_eigen value exceeding 1: 1.000443





In [52]:
dim(dentist_LD_result)
head(dentist_LD_result)

LD_it,for,iteration,:,V5,V6,V7,V8,V9,V10,⋯,V2052,V2053,V2054,V2055,V2056,V2057,V2058,V2059,V2060,V2061
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.98501,0.99703,-0.16597,0.98062,0.95174,0.30599,0.19635,0.24287,0.19603,0.20005,⋯,0.01595,0.010116,-0.0094939,-0.0014634,-0.010461,-0.010461,-0.024245,-0.005539,-0.041944,-0.043
0.97722,0.98368,-0.16553,0.99097,0.96237,0.30933,0.18212,0.24568,0.19891,0.1849,⋯,0.025213,0.019561,-0.013084,0.0043563,-0.0073216,-0.0073216,-0.022075,-0.011471,-0.039437,-0.040075
0.94805,0.94244,-0.078334,0.96493,0.99127,0.29998,0.17464,0.23784,0.19103,0.17767,⋯,0.014289,0.0080908,-0.016947,9.1581e-05,-0.0035575,-0.0035575,-0.027552,-0.00036976,-0.045728,-0.047427
0.41409,0.40288,-0.056151,0.41477,0.39906,-0.034997,-0.049078,-0.025289,-0.032538,-0.046423,⋯,0.0057649,-0.00058517,-0.040529,-0.0012314,-0.04817,-0.04817,-0.03419,0.018159,-0.046387,-0.032131
0.58251,0.57434,-0.18227,0.58924,0.60417,0.17331,0.068187,0.16826,0.084623,0.077967,⋯,-0.013174,-0.024045,0.024705,-0.0062777,0.0038312,0.0038312,-0.047474,0.040008,-0.034784,-0.031634
0.28936,0.28088,-0.048627,0.29017,0.27282,0.90044,0.70841,-0.039013,-7.5352e-05,0.72434,⋯,-0.0075276,-0.012857,-0.021822,-0.0032347,-0.032883,-0.032883,-0.02334,0.041117,-0.016986,-0.014725


Ignore the column names. The dimension of LD is not correct yet, also the diagnol elements are not equal to 1.

In [58]:
DENTIST_compile = vroom("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/DENTIST_Rcpp_output.txt")

“Detected 4 column names but the data has 2061 columns (i.e. invalid file). Added 2057 extra default column names at the end.”
“Stopped early on line 2064. Expected 2061 fields but found 0. Consider fill=TRUE and comment.char=. First discarded non-empty line: <<LD_it for iteration :>>”


In [174]:
EAF

In [115]:
sumstat

SNP,A1,A2,freq,b,se,p,N
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
chr1:206011748_A_G,G,A,0.16565481,-0.0169,0.0115,0.07083977,1153
chr1:206012565_A_G,G,A,0.16912402,-0.0189,0.0114,0.04866936,1153
chr1:206012721_C_G,G,C,0.16999133,-0.0185,0.0114,0.05231533,1153
chr1:206013858_A_G,G,A,0.16608846,-0.0183,0.0114,0.05421795,1153
chr1:206014205_G_A,A,G,0.08976583,0.0023,0.0164,0.55576636,1153
chr1:206015713_G_A,A,G,0.16348656,-0.0171,0.0115,0.06851315,1153
chr1:206016996_A_T,T,A,0.17389419,-0.0170,0.0113,0.06623601,1153
chr1:206018106_T_C,C,T,0.17649610,-0.0195,0.0113,0.04220396,1153
chr1:206018145_C_T,T,C,0.01951431,-0.0357,0.0370,0.16730625,1153
chr1:206018463_T_C,C,T,0.03295750,-0.0013,0.0261,0.48013752,1153


In [175]:
sumstat$freq = EAF

In [153]:
sumstat

SNP,A1,A2,freq,b,se,p,N
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
chr1:206011748_A_G,G,A,0.16565481,-0.0169,0.0115,0.07083977,1153
chr1:206012565_A_G,G,A,0.16912402,-0.0189,0.0114,0.04866936,1153
chr1:206012721_C_G,G,C,0.16999133,-0.0185,0.0114,0.05231533,1153
chr1:206013858_A_G,G,A,0.16608846,-0.0183,0.0114,0.05421795,1153
chr1:206014205_G_A,A,G,0.08976583,0.0023,0.0164,0.55576636,1153
chr1:206015713_G_A,A,G,0.16348656,-0.0171,0.0115,0.06851315,1153
chr1:206016996_A_T,T,A,0.17389419,-0.0170,0.0113,0.06623601,1153
chr1:206018106_T_C,C,T,0.17649610,-0.0195,0.0113,0.04220396,1153
chr1:206018145_C_T,T,C,0.01951431,-0.0357,0.0370,0.16730625,1153
chr1:206018463_T_C,C,T,0.03295750,-0.0013,0.0261,0.48013752,1153


In [94]:
sumstat %>% filter(freq < 0.5)

SNP,A1,A2,freq,b,se,p,N,z
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
chr1:206011748_A_G,G,A,0.16565481,-0.0169,0.0115,0.07083977,1153,-1.46956522
chr1:206012565_A_G,G,A,0.16912402,-0.0189,0.0114,0.04866936,1153,-1.65789474
chr1:206012721_C_G,G,C,0.16999133,-0.0185,0.0114,0.05231533,1153,-1.62280702
chr1:206013858_A_G,G,A,0.16608846,-0.0183,0.0114,0.05421795,1153,-1.60526316
chr1:206014205_G_A,A,G,0.08976583,0.0023,0.0164,0.55576636,1153,0.14024390
chr1:206015713_G_A,A,G,0.16348656,-0.0171,0.0115,0.06851315,1153,-1.48695652
chr1:206016996_A_T,T,A,0.17389419,-0.0170,0.0113,0.06623601,1153,-1.50442478
chr1:206018106_T_C,C,T,0.17649610,-0.0195,0.0113,0.04220396,1153,-1.72566372
chr1:206018145_C_T,T,C,0.01951431,-0.0357,0.0370,0.16730625,1153,-0.96486486
chr1:206018463_T_C,C,T,0.03295750,-0.0013,0.0261,0.48013752,1153,-0.04980843


In [176]:
write_tsv(sumstat, "/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_MAF_updated.txt")

In [None]:
./DENTIST 

In [97]:
LD

chr1:206011748_A_G,chr1:206012565_A_G,chr1:206012721_C_G,chr1:206013858_A_G,chr1:206014205_G_A,chr1:206015713_G_A,chr1:206016996_A_T,chr1:206018106_T_C,chr1:206018145_C_T,chr1:206018463_T_C,⋯,chr1:208455523_C_T,chr1:208455658_G_A,chr1:208455862_A_G,chr1:208456632_T_C,chr1:208457317_A_C,chr1:208457639_C_T,chr1:208458441_G_A,chr1:208459950_T_C,chr1:208460159_A_G,chr1:208461262_G_A
1.00000000,0.98500629,0.98215134,0.97722319,-0.16241358,0.98013632,0.94805046,0.94283218,0.312660171,0.41408706,⋯,-0.021963764,-0.0193532592,-0.0138176149,-0.039436222,0.0256967804,-0.005392992,-0.0135394991,-0.0461785952,-0.044373583,-0.0411658907
0.98500629,1.00000000,0.99703144,0.98655338,-0.16597212,0.98062367,0.94822880,0.95174415,0.305991357,0.40531837,⋯,-0.024245263,-0.0216382623,-0.0055390197,-0.041944429,0.0225030342,-0.007702361,-0.0153967323,-0.0429995460,-0.041161901,-0.0382699506
0.98215134,0.99703144,1.00000000,0.98367739,-0.16673209,0.97479826,0.94244220,0.94886559,0.304141005,0.40288333,⋯,-0.024789252,-0.0221849130,-0.0042933214,-0.042531913,0.0217025520,-0.008264625,-0.0158425589,-0.0437020510,-0.041865722,-0.0390144232
0.97722319,0.98655338,0.98367739,1.00000000,-0.16553057,0.99096828,0.95579392,0.96236864,0.309334231,0.40969015,⋯,-0.022075062,-0.0194857825,-0.0114711522,-0.039436839,0.0250915751,-0.005639725,-0.0136646556,-0.0400754211,-0.038252242,-0.0352470225
-0.16241358,-0.16597212,-0.16673209,-0.16553057,1.00000000,-0.16225296,-0.07833410,-0.08209722,-0.033804260,-0.05615093,⋯,-0.021080030,0.0368082378,0.0606923459,-0.019753167,-0.0102328423,-0.021702814,-0.0279977193,-0.0072278676,-0.005818709,0.0187061610
0.98013632,0.98062367,0.97479826,0.99096828,-0.16225296,1.00000000,0.96493215,0.95671660,0.313204436,0.41476782,⋯,-0.020266689,-0.0262699565,-0.0152635169,-0.037391336,0.0274109927,-0.003872517,-0.0122088692,-0.0376714618,-0.035855042,-0.0327423041
0.94805046,0.94822880,0.94244220,0.95579392,-0.07833410,0.96493215,1.00000000,0.99127458,0.299981014,0.40379655,⋯,-0.027551593,0.0003804079,-0.0003697598,-0.045727553,0.0330202418,-0.010885808,-0.0076633060,-0.0474268837,-0.045571677,-0.0369573921
0.94283218,0.95174415,0.94886559,0.96236864,-0.08209722,0.95671660,0.99127458,1.00000000,0.296396524,0.39906019,⋯,-0.029298390,-0.0014986479,0.0032685364,-0.047709651,0.0307219444,-0.012585963,-0.0091252968,-0.0497536193,-0.047891132,-0.0394115749
0.31266017,0.30599136,0.30414100,0.30933423,-0.03380426,0.31320444,0.29998101,0.29639652,1.000000000,-0.03499675,⋯,-0.016766301,-0.0160300508,0.0370233024,-0.008195615,-0.0242489625,0.038052609,-0.0026999291,-0.0209378224,-0.020386566,-0.0065450325
0.41408706,0.40531837,0.40288333,0.40969015,-0.05615093,0.41476782,0.40379655,0.39906019,-0.034996747,1.00000000,⋯,-0.034190273,-0.0151785978,0.0181593439,-0.046386610,-0.0266248820,0.009562230,-0.0189365839,-0.0321306984,-0.031440272,-0.0603098837


In [219]:
sumstat

SNP,A1,A2,freq,b,se,p,N
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
chr1:206510390_A_G,G,A,0.51691240,-0.0030,0.0082,0.3572371,1153
chr1:206511644_A_C,C,A,0.51474415,-0.0047,0.0082,0.2832645,1153
chr1:206512294_T_C,C,T,0.96010408,0.0069,0.0212,0.6275880,1153
chr1:206512396_C_T,T,C,0.46357329,-0.0025,0.0082,0.3802295,1153
chr1:206513188_C_T,T,C,0.03469211,0.0165,0.0238,0.7559322,1153
chr1:206514603_T_C,C,T,0.79011275,0.0010,0.0105,0.5379371,1153
chr1:206516658_C_T,T,C,0.03729402,-0.0070,0.0217,0.3735064,1153
chr1:206516810_C_T,T,C,0.01517780,-0.0239,0.0352,0.2485761,1153
chr1:206516811_G_A,A,G,0.01604510,-0.0152,0.0335,0.3250111,1153
chr1:206516953_C_T,T,C,0.51604510,-0.0031,0.0082,0.3526972,1153


In [59]:
DENTIST_compile

LD_it,for,iteration,:,V5,V6,V7,V8,V9,V10,⋯,V2052,V2053,V2054,V2055,V2056,V2057,V2058,V2059,V2060,V2061
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.985010,0.997030,-0.165970,0.980620,0.9517400,0.3059900,0.1963500,0.24287000,1.9603e-01,0.200050,⋯,0.0159500,0.01011600,-0.00949390,-1.4634e-03,-0.0104610,-0.0104610,-0.0242450,-0.00553900,-0.0419440,-0.0430000
0.977220,0.983680,-0.165530,0.990970,0.9623700,0.3093300,0.1821200,0.24568000,1.9891e-01,0.184900,⋯,0.0252130,0.01956100,-0.01308400,4.3563e-03,-0.0073216,-0.0073216,-0.0220750,-0.01147100,-0.0394370,-0.0400750
0.948050,0.942440,-0.078334,0.964930,0.9912700,0.2999800,0.1746400,0.23784000,1.9103e-01,0.177670,⋯,0.0142890,0.00809080,-0.01694700,9.1581e-05,-0.0035575,-0.0035575,-0.0275520,-0.00036976,-0.0457280,-0.0474270
0.414090,0.402880,-0.056151,0.414770,0.3990600,-0.0349970,-0.0490780,-0.02528900,-3.2538e-02,-0.046423,⋯,0.0057649,-0.00058517,-0.04052900,-1.2314e-03,-0.0481700,-0.0481700,-0.0341900,0.01815900,-0.0463870,-0.0321310
0.582510,0.574340,-0.182270,0.589240,0.6041700,0.1733100,0.0681870,0.16826000,8.4623e-02,0.077967,⋯,-0.0131740,-0.02404500,0.02470500,-6.2777e-03,0.0038312,0.0038312,-0.0474740,0.04000800,-0.0347840,-0.0316340
0.289360,0.280880,-0.048627,0.290170,0.2728200,0.9004400,0.7084100,-0.03901300,-7.5352e-05,0.724340,⋯,-0.0075276,-0.01285700,-0.02182200,-3.2347e-03,-0.0328830,-0.0328830,-0.0233400,0.04111700,-0.0169860,-0.0147250
0.197800,0.210030,-0.066790,0.198570,0.1854400,0.6333200,0.9384300,-0.00181520,1.8330e-02,0.987060,⋯,-0.0296610,-0.03375500,-0.01208400,1.7388e-03,-0.0503800,-0.0503800,-0.0097299,0.06845400,-0.0203450,-0.0299390
0.183230,0.213230,-0.064954,0.184010,0.1714600,0.5908600,0.9381800,-0.00176530,1.7826e-02,0.986700,⋯,-0.0288450,-0.03282700,-0.01175200,8.5082e-03,-0.0489950,-0.0489950,-0.0094623,0.06657200,-0.0197860,-0.0291160
0.186980,0.216420,-0.066252,0.187760,0.1750000,0.6072600,0.9508200,-0.00257810,1.6673e-02,1.000000,⋯,-0.0333890,-0.03739900,-0.01365600,8.8226e-03,-0.0497280,-0.0497280,-0.0102980,0.07230000,-0.0206620,-0.0300530
0.203670,0.197390,-0.065482,0.204390,0.1912600,0.6430300,0.8990000,-0.00095894,1.9565e-02,0.945680,⋯,-0.0248920,-0.02896000,-0.01008500,-5.7873e-03,-0.0496530,-0.0496530,-0.0088565,0.06247200,-0.0194390,-0.0289770


In [28]:
kk = vroom("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/LD_separate/LD_it_output_20.txt")

In [29]:
head(kk)

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V973,V974,V975,V976,V977,V978,V979,V980,V981,V982
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.675916,-0.0513104,-0.00192395,-0.0447026,0.0505993,-0.0797282,0.0436569,0.0190822,0.321648,0.0507464,⋯,0.137823,0.112539,0.0390107,-0.013961,-0.0256802,-0.085199,0.115159,-0.21198,0.0457133,-0.14308
-0.0286956,0.0256355,0.187835,-0.0329341,0.0358906,0.0264595,-0.0358579,-0.0417524,-0.0314405,-0.0341439,⋯,0.0164742,0.00802881,0.00357216,0.0238751,0.070886,-0.0322581,-0.0545845,0.0235798,-0.00985897,-0.0141408
-0.0742423,0.186047,0.0714334,0.0313478,-0.0447789,0.184357,-0.172398,-0.0233221,-0.0427587,0.00418691,⋯,-0.0185382,-0.00656875,0.0111695,-0.154754,0.0525748,0.010088,-0.0784397,0.0335985,-0.0761834,0.0346303
-0.190902,0.00375349,-0.00168635,-0.0666032,0.0394499,-0.0277484,0.0172595,-0.00700894,-0.34048,-0.051328,⋯,-0.060039,-0.0647869,-0.0505422,-0.00102061,-0.0511662,0.0973992,-0.098007,-0.0827689,-0.0417143,0.179185
0.807836,-0.0476307,-7.06678e-05,-0.00322825,0.0854091,-0.0509672,0.0533767,0.0131902,0.220293,0.0715207,⋯,0.0398345,0.00410843,0.0449112,0.011596,-0.0482848,-0.092978,0.0922414,-0.208791,0.0602095,-0.123529
-0.0531262,0.00703471,0.0220129,0.222993,-0.0183365,0.0137763,0.0501553,0.0282091,0.00702352,0.0716415,⋯,-0.0555523,0.0441245,0.0994747,-0.0447567,0.102343,0.00352642,-0.071931,0.0556175,-0.0264469,-0.0668466


In [26]:
k2 = vroom("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_Rcpp/4213_variants/DENTIST_Rcpp_output_01.txt")

In [27]:
sumstat: /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/sumstat_sep_20.tsv
LD: /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/LD_sep_20.txt
PLINK: /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_input_sep

V1,V2,V3,V4,V5,V6,V7,V8,V9,V10,⋯,V2052,V2053,V2054,V2055,V2056,V2057,V2058,V2059,V2060,V2061
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.985010,0.997030,-0.165970,0.980620,0.9517400,0.3059900,0.1963500,0.24287000,1.9603e-01,0.200050,⋯,0.0159500,0.01011600,-0.00949390,-1.4634e-03,-0.0104610,-0.0104610,-0.0242450,-0.00553900,-0.0419440,-0.0430000
0.977220,0.983680,-0.165530,0.990970,0.9623700,0.3093300,0.1821200,0.24568000,1.9891e-01,0.184900,⋯,0.0252130,0.01956100,-0.01308400,4.3563e-03,-0.0073216,-0.0073216,-0.0220750,-0.01147100,-0.0394370,-0.0400750
0.948050,0.942440,-0.078334,0.964930,0.9912700,0.2999800,0.1746400,0.23784000,1.9103e-01,0.177670,⋯,0.0142890,0.00809080,-0.01694700,9.1581e-05,-0.0035575,-0.0035575,-0.0275520,-0.00036976,-0.0457280,-0.0474270
0.414090,0.402880,-0.056151,0.414770,0.3990600,-0.0349970,-0.0490780,-0.02528900,-3.2538e-02,-0.046423,⋯,0.0057649,-0.00058517,-0.04052900,-1.2314e-03,-0.0481700,-0.0481700,-0.0341900,0.01815900,-0.0463870,-0.0321310
0.582510,0.574340,-0.182270,0.589240,0.6041700,0.1733100,0.0681870,0.16826000,8.4623e-02,0.077967,⋯,-0.0131740,-0.02404500,0.02470500,-6.2777e-03,0.0038312,0.0038312,-0.0474740,0.04000800,-0.0347840,-0.0316340
0.289360,0.280880,-0.048627,0.290170,0.2728200,0.9004400,0.7084100,-0.03901300,-7.5352e-05,0.724340,⋯,-0.0075276,-0.01285700,-0.02182200,-3.2347e-03,-0.0328830,-0.0328830,-0.0233400,0.04111700,-0.0169860,-0.0147250
0.197800,0.210030,-0.066790,0.198570,0.1854400,0.6333200,0.9384300,-0.00181520,1.8330e-02,0.987060,⋯,-0.0296610,-0.03375500,-0.01208400,1.7388e-03,-0.0503800,-0.0503800,-0.0097299,0.06845400,-0.0203450,-0.0299390
0.183230,0.213230,-0.064954,0.184010,0.1714600,0.5908600,0.9381800,-0.00176530,1.7826e-02,0.986700,⋯,-0.0288450,-0.03282700,-0.01175200,8.5082e-03,-0.0489950,-0.0489950,-0.0094623,0.06657200,-0.0197860,-0.0291160
0.186980,0.216420,-0.066252,0.187760,0.1750000,0.6072600,0.9508200,-0.00257810,1.6673e-02,1.000000,⋯,-0.0333890,-0.03739900,-0.01365600,8.8226e-03,-0.0497280,-0.0497280,-0.0102980,0.07230000,-0.0206620,-0.0300530
0.203670,0.197390,-0.065482,0.204390,0.1912600,0.6430300,0.8990000,-0.00095894,1.9565e-02,0.945680,⋯,-0.0248920,-0.02896000,-0.01008500,-5.7873e-03,-0.0496530,-0.0496530,-0.0088565,0.06247200,-0.0194390,-0.0289770


In [220]:
rds = readRDS("~/RSS_QC/toy_example_zRdiscrep.rds")

In [224]:
var_names = vroom("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/out.DENTIST.full.txt")

In [234]:
var_name = var_names %>% pull(V1) %>% str_replace("chr", "") %>% str_replace("_", ":") %>% str_replace("_", ":")

In [239]:
var_name[2000] %in% colnames(rds$LD)

In [238]:
colnames(rds$LD)

In [236]:
rds$LD[var_name, var_name]

ERROR: Error in rds$LD[var_name, var_name]: subscript out of bounds


In [240]:
sumstat_4k = vroom("~/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_4k.tsv")

In [242]:
sumstat_4k %>% pull(plink_variant_id) %>% 
    write.table("/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/4k_snplist.txt", row.names = F, quote = F)

In [None]:
# extract the PLINK files from chromosome 1

plink2 --bfile /mnt/vast/hpc/csg/FunGen_xQTL/ROSMAP/Genotype/geno_by_chrom/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.1 --extract /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/4k_snplist.txt --make-bed --maf 0.01 --geno 0.01 --out ~/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_4k

In [243]:
# calculate LD matrix to run it in rCPP interface 

geno = read_plink("/home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_4k")$bed
LD = as.matrix(cor(geno))
write.table(LD, "/home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/LD_4k.tsv", sep = "\t", quote = FALSE, row.names = TRUE, col.names = TRUE)

In [245]:
X = geno
EAF = c()
for(mm in 1:ncol(X)){
            EAF[mm] = sum(X[,mm])/(2*nrow(X))
}

In [246]:
sumstat_4k$maf = EAF

In [248]:
sumstat_4k %>% mutate(variant_id = paste0("chr", chrom, ":", pos, "_", A2, "_", A1))  %>% 
    mutate(pvalue = pnorm(beta / se)) %>%
    select(variant_id, A1, A2, maf, beta, se,  pvalue ,n_case) %>% rename(SNP = variant_id,
                                                                        freq = maf, b = beta,
                                                                        p = pvalue, N = n_case) %>%
    mutate(N = 1153) %>%
    write_tsv("~/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_4k.tsv")

In [283]:
sumstat_4k[730,]$pos - sumstat_4k[1,]$pos

In [None]:
./DENTIST_1.3.0.0 --bfile /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_4k --gwas-summary /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_4k.tsv --out D_4k_8iter --no-missing-genotype  --wind-dist 2000000  --dup-threshold 1.0 --iteration-num 8

In [293]:
vroom("~/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_4k.tsv")[c(743:4123),] %>% pull(SNP) %>% 
    write.table("/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/MWE_snplist_4k_window2.txt", row.names = F, quote = F)

In [None]:
# extract the PLINK files from chromosome 1

plink2 --bfile /mnt/vast/hpc/csg/FunGen_xQTL/ROSMAP/Genotype/geno_by_chrom/ROSMAP_NIA_WGS.leftnorm.bcftools_qc.plink_qc.1 --extract /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/MWE_snplist_4k_window2.txt --make-bed --maf 0.01 --geno 0.01 --out ~/RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_4k_window2

In [286]:
write.table(LD[2:4123, 2:4123], "/home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/LD_4k_even.tsv", sep = "\t", quote = FALSE, row.names = TRUE, col.names = TRUE)

In [292]:
vroom("~/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_4k.tsv")[c(743:4123),] %>% write_tsv("~/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_4k_window2.tsv")

In [289]:
(4123 - 2746) / 4123

In [296]:
2449520 / 4

In [336]:
vroom("~/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_4k.tsv") %>% separate(SNP, into = c("chr", "pos"), sep = ":") %>%
separate(pos,into = c("pos", "var"), sep = "_") %>% mutate(pos = as.numeric(pos)) %>%
mutate(chr = str_replace(chr, "^chr", "")) %>% mutate(z = b / se, chr = as.numeric(chr))  %>%
select(chr, pos, A1, A2, z) %>% write_tsv("~/RSS_QC/pecotmr/data/RSS_QC_MWE/sumstat_pecotmr_format.tsv")

“[1m[22mExpected 2 pieces. Additional pieces discarded in 4123 rows [1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].”


In [308]:
(sumstat$pos[4123] - sumstat$pos[1])/4

In [332]:
sumstat$pos[1407] - sumstat$pos[1]

sumstat$pos[515] - sumstat$pos[1]

sumstat$pos[1407] - sumstat$pos[515]

sumstat$pos[2126] - sumstat$pos[1407]

sumstat$pos[3181] - sumstat$pos[2126]
