# Introduction

This notebook is to study the DENTIST algorithm based on the original C++ [(github repo)](https://github.com/Yves-CHEN/DENTIST) and the version we converted into R [slalom.R](https://github.com/cumc/pecotmr/blob/main/R/slalom.R).

# DENTIST in C++

## Installation

Please follow the instructions from [Gao's fork](https://github.com/gaow/DENTIST/tree/master). We failed to compile the original repo.

The compiled executable file will be stored in `./builts` folder.

## Run DENTIST on test data

```shell
/home/rd2972/software/DENTIST/builts/DENTIST.tmp2 \
    --bfile  /home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_input_MWE \
    --gwas-summary /home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_MAF_filtered.txt \
    --out /home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/out
```

- `Input data`
    - The bfiles contain 4123 variants of 1153 individuals.
    - The summary statistics is for the same 4123 variants.
- `Output results`
    - Output from the original `DENTIST`
        - `/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/out.DENTIST.full.txt` (4123 variants)
        - `/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/out.DENTIST.ignored.txt` (empty)
        - `/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/out.DENTIST.short.txt` (63 variants)
    - Output of `LD_it`:
        - We edit the C++ code so every time it runs `oneIteration` function, it will append `LD_it` to `/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/LD_it_output.txt`
        - Then we separate the output file into seperate `LD_it`, stored at `/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/LD_separate/`
        - **WEIRD WEIRD WEIRD**: for 8 iterations there are 32 `LD_it`s. (why not 16??)

# Run dentist in R

In [1]:
rm(list=ls())
library(readr)
library(dplyr, warn.conflicts = FALSE)
library(data.table)
library(RcppArmadillo)
library(Rcpp)
source("/home/rd2972/software/pecotmr/R/run_dentist.R")
source("/home/rd2972/software/pecotmr/R/RcppExports.R")
sourceCpp("/home/rd2972/software/pecotmr/src/dentist.cpp")
sourceCpp("/home/rd2972/software/pecotmr/src/RcppExports.cpp")


Attaching package: ‘data.table’


The following objects are masked from ‘package:dplyr’:

    between, first, last


Registered S3 methods overwritten by 'RcppGSL':
  method               from         
  predict.fastLm       RcppArmadillo
  print.fastLm         RcppArmadillo
  summary.fastLm       RcppArmadillo
  print.summary.fastLm RcppArmadillo

“No Rcpp::export attributes or RCPP_MODULE declarations found in source”


In [2]:
snp_data = read_delim("/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/Sumstat_MAF_filtered.txt", show_col_types = FALSE) %>% mutate(z = b / se)
LD_data = fread("/home/hs3393/RSS_QC/pecotmr/data/RSS_QC_MWE/LD_MWE.tsv")[,-1]
LD_data = as.matrix(LD_data)

“Detected 4123 column names but the data has 4124 columns (i.e. invalid file). Added 1 extra default column name for the first column which is guessed to be row names or an index. Use setnames() afterwards if this guess is not correct, or fix the file write command that created the file to create a valid file.”


In [3]:
head(snp_data,3)
dim(snp_data)

SNP,A1,A2,freq,b,se,p,N,z
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
chr1:206011748_A_G,G,A,0.1865,-0.0169,0.0115,0.07083977,1153,-1.469565
chr1:206012565_A_G,G,A,0.1908,-0.0189,0.0114,0.04866936,1153,-1.657895
chr1:206012721_C_G,G,C,0.193,-0.0185,0.0114,0.05231533,1153,-1.622807


In [4]:
res1 = dentist(snp_data$z, LD_data, 
                 nSample = 1153, nIter = 8, 
                 pValueThreshold = 5.0369e-8, gPvalueThreshold = 0.05,
                 propSVD = 0.4, gcControl = FALSE)
# most argument values are matched with default values here: /home/rd2972/software/DENTIST/options.h
# 1153 comes from ` /home/hs3393//RSS_QC/pecotmr/data/RSS_QC_MWE/PLINK_input_MWE.fam`
# the output are all stored at `/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_Rcpp/`

In [5]:
head(res1,3)

Unnamed: 0_level_0,imputed_z,rsq,corrected_z,iter_to_correct,is_problematic,original_z
Unnamed: 0_level_1,<dbl>,<dbl>,<dbl>,<int>,<dbl>,<dbl>
1,-0.3698164,0.7568408,-2.230222,8,0,-1.469565
2,-0.2682808,0.7585386,-2.827941,8,0,-1.657895
3,-0.2928419,0.7613557,-2.72248,8,0,-1.622807


In [6]:
saveRDS(res1, file = "/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_Rcpp/result.rds")

Note: There are two `LD_it`s for each iteration, and the index of iteration starts from 0. For example, `DENTIST_Rcpp_output_42.txt` stores the `LD_it` from the second time it runs `oneIteration` function in the No.5 iteration.

# Compare results

In [8]:
# load the two LD_its from the first ieration
Rcpp_res_iter_0_1 <- fread("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_Rcpp/4213_variants/DENTIST_Rcpp_output_01.txt")
Rcpp_res_iter_0_2 <- fread("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_Rcpp/4213_variants/DENTIST_Rcpp_output_02.txt")

In [9]:
# load the first two LD_its from 
dentist_res_1 <-fread("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/LD_separate/LD_it_output_1.txt")
dentist_res_2 <-fread("/home/rd2972/private_data/20240300_rss_qc_imputation/DENTIST/per_iteration/dentist_original/4213_variants/LD_separate/LD_it_output_2.txt")

In [12]:
print("==============From Rcpp version=====================")
dim(Rcpp_res_iter_0_1)
Rcpp_res_iter_0_1[1:10,1:10]
dim(Rcpp_res_iter_0_2)
Rcpp_res_iter_0_2[1:10,1:10]
print("==============From DENTIST C++ version=====================")
dim(dentist_res_1)
dentist_res_1[1:10,1:10]
dim(dentist_res_2)
dentist_res_2[1:10,1:10]



V1,V2,V3,V4,V5,V6,V7,V8,V9,V10
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.98501,0.99703,-0.16597,0.98062,0.95174,0.30599,0.19635,0.24287,0.19603,0.20005
0.97722,0.98368,-0.16553,0.99097,0.96237,0.30933,0.18212,0.24568,0.19891,0.1849
0.94805,0.94244,-0.078334,0.96493,0.99127,0.29998,0.17464,0.23784,0.19103,0.17767
0.41409,0.40288,-0.056151,0.41477,0.39906,-0.034997,-0.049078,-0.025289,-0.032538,-0.046423
0.58251,0.57434,-0.18227,0.58924,0.60417,0.17331,0.068187,0.16826,0.084623,0.077967
0.28936,0.28088,-0.048627,0.29017,0.27282,0.90044,0.70841,-0.039013,-7.5352e-05,0.72434
0.1978,0.21003,-0.06679,0.19857,0.18544,0.63332,0.93843,-0.0018152,0.01833,0.98706
0.18323,0.21323,-0.064954,0.18401,0.17146,0.59086,0.93818,-0.0017653,0.017826,0.9867
0.18698,0.21642,-0.066252,0.18776,0.175,0.60726,0.95082,-0.0025781,0.016673,1.0
0.20367,0.19739,-0.065482,0.20439,0.19126,0.64303,0.899,-0.00095894,0.019565,0.94568


V1,V2,V3,V4,V5,V6,V7,V8,V9,V10
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.98501,0.99703,-0.16597,0.98062,0.95174,0.30599,0.19635,0.24287,0.19603,0.20005
0.97722,0.98368,-0.16553,0.99097,0.96237,0.30933,0.18212,0.24568,0.19891,0.1849
0.94805,0.94244,-0.078334,0.96493,0.99127,0.29998,0.17464,0.23784,0.19103,0.17767
0.41409,0.40288,-0.056151,0.41477,0.39906,-0.034997,-0.049078,-0.025289,-0.032538,-0.046423
0.58251,0.57434,-0.18227,0.58924,0.60417,0.17331,0.068187,0.16826,0.084623,0.077967
0.28936,0.28088,-0.048627,0.29017,0.27282,0.90044,0.70841,-0.039013,-7.5352e-05,0.72434
0.1978,0.21003,-0.06679,0.19857,0.18544,0.63332,0.93843,-0.0018152,0.01833,0.98706
0.18323,0.21323,-0.064954,0.18401,0.17146,0.59086,0.93818,-0.0017653,0.017826,0.9867
0.18698,0.21642,-0.066252,0.18776,0.175,0.60726,0.95082,-0.0025781,0.016673,1.0
0.20367,0.19739,-0.065482,0.20439,0.19126,0.64303,0.899,-0.00095894,0.019565,0.94568




V1,V2,V3,V4,V5,V6,V7,V8,V9,V10
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.948033,0.948211,0.955779,-0.0785996,0.964921,0.991271,0.403735,0.188134,0.177608,0.237791
0.312609,0.305939,0.309283,-0.0338847,0.313154,0.296342,-0.0350442,0.633306,0.60725,-0.0351602
0.582866,0.575675,0.585092,-0.182146,0.589592,0.604548,0.242507,0.0866302,0.0780352,0.168334
0.289292,0.282638,0.286153,-0.0487172,0.2901,0.272754,-0.0425133,0.755327,0.72433,-0.0390478
0.183171,0.196556,0.181131,-0.0650268,0.18395,0.171392,-0.0457802,0.972503,0.9867,-0.00179176
0.166918,0.196288,0.182052,-0.0713314,0.185002,0.171811,-0.0491233,0.93843,0.950818,-0.00567403
0.19237,0.195959,0.198841,0.0278439,0.201963,0.188044,-0.0325835,0.0183002,0.0166432,-0.033922
0.203609,0.198676,0.201369,-0.0655563,0.204336,0.191198,-0.0463954,0.985572,0.945678,-0.000985741
0.183171,0.196556,0.181131,-0.0650268,0.18395,0.171392,-0.0457802,0.972503,0.9867,-0.00179176
0.167823,0.198033,0.183351,-0.0688656,0.18626,0.17328,-0.0478103,0.934984,0.948702,-0.00417425


V1,V2,V3,V4,V5,V6,V7,V8,V9,V10
<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
0.948033,0.312609,0.582866,0.289292,0.183171,0.166918,0.19237,0.203609,0.183171,0.167823
0.948211,0.305939,0.575675,0.282638,0.196556,0.196288,0.195959,0.198676,0.196556,0.198033
0.955779,0.309283,0.585092,0.286153,0.181131,0.182052,0.198841,0.201369,0.181131,0.183351
-0.0785996,-0.0338847,-0.182146,-0.0487172,-0.0650268,-0.0713314,0.0278439,-0.0655563,-0.0650268,-0.0688656
0.964921,0.313154,0.589592,0.2901,0.18395,0.185002,0.201963,0.204336,0.18395,0.18626
0.991271,0.296342,0.604548,0.272754,0.171392,0.171811,0.188044,0.191198,0.171392,0.17328
0.403735,-0.0350442,0.242507,-0.0425133,-0.0457802,-0.0491233,-0.0325835,-0.0463954,-0.0457802,-0.0478103
0.188134,0.633306,0.0866302,0.755327,0.972503,0.93843,0.0183002,0.985572,0.972503,0.934984
0.177608,0.60725,0.0780352,0.72433,0.9867,0.950818,0.0166432,0.945678,0.9867,0.948702
0.237791,-0.0351602,0.168334,-0.0390478,-0.00179176,-0.00567403,-0.033922,-0.000985741,-0.00179176,-0.00417425
