In [34]:
Output = '/Users/alexis/Library/CloudStorage/OneDrive-UniversityofNorthCarolinaatChapelHill/CEMALB_DataAnalysisPM/Projects/P1015. Fire Sufficient Similarity/3. Analyses/2. Data Processing/Output'
cur_date = '022025'

library(missForest)
library(readxl)
library(tidyverse)
library(imputeLCMD)
library(factoextra)

# reading in files
ws_df = data.frame(Data = 'WS', read_excel("Input/Woodsmoke_Data_012825.xlsx", sheet = 2)) %>%
    select(-Sample_Number)
wf_df = data.frame(Data = 'WF', read_excel("Input/Wildfire_Data_012825.xlsx", sheet = 2)) %>%
    select(-Sample_Number)

NOTES TO SELF:
- change 'replicate' to 'instance' and 'chemical class' to 'class'

In [35]:
head(ws_df)
head(wf_df)

Unnamed: 0_level_0,Data,HAWC_ID,Study,Replicate,Chemical_Class,Metric,DTXSID,Name,Value
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>
1,WS,821855,Erlandsson et al. 2020,1.0,PAH,Weight,DTXSID3074787,1-Methylanthracene,11.3
2,WS,821855,Erlandsson et al. 2020,1.0,PAH,Weight,DTXSID3074787,1-Methylanthracene,2.3
3,WS,1257056,McCarrick et al. 2024,1.0,PAH,Weight,DTXSID3074787,1-Methylanthracene,1.66
4,WS,267140,Alfheim and Ramdahl 1984,1.0,PAH,Volume,DTXSID3074787,1-Methylanthracene,
5,WS,1263480,Burnet et al. 1990,1.0,PAH,Volume,DTXSID3074787,1-Methylanthracene,
6,WS,1263480,Burnet et al. 1990,,PAH,Volume,DTXSID3074787,1-Methylanthracene,


Unnamed: 0_level_0,Data,HAWC_ID,Study,Replicate,Chemical_Class,Metric,DTXSID,Name,Value
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>
1,WF,1289821,Liang et al. 2021,1.0,PAH,Volume,,1-(10-methylanthracen-9-yl)ethanone,5.0
2,WF,1289821,Liang et al. 2021,1.0,PAH,Volume,DTXSID50176885,1-Acenaphthenone,1.0
3,WF,1289737,Campbell et al. 2024,1.0,PAH,Weight,DTXSID1074759,1-Methylchrysene,
4,WF,1289739,Campos et al. 2019,1.0,PAH,Weight,DTXSID1074759,1-Methylchrysene,
5,WF,1289739,Campos et al. 2019,,PAH,Weight,DTXSID1074759,1-Methylchrysene,
6,WF,1289739,Campos et al. 2019,,PAH,Weight,DTXSID1074759,1-Methylchrysene,


In [36]:
dim(ws_df)
dim(wf_df)

# Filter 1

Removing duplicate records for instances that weren't measured at all (MAR). 

In [37]:
`%notin%` <- Negate(`%in%`)

ws_df = ws_df %>%
    filter(Replicate %notin% NA)

wf_df = wf_df %>%
    filter(Replicate %notin% NA)

In [38]:
dim(ws_df)
dim(wf_df)

In [39]:
# combining dfs
combined_df = rbind(ws_df, wf_df)
head(combined_df)

Unnamed: 0_level_0,Data,HAWC_ID,Study,Replicate,Chemical_Class,Metric,DTXSID,Name,Value
Unnamed: 0_level_1,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<chr>,<chr>
1,WS,821855,Erlandsson et al. 2020,1,PAH,Weight,DTXSID3074787,1-Methylanthracene,11.3
2,WS,821855,Erlandsson et al. 2020,1,PAH,Weight,DTXSID3074787,1-Methylanthracene,2.3
3,WS,1257056,McCarrick et al. 2024,1,PAH,Weight,DTXSID3074787,1-Methylanthracene,1.66
4,WS,267140,Alfheim and Ramdahl 1984,1,PAH,Volume,DTXSID3074787,1-Methylanthracene,
5,WS,1263480,Burnet et al. 1990,1,PAH,Volume,DTXSID3074787,1-Methylanthracene,
6,WS,267091,Forchhammer et al. 2012,1,PAH,Volume,DTXSID3074787,1-Methylanthracene,


In [40]:
length(unique(combined_df$Name))

192 unique variables.

# Variable Background Filters

Determining what variables have missing data for all variables delineated based on the dfs the data will be split into and analyzed. 

1. A variable (ie. `Name` which represents a chemical or metal) will be removed if < 50% of its experimental data points were either measurable concentrations and/or specified as non-detects. In other words, chemicals were excluded if they were NAs (not evaluated) across >= 50% of the samples.
2. The data needs to have at least one experimental (ie. measured or ND) value in both WS and WF data.

In [41]:
sample_type_presence_df = combined_df %>%
    # if the value isn't MAR count it as being present
    mutate(count = ifelse(Value != 'NA', 1, 0)) %>%
    # determining which have at least one experimental value within each sample type
    group_by(Data, Metric, Name) %>%
    # summing the number of experimental records for each variable
    reframe(data_group_count = sum(count))

head(sample_type_presence_df)

Data,Metric,Name,data_group_count
<chr>,<chr>,<chr>,<dbl>
WF,Volume,"1,2,4-Trimethylbenzene",2
WF,Volume,"1,3,5-Trimethylbenzene",2
WF,Volume,"1,3-Dihydroxynaphthalene",1
WF,Volume,"1,4-Dichloro-2-butene, cis",1
WF,Volume,"1,4-Dichlorobenzene",1
WF,Volume,"1,8-Dihydroxynaphthalene",1


In [42]:
dim(sample_type_presence_df)

sample_type_keep_df = sample_type_presence_df %>%
    filter(data_group_count > 0) 

dim(sample_type_keep_df)

In [43]:
# filtering the original df
filter1_df = inner_join(sample_type_keep_df[,1:3], combined_df)
head(filter1_df)

[1m[22mJoining with `by = join_by(Data, Metric, Name)`


Data,Metric,Name,HAWC_ID,Study,Replicate,Chemical_Class,DTXSID,Value
<chr>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>
WF,Volume,"1,2,4-Trimethylbenzene",1289926,Wang et al. 2024,1,VOC,DTXSID6021402,730
WF,Volume,"1,2,4-Trimethylbenzene",1306371,Ketcherside et al. 2024,1,VOC,DTXSID6021402,130
WF,Volume,"1,3,5-Trimethylbenzene",1289926,Wang et al. 2024,1,VOC,DTXSID6026797,330
WF,Volume,"1,3,5-Trimethylbenzene",1306371,Ketcherside et al. 2024,1,VOC,DTXSID6026797,110
WF,Volume,"1,3-Dihydroxynaphthalene",1289821,Liang et al. 2021,1,PAH,DTXSID40456587,6
WF,Volume,"1,4-Dichloro-2-butene, cis",1289926,Wang et al. 2024,1,VOC,DTXSID3027405,230


In [44]:
dim(combined_df)
dim(filter1_df)

Started with 6581 records, 1363 were removed, leaving 5218. 

Now that each sample type (WS of WF) has at least one experimental value, we'll see if there are at least 50% of data between the sample types.

In [45]:
variable_presence_df = filter1_df %>%
    # if the value isn't MAR count it as being present
    mutate(count = ifelse(Value != 'NA', 1, 0)) %>%
    group_by(Metric, Name) %>%
    # calculating the percentage of variables with data overall
    reframe(Variable_Presence_Percentage = (sum(count)/n()) * 100) %>%
    arrange(-Variable_Presence_Percentage)

# viewing data that passed the filter
keep_variables_df = variable_presence_df %>%
     filter(Variable_Presence_Percentage >= 50) %>%
     unique()

head(keep_variables_df)

Metric,Name,Variable_Presence_Percentage
<chr>,<chr>,<dbl>
Volume,"1,2,4-Trimethylbenzene",100
Volume,"1,3,5-Trimethylbenzene",100
Volume,"1,3-Dihydroxynaphthalene",100
Volume,"1,4-Dichloro-2-butene, cis",100
Volume,"1,4-Dichlorobenzene",100
Volume,"1,8-Dihydroxynaphthalene",100


In [46]:
# only keeping records that passed the background filter
filter2_df = inner_join(keep_variables_df[,1:2], filter1_df) %>%
    # metals don't have a DTXSID, so making that col their name
    mutate(DTXSID = ifelse(DTXSID != 'NA', DTXSID, Name)) 
    

head(filter2_df)

[1m[22mJoining with `by = join_by(Metric, Name)`


Metric,Name,Data,HAWC_ID,Study,Replicate,Chemical_Class,DTXSID,Value
<chr>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>
Volume,"1,2,4-Trimethylbenzene",WF,1289926,Wang et al. 2024,1,VOC,DTXSID6021402,730
Volume,"1,2,4-Trimethylbenzene",WF,1306371,Ketcherside et al. 2024,1,VOC,DTXSID6021402,130
Volume,"1,3,5-Trimethylbenzene",WF,1289926,Wang et al. 2024,1,VOC,DTXSID6026797,330
Volume,"1,3,5-Trimethylbenzene",WF,1306371,Ketcherside et al. 2024,1,VOC,DTXSID6026797,110
Volume,"1,3-Dihydroxynaphthalene",WF,1289821,Liang et al. 2021,1,PAH,DTXSID40456587,6
Volume,"1,4-Dichloro-2-butene, cis",WF,1289926,Wang et al. 2024,1,VOC,DTXSID3027405,230


In [47]:
dim(filter2_df)

An additional 336 records were removed, leaving 4882.

# Second Variable Background Filter

In the intial filter, samples were combined. However, this time variables (ie. metal or chemical) will be split based on their `Metric` (ie. volume or weight) and then retained if that variable in found in both woodsmoke and wildfire samples within volume or weight samples.

In [48]:
split_filtered_df = filter2_df %>%
    group_by(Data, Metric) %>%
    group_split

split_ws_vol_df = split_filtered_df[[3]]
split_ws_weight_df = split_filtered_df[[4]]
split_wf_vol_df = split_filtered_df[[1]]
split_wf_weight_df = split_filtered_df[[2]]

In [49]:
# seeing how many unique variables are in each df and if they're consistent in each file
length(unique(split_ws_vol_df$Name))
length(unique(split_wf_vol_df$Name))
length(unique(split_ws_weight_df$Name))
length(unique(split_wf_weight_df$Name))

In [50]:
# they're not so first getting variables that are in weight or volume samples
consistent_wf_vol_df = split_wf_vol_df %>%
    filter(Name %in% unique(split_ws_vol_df$Name))
consistent_ws_vol_df = split_ws_vol_df %>%
    filter(Name %in% consistent_wf_vol_df$Name)
consistent_wf_weight_df = split_wf_weight_df %>%
    filter(Name %in% unique(split_ws_weight_df$Name))
consistent_ws_weight_df = split_ws_weight_df %>%
    filter(Name %in% consistent_wf_weight_df$Name)

length(unique(consistent_wf_vol_df$Name))
length(unique(consistent_ws_vol_df$Name))
length(unique(consistent_wf_weight_df$Name))
length(unique(consistent_ws_weight_df$Name))

There were 95, 67, 85 and 68 woodsmoke weight, woodsmoke volume, wildfire weight, and wildfire volume samples, respectively. 32 variables were common between volume samples and 35 were common between weight records and will be retained.

In [51]:
# recombining data
vol_df = rbind(consistent_ws_vol_df, consistent_wf_vol_df)
weight_df = rbind(consistent_ws_weight_df, consistent_wf_weight_df)

head(vol_df)

Metric,Name,Data,HAWC_ID,Study,Replicate,Chemical_Class,DTXSID,Value
<chr>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>
Volume,Acenaphthylene,WS,1263480,Burnet et al. 1990,1,PAH,DTXSID3023845,6187000
Volume,Acenaphthylene,WS,1263480,Burnet et al. 1990,2,PAH,DTXSID3023845,18890500
Volume,Acenaphthylene,WS,1263480,Burnet et al. 1990,3,PAH,DTXSID3023845,7806000
Volume,Acenaphthylene,WS,1263484,Leese et al. 1989,1,PAH,DTXSID3023845,1100000
Volume,Acenaphthylene,WS,1263484,Leese et al. 1989,2,PAH,DTXSID3023845,2800000
Volume,Acenaphthylene,WS,429445,Rajput 2010,1,PAH,DTXSID3023845,53500


# QRILC Imputation

Imputing non-detect data.

In [52]:
head(weight_df %>%
    filter(Value == 'ND'))

Metric,Name,Data,HAWC_ID,Study,Replicate,Chemical_Class,DTXSID,Value
<chr>,<chr>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>
Weight,Fluoranthene,WS,1098462,Niu et al. 2023,3,PAH,DTXSID3024104,ND
Weight,Phenanthrene,WS,1098462,Niu et al. 2023,3,PAH,DTXSID6024254,ND
Weight,Phenanthrene,WS,914540,Verma et al. 2021,7,PAH,DTXSID6024254,ND
Weight,Phenanthrene,WS,914540,Verma et al. 2021,8,PAH,DTXSID6024254,ND
Weight,Ni,WS,822010,Farina et al. 2019,1,Metal,Ni,ND
Weight,Ni,WS,299223,Kasurinen et al. 2015,2,Metal,Ni,ND


Only the weight dataframe has non-detect values, which will be imputed using QRILC. However, its MAR data that will be imputed using random forest (RF) will be removed entirely from the dataset.

In [53]:
mar_weight_df = weight_df %>%
    filter(Value == 'NA')

preimputed_df = anti_join(weight_df, mar_weight_df) #%>%
    # creating a sample id col
    #unite(Sample_ID, HAWC_ID, Name, Replicate, sep = '_', remove = FALSE)
preimputed_df$Value = as.numeric(preimputed_df$Value)

# reordering
preimputed_df = preimputed_df[,c(3,1,4:8,2,9)]

head(preimputed_df)

[1m[22mJoining with `by = join_by(Metric, Name, Data, HAWC_ID, Study, Replicate,
Chemical_Class, DTXSID, Value)`
“NAs introduced by coercion”


Data,Metric,HAWC_ID,Study,Replicate,Chemical_Class,DTXSID,Name,Value
<chr>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>
WS,Weight,821855,Erlandsson et al. 2020,1,PAH,DTXSID2060383,"2,3-Dimethylnaphthalene",0.05
WS,Weight,821855,Erlandsson et al. 2020,2,PAH,DTXSID2060383,"2,3-Dimethylnaphthalene",0.05
WS,Weight,1257056,McCarrick et al. 2024,1,PAH,DTXSID2060383,"2,3-Dimethylnaphthalene",0.04
WS,Weight,821855,Erlandsson et al. 2020,1,PAH,DTXSID8074819,2-Methylchrysene,7.4
WS,Weight,821855,Erlandsson et al. 2020,2,PAH,DTXSID8074819,2-Methylchrysene,12.1
WS,Weight,1257056,McCarrick et al. 2024,1,PAH,DTXSID8074819,2-Methylchrysene,15.45


In [82]:
test = preimputed_df %>%
    filter(Name == 'Fluoranthene', HAWC_ID == '1098462')
test

test1 = preimputed_df %>%
    filter(Chemical_Class == 'Metal')

test2 = preimputed_df %>%
    filter(HAWC_ID == study_id[8])
head(test2)

Data,Metric,HAWC_ID,Study,Replicate,Chemical_Class,DTXSID,Name,Value
<chr>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>
WS,Weight,1098462,Niu et al. 2023,1,PAH,DTXSID3024104,Fluoranthene,0.21
WS,Weight,1098462,Niu et al. 2023,2,PAH,DTXSID3024104,Fluoranthene,0.18
WS,Weight,1098462,Niu et al. 2023,3,PAH,DTXSID3024104,Fluoranthene,


Data,Metric,HAWC_ID,Study,Replicate,Chemical_Class,DTXSID,Name,Value
<chr>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>
WS,Weight,271638,Jalava et al. 2012,1,PAH,DTXSID4075459,Benzo(c)phenanthrene,493.9
WS,Weight,271638,Jalava et al. 2012,2,PAH,DTXSID4075459,Benzo(c)phenanthrene,48.5
WS,Weight,271638,Jalava et al. 2012,3,PAH,DTXSID4075459,Benzo(c)phenanthrene,942.0
WS,Weight,271638,Jalava et al. 2012,4,PAH,DTXSID4075459,Benzo(c)phenanthrene,93.4
WS,Weight,271638,Jalava et al. 2012,5,PAH,DTXSID4075459,Benzo(c)phenanthrene,34.4
WS,Weight,271638,Jalava et al. 2012,6,PAH,DTXSID4075459,Benzo(c)phenanthrene,6.4


In [55]:
QRILC_imputation = function(dataset){
      # """
    # Creating a quantile normalization function to normalize each sample.
    # :param (input): exposed and unexposed (vehicle) dfs
    # :output: 1 quantile normalized df
    # """
    #ADD WORDS
    wider_dataset = dataset%>%#[,c(1:2,7,9)] %>%
        # removing these cols temporarily 
        select(-c("HAWC_ID", "Chemical_Class", "Name")) %>%
        pivot_wider(names_from = DTXSID, values_from = Value)
    
    index_of_last_variable = length(colnames(wider_dataset))

    # normalizing data since that what the QRILC function wants
    # had to pseudo log transform to prevent Inf values
    QRILC_prep = wider_dataset[,6:dim(wider_dataset)[2]] %>%
         mutate_all(., function(x) log10(x + 1)) %>%
         as.matrix()
                    
    imputed_QRILC_object = impute.QRILC(QRILC_prep, tune.sigma = 0.1)
    # QRILC_log10_df = data.frame(imputed_QRILC_object[1]) 
    
    # # converting back the original scale
    # QRILC_df = QRILC_log10_df %>%
    #     mutate_all(., function(x) 10^x - 1)
     
    # imputed_dataset = data.frame(cbind(unique(dataset[,1:8]), QRILC_df)) %>%
    #      pivot_longer(cols = 9:dim(wider_dataset)[2], names_to = "Variable", values_to = "Value")
    
    # return(imputed_dataset)
    return(wider_dataset)
}

In [56]:
QRILC_imputation(test1)

Data,Metric,Study,Replicate,B,Cu,K,Mg,Zn,Ni,⋯,Co,Si,Sr,Ti,Ba,Rb,Li,Bi,U,Th
<chr>,<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
WS,Weight,Nordin et al. 2015,1,308,336.00,35800.00,1806.00,7280.00,,⋯,,,,,,200.00,,4,,
WS,Weight,Nordin et al. 2015,2,268,196.00,9200.00,1776.00,2040.00,,⋯,,,,,,56.00,,,,
WS,Weight,Arif et al. 2017,1,,132.00,60136.00,12016.00,2996.00,39.00,⋯,4.00,29315.00,213,227.00,1110.00,,,,,
WS,Weight,Arif et al. 2017,2,,242.00,180744.00,35933.00,1140.00,96.00,⋯,2.90,57321.00,292,108.00,718.00,,,,,
WS,Weight,Corsini et al. 2013,1,,280.00,,,5300.00,20.00,⋯,,2230.00,60,,,1250.00,,,,
WS,Weight,Corsini et al. 2013,2,,470.00,,,15460.00,40.00,⋯,,9120.00,120,,,1290.00,,,,
WS,Weight,Danielsen et al. 2011,1,,65.60,,,1030.00,14.70,⋯,,,,,,,,,,
WS,Weight,Danielsen et al. 2011,2,,34.40,,,750.00,10.30,⋯,,,,,,,,,,
WS,Weight,Dilger et al. 2016,1,,4.10,69363.50,,4178.80,0.50,⋯,,,,,,,,,,
WS,Weight,Erlandsson et al. 2020,1,,12.80,,,963.90,1.30,⋯,0.30,,,,9.30,,,,,


In [57]:
# # imputing within each study
study_id = unique(preimputed_df$HAWC_ID)

# imputed_df = data.frame()
# for (i in 1:length(study_id)){
#     filtered_preimputed_df = preimputed_df %>%
#         filter(HAWC_ID == study_id[i])
#     if(i ==1){
#     idk = QRILC_imputation(filtered_preimputed_df)
#     print(idk)

#     imputed_df = rbind(imputed_df, filtered_preimputed_df)
#         }
# }
# # calling fn
# #imputed_df = QRILC_imputation(test)

# head(imputed_df)

In [83]:
test2

Data,Metric,HAWC_ID,Study,Replicate,Chemical_Class,DTXSID,Name,Value
<chr>,<chr>,<dbl>,<chr>,<dbl>,<chr>,<chr>,<chr>,<dbl>
WS,Weight,271638,Jalava et al. 2012,1,PAH,DTXSID4075459,Benzo(c)phenanthrene,493.9
WS,Weight,271638,Jalava et al. 2012,2,PAH,DTXSID4075459,Benzo(c)phenanthrene,48.5
WS,Weight,271638,Jalava et al. 2012,3,PAH,DTXSID4075459,Benzo(c)phenanthrene,942.0
WS,Weight,271638,Jalava et al. 2012,4,PAH,DTXSID4075459,Benzo(c)phenanthrene,93.4
WS,Weight,271638,Jalava et al. 2012,5,PAH,DTXSID4075459,Benzo(c)phenanthrene,34.4
WS,Weight,271638,Jalava et al. 2012,6,PAH,DTXSID4075459,Benzo(c)phenanthrene,6.4
WS,Weight,271638,Jalava et al. 2012,7,PAH,DTXSID4075459,Benzo(c)phenanthrene,29.4
WS,Weight,271638,Jalava et al. 2012,1,Metal,K,K,19700.0
WS,Weight,271638,Jalava et al. 2012,2,Metal,K,K,242000.0
WS,Weight,271638,Jalava et al. 2012,3,Metal,K,K,25500.0


In [72]:
preimputed_df %>%
    select(-c("Study", "Chemical_Class", "Name")) %>%
    pivot_wider(names_from = DTXSID, values_from = Value)

Data,Metric,HAWC_ID,Replicate,DTXSID2060383,DTXSID8074819,DTXSID4020878,B,DTXSID4075455,DTXSID4075459,⋯,Rb,DTXSID9059757,DTXSID5047740,DTXSID6062591,Li,DTXSID8052691,Bi,U,Th,DTXSID1025649
<chr>,<chr>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
WS,Weight,821855,1,0.05,7.40,5.800,,,,⋯,,,,,,,,,,
WS,Weight,821855,2,0.05,12.10,1.200,,,,⋯,,,,,,,,,,
WS,Weight,1257056,1,0.04,15.45,,,,,⋯,,,154.8400,,,,,,,
WS,Weight,267105,1,,,10.000,,,,⋯,,,,,,,,,,
WS,Weight,267105,2,,,37.000,,,,⋯,,,,,,,,,,
WS,Weight,279652,1,,,5.661,,,,⋯,,,,,,,,,,
WS,Weight,1040882,1,,,,308,,,⋯,200,,,,,,4,,,
WS,Weight,1040882,2,,,,268,,,⋯,56,,,,,,,,,
WS,Weight,267127,1,,,,,168.00000,10.00000,⋯,,,,123.0,,,,,,
WS,Weight,267127,2,,,,,336.00000,32.00000,⋯,,,,216.0,,,,,,


In [87]:
idk = test2 %>%
    select(-c("Study", "HAWC_ID", "Chemical_Class", "Name")) %>%
    pivot_wider(names_from = DTXSID, values_from = Value) 

idk2 = idk[,4:17] %>%
    mutate_all(., function(x) log10(x + 1)) %>% as.matrix()

idk2

impute.QRILC(idk2, tune.sigma = 0.1)

DTXSID4075459,K,Mg,Zn,DTXSID3024104,DTXSID6024254,Mn,Ca,DTXSID3021774,Si,DTXSID9059757,DTXSID5047740,DTXSID6062591,DTXSID8052691
2.6945175,4.294488,1.959041,2.399674,3.66096,4.06487,1.491362,3.057286,2.1261314,,2.3066394,1.883661,3.028571,2.7911992
1.6946052,5.383817,2.606381,3.182129,2.746556,2.844601,2.130334,3.250664,0.5910646,,1.2648178,1.045323,2.323458,1.8909796
2.9745117,4.406557,1.832509,2.683947,3.910037,4.080302,1.491362,2.89487,1.7185017,,2.4872798,2.320769,3.422426,3.2518815
1.974972,4.487153,2.212188,2.770115,2.977815,2.626956,1.491362,3.215109,,4.167347,1.6720979,1.409933,2.079543,2.1731863
1.5490033,5.243041,2.555094,3.509337,2.611829,2.507316,1.869232,3.350442,0.2552725,,1.1931246,1.181844,1.741152,1.6424645
0.8692317,5.322221,2.841985,3.648458,1.731589,1.056905,2.919078,3.477266,,,0.6812412,,1.20412,0.8325089
1.4828736,5.459394,2.790285,4.413317,2.455606,1.651278,3.262688,3.616055,,,1.0,1.305351,1.720986,1.544068


“NaNs produced”


ERROR: Error in checkTmvArgs(mean, sigma, lower, upper): ‘upper’ not specified or contains NA
