DreamAI2

DreamAI2::DreamAI2
- Imputation of Missing Protein Abundances with Iterative Prediction Model
DreamAI2::DreamAI2_Bagging
- Bag Imputation of Missing Protein Abundances with Iterative Prediction Model
DreamAI2::bag.summary
- Wrapper function for summarizing the outputs from DreamAI2_bagging

DreamAI2::DreamAI2

Description
Usage
Arguments
Value
Notes
Example

Description

The function DreamAI2 imputes a dataset with missing values or NA's using individual or ensemble output from 7 different methods.

Individual methods:

"KNN": k nearest neighbor
"MissForest": nonparametric Missing Value Imputation using Random Forest
"ADMIN": abundance dependent missing imputation
"Birnn": imputation using IRNN-SCAD algorithm
"SpectroFM": imputation using matrix factorization
"RegImpute": imputation using Glmnet ridge regression
"MICE": Multiple Imputation by Chained Equations

Ensemble methods

"Ensemble": average of the 7 individual methods or the user specified methods among the 7.
"Ensemble.Fast": average of the 7 individual methods or the user specified methods among the 7 excluding "MissForest".

Usage

DreamAI2(data, k = 10, maxiter_MF = 10, ntree = 100,
  maxnodes = NULL, maxiter_ADMIN = 30, tol = 10^(-2),
  gamma_ADMIN = NA, gamma = 50, CV = FALSE,
  fillmethod = "row_mean", maxiter_RegImpute = 10,
  conv_nrmse = 1e-06, iter_SpectroFM = 40,
  m_mice = 1, method_mice = 'pmm', maxit_mice = 20,
  method = c("KNN", "MissForest", "ADMIN", "Birnn", "SpectroFM", "RegImpute", "MICE"),
  out = c("Ensemble.Fast"))

Arguments

Parameter	Default	Description
data		dataset in the form of a matrix or dataframe with missing values or NA's. The function throws an error message and stops if any row or column in the dataset is missing all values
k	10	number of neighbors to be used in the imputation by KNN and ADMIN
maxiter_MF	10	maximum number of iteration to be performed in the imputation by "MissForest" if the stopping criteria is not met beforehand
ntree	100	number of trees to grow in each forest in "MissForest"
maxnodes	NULL	maximum number of terminal nodes for trees in the forest in "MissForest", has to equal at least the number of columns in the given data
maxiter_ADMIN	30	maximum number of iteration to be performed in the imputation by "ADMIN" if the stopping criteria is not met beforehand
tol	10^(-2)	convergence threshold for "ADMIN"
gamma_ADMIN	NA	parameter for ADMIN to control abundance dependent missing. Set gamma_ADMIN=0 for log ratio intensity data. For abundance data put gamma_ADMIN=NA, and it will be estimated accordingly
gamma	50	parameter of the supergradients of popular nonconvex surrogate functions, e.g. SCAD and MCP of L0-norm for Birnn
CV	FALSE	a logical value indicating whether to fit the best gamma with cross validation for "Birnn". If CV=FALSE, default gamma=50 is used, while if CV=TRUE gamma is calculated using cross-validation.
fillmethod	"row_mean"	a string identifying the method to be used to initially filling the missing values using simple imputation for "RegImpute". That could be "row_mean" or "zeros", with "row_mean" being the default. It throws an warning if "row_median" is used.
maxiter_RegImpute	10	maximum number of iterations to reach convergence in the imputation by "RegImpute"
conv_nrmse	1e-06	convergence threshold for "RegImpute"
iter_SpectroFM	40	number of iterations for "SpectroFM"
m_mice	1	Number of multiple imputations in "MICE"
method_mice	"pmm"	imputation method to be used for each column in "MICE"
maxit_mice	20	A scalar giving the number of iterations in "MICE"
method	c("KNN","MissForest", "ADMIN", "Birnn", "SpectroFM", "RegImpute", "MICE")	a vector of imputation methods selected from "KNN", "MissForest", "ADMIN", "Birnn", "SpectroFM, "RegImpute" and "MICE".
out	c("Ensemble.Fast")	a vector of imputation methods for which the function will output the imputed matrices. Default is "Ensemble.Fast"

Value

a list of imputed datasets by different methods as specified by the user.

Notes

If all methods are specified for obtaining "Ensemble" imputed matrix, the approximate time required to output the imputed matrix for a dataset of dimension 26000 x 200 is ~50 hours.

Example

data(datapnnl)
data<-datapnnl.rm.ref[1:100,1:21]
impute<- DreamAI2(data,k=10,maxiter_MF = 10, ntree = 100,maxnodes = NULL,maxiter_ADMIN=30,tol=10^(-2),gamma_ADMIN=NA,gamma=50,CV=FALSE,fillmethod="row_mean",maxiter_RegImpute=10,conv_nrmse = 1e-6,iter_SpectroFM=40, m_mice = 1, method_mice = 'pmm', maxit_mice = 20,
method = c("KNN", "MissForest", "ADMIN", "Birnn", "SpectroFM", "RegImpute","MICE"),out="Ensemble.Fast")
impute$Ensemble

Description

The function DreamAI2_bagging imputes a dataset with missing values or NA's by bag imputaion with help of parallel processing. Pseudo datasets are generated having true missing (as in the original dataset) and pseudo missing and every such pseudo dataset is imputed by individual or ensemble output of the 7 different methods: KNN, MissForest, ADMIN, Birnn, SpectroFM, RegImpute and MICE (descriptions are included in the documentation of the function DreamAI2).

Usage

DreamAI2_Bagging(data, k = 10, maxiter_MF = 10, ntree = 100,
  maxnodes = NULL, maxiter_ADMIN = 30, tol = 10^(-2),
  gamma_ADMIN = NA, gamma = 50, CV = FALSE,
  fillmethod = "row_mean", maxiter_RegImpute = 10,
  conv_nrmse = 1e-06, iter_SpectroFM = 40,
  m_mice = 1, method_mice = 'pmm', maxit_mice = 20,
  method = c("KNN", "MissForest", "ADMIN", "Birnn", "SpectroFM", "RegImpute", "MICE"),out=c("Enemble.Fast"),
  SamplesPerBatch, n.bag, save.out = TRUE, path = NULL, ProcessNum)

Arguments

Parameter	Default	Description
data		dataset in the form of a matrix or dataframe with missing values or NA's. The function throws an error message and stops if any row or column in the dataset is missing all values
k	10	number of neighbors to be used in the imputation by KNN and ADMIN
maxiter_MF	10	maximum number of iteration to be performed in the imputation by "MissForest" if the stopping criteria is not met beforehand
ntree	100	number of trees to grow in each forest in "MissForest"
maxnodes	NULL	maximum number of terminal nodes for trees in the forest in "MissForest", has to equal at least the number of columns in the given data
maxiter_ADMIN	30	maximum number of iteration to be performed in the imputation by "ADMIN" if the stopping criteria is not met beforehand
tol	10^(-2)	convergence threshold for "ADMIN"
gamma_ADMIN	NA	parameter for ADMIN to control abundance dependent missing. Set gamma_ADMIN=0 for log ratio intensity data. For abundance data put gamma_ADMIN=NA, and it will be estimated accordingly
gamma	50	parameter of the supergradients of popular nonconvex surrogate functions, e.g. SCAD and MCP of L0-norm for Birnn
CV	FALSE	a logical value indicating whether to fit the best gamma with cross validation for "Birnn". If CV=FALSE, default gamma=50 is used, while if CV=TRUE gamma is calculated using cross-validation.
fillmethod	"row_mean"	a string identifying the method to be used to initially filling the missing values using simple imputation for "RegImpute". That could be "row_mean" or "zeros", with "row_mean" being the default. It throws an warning if "row_median" is used.
maxiter_RegImpute	10	maximum number of iterations to reach convergence in the imputation by "RegImpute"
conv_nrmse	1e-06	convergence threshold for "RegImpute"
iter_SpectroFM	40	number of iterations for "SpectroFM"
m_mice	1	Number of multiple imputations in "MICE"
method_mice	"pmm"	imputation method to be used for each column in "MICE"
maxit_mice	20	A scalar giving the number of iterations in "MICE"
method	must specify	a vector of imputation methods selected from "KNN", "MissForest", "ADMIN", "Birnn", "SpectroFM, "RegImpute", "MICE"
SamplesPerBatch		number of samples per batch (batch size in the original data)
n.bag		number of pseudo datasets to generate and impute in the current process
save.out		logical indicator whether or not to save the output. When TRUE output is saved, when FALSE output is returned
path	NULL	location to save the output file from the curent process. Path only needs to be specified when save.out=TRUE
ProcessNum		process number starting from 1 when run in cluster, e.g. 1 - 10, 1 - 100 etc. Needs to be specified only if the output is saved
out	"Ensemble.Fast"	a vector of imputation methods for which the function will output the imputed matrices.

Value

list of imputed dataset (averaged over all pseudo imputed data matrices) by different methods as specified by the user, n.bag and a summary matrix containing gene name, sample name, true and imputed values of every pseudo missing combined from n.bag datasets.

Notes

This function can be run as parallel job in cluster. It generates and saves a .RData file containing the output from the current process in the location provided by the user, with the process number in the file name. If the user runs it in local computer multiple times, then changing the ProcessNumber everytime will generate and save .RData file with the given ProcessNumber.

Example

data(datapnnl)
data<-datapnnl.rm.ref[1:100,1:21]
impute<- DreamAI2_Bagging(data=data,k=10,maxiter_MF = 10, ntree = 100,maxnodes = NULL,maxiter_ADMIN=30,tol=10^(-2),gamma_ADMIN=NA,gamma=50,CV=FALSE,fillmethod="row_mean",maxiter_RegImpute=10,conv_nrmse = 1e-6,iter_SpectroFM=40,m_mice = 1, method_mice = 'pmm', maxit_mice = 20,
method=c("KNN","MissForest","ADMIN","Birnn","SpectroFM","RegImpute","MICE"),SamplesPerBatch=3,n.bag=2,save.out=TRUE,path="C:\\Users\\chowds14\\Desktop\\test_package\\",ProcessNum=1)
impute$Ensemble.Fast

DreamAI2::bag.summary

Description
Usage
Arguments
Value
Example

Description

Wrapper function for summarizing the outputs from DreamAI2_bagging

Usage

bag.summary(method = c("KNN", "MissForest", "ADMIN", "Birnn",
  "SpectroFM", "RegImpute", "MICE"), nNodes = 3, path = NULL)

Arguments

Parameter	Default	Description
method	Ensemble	a vector of imputation methods. This vector should be same or subset of the vector out in DreamAI2_bagging. Default is "Ensemble"
nNodes		number of parallel processes
path	NULL	location where the bagging output is saved

Value

list of final imputed data and confidence score for every gene using pseudo missing

Example

data(datapnnl)
data<-datapnnl.rm.ref[1:100,1:21]
impute<- DreamAI2_Bagging(data=data,k=10,maxiter_MF = 10, ntree = 100,maxnodes = NULL,maxiter_ADMIN=30,tol=10^(-2),gamma_ADMIN=NA,gamma=50,CV=FALSE,fillmethod="row_mean",maxiter_RegImpute=10,conv_nrmse = 1e-6,iter_SpectroFM=40,m_mice = 1, method_mice = 'pmm', maxit_mice = 20,
method=c("KNN","MissForest","ADMIN","Birnn","SpectroFM","RegImpute","MICE"),SamplesPerBatch=3,n.bag=2,save.out=TRUE,path="C:\\Users\\chowds14\\Desktop\\test_package\\",ProcessNum=1)
final.out<-bag.summary(method=c("KNN"),nNodes=2,path="C:\\Users\\chowds14\\Desktop\\test_package\\")
final.out$score
final.out$imputed_data

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
R		R
data		data
inst/src		inst/src
src		src
DESCRIPTION		DESCRIPTION
DreamAI2.Rproj		DreamAI2.Rproj
NAMESPACE		NAMESPACE
NULL		NULL
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DreamAI2

DreamAI2::DreamAI2

Description

Usage

Arguments

Value

Notes

Example

DreamAI2::DreamAI2_Bagging

Description

Usage

Arguments

Value

Notes

Example

DreamAI2::bag.summary

Description

Usage

Arguments

Value

Example

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DreamAI2

DreamAI2::DreamAI2

Description

Usage

Arguments

Value

Notes

Example

DreamAI2::DreamAI2_Bagging

Description

Usage

Arguments

Value

Notes

Example

DreamAI2::bag.summary

Description

Usage

Arguments

Value

Example

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages