Install the released version from CRAN:
install.packages("CGMissingDataR")Or install the development version from GitHub:
install.packages("devtools")
devtools::install_github("ZhangLabUKY/CGMmissingDataR")CGMmissingDataR imputes missing glucose values in continuous glucose monitoring (CGM) data. The main public workflow is:
run_missing_glucose_imputation()The function handles both explicit missing glucose values coded as NA
and implicit missing readings caused by timestamp gaps. It accepts a
data frame with a subject identifier, timestamp column, glucose column,
and optional subject-level or visit-level covariates. It returns the
user’s original columns plus imputed_glucose_value, leaving the
original glucose column unchanged.
run_missing_glucose_imputation() performs the following steps:
- reads a data frame or CSV file;
- parses and sorts timestamps within each subject;
- regularizes each subject to an equal
interval_minutestimestamp grid; - converts missing timestamp gaps into explicit rows with
target_col = NA; - encodes
SEXwhen present; - creates internal time, lag, and rolling-mean glucose features;
- imputes the target and feature matrix;
- chooses the final model from
models:models = NULLormodels = "auto"keeps the missing-rate rule,MICE+ARIMAwhen missing rate is<= 0.05,MICE+XGBoostwhen missing rate is> 0.05,- or users can force
MICE+ARIMA,MICE+XGBoost,MICE+RF,MICE+kNN, orMICE+LightGBM;
- returns a single completed data frame containing the original input
columns plus
imputed_glucose_value.
Internal columns such as TimeSeries, TimeDifferenceMinutes, lag
features, rolling means, imputation method labels, and
missingness-tracking flags are used for modeling but are not returned.
Because timestamp gaps are converted into explicit rows before imputation, the returned data frame may contain more rows than the input data when readings are absent from the expected CGM sampling grid.
The default R-native backend uses the R package mice. For closest
agreement with the Python reference workflow, install reticulate and
use the optional Python backend.
Real-imputation model engines run with n_threads = 1 by default so
examples, tests, and shared systems use conservative CPU resources.
Increase n_threads for faster local XGBoost, Random Forest, or
LightGBM runs.
install.packages("reticulate")The Python backend uses these Python packages through reticulate:
reticulate::py_require(c(
"numpy",
"pandas",
"scikit-learn",
"statsmodels",
"xgboost"
))
# Optional, only needed for models = "lightgbm"
reticulate::py_install("lightgbm", pip = TRUE)library(CGMissingDataR)
data("CGMExmplDat10Pct")
out <- run_missing_glucose_imputation(
CGMExmplDat10Pct,
target_col = "LBORRES",
feature_cols = c("AGE", "hba1c"),
id_col = "USUBJID",
time_col = "Time",
imputer_backend = "mice"
)
head(out[c(
"USUBJID",
"Time",
"LBORRES",
"AGE",
"hba1c",
"imputed_glucose_value"
)])The original target column is not overwritten. Rows that were missing in
LBORRES, including rows inserted from timestamp gaps, remain missing
there; the completed value is stored in imputed_glucose_value.
missing_rows <- is.na(out$LBORRES)
head(out[missing_rows, c(
"USUBJID",
"Time",
"LBORRES",
"imputed_glucose_value"
)])imputed_glucose_value is returned as a continuous numeric model
estimate. Users who need whole-number glucose values for reporting can
round after imputation:
out$imputed_glucose_value_rounded <- round(out$imputed_glucose_value)Raw CGM exports may represent missingness in two ways:
- a row exists but the glucose value is
NA; - a timestamp is absent entirely, causing a gap in the expected sampling grid.
For example, if a subject’s readings jump from 00:05 to 00:30, the
function internally creates the missing 5-minute rows at 00:10,
00:15, 00:20, and 00:25, sets the target glucose value to NA,
and then imputes those values using the same workflow as explicit
missing glucose values.
CGMmissingDataR also includes a small Shiny app for users who prefer an
interactive workflow. The app lets users upload a CSV file or load one
of the built-in example data sets, choose the target glucose, subject
ID, timestamp, and feature columns, run
run_missing_glucose_imputation(), preview rows with missing glucose
values that were imputed, and download the completed data as a CSV file.
The app also exposes the same final-method selector as the R function.
Users can keep the automatic missing-rate rule or force MICE+ARIMA,
MICE+XGBoost, MICE+RF, MICE+kNN, or MICE+LightGBM;
method-specific controls appear only when they apply.
Launch the app from R with:
run_app()The app supports the same two imputation backends as the main function:
mice, the default CRAN-safe R backend;sklearn, the optional Python-compatible backend usingreticulate.
The Shiny app is optional. If it is not already installed, install Shiny with:
install.packages("shiny")For package developers, the app is stored under
inst/shiny/cgm_imputation_app/ and is launched through the exported
run_app() helper.
Use imputer_backend = "sklearn" to run the strict Python-compatible
path. In that path, reticulate sends the data to Python, where pandas,
scikit-learn, statsmodels, Python xgboost, and optional Python lightgbm
perform the preprocessing and calculations. The completed pandas data
frame is then converted back to R.
out_py <- run_missing_glucose_imputation(
CGMExmplDat10Pct,
target_col = "LBORRES",
feature_cols = c("AGE", "hba1c"),
id_col = "USUBJID",
time_col = "Time",
imputer_backend = "sklearn"
)The Python backend is optional. It is not required for package installation, loading, or CRAN examples.
The main vignette contains a detailed walkthrough of data requirements, timestamp regularization, return columns, backend selection, optional Python setup, and troubleshooting:
https://zhanglabuky.github.io/CGMmissingDataR/articles/How-To-Use-CGMissingDataR.html
A separate Shiny app vignette walks through the interactive interface:
https://zhanglabuky.github.io/CGMmissingDataR/articles/Using-the-CGMissingDataR-Shiny-App.html
The changelog is available at:
https://zhanglabuky.github.io/CGMmissingDataR/news/index.html