Author : Chan Mun Fai
cmf provides the Ratio of Counts(ROC) function and the Differential Correct Attribution Probability(DCAP).
These functions along with other metrics, are used to construct a systematic and comprehensive framework in evaluating the quality of synthetic datasets and different synthesis methods.
For more details and usage examples on the other metrics, please refer to Comparative Metrics Framework in R to evaluate the performance of Synthetic Data.
All documentation can be found in the code itself.
The current development version can be installed from source using devtools.
devtools::install_github("ChanMunFai/cmf")
The following script demonstrates how to use the functions in cmf. We will use the package synthpop to generate synthetic data using the mtcars dataset.
library(cmf)
library(synthpop)
df <- mtcars
key_var <- c("cyl", "gear")
target_var <- c("wt", "carb")
syn1 <- syn(df, seed = 1234)
synthpop_df <- syn1$syn
view(synthpop_df)
CAP_original(df, key_var, target_var)
CAP_baseline(df, target_var)
CAP_synthetic(df, synthpop_df, key_var, target_var)
ROC_list(df, synthpop_df)
ROC_indiv(df, synthpop_df, "disp")
ROC_score(df, synthpop_df)
ROC_numeric(df, synthpop_df, "disp", y=2)
Taub, J., Elliot, M., Pampaka, M., & Smith, D. (2018). Differential Correct Attribution Probability for Synthetic Data: An Exploration. Privacy in Statistical Databases Lecture Notes in Computer Science, 122-137. doi:10.1007/978-3-319-99771-1_9
Taub, J., Elliot, M., & Raab, G. (2019). Creating the Best Risk-Utility Profile : The Synthetic Data Challenge.