Xy() you can convienently simulate supervised learning data. The simulation can be
very specific, since there are many degrees of freedom for the user. For instance,
the functional shape of the nonlinearity is user-defined as well. Interactions can be formed and (co)variances altered. For a more specific motivation you can visit our blog
The usage is pretty straight forward. I strongly encourage you to read the help document to explore all functionalities.
Install the package with
# install.packages("devtools") # get it from github devtools::install_github("andrebleier/Xy")
You can simulate regression and classification data with interactions and a user-specified non-linearity. With the
stn argument you can alter the signal to noise ratio of your simulation. I strongly encourage you to read this blog post, where I've analyzed OLS coefficients with different signal to noise ratios.
# load the library library(Xy) # simulate regression data my_sim <- Xy(n = 1000, numvars = c(10,10), catvars = c(3, 2), noisevars = 50, task = Xy_task(), nlfun = function(x) x^2, interactions = 1, sig = c(1,4), cor = c(0), weights = c(-10,10), intercept = TRUE, stn = 4)
You can extract a feature importance of your simulation. For instance, to benchmark feature selection algorithms. You can read up on a small benchmark I did with this feature on our blog.
# Feature Importance fs_varimp <- varimp(my_sim, plot = TRUE)
Feel free to contact me with input, ideas or some dank memes.