Skip to content
Simulating Supervised Learning Data
Branch: master
Clone or download
Type Name Latest commit message Commit time
Failed to load latest commit information.
R hotfix @ intercept = FALSE: psi matrix wrong dimensionality Mar 20, 2019
examples Update v0.2 Mar 20, 2019
img upload importance plot Jul 17, 2018
man Update v0.2 Mar 20, 2019
tests Update v0.2 Mar 20, 2019
.Rbuildignore Update v0.2 Mar 20, 2019
.gitignore Pre CRAN Submission Jan 19, 2019
.travis.yml remove check Mar 20, 2019
DESCRIPTION Update v0.2 Mar 20, 2019
NAMESPACE Update v0.2 Mar 20, 2019
Xy.Rproj Update v0.2 Mar 20, 2019

Build Status codecov

Simulating Supervised Learning Data drawing

With Xy() you can convienently simulate supervised learning data. The simulation can be very specific, since there are many degrees of freedom for the user. For instance, the functional shape of the nonlinearity is user-defined as well. Interactions can be formed and (co)variances altered. For a more specific motivation you can visit our blog


The usage is pretty straight forward. I strongly encourage you to read the help document to explore all functionalities.


Install the package with devtools:

# install.packages("devtools") 
# get it from github

Simulate data

You can simulate regression and classification data with interactions and a user-specified non-linearity. With the stn argument you can alter the signal to noise ratio of your simulation. I strongly encourage you to read this blog post, where I've analyzed OLS coefficients with different signal to noise ratios.

# load the library
# simulate regression data
my_sim <- Xy(n = 1000, 
             numvars = c(10,10), 
             catvars = c(3, 2), 
             noisevars = 50, 
             task = Xy_task(), 
             nlfun = function(x) x^2, 
             interactions = 1, 
             sig = c(1,4),  
             cor = c(0), 
             weights = c(-10,10), 
             intercept = TRUE, 
             stn = 4)

Feature Selection

You can extract a feature importance of your simulation. For instance, to benchmark feature selection algorithms. You can read up on a small benchmark I did with this feature on our blog.

# Feature Importance 
fs_varimp <- varimp(my_sim, plot = TRUE)


Feel free to contact me with input, ideas or some dank memes.

You can’t perform that action at this time.