## mlbench

A collection of artificial and real-world machine learning benchmark problems including, e.g., several data sets from the UCI repository.

We'll use the lattice package (by Deepayan Sarkar), which improves over base R graphics by providing better defaults and the ability to easily display multivariate relationships, to visualise the data sets.

In [None]:
library("mlbench")
library("lattice")

# Set seed.
set.seed(321)

In [None]:
# 2-dimensional Gaussian Problem
# Each of the cl classes consists of a 2-dimensional Gaussian. 
# The centers are equally spaced on a circle around the origin with radius r.
p <- mlbench.2dnormals(n = 1000, cl = 3, r = sqrt(3), sd = 0.6)
plot(p)
df <- as.data.frame(p)
save(df, file = "data/gaussian2d")

In [None]:
# Cassini: A 2-Dimensional Problem
# The inputs of the cassini problem are uniformly distributed on a 2-dimensional space 
# within 3 structures. The 2 external structures (classes) are banana-shaped structures 
# and in between them, the middle structure (class) is a circle.
p <- mlbench.cassini(n = 5000, relsize = c(2,2,1))
plot(p)
df <- as.data.frame(p)
save(df, file = "data/2d")

In [None]:
# Circle in a Square Problem
# The inputs of the circle problem are uniformly distributed on the d-dimensional cube with corners {+/-1}. 
# This is a 2-class problem: The first class is a d-dimensional ball in the middle of the cube, the 
# remainder forms the second class. The size of the ball is chosen such that both classes have 
# equal prior probability 0.5.
p <- mlbench.circle(n = 1000, d = 2)
plot(p)
df <- as.data.frame(p)
save(df, file = "data/square")

In [None]:
# Corners of Hypercube
# The created data are d-dimensional spherical Gaussians with standard deviation sd and means at 
# the corners of a d-dimensional hypercube. The number of classes is 2d.
p <- mlbench.hypercube()
plot(p)
cloud(x.3~x.1+x.2, groups=classes, data = as.data.frame(p))
df <- as.data.frame(p)
save(df, file = "data/hypercube")

In [None]:
# Corners of d-dimensional Simplex
# The created data are d-dimensional spherical Gaussians with standard deviation sd and means at 
# the corners of a d-dimensional simplex. The number of classes is d+1.
p <- mlbench.simplex(n = 800, d = 3, sides = 1, sd = 0.1, center=TRUE)
plot(p)
library("lattice")
cloud(x.3~x.1+x.2, groups = classes, data = as.data.frame(p))
df <- as.data.frame(p)
save(df, file = "data/simplex")

In [None]:
# Twonorm Benchmark Problem
# The inputs of the twonorm problem are points from two Gaussian distributions with unit covariance matrix. 
# Class 1 is multivariate normal with mean (a, a, ... ,a) and class 2 with mean (-a, -a, ..., -a), a = 2/d^0.5.
p <- mlbench.twonorm(n = 1000, d = 2)
plot(p)
df <- as.data.frame(p)
save(df, file = "data/twonorm")

In [None]:
# Continuous XOR Benchmark Problem
# The inputs of the XOR problem are uniformly distributed on the d-dimensional cube with corners {+/-1}. 
# Each pair of opposite corners form one class, hence the total number of classes is 2(d-1).
p <- mlbench.xor(n = 300, d = 2)
plot(p)
df <- as.data.frame(p)
save(df, file = "data/xor")

In [None]:
# Two Spirals Benchmark Problem
# The inputs of the spirals problem are points on two entangled spirals. If sd>0, then Gaussian noise is added 
# to each data point. mlbench.1spiral creates a single spiral.
p <- mlbench.spirals(n = 500, cycles = 1.5, sd = 0.03)
plot(p)
df <- as.data.frame(p)
save(df, file = "data/spirals")