Poor performance of mxnet LinearRegressionOutput #4287

khalida · 2016-12-19T07:26:24Z

I have been unable to get reasonable performance using mxnet LinearRegressionOutput layer.

Full details of the problem including self-contained example I have given in the following SO question.

The question may seen rather broad (I'm getting poor performance), so the answer should perhaps be the obvious (do some hyper-parameter tuning). However given the simplicity of the regression problem considered, and the much better performance of other neural-net libraries "out-of-the-box" I thought this might be of general interest.

The text was updated successfully, but these errors were encountered:

uzhao · 2016-12-20T23:16:10Z

For a network without hidden layer, the best performance will match result from lm. If you change optimizer to adam without fixed learning rate, you will get a reasonable outcome. For network with layer, I don't think nnet use active function, and also optimizer is a potential issue here.

khalida · 2016-12-21T00:29:43Z

Many thanks for the response. Removing the fixed learning rate and changing to the adam optimizer was a big help. Results are attached.

I have also included the performance of lm for reference (in green). The training root-mean-squared-error for the five models are given below.

As you have stated the mxModel1 (mxnet regression without hidden layer) converges to the performance of the linear model as expected. The mxModel2 (mxnet regression with a single hidden layer) is significantly out-performed by other tools on this particular regression task.

I don't understand your point about nnet not using active function.

Is there any default regularisation in mxnet which might be affecting performance? (I am looking at training performance only, so should turn off all regularisation for a fair comparison)

$mxModel1
[1] 0.1308567098

$mxModel2
[1] 0.1187877492

$nnet
[1] 0.02919336078

$neuralnet
[1] 0.02978636594

$linearModel
[1] 0.1308205827

khalida · 2016-12-21T01:45:12Z

It seems that using "rmsprop" for optimization offers a further improvement, as well as increasing the batch size.

For reference pasted below is a version of the code and results in which mxnet performs well compared to neuralnet and nnet for this simple regression task.

Many thanks for the help.

RMSE errors for the 5 models:

$mxModel1
[1] 0.1404579862

$mxModel2
[1] 0.03263213499

$nnet
[1] 0.03222651138

$neuralnet
[1] 0.03054112057

$linearModel
[1] 0.1404421006

Plots of training fit (linear model results shown in green):

And code that produces these:

## SIMPLE REGRESSION PROBLEM
# Check mxnet out-of-the-box performance VS neuralnet, and caret/nnet

library(mxnet)
library(neuralnet)
library(nnet)
library(caret)
library(tictoc)
library(reshape)

# Data definitions
nObservations <- 1000
noiseLvl <- 0.1

# Network config
nHidden <- 3
batchSize <- 100
nRound <- 400
verbose <- FALSE
array.layout = "rowmajor"
optimizer <- "rmsprop"

# GENERATE DATA:
set.seed(0)
df <- data.frame(x1=runif(nObservations),
                 x2=runif(nObservations),
                 x3=runif(nObservations))

df$y <- df$x1 + df$x2^2 + df$x3^3 + noiseLvl*runif(nObservations)
# normalize data columns
# df <- scale(df)

# Seperate data into train/test
test.ind = seq(1, nObservations, 10)    # 1 in 10 samples for testing
train.x = data.matrix(df[-test.ind, -which(colnames(df) %in% c("y"))])
train.y = df[-test.ind, "y"]
test.x = data.matrix(df[test.ind, -which(colnames(df) %in% c("y"))])
test.y = df[test.ind, "y"]

# Define mxnet network, following 5-minute regression example from here:
# http://mxnet-tqchen.readthedocs.io/en/latest//packages/r/fiveMinutesNeuralNetwork.html#regression
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden=1, name="fc1")
lro1 <- mx.symbol.LinearRegressionOutput(data=fc1, label=label, name="lro")

# Train MXNET model
mx.set.seed(0)
tic("mxnet training 1")
mxModel1 <- mx.model.FeedForward.create(lro1, X=train.x, y=train.y,
                                        eval.data=list(data=test.x, label=test.y),
                                        ctx=mx.cpu(), num.round=nRound,
                                        array.batch.size=batchSize,
                                        eval.metric=mx.metric.rmse,
                                        verbose=verbose,
                                        array.layout=array.layout,
                                        optimizer=optimizer
                                        )
toc()

# Train network with a hidden layer
fc1 <- mx.symbol.FullyConnected(data, num_hidden=nHidden, name="fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type="tanh", name="tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden=1, name="fc2")
lro2 <- mx.symbol.LinearRegressionOutput(data=fc2, label=label, name="lro2")
tic("mxnet training 2")
mx.set.seed(0)
mxModel2 <- mx.model.FeedForward.create(lro2, X=train.x, y=train.y,
                                        eval.data=list(data=test.x, label=test.y),
                                        ctx=mx.cpu(), num.round=nRound,
                                        array.batch.size=batchSize,
                                        eval.metric=mx.metric.rmse,
                                        verbose=verbose,
                                        array.layout=array.layout,
                                        optimizer=optimizer
                                        )
toc()

# Train neuralnet model
set.seed(0)
tic("neuralnet training")
nnModel <- neuralnet(y~x1+x2+x3, data=df[-test.ind, ], hidden=c(nHidden),
                     linear.output=TRUE, stepmax=1e6)
toc()
# Train caret model
set.seed(0)
tic("nnet training")
nnetModel <- nnet(y~x1+x2+x3, data=df[-test.ind, ], size=nHidden, trace=F,
                   linout=TRUE)
toc()

# Check response VS targets on training data:
par(mfrow=c(2,2))
plot(train.y, compute(nnModel, train.x)$net.result, 
     main="neuralnet Train Fitting Fake Data", xlab="Target", ylab="Response")
abline(0,1, col="red")

# Plot linear model performance for reference
linearModel <- linearModel <- lm(y~., df[-test.ind, ])
points(train.y, predict(linearModel, data.frame(train.x)), col="green")

plot(train.y, predict(nnetModel, train.x), 
     main="nnet Train Fitting Fake Data", xlab="Target", ylab="Response")
abline(0,1, col="red")

plot(train.y, predict(mxModel1, train.x, array.layout=array.layout), 
     main="MXNET (no hidden) Train Fitting Fake Data", xlab="Target",
     ylab="Response")
abline(0,1, col="red")

plot(train.y, predict(mxModel2, train.x, array.layout=array.layout),
     main="MXNET (with hidden) Train Fitting Fake Data", xlab="Target",
     ylab="Response")
abline(0,1, col="red")

# Create and print table of results:
results <- list()
rmse <- function(target, response) {
  return(sqrt(mean((target - response)^2)))
}
results$mxModel1 <- rmse(train.y, predict(mxModel1, train.x,
                                          array.layout=array.layout))
results$mxModel2 <- rmse(train.y, predict(mxModel2, train.x,
                                          array.layout=array.layout))
results$nnet <- rmse(train.y, predict(nnetModel, train.x))
results$neuralnet <- rmse(train.y, compute(nnModel, train.x)$net.result)
results$linearModel <- rmse(train.y, predict(linearModel, data.frame(train.x)))

print(results)

khalida mentioned this issue Dec 20, 2016

custom loss symbol in R/Python #3368

Closed

khalida closed this as completed Dec 21, 2016

gwern mentioned this issue Mar 27, 2017

Issues on the latest Windows for R #5281

Closed

Mrugankakarte mentioned this issue Apr 28, 2017

Same prediction values for linear regression while using mxnet in R #5979

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Poor performance of mxnet LinearRegressionOutput #4287

Poor performance of mxnet LinearRegressionOutput #4287

khalida commented Dec 19, 2016

uzhao commented Dec 20, 2016

khalida commented Dec 21, 2016

khalida commented Dec 21, 2016

Poor performance of mxnet LinearRegressionOutput #4287

Poor performance of mxnet LinearRegressionOutput #4287

Comments

khalida commented Dec 19, 2016

uzhao commented Dec 20, 2016

khalida commented Dec 21, 2016

khalida commented Dec 21, 2016