Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Poor performance of mxnet LinearRegressionOutput #4287

Closed
khalida opened this issue Dec 19, 2016 · 3 comments
Closed

Poor performance of mxnet LinearRegressionOutput #4287

khalida opened this issue Dec 19, 2016 · 3 comments

Comments

@khalida
Copy link

khalida commented Dec 19, 2016

I have been unable to get reasonable performance using mxnet LinearRegressionOutput layer.

Full details of the problem including self-contained example I have given in the following SO question.

The question may seen rather broad (I'm getting poor performance), so the answer should perhaps be the obvious (do some hyper-parameter tuning). However given the simplicity of the regression problem considered, and the much better performance of other neural-net libraries "out-of-the-box" I thought this might be of general interest.

@uzhao
Copy link

uzhao commented Dec 20, 2016

For a network without hidden layer, the best performance will match result from lm. If you change optimizer to adam without fixed learning rate, you will get a reasonable outcome. For network with layer, I don't think nnet use active function, and also optimizer is a potential issue here.

@khalida
Copy link
Author

khalida commented Dec 21, 2016

Many thanks for the response. Removing the fixed learning rate and changing to the adam optimizer was a big help. Results are attached.

I have also included the performance of lm for reference (in green). The training root-mean-squared-error for the five models are given below.

As you have stated the mxModel1 (mxnet regression without hidden layer) converges to the performance of the linear model as expected. The mxModel2 (mxnet regression with a single hidden layer) is significantly out-performed by other tools on this particular regression task.

I don't understand your point about nnet not using active function.

Is there any default regularisation in mxnet which might be affecting performance? (I am looking at training performance only, so should turn off all regularisation for a fair comparison)

$mxModel1
[1] 0.1308567098

$mxModel2
[1] 0.1187877492

$nnet
[1] 0.02919336078

$neuralnet
[1] 0.02978636594

$linearModel
[1] 0.1308205827

mxnet_regression_performance

@khalida
Copy link
Author

khalida commented Dec 21, 2016

It seems that using "rmsprop" for optimization offers a further improvement, as well as increasing the batch size.

For reference pasted below is a version of the code and results in which mxnet performs well compared to neuralnet and nnet for this simple regression task.

Many thanks for the help.

RMSE errors for the 5 models:

$mxModel1
[1] 0.1404579862

$mxModel2
[1] 0.03263213499

$nnet
[1] 0.03222651138

$neuralnet
[1] 0.03054112057

$linearModel
[1] 0.1404421006

Plots of training fit (linear model results shown in green):

mxnet_regression_performance_rmsprop

And code that produces these:

## SIMPLE REGRESSION PROBLEM
# Check mxnet out-of-the-box performance VS neuralnet, and caret/nnet

library(mxnet)
library(neuralnet)
library(nnet)
library(caret)
library(tictoc)
library(reshape)

# Data definitions
nObservations <- 1000
noiseLvl <- 0.1

# Network config
nHidden <- 3
batchSize <- 100
nRound <- 400
verbose <- FALSE
array.layout = "rowmajor"
optimizer <- "rmsprop"

# GENERATE DATA:
set.seed(0)
df <- data.frame(x1=runif(nObservations),
                 x2=runif(nObservations),
                 x3=runif(nObservations))

df$y <- df$x1 + df$x2^2 + df$x3^3 + noiseLvl*runif(nObservations)
# normalize data columns
# df <- scale(df)

# Seperate data into train/test
test.ind = seq(1, nObservations, 10)    # 1 in 10 samples for testing
train.x = data.matrix(df[-test.ind, -which(colnames(df) %in% c("y"))])
train.y = df[-test.ind, "y"]
test.x = data.matrix(df[test.ind, -which(colnames(df) %in% c("y"))])
test.y = df[test.ind, "y"]

# Define mxnet network, following 5-minute regression example from here:
# http://mxnet-tqchen.readthedocs.io/en/latest//packages/r/fiveMinutesNeuralNetwork.html#regression
data <- mx.symbol.Variable("data")
label <- mx.symbol.Variable("label")
fc1 <- mx.symbol.FullyConnected(data, num_hidden=1, name="fc1")
lro1 <- mx.symbol.LinearRegressionOutput(data=fc1, label=label, name="lro")

# Train MXNET model
mx.set.seed(0)
tic("mxnet training 1")
mxModel1 <- mx.model.FeedForward.create(lro1, X=train.x, y=train.y,
                                        eval.data=list(data=test.x, label=test.y),
                                        ctx=mx.cpu(), num.round=nRound,
                                        array.batch.size=batchSize,
                                        eval.metric=mx.metric.rmse,
                                        verbose=verbose,
                                        array.layout=array.layout,
                                        optimizer=optimizer
                                        )
toc()

# Train network with a hidden layer
fc1 <- mx.symbol.FullyConnected(data, num_hidden=nHidden, name="fc1")
tanh1 <- mx.symbol.Activation(fc1, act_type="tanh", name="tanh1")
fc2 <- mx.symbol.FullyConnected(tanh1, num_hidden=1, name="fc2")
lro2 <- mx.symbol.LinearRegressionOutput(data=fc2, label=label, name="lro2")
tic("mxnet training 2")
mx.set.seed(0)
mxModel2 <- mx.model.FeedForward.create(lro2, X=train.x, y=train.y,
                                        eval.data=list(data=test.x, label=test.y),
                                        ctx=mx.cpu(), num.round=nRound,
                                        array.batch.size=batchSize,
                                        eval.metric=mx.metric.rmse,
                                        verbose=verbose,
                                        array.layout=array.layout,
                                        optimizer=optimizer
                                        )
toc()

# Train neuralnet model
set.seed(0)
tic("neuralnet training")
nnModel <- neuralnet(y~x1+x2+x3, data=df[-test.ind, ], hidden=c(nHidden),
                     linear.output=TRUE, stepmax=1e6)
toc()
# Train caret model
set.seed(0)
tic("nnet training")
nnetModel <- nnet(y~x1+x2+x3, data=df[-test.ind, ], size=nHidden, trace=F,
                   linout=TRUE)
toc()

# Check response VS targets on training data:
par(mfrow=c(2,2))
plot(train.y, compute(nnModel, train.x)$net.result, 
     main="neuralnet Train Fitting Fake Data", xlab="Target", ylab="Response")
abline(0,1, col="red")

# Plot linear model performance for reference
linearModel <- linearModel <- lm(y~., df[-test.ind, ])
points(train.y, predict(linearModel, data.frame(train.x)), col="green")

plot(train.y, predict(nnetModel, train.x), 
     main="nnet Train Fitting Fake Data", xlab="Target", ylab="Response")
abline(0,1, col="red")

plot(train.y, predict(mxModel1, train.x, array.layout=array.layout), 
     main="MXNET (no hidden) Train Fitting Fake Data", xlab="Target",
     ylab="Response")
abline(0,1, col="red")

plot(train.y, predict(mxModel2, train.x, array.layout=array.layout),
     main="MXNET (with hidden) Train Fitting Fake Data", xlab="Target",
     ylab="Response")
abline(0,1, col="red")

# Create and print table of results:
results <- list()
rmse <- function(target, response) {
  return(sqrt(mean((target - response)^2)))
}
results$mxModel1 <- rmse(train.y, predict(mxModel1, train.x,
                                          array.layout=array.layout))
results$mxModel2 <- rmse(train.y, predict(mxModel2, train.x,
                                          array.layout=array.layout))
results$nnet <- rmse(train.y, predict(nnetModel, train.x))
results$neuralnet <- rmse(train.y, compute(nnModel, train.x)$net.result)
results$linearModel <- rmse(train.y, predict(linearModel, data.frame(train.x)))

print(results)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants