Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

[R] Consider make model structure compatible with Rdata #362

Closed
tqchen opened this issue Oct 22, 2015 · 13 comments
Closed

[R] Consider make model structure compatible with Rdata #362

tqchen opened this issue Oct 22, 2015 · 13 comments
Labels

Comments

@tqchen
Copy link
Member

tqchen commented Oct 22, 2015

Currently most things are Rcpp based, which means things are not compatible with Rdata. Due to the limitation of current R's serialization system without a customized loading function.

The only way to make things Rdata compatible is to eagerly dump state into raw state every-time a new object is returned, and in all functions, check externalptr, if it is null, load things from the raw state. This is how things are handled in xgboost.

Such way is dumb, and cost a lot of overhead and do not make sense for low level API. However, it might be possible to support this way for https://github.com/dmlc/mxnet/blob/master/R-package/R/model.R#L71 model structure. So at least user can save the model in Rdata, while things still won't be perfect, this might be helpful.

This is not an urgent thing, but maybe worth considering

@tqchen
Copy link
Member Author

tqchen commented Oct 22, 2015

cc @hetong007 @thirdwing

@tqchen
Copy link
Member Author

tqchen commented Oct 22, 2015

This might involve a bit more complicated wrapping of the model, to make it an S4 structure, so the symbol, arg.params can be get out like property function of Rcpp, which allows the chance of nullptr checking and recovery

@tqchen
Copy link
Member Author

tqchen commented Oct 29, 2015

@topepo I guess this thread will be related to what you are trying to do

@thirdwing
Copy link
Contributor

Just think about this for a little while.

What about saving the model as a json string in Rdata. When loading, we parse the json string.

This can be done by adding two helper functions.

@topepo
Copy link

topepo commented Nov 18, 2015

My viewpoint is mostly around prediction using the model object so I'll focus on how we could export the prediction function. I can think of two options:

  • some stand-alone C programs like C5.0 drop the model parameters (splits and variables in that case) into a text file that is then parsed at prediction time by separate code. You could have a defined format to export the weights and biases and some R or C++ code could parse it for making predictions. This is basically what @thirdwing is proposing (I think)
  • it wouldn't be difficult, even for large networks, to have your code write an actual R function for the prediction equation (including the weight and bias values to full precision). It is kludgy but is the most portable way of accomplishing the goal.

Thanks,

Max

@tqchen
Copy link
Member Author

tqchen commented Nov 18, 2015

It is never hard to explicit save something that is serializable in Rdata form. We have ways to save graph to json string, and ndarrays(parameters) to raw type in R.

Problem is on eagerly saving things to object everytime an object is generated. Which could be costly for object such as array(means need to dump things from GPU to CPU everytime an operation is calculated).

  • This eager saving approach may not be too bad for model object, since model is rather costly to build.
  • It will also work if the upper level tool calling mx provide some explicit wrapper that calls the save.to.robject, load.from.robject.

@thirdwing
Copy link
Contributor

Let me write mx.model.save.Rdata/mx.model.load.Rdata and do some benchmarking.

@sandeep-krishnamurthy
Copy link
Contributor

@thirdwing - Is this implemented? Can we close this issue?

@topepo
Copy link

topepo commented May 30, 2017

I don't think so (at least as of Apr 14).

What I was requesting was to be able to save the fitted model object (in its native class) so that it can be re-used in future sessions. A workflow like:

model <- mx.mlp(data = x, label = y)
save(mode, file = "model.RData"")
q("no")

## new R session
load("model.RData")
predict(model, newdata)

Having an export function doesn't really solve that problem. Maybe an intermediate step of

model <- serialize(model)

prior to saving would be a solution.

@thirdwing
Copy link
Contributor

thirdwing commented May 30, 2017

@topepo I have added two helper functions. The network symbol and parameters will be saved using saveRDS.

I restart the R session after saving, so the external pointer won't work. Can you give some advice on this?

require(mlbench)
require(mxnet)

data(Sonar, package = "mlbench")

Sonar[,61] <- as.numeric(Sonar[,61])-1
train.ind <- c(1:50, 100:150)
train.x <- data.matrix(Sonar[train.ind, 1:60])
train.y <- Sonar[train.ind, 61]
test.x <- data.matrix(Sonar[-train.ind, 1:60])
test.y <- Sonar[-train.ind, 61]

mx.set.seed(0)
model <- mx.mlp(train.x, train.y, hidden_node=10, out_node=2, out_activation="softmax",
                num.round=20, array.batch.size=15, learning.rate=0.07, momentum=0.9, 
                eval.metric=mx.metric.accuracy)


mx.model.save.RData(model = model, filename = "test.RData")


#### restart the R session

require(mlbench)
require(mxnet)

data(Sonar, package = "mlbench")

Sonar[,61] <- as.numeric(Sonar[,61])-1
train.ind <- c(1:50, 100:150)
train.x <- data.matrix(Sonar[train.ind, 1:60])
train.y <- Sonar[train.ind, 61]
test.x <- data.matrix(Sonar[-train.ind, 1:60])
test.y <- Sonar[-train.ind, 61]

model <- mx.model.load.RData("test.RData")

preds <- predict(model, test.x)
pred.label <- max.col(t(preds)) - 1
table(pred.label, test.y)

thirdwing added a commit to thirdwing/mxnet that referenced this issue May 30, 2017
@jaredlander
Copy link

This will help a lot. Has this been merged upstream and are binaries ready?

Also, people have really started to use RDS files or RData. Extending this functionality to RDS would be huge if you can.

@thirdwing
Copy link
Contributor

@jaredlander This has been merged in #6494 .

After mx.serialize, it is basically the R object, you can save into RDS as you like.

@jaredlander
Copy link

OK, will look into this. Not sure that function is in version 0.10.1.

Guneet-Dhillon pushed a commit to Guneet-Dhillon/mxnet that referenced this issue Sep 13, 2017
anirudh2290 pushed a commit to anirudh2290/mxnet that referenced this issue Mar 9, 2018
* More Windows compile fixes

* Expand timer delay in order to reduce error due to slow CI machine (Mac)

* fix signed/unsigned warnings
iblislin added a commit to iblislin/incubator-mxnet that referenced this issue Mar 18, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants