Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Fine-tune with R API #4817

Closed
statist-bhfz opened this issue Jan 27, 2017 · 9 comments
Closed

Fine-tune with R API #4817

statist-bhfz opened this issue Jan 27, 2017 · 9 comments
Labels

Comments

@statist-bhfz
Copy link
Contributor

statist-bhfz commented Jan 27, 2017

Hi,

Many thanks for this library which provides almost only way to do deep learning in R. Current R documentation is not very comprehensive, but examples and discussions in issues help a lot.

Now I got stuck when looking for R equivalent of get_internals() method in Python (http://mxnet.io/how_to/finetune.html).
Unfortunately, model$symbol$get.internals() doesn't return any adoptable R object, so modifying symbol in pre-trained model seems to be impossible in R API without direct editing of symbol.json file.

Is it possible to add in the R package capabilities for writing code like this?

all_layers = sym.get_internals()

net = all_layers[layer_name+'_output']

net = mx.symbol.FullyConnected(data=net, num_hidden=num_classes, name='fc1')

net = mx.symbol.SoftmaxOutput(data=net, name='softmax')

new_args = dict({k:arg_params[k] for k in arg_params if 'fc1' not in k})
@thirdwing thirdwing self-assigned this Jan 27, 2017
@thirdwing
Copy link
Contributor

I think the last line may not be possible. Others should be OK.

@jeremiedb
Copy link
Contributor

Hello, I think you should actually be able to do it by using the get.output:

resnet101<- mx.model.load("Models/Resnet101/resnet-101", iteration=0)

symbol<- resnet101$symbol
internals<- symbol$get.internals()
outputs<- internals$outputs

flatten<- internal2$get.output(which(outputs=="flatten0_output"))

Tricky part is that get.output seems to only accept a numeric index and not a named one.

After, it should be relatively straightforward through the use of the initializers:

new_fc<- mx.symbol.FullyConnected(data=flatten, num_hidden=24, name="new_fc_24") 
new_soft = mx.symbol.SoftmaxOutput(data=new_fc, name='new_softmax')

arg_params_ori<- resnet101$arg.params
fc1_weights_ori<- arg_params_ori[["fc1_weight"]]

devices<- mx.ctx.default()
arg_params_new<- mxnet:::mx.model.init.params(symbol = new_soft, input.shape = c(224,224,3,32), initializer = mxnet:::mx.init.uniform(0.1), ctx = devices)$arg.params
fc1_weights_new<- arg_params_new[["new_fc_24_weight"]]

### Alternative
arg_params_new2<- mx.init.create(initializer = mx.init.uniform(0.1), shape.array = mx.symbol.infer.shape(new_soft, data=c(224,224,3,32))$arg.shapes, ctx = devices)
fc1_weights_new2<- arg_params_new2[["new_fc_24_weight"]]

And finally reassigning the original weights to the new initialiser for all but the new FC layer:

arg_params_new_patch<- arg_params_new
arg_params_new_patch[setdiff(names(arg_params_new_patch), "new_fc_24_weight")]<- arg_params_ori[setdiff(names(arg_params_new_patch), "new_fc_24_weight")]

@statist-bhfz
Copy link
Contributor Author

statist-bhfz commented Jan 28, 2017

Thank you for the response.

Is it possible to replace the line new_args = dict({k:arg_params[k] for k in arg_params if 'fc1' not in k}) in Python with something like model$arg.params[c("fc1_weight", "fc1_bias")] <- NULL in R? Or we should always use initializers?

@OwlECoyote
Copy link

I'm not an expert, but i am pretty sure if you set all the weights to zero then a network is not able to learn anything (If that what you meant to do by setting the params to NULL)

@statist-bhfz
Copy link
Contributor Author

I hope that setting subset of parameters to NULL will force some kind of default initialization during training process for these parameters (similar to mxnet:::mx.init.uniform(0.1) in jeremiedb`s answer). Possibly I'm wrong.

@OwlECoyote
Copy link

Well, I used jeremiedb's answer for finetuning a network and had to add some lines because after calling the FeedForward function I got an error that the bias for new_fc_24 was NULL, so in that case at least there was no such default initialization.

@jeremiedb
Copy link
Contributor

Thanks OwlECoyote for pointing out the bias argument.
Effectively, the R feedforward module is designed so as to either initialize all the arguments or take the full and complete list of arguments in the arg.params.

Lines 416-417 in /model.R should make the behavior of the model clearer:

params <- mx.model.init.params(symbol, input.shape, initializer, mx.cpu())
if (!is.null(arg.params)) params$arg.params <- arg.params

To make the picture clearer, the Feedforward wrapper calls the model.train function which initiates the arg.params to 0 before assigning the Pre-trained arg.params. Therefore, if there are missing arguments like statist-bhfz mentionned, their weights will be "initialised" to 0. In this case, the model will run, but won't learn, so it's not a good idea!

Bottom line: arg.params, if provided, should simply be list containing the ndarrays for each of the model arguments (symbol$arguments, excluding data and label). And such list must have its names match the model arguments names.

@lichen11
Copy link

lichen11 commented Jan 5, 2018

Hi, I recently attempted fine-tuning ResNet101 based on your comments.

However, when I initiate training (using GPU), R outputs the following msg:

Start training with 1 devices
[19:35:16] src/operator/nn/./cudnn/./cudnn_algoreg-inl.h:107: Running performance tests to find the             best convolution algorithm, this can take a while... (setting env variable MXNET_CUDNN_AUTOTUNE_DEFAULT to 0 to disable)

Then it crashes. I updated mxnet to 1.0.0. Fine-tuning using Inception-BN at 126 does not crash. I am wondering if there is an internal bug in mxnet R to cause this crash.

@jeremiedb
Copy link
Contributor

@lichen11, please see answer in #7968

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants