-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAML makes weights stuck in some bad equilibrium #18
Comments
That's strange. I haven't seen anything like that before. Have you tuned the update_lr? (alpha in the paper) It's possible that it might be much too large or much too small. Can you visualize the motor current for multiple tasks? Are you using a different sample for the inner and outer objectives? (You should be.) |
Yes I played around a lot with the learning rates. I found out that it is very sensible to those learning rates, and doesn't learn at all, if its wrong. Currently i have meta_lr 0.0001 and update_lr 0.00001 while having num_updates 3. I recognized, if I make num_updates > 1, I have to make update_lr rather small that it learns anything - is this probably because the updates regarding every single task get to different, when going many steps into one task?! I think for some reason, now the equilibrium is no problem anymore, don't know why. It takes me about 100 iterations of finetuning with a higher learning rate to come to a good accuracy. What number of inner gradient steps do you normally use for good results? The motor currents for the multiple tasks are not so different. Just having their spikes etc. at different locations. What exactly does the console output of the preloss and postloss mean. Loss before updating the inner gradients and afterwards?! Are you using a different sample for the inner and outer objectives? (You should be.)
|
> The motor currents for the multiple tasks are not so different. Just
having their spikes etc. at different locations.
In that case, I would at least expect the base current to be learnable.
You might consider using gradient clipping or meta-gradient clipping, which
we have found can stabilize training in different settings. For example, I
typically use a meta_lr of 0.001 (the default for Adam). But if you see
spikes in meta-training performance with this learning rate, it would make
sense to clip the meta-gradients.
If you have to set the inner learning rate to be very small, it's possible
that a larger learning rate with clipping could produce better performance.
I guess this is creating different samples for the inner and outer
objective?! a single tasks, b meta training?!
Yes, that code creates different samples for the inner and outer objective.
a corresponds to the inner objective and b corresponds to the outer
objective. I just wanted to make sure that you didn't modify that code, and
that the inner and outer data corresponds to the same "task".
…On Thu, Nov 9, 2017 at 1:35 AM, kapsl ***@***.***> wrote:
Yes I played around a lot with the learning rates. I found out that it is
very sensible to those learning rates, and doesn't learn at all, if its
wrong. Currently i have meta_lr 0.0001 and update_lr 0.00001 while having
num_updates 3. I recognized, if make num_updates > 1 I have to make
update_lr rather small that it learns anything - is this probably because
the updates regarding every single task get to different when going many
steps into one task?!
It also seems like it only reaches the straight line equilibrium if meta
training runs pretty long. But I have to run more tests here.
The motor currents for the multiple tasks are not so different. Just
having their spikes etc. at different locations.
*Are you using a different sample for the inner and outer objectives? (You
should be.)*
I guess this is creating different samples for the inner and outer
objective?! a single tasks, b meta training?!
inputa = tf.slice(image_tensor, [0,0,0], [-1,num_classes*FLAGS.update_batch_size, -1])
inputb = tf.slice(image_tensor, [0,num_classes*FLAGS.update_batch_size, 0], [-1,-1,-1])
labela = tf.slice(label_tensor, [0,0,0], [-1,num_classes*FLAGS.update_batch_size, -1])
labelb = tf.slice(label_tensor, [0,num_classes*FLAGS.update_batch_size, 0], [-1,-1,-1])
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#18 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABMlAfvTh7pTNjUgooY3NrdbsyLrv8Eqks5s0sdRgaJpZM4QV-Pj>
.
|
Hi Chelsea,
I'm trying to use MAML for training an convolutional autoencoder, which should learn to encode robot motorcurrents. (Several traces from different robots, so every robot is a task, the traces are the samples of the task.)
I got this working in general, but it seems like MAML drives the weights into a direction, so that it produces a complete straight line, at the baseline of the amplitude. (Which in general probably makes sense?! because from this it is rather easy to go train to new motorcurrents?!) (See figure below)
It seems that the problem is, that when I try to finetune afterward for one robot, the optimization doesn't guide the weights out of this straight line equilibrium. So it actually doesn't work. I tried to add some noise to the weights, then it somehow goes out, but then the MAML pretraining is not preserved really well...
This graph shows the output of the autoencoder after MAML training
This graph (the green one) shows the real motor current. MAML seems to find a straight line at the base of the amplitude, but when trying to finetune to this task, it doesn't get out, anymore.
The text was updated successfully, but these errors were encountered: