-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Something strange about the update of theta and psi in the inner loop #13
Comments
Thanks. Yes, theta is updated for each task while psi_i only updated for corresponding task. Gradients are not calculated separately as you suspect, once we updated the theta and psi_i, in the inner loop using backward pass, in the outerloop only theta is updated using a weighted average of all task. |
Thanks and i get it. And here l have another question. When i make the train and test in Minist, i find something strange about the accuracy after the meta test. The accuracy of the first task reach alomost 100 percent, and the accuracies of the second task , the third task and so on become lower and lower. It make me so confused and i can't find the answer in the code and paper. Can you explain it to me? THANKS |
are you referring to this issue? |
Thanks. I get it. I have found out the reason of my issue. I wanna transform the method to a new dataset, but the number of pictures belonging to different tasks is unbalanced. Just for this, the accuracies corresponding to different tasks have big gap. |
Here, sorry, i have to disturb you again. I have another question.When i read the code as: main_learner=Learner(model=model,args=args,trainloader=train_loader, testloader=test_loader, use_cuda=use_cuda) Here, i find the function Learner() doesn't use the memory message. And you use memory in meta-test. And when i read the paper, i find that when you train the model, you use both the new task and memory message. Can you explain it to me. And my first question about the update of the theta and psi. When i read the code, i am also confused. Here i show the code of the outer update.
Here the p refers to the mode's whole parameter which can't match what you say,'in the outerloop only theta is updated using a weighted average of all task'. |
No worries, Learner takes the dataloader as the inputs, which is generated from here. the train_loader contains data from both new task and memory. for the next part, the gradients are calculated only for the part of the fully connected layer. so only the classification parameters for the task will be updated. Line 148 in e56e72b
And, about adding parameters, this was much faster. However, we can use the below part as well.
|
Hi Jathushan,
Thanks for your awesome work.
Though I have a question in your paper. In Page3, in the paragraph whose name is Inner loop, you illustrate that Here, theta is updated in the inner loop for all tasks, but psi_i is only updated for i_th task. You have motioned in the above that you take apart the model into two parts where theta corresponds to the part which can get the feature vector v and meantime, psi corresponds to the part which can get the predictions p. But in my opinion, for a model, the two parts are organized unitedly and when we train the model, the backward of this two parts is simultaneous. So i am confused how you manage the separate update because according to the Algorithm 1 in Page 3 and the train function in the code, i can't catch the point.
Wonder if i miss something again and could you explain that for me pls?
Best.
C
The text was updated successfully, but these errors were encountered: