Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Something strange about the update of theta and psi in the inner loop #13

Closed
genghuanlee opened this issue Jul 20, 2020 · 6 comments
Closed

Comments

@genghuanlee
Copy link

genghuanlee commented Jul 20, 2020

Hi Jathushan,

Thanks for your awesome work.
Though I have a question in your paper. In Page3, in the paragraph whose name is Inner loop, you illustrate that Here, theta is updated in the inner loop for all tasks, but psi_i is only updated for i_th task. You have motioned in the above that you take apart the model into two parts where theta corresponds to the part which can get the feature vector v and meantime, psi corresponds to the part which can get the predictions p. But in my opinion, for a model, the two parts are organized unitedly and when we train the model, the backward of this two parts is simultaneous. So i am confused how you manage the separate update because according to the Algorithm 1 in Page 3 and the train function in the code, i can't catch the point.
Wonder if i miss something again and could you explain that for me pls?

Best.
C

@brjathu
Copy link
Owner

brjathu commented Jul 20, 2020

Thanks. Yes, theta is updated for each task while psi_i only updated for corresponding task. Gradients are not calculated separately as you suspect, once we updated the theta and psi_i, in the inner loop using backward pass, in the outerloop only theta is updated using a weighted average of all task.

@genghuanlee
Copy link
Author

Thanks and i get it. And here l have another question. When i make the train and test in Minist, i find something strange about the accuracy after the meta test. The accuracy of the first task reach alomost 100 percent, and the accuracies of the second task , the third task and so on become lower and lower. It make me so confused and i can't find the answer in the code and paper. Can you explain it to me? THANKS

@brjathu
Copy link
Owner

brjathu commented Jul 23, 2020

are you referring to this issue?
#10

@genghuanlee
Copy link
Author

Thanks. I get it. I have found out the reason of my issue. I wanna transform the method to a new dataset, but the number of pictures belonging to different tasks is unbalanced. Just for this, the accuracies corresponding to different tasks have big gap.

@genghuanlee
Copy link
Author

Here, sorry, i have to disturb you again. I have another question.When i read the code as:

main_learner=Learner(model=model,args=args,trainloader=train_loader, testloader=test_loader, use_cuda=use_cuda)
main_learner.learn()
memory = inc_dataset.get_memory(memory, for_memory)
acc_task = main_learner.meta_test(main_learner.best_model, memory, inc_dataset)

Here, i find the function Learner() doesn't use the memory message. And you use memory in meta-test. And when i read the paper, i find that when you train the model, you use both the new task and memory message. Can you explain it to me.

And my first question about the update of the theta and psi. When i read the code, i am also confused. Here i show the code of the outer update.

        for i,(p,q) in enumerate(zip(model.parameters(), model_base.parameters())):
            alpha = np.exp(-self.args.beta*((1.0*self.args.sess)/self.args.num_task))
            ll = torch.stack(reptile_grads[i])
            p.data = torch.mean(ll,0)*(alpha) + (1-alpha)* q.data  

Here the p refers to the mode's whole parameter which can't match what you say,'in the outerloop only theta is updated using a weighted average of all task'.

@brjathu
Copy link
Owner

brjathu commented Jul 28, 2020

No worries,

Learner takes the dataloader as the inputs, which is generated from here.
task_info, train_loader, val_loader, test_loader, for_memory = inc_dataset.new_task(memory)

the train_loader contains data from both new task and memory.

for the next part, the gradients are calculated only for the part of the fully connected layer. so only the classification parameters for the task will be updated.

loss = F.binary_cross_entropy_with_logits(class_pre_ce[:, ai:bi], class_tar_ce[:, ai:bi])

And, about adding parameters, this was much faster. However, we can use the below part as well.

for i,(p,q) in enumerate(zip(model.parameters(), model_base.parameters())):
                alpha = np.exp(-self.args.beta*((1.0*self.args.sess)/self.args.num_task))
                ll = torch.stack(reptile_grads[i])
                if(p.data.size()[0]==10 and p.data.size()[1]==256):
                     for ik in sessions:
                           p.data[2*ik[0]:2*(ik[0]+1),:] = ll[ik[1]][2*ik[0]:2*(ik[0]+1),:]*(alpha) + (1-alpha)* q.data[2*ik[0]:2*(ik[0]+1),:]  
                else:
                     p.data = torch.mean(ll,0)*(alpha) + (1-alpha)* q.data 

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants