Something strange about the update of theta and psi in the inner loop #13

genghuanlee · 2020-07-20T09:28:44Z

Hi Jathushan，

Thanks for your awesome work.
Though I have a question in your paper. In Page3, in the paragraph whose name is Inner loop, you illustrate that Here, theta is updated in the inner loop for all tasks, but psi_i is only updated for i_th task. You have motioned in the above that you take apart the model into two parts where theta corresponds to the part which can get the feature vector v and meantime, psi corresponds to the part which can get the predictions p. But in my opinion, for a model, the two parts are organized unitedly and when we train the model, the backward of this two parts is simultaneous. So i am confused how you manage the separate update because according to the Algorithm 1 in Page 3 and the train function in the code, i can't catch the point.
Wonder if i miss something again and could you explain that for me pls?

Best.
C

brjathu · 2020-07-20T09:59:49Z

Thanks. Yes, theta is updated for each task while psi_i only updated for corresponding task. Gradients are not calculated separately as you suspect, once we updated the theta and psi_i, in the inner loop using backward pass, in the outerloop only theta is updated using a weighted average of all task.

genghuanlee · 2020-07-23T09:15:29Z

Thanks and i get it. And here l have another question. When i make the train and test in Minist, i find something strange about the accuracy after the meta test. The accuracy of the first task reach alomost 100 percent, and the accuracies of the second task , the third task and so on become lower and lower. It make me so confused and i can't find the answer in the code and paper. Can you explain it to me? THANKS

brjathu · 2020-07-23T09:40:17Z

are you referring to this issue?
#10

genghuanlee · 2020-07-27T09:15:04Z

Thanks. I get it. I have found out the reason of my issue. I wanna transform the method to a new dataset, but the number of pictures belonging to different tasks is unbalanced. Just for this, the accuracies corresponding to different tasks have big gap.

genghuanlee · 2020-07-27T09:30:58Z

Here, sorry, i have to disturb you again. I have another question.When i read the code as:

main_learner=Learner(model=model,args=args,trainloader=train_loader, testloader=test_loader, use_cuda=use_cuda)
main_learner.learn()
memory = inc_dataset.get_memory(memory, for_memory)
acc_task = main_learner.meta_test(main_learner.best_model, memory, inc_dataset)

Here, i find the function Learner() doesn't use the memory message. And you use memory in meta-test. And when i read the paper, i find that when you train the model, you use both the new task and memory message. Can you explain it to me.

And my first question about the update of the theta and psi. When i read the code, i am also confused. Here i show the code of the outer update.

        for i,(p,q) in enumerate(zip(model.parameters(), model_base.parameters())):
            alpha = np.exp(-self.args.beta*((1.0*self.args.sess)/self.args.num_task))
            ll = torch.stack(reptile_grads[i])
            p.data = torch.mean(ll,0)*(alpha) + (1-alpha)* q.data

Here the p refers to the mode's whole parameter which can't match what you say,'in the outerloop only theta is updated using a weighted average of all task'.

brjathu · 2020-07-28T01:53:54Z

No worries,

Learner takes the dataloader as the inputs, which is generated from here.
task_info, train_loader, val_loader, test_loader, for_memory = inc_dataset.new_task(memory)

the train_loader contains data from both new task and memory.

for the next part, the gradients are calculated only for the part of the fully connected layer. so only the classification parameters for the task will be updated.

iTAML/learner_task_itaml.py

Line 148 in e56e72b

    
           loss = F.binary_cross_entropy_with_logits(class_pre_ce[:, ai:bi], class_tar_ce[:, ai:bi])

And, about adding parameters, this was much faster. However, we can use the below part as well.

for i,(p,q) in enumerate(zip(model.parameters(), model_base.parameters())):
                alpha = np.exp(-self.args.beta*((1.0*self.args.sess)/self.args.num_task))
                ll = torch.stack(reptile_grads[i])
                if(p.data.size()[0]==10 and p.data.size()[1]==256):
                     for ik in sessions:
                           p.data[2*ik[0]:2*(ik[0]+1),:] = ll[ik[1]][2*ik[0]:2*(ik[0]+1),:]*(alpha) + (1-alpha)* q.data[2*ik[0]:2*(ik[0]+1),:]  
                else:
                     p.data = torch.mean(ll,0)*(alpha) + (1-alpha)* q.data

salman-h-khan closed this as completed Dec 30, 2020

LeungWaiHo mentioned this issue May 31, 2021

A question about theta and task-specific phi #21

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Something strange about the update of theta and psi in the inner loop #13

Something strange about the update of theta and psi in the inner loop #13

genghuanlee commented Jul 20, 2020 •

edited

Loading

brjathu commented Jul 20, 2020

genghuanlee commented Jul 23, 2020

brjathu commented Jul 23, 2020

genghuanlee commented Jul 27, 2020

genghuanlee commented Jul 27, 2020

brjathu commented Jul 28, 2020 •

edited

Loading

Something strange about the update of theta and psi in the inner loop #13

Something strange about the update of theta and psi in the inner loop #13

Comments

genghuanlee commented Jul 20, 2020 • edited Loading

brjathu commented Jul 20, 2020

genghuanlee commented Jul 23, 2020

brjathu commented Jul 23, 2020

genghuanlee commented Jul 27, 2020

genghuanlee commented Jul 27, 2020

brjathu commented Jul 28, 2020 • edited Loading

genghuanlee commented Jul 20, 2020 •

edited

Loading

brjathu commented Jul 28, 2020 •

edited

Loading