Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What means about 'importance-weighted' in paper? #4

Closed
miss-rain opened this issue Oct 28, 2021 · 8 comments
Closed

What means about 'importance-weighted' in paper? #4

miss-rain opened this issue Oct 28, 2021 · 8 comments

Comments

@miss-rain
Copy link

Nice work, but i have some little question.

What means about 'importance-weighted' in paper?

in code,
### def data_weighting() in default.py,
i find self.dw_k is [1,1,1,1,........1]

### class AlwaysBeDreaming(DeepInversionGenBN) in datafree.py
the loss_kd = self.kd_criterion(logits_KD, logits_KD_past).sum(dim=1) * dw_KD
dw_KD is also [1,1,1,......1]

it's not useful,and i don't understead where means 'importance-weighted' or 'data weighting'?

@jamessealesmith
Copy link
Collaborator

jamessealesmith commented Oct 28, 2021

Thank you for the message :)

What you are referring to is the "data-weighting" vector, which weights the contribution of each data point in the loss function. This is important for class-balancing for some methods, where examples in the coreset are weighted with higher importance.

The "importance-weighted" distillation for our method is done with a frozen copy of the linear layer from the most recent task model. This weights the contribution of each feature in the embedding space. This is done in lines 340 and 341 in datafree.py.

Does this answer your question?

@miss-rain
Copy link
Author

Thanks for reply,Thanks.

"data-weighting" vector in Cifar100 datasets, the vector is [1,1,1...1], is it mean not weighted for each data point? (in default.py)

"important-weighted" in lines 340 and 341 in datafree.py.
old tast's feature as input to current task' classifier (output is logits_KD_past), and the most recent task's feature also as input to current task's classifier (output is logits_KD). Then logit_KD and logit_KD_past use L2 loss.

weights means logits_pen[kd_index] and input[kd_index] in lines 340 and 341 in datafree.py?

Thanks again

@jamessealesmith
Copy link
Collaborator

Yes, the default [1,1,1...1] simply means no class-balancing weighting. It is put there for consistent implementation wrt the methods which have non-ones in this vector.

The comment about 340 and 341 is nearly correct. logits_KD_past is the old task's features passed through the old task's linear classification head. logits_KD is the new task's features passed through the old task's linear classification head. The intuition here is that we focus on a feature distillation, but we want to penalize changes to features which were most important to the previous task. We can weight the importance of these features by using the linear head!

"self.previous_linear" is the weighting vector. Logits_pen is the penultimate feature representation from the current task's model. self.previous_teacher.generate_scores_pen(x) generates the penultimate feature representation from the previous task's model.

@miss-rain
Copy link
Author

miss-rain commented Oct 29, 2021

Great! nice idea!

In lines 338 and 340 in datafree.py

lines 338:
kd_index = np.arange(2 * self.batch_size)
(bach_size=128)
so kd_index is [0,1,2,....255]

lines 340
logits_pen = self.model.forward(x=inputs, pen=True)
so logits_pen is feature [batchSize=128, dimension=64]

i don't understand what logits_pen([kd_index]) means?
Logits_pen is the penultimate feature representation, why need logits_pen([kd_index])?

Thanks again.

@jamessealesmith
Copy link
Collaborator

Thanks! logits_pen is the penultimate distribution because the pen=True flag returns the penultimate features.

kd_index selects the data which we are using for distillation. For our method, we use all data; for other methods (such as DGR), we would only perform distillation over the synthetic/generated data. This is simply for syntax consistency with other implemented methods in our framework :)

If you only wanted to use real data for distillation, you would set kd_index = np.arange(self.batch_size). If you only wanted to use the synthetic data, you would set kd_index = np.arange(self.batch_size,2*self.batch_size)

@miss-rain
Copy link
Author

miss-rain commented Oct 29, 2021

Thanks for your replay!

logits_pen = self.model.forward(x=inputs, pen=True)

i notice parameter pen=Ture before,
**but i don't find how to implement in the code **

in my opinion, the model is composed of feature extractor and classifier
i think penultimate features is feature extractor's output, and last layer is classifier

by the way
If only wanted to use real data for distillation, and set kd_index = np.arange(self.batch_size).
Is it exactly the same as feature distillation
(not importance-weighted feature distillation)

@jamessealesmith
Copy link
Collaborator

Thanks! :)

You can find the pen=True part of the code in the model file! :) Look in models/resnet.py

I am not sure what your last comment is asking. Importance-weighting part is a different part of the code, where the penultimate features are passed through the self.previous_linear

@miss-rain
Copy link
Author

Thanks ! :)
Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants