What means about 'importance-weighted' in paper？ #4

miss-rain · 2021-10-28T09:21:29Z

Nice work, but i have some little question.

What means about 'importance-weighted' in paper?

in code,
### def data_weighting() in default.py,
i find self.dw_k is [1,1,1,1,........1]

### class AlwaysBeDreaming(DeepInversionGenBN) in datafree.py
the loss_kd = self.kd_criterion(logits_KD, logits_KD_past).sum(dim=1) * dw_KD
dw_KD is also [1,1,1,......1]

it's not useful,and i don't understead where means 'importance-weighted' or 'data weighting'?

jamessealesmith · 2021-10-28T15:51:28Z

Thank you for the message :)

What you are referring to is the "data-weighting" vector, which weights the contribution of each data point in the loss function. This is important for class-balancing for some methods, where examples in the coreset are weighted with higher importance.

The "importance-weighted" distillation for our method is done with a frozen copy of the linear layer from the most recent task model. This weights the contribution of each feature in the embedding space. This is done in lines 340 and 341 in datafree.py.

Does this answer your question?

miss-rain · 2021-10-29T02:31:45Z

Thanks for reply,Thanks.

"data-weighting" vector in Cifar100 datasets, the vector is [1,1,1...1], is it mean not weighted for each data point? (in default.py)

"important-weighted" in lines 340 and 341 in datafree.py.
old tast's feature as input to current task' classifier （output is logits_KD_past）, and the most recent task's feature also as input to current task's classifier （output is logits_KD）. Then logit_KD and logit_KD_past use L2 loss.

weights means logits_pen[kd_index] and input[kd_index] in lines 340 and 341 in datafree.py?

Thanks again

jamessealesmith · 2021-10-29T03:25:47Z

Yes, the default [1,1,1...1] simply means no class-balancing weighting. It is put there for consistent implementation wrt the methods which have non-ones in this vector.

The comment about 340 and 341 is nearly correct. logits_KD_past is the old task's features passed through the old task's linear classification head. logits_KD is the new task's features passed through the old task's linear classification head. The intuition here is that we focus on a feature distillation, but we want to penalize changes to features which were most important to the previous task. We can weight the importance of these features by using the linear head!

"self.previous_linear" is the weighting vector. Logits_pen is the penultimate feature representation from the current task's model. self.previous_teacher.generate_scores_pen(x) generates the penultimate feature representation from the previous task's model.

miss-rain · 2021-10-29T05:53:14Z

Great! nice idea!

In lines 338 and 340 in datafree.py

lines 338:
kd_index = np.arange(2 * self.batch_size)
(bach_size=128)
so kd_index is [0,1,2,....255]

lines 340
logits_pen = self.model.forward(x=inputs, pen=True)
so logits_pen is feature [batchSize=128, dimension=64]

i don't understand what logits_pen([kd_index]) means?
Logits_pen is the penultimate feature representation, why need logits_pen([kd_index])?

Thanks again.

jamessealesmith · 2021-10-29T15:12:16Z

Thanks! logits_pen is the penultimate distribution because the pen=True flag returns the penultimate features.

kd_index selects the data which we are using for distillation. For our method, we use all data; for other methods (such as DGR), we would only perform distillation over the synthetic/generated data. This is simply for syntax consistency with other implemented methods in our framework :)

If you only wanted to use real data for distillation, you would set kd_index = np.arange(self.batch_size). If you only wanted to use the synthetic data, you would set kd_index = np.arange(self.batch_size,2*self.batch_size)

miss-rain · 2021-10-29T15:23:44Z

Thanks for your replay!

logits_pen = self.model.forward(x=inputs, pen=True)

i notice parameter pen=Ture before,
**but i don't find how to implement in the code **

in my opinion, the model is composed of feature extractor and classifier
i think penultimate features is feature extractor's output, and last layer is classifier

by the way
If only wanted to use real data for distillation, and set kd_index = np.arange(self.batch_size).
Is it exactly the same as feature distillation (not importance-weighted feature distillation)

jamessealesmith · 2021-10-29T16:03:42Z

Thanks! :)

You can find the pen=True part of the code in the model file! :) Look in models/resnet.py

I am not sure what your last comment is asking. Importance-weighting part is a different part of the code, where the penultimate features are passed through the self.previous_linear

miss-rain · 2021-10-30T06:59:55Z

Thanks ! :)
Great work!

jamessealesmith closed this as completed Nov 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What means about 'importance-weighted' in paper？ #4

What means about 'importance-weighted' in paper？ #4

miss-rain commented Oct 28, 2021

jamessealesmith commented Oct 28, 2021 •

edited

miss-rain commented Oct 29, 2021

jamessealesmith commented Oct 29, 2021

miss-rain commented Oct 29, 2021 •

edited

jamessealesmith commented Oct 29, 2021

miss-rain commented Oct 29, 2021 •

edited

jamessealesmith commented Oct 29, 2021

miss-rain commented Oct 30, 2021

What means about 'importance-weighted' in paper？ #4

What means about 'importance-weighted' in paper？ #4

Comments

miss-rain commented Oct 28, 2021

jamessealesmith commented Oct 28, 2021 • edited

miss-rain commented Oct 29, 2021

jamessealesmith commented Oct 29, 2021

miss-rain commented Oct 29, 2021 • edited

jamessealesmith commented Oct 29, 2021

miss-rain commented Oct 29, 2021 • edited

jamessealesmith commented Oct 29, 2021

miss-rain commented Oct 30, 2021

jamessealesmith commented Oct 28, 2021 •

edited

miss-rain commented Oct 29, 2021 •

edited

miss-rain commented Oct 29, 2021 •

edited