Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some questions about implementation details #13

Open
SUDA-HLT-ywfang opened this issue Oct 30, 2023 · 3 comments
Open

Some questions about implementation details #13

SUDA-HLT-ywfang opened this issue Oct 30, 2023 · 3 comments

Comments

@SUDA-HLT-ywfang
Copy link

Hi, dejavu is really fascinating! Thanks a lot for releasing the corresponding code.

I have some questions about implementation details.

  1. In section 3.1, how do you get the sparsity in every layer? If there is a threshold, then what is the threshold set to?
  2. When you train sparse predictors, it seems like you only care about the recall of classifiers, instead of precision or f1 value. Why is that?

Thank you! Hope to hear from you!

@AmazeQiu
Copy link

I have the same question.

@XieWeikai
Copy link

I also wonder how 1 is done.

I think it is reasonable that the author only care about recall. The activated neurons contribute most of the activation, while non-activated neurons are less important. So we want to find all activated neurons to ensure the model accuracy. The neurons that are not activated but predicted as activated do not have a negative impact on the results, but the activated neurons predicted as not activated have a significant impact on the results. That's why recall is used.

@MaTwickenham
Copy link

Hi guys, I would like to ask if the term 'activating neurons' in the FFN in the paper refers to a row or column of parameters in a linear layer? For example, if a neural network only has one linear layer (256, 512), with input x as (1, 256), and the output output is (1, 512). Then, for predicting neuron activation, does the MLP predictor need to take x as input and output a (1, 256) or (1, 512) tensor as the activation_mask indicating which row/column of weights are activated? I don't know if I understand it correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants