-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The code in APNet/modeling/baseline.py does not match the description in Figure 2 #4
Comments
Thanks for the good question. Both of them are in the form of a hieratical pyramid and can achieve great performance. We chose to first split and then use whole attention as one implementation in the experiments. Please follow the code. |
Because when use attention, you will lose a lot of information. |
|
I think the attention is not a bad guy which causes the loss of information. Instead, well-trained attention should guide the model to find clues. In my memory, the early experiments may show a similar performance of them in the level 2 APNet (one whole att and one split att). |
thank you!
…------------------ 原始邮件 ------------------
发件人: ***@***.***>;
发送时间: 2021年12月4日(星期六) 下午2:11
收件人: ***@***.***>;
抄送: "Get ***@***.***>; ***@***.***>;
主题: Re: [CHENGY12/APNet] The code in APNet/modeling/baseline.py does not match the description in Figure 2 (Issue #4)
I think the attention is not a bad guy which causes the loss of information. Instead, well-trained attention should guide the model to find clues. In my memory, the early experiments may show a similar performance of them in the level 2 APNet (one whole att and one split att).
Besides, I think it is very interesting to explore the order of fine-grained attention and whole attention and corresponding effects, especially with more layers.
Welcome to share your finding with us.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
Triage notifications on the go with GitHub Mobile for iOS or Android.
|
First of all thank you very much for your outstanding contribution!
In the forword of APNet/modeling/baseline.py, I found that the code is first split into two trainings, and finally after a se, which does not match the description in Figure 2 in the text (first horizontal training as a whole, and then cut into 2 training , Then cut into 4 points for training), which one prevails?
The text was updated successfully, but these errors were encountered: