The code in APNet/modeling/baseline.py does not match the description in Figure 2 #4

Mr-Da-Yang · 2021-12-04T04:18:32Z

First of all thank you very much for your outstanding contribution！
In the forword of APNet/modeling/baseline.py, I found that the code is first split into two trainings, and finally after a se, which does not match the description in Figure 2 in the text (first horizontal training as a whole, and then cut into 2 training , Then cut into 4 points for training), which one prevails?

CHENGY12 · 2021-12-04T05:48:40Z

Thanks for the good question.

Both of them are in the form of a hieratical pyramid and can achieve great performance. We chose to first split and then use whole attention as one implementation in the experiments. Please follow the code.

Mr-Da-Yang · 2021-12-04T05:55:45Z

Because when use attention, you will lose a lot of information.
If you use whole attention first, and then split attention, the effect may not be as good as first split and then use whole attention , but I haven’t verified it by experiments.This is my idea, what do you think？

Mr-Da-Yang · 2021-12-04T06:03:29Z

Thanks for the good question.

Both of them are in the form of a hieratical pyramid and can achieve great performance. We chose to first split and then use whole attention as one implementation in the experiments. Please follow the code.

CHENGY12 · 2021-12-04T06:11:30Z

I think the attention is not a bad guy which causes the loss of information. Instead, well-trained attention should guide the model to find clues. In my memory, the early experiments may show a similar performance of them in the level 2 APNet (one whole att and one split att).
Besides, I think it is very interesting to explore the order of fine-grained attention and whole attention and corresponding effects, especially with more layers.
Welcome to share your finding with us.

Mr-Da-Yang · 2021-12-04T06:15:19Z

thank you!

…

------------------ 原始邮件 ------------------ 发件人: ***@***.***>; 发送时间: 2021年12月4日(星期六) 下午2:11 收件人: ***@***.***>; 抄送: "Get ***@***.***>; ***@***.***>; 主题: Re: [CHENGY12/APNet] The code in APNet/modeling/baseline.py does not match the description in Figure 2 (Issue #4) I think the attention is not a bad guy which causes the loss of information. Instead, well-trained attention should guide the model to find clues. In my memory, the early experiments may show a similar performance of them in the level 2 APNet (one whole att and one split att). Besides, I think it is very interesting to explore the order of fine-grained attention and whole attention and corresponding effects, especially with more layers. Welcome to share your finding with us. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.

CHENGY12 closed this as completed Dec 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The code in APNet/modeling/baseline.py does not match the description in Figure 2 #4

The code in APNet/modeling/baseline.py does not match the description in Figure 2 #4

Mr-Da-Yang commented Dec 4, 2021

CHENGY12 commented Dec 4, 2021

Mr-Da-Yang commented Dec 4, 2021

Mr-Da-Yang commented Dec 4, 2021

CHENGY12 commented Dec 4, 2021

Mr-Da-Yang commented Dec 4, 2021 via email

The code in APNet/modeling/baseline.py does not match the description in Figure 2 #4

The code in APNet/modeling/baseline.py does not match the description in Figure 2 #4

Comments

Mr-Da-Yang commented Dec 4, 2021

CHENGY12 commented Dec 4, 2021

Mr-Da-Yang commented Dec 4, 2021

Mr-Da-Yang commented Dec 4, 2021

CHENGY12 commented Dec 4, 2021

Mr-Da-Yang commented Dec 4, 2021 via email