Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The code in APNet/modeling/baseline.py does not match the description in Figure 2 #4

Closed
Mr-Da-Yang opened this issue Dec 4, 2021 · 5 comments

Comments

@Mr-Da-Yang
Copy link

First of all thank you very much for your outstanding contribution!
In the forword of APNet/modeling/baseline.py, I found that the code is first split into two trainings, and finally after a se, which does not match the description in Figure 2 in the text (first horizontal training as a whole, and then cut into 2 training , Then cut into 4 points for training), which one prevails?

@CHENGY12
Copy link
Owner

CHENGY12 commented Dec 4, 2021

Thanks for the good question.

Both of them are in the form of a hieratical pyramid and can achieve great performance. We chose to first split and then use whole attention as one implementation in the experiments. Please follow the code.

@CHENGY12 CHENGY12 closed this as completed Dec 4, 2021
@Mr-Da-Yang
Copy link
Author

Because when use attention, you will lose a lot of information.
If you use whole attention first, and then split attention, the effect may not be as good as first split and then use whole attention , but I haven’t verified it by experiments.This is my idea, what do you think?

@Mr-Da-Yang
Copy link
Author

Thanks for the good question.

Both of them are in the form of a hieratical pyramid and can achieve great performance. We chose to first split and then use whole attention as one implementation in the experiments. Please follow the code.

@CHENGY12
Copy link
Owner

CHENGY12 commented Dec 4, 2021

I think the attention is not a bad guy which causes the loss of information. Instead, well-trained attention should guide the model to find clues. In my memory, the early experiments may show a similar performance of them in the level 2 APNet (one whole att and one split att).
Besides, I think it is very interesting to explore the order of fine-grained attention and whole attention and corresponding effects, especially with more layers.
Welcome to share your finding with us.

@Mr-Da-Yang
Copy link
Author

Mr-Da-Yang commented Dec 4, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants