Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About axial attention module gamma #19

Closed
SimonServant opened this issue Jul 25, 2022 · 5 comments
Closed

About axial attention module gamma #19

SimonServant opened this issue Jul 25, 2022 · 5 comments

Comments

@SimonServant
Copy link

Hello good sir,

i am currently using your architecture as basis for my thesis work and while training the module i wondered about the axial attention module. you utilize a gamma value by setting it this way: self.gamma = nn.Parameter(torch.zeros(1)).

Afterwards you multiply the result of´the axial attention with the gamma value: self.gamma * out and then apply the residual connection of x : out = self.gamma * out + x.

Does this not mean you don't even utilize axial attention in any way by setting the output of axial attention to zero ? and only using the residual output ?

Is this a different version of the code or am i missing something.

I would be very thankful if you could elaborate on this.

@SimonServant
Copy link
Author

Therefore the increase of performance comes simply from the CFP module. Additionally, when splitting the Training DataSet to 80% Training and 20% validation, which was mentioned in the paper, i was sadly unable to recreate the measured performance (still very good performance, but not the same as in your paper). I tried running for around 200 epochs, before stopping with a batch size of 6 and without using augmentation. On the Kvasir dataset i came to a mean dice value of 0.891, while achieving increased IOU compared to the values you reported in your paper. Could you have a hint on why i am having a hard time to recreate the performance, or is that simply based on the batch size ?

@AngeLouCN
Copy link
Owner

Hi thank you for your interest.

For the gamma value, self.gamma = nn.Parameter(torch.zeros(1)) makes the gamma to be a trainable value and its initial value is z=0. You can also print the gamma to check.

For the performance, I think set batch size to 8 will be helpful to improve it. And also you can download the prepared dataset from the our or PraNet link to make sure we use the same training and testing dataset.

@SimonServant
Copy link
Author

SimonServant commented Jul 25, 2022

Thank you so much. I was already panicking. I will try out batch size 8. Thank you a lot. For the dataset i was reading your paper, and it was written, that you randomly split 80% of images from Kvasir and CVC-ClinicDB as training set and the remainder as a testing dataset. I got slightly confused, as the mention of a validation dataset was missing. They way the test function in the code is used, it represents a validation function. Therefore i tried to combine the kvasir test and cvc-clinicDB dataset to see if they represent 20% of (Training + Test_Kvasir + Test_CVC-ClinicDB). As this was not the case, i thought the cause (for the dataset numbers not adding up) was you separating the Training Dataset againt with a train val split of 0.8 to 0.2. Did i read that right or was a test dataset utilized for validation as well as testing later on ?

@AngeLouCN
Copy link
Owner

Hi we directly use the same dataset created from PraNet which have already randomly split. And we follow the code and dataset provided from PraNet which used the test set as validation set.

@SimonServant
Copy link
Author

Thank you very much !!!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants