About axial attention module gamma #19

SimonServant · 2022-07-25T19:01:23Z

Hello good sir,

i am currently using your architecture as basis for my thesis work and while training the module i wondered about the axial attention module. you utilize a gamma value by setting it this way: self.gamma = nn.Parameter(torch.zeros(1)).

Afterwards you multiply the result of´the axial attention with the gamma value: self.gamma * out and then apply the residual connection of x : out = self.gamma * out + x.

Does this not mean you don't even utilize axial attention in any way by setting the output of axial attention to zero ? and only using the residual output ?

Is this a different version of the code or am i missing something.

I would be very thankful if you could elaborate on this.

SimonServant · 2022-07-25T19:07:54Z

Therefore the increase of performance comes simply from the CFP module. Additionally, when splitting the Training DataSet to 80% Training and 20% validation, which was mentioned in the paper, i was sadly unable to recreate the measured performance (still very good performance, but not the same as in your paper). I tried running for around 200 epochs, before stopping with a batch size of 6 and without using augmentation. On the Kvasir dataset i came to a mean dice value of 0.891, while achieving increased IOU compared to the values you reported in your paper. Could you have a hint on why i am having a hard time to recreate the performance, or is that simply based on the batch size ?

AngeLouCN · 2022-07-25T20:54:03Z

Hi thank you for your interest.

For the gamma value, self.gamma = nn.Parameter(torch.zeros(1)) makes the gamma to be a trainable value and its initial value is z=0. You can also print the gamma to check.

For the performance, I think set batch size to 8 will be helpful to improve it. And also you can download the prepared dataset from the our or PraNet link to make sure we use the same training and testing dataset.

SimonServant · 2022-07-25T21:27:29Z

Thank you so much. I was already panicking. I will try out batch size 8. Thank you a lot. For the dataset i was reading your paper, and it was written, that you randomly split 80% of images from Kvasir and CVC-ClinicDB as training set and the remainder as a testing dataset. I got slightly confused, as the mention of a validation dataset was missing. They way the test function in the code is used, it represents a validation function. Therefore i tried to combine the kvasir test and cvc-clinicDB dataset to see if they represent 20% of (Training + Test_Kvasir + Test_CVC-ClinicDB). As this was not the case, i thought the cause (for the dataset numbers not adding up) was you separating the Training Dataset againt with a train val split of 0.8 to 0.2. Did i read that right or was a test dataset utilized for validation as well as testing later on ?

AngeLouCN · 2022-07-25T21:38:59Z

Hi we directly use the same dataset created from PraNet which have already randomly split. And we follow the code and dataset provided from PraNet which used the test set as validation set.

SimonServant · 2022-07-25T22:22:03Z

Thank you very much !!!

SimonServant closed this as completed Jul 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About axial attention module gamma #19

About axial attention module gamma #19

SimonServant commented Jul 25, 2022

SimonServant commented Jul 25, 2022

AngeLouCN commented Jul 25, 2022

SimonServant commented Jul 25, 2022 •

edited

AngeLouCN commented Jul 25, 2022

SimonServant commented Jul 25, 2022

About axial attention module gamma #19

About axial attention module gamma #19

Comments

SimonServant commented Jul 25, 2022

SimonServant commented Jul 25, 2022

AngeLouCN commented Jul 25, 2022

SimonServant commented Jul 25, 2022 • edited

AngeLouCN commented Jul 25, 2022

SimonServant commented Jul 25, 2022

SimonServant commented Jul 25, 2022 •

edited