Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QUESTION] Batch-normalization in pooling layer #103

Closed
emjotde opened this issue Dec 11, 2022 · 2 comments
Closed

[QUESTION] Batch-normalization in pooling layer #103

emjotde opened this issue Dec 11, 2022 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@emjotde
Copy link

emjotde commented Dec 11, 2022

❓ Question

Hi,
Is the batch-normalization component from the pooling layer from here:

* _layer_norm(tensor, broadcast_mask, num_elements_not_masked)

mentioned in any of the COMET papers? I don't seem to see it in the main COMET citation.

Can you elaborate a bit why this was useful? It seems unusual to see batch-normalization in NLP applications.

Also, it seems this could be fixed to normalize over a single sentence during inference (use dimensions to only average over time dim) to achieve deterministic behavior consistent with batch-size 1. Would there be any downsides to that?

Thanks,
Marcin

@emjotde emjotde added the question Further information is requested label Dec 11, 2022
@ricardorei ricardorei pinned this issue Dec 16, 2022
@ricardorei
Copy link
Collaborator

Hi @emjotde!

This is a good observation. In the paper we only say that we use the Layerwise scalar mix to combine multiple layers. The implementation we used actually comes from this model.

The idea is that the batch normalisation helps generalisation but in experiments for this year shared task we had more time to play with different hyper parameters and we ended up deactivating it. New models from this year do not use it anymore.

Also, your suggestion is good. I'll test it out

@emjotde
Copy link
Author

emjotde commented Dec 17, 2022

Thanks for the answer! Feel free to close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants