Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SimCSE dropout parameter #2634

Open
riyajatar37003 opened this issue May 8, 2024 · 2 comments
Open

SimCSE dropout parameter #2634

riyajatar37003 opened this issue May 8, 2024 · 2 comments

Comments

@riyajatar37003
Copy link

i am trying to understand where exactly the dropout is applied to get two representation of same input text in this exampl
https://github.com/UKPLab/sentence-transformers/blob/master/examples/unsupervised_learning/SimCSE/README.md

thanks

@tomaarsen
Copy link
Collaborator

tomaarsen commented May 8, 2024

Hello!

The dropout already exists in the underlying transformers model and is activated when the model is in train mode. For example:

from sentence_transformers import SentenceTransformer, models

# Define your sentence transformer model using CLS pooling
model_name = "distilroberta-base"
transformer = models.Transformer(model_name)
pooling_model = models.Pooling(transformer.get_word_embedding_dimension(), pooling_mode="mean")
model = SentenceTransformer(modules=[transformer, pooling_model])

print(transformer.auto_model)
"""
RobertaModel(
  (embeddings): RobertaEmbeddings(
    (word_embeddings): Embedding(50265, 768, padding_idx=1)
    (position_embeddings): Embedding(514, 768, padding_idx=1)
    (token_type_embeddings): Embedding(1, 768)
    (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
    (dropout): Dropout(p=0.1, inplace=False)
  )
  (encoder): RobertaEncoder(
    (layer): ModuleList(
      (0-5): 6 x RobertaLayer(
        (attention): RobertaAttention(
            (query): Linear(in_features=768, out_features=768, bias=True)
            (key): Linear(in_features=768, out_features=768, bias=True)
            (value): Linear(in_features=768, out_features=768, bias=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
          (output): RobertaSelfOutput(
            (dense): Linear(in_features=768, out_features=768, bias=True)
            (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
            (dropout): Dropout(p=0.1, inplace=False)
          )
        )
        (intermediate): RobertaIntermediate(
          (dense): Linear(in_features=768, out_features=3072, bias=True)
          (intermediate_act_fn): GELUActivation()
        )
        (output): RobertaOutput(
          (dense): Linear(in_features=3072, out_features=768, bias=True)
          (LayerNorm): LayerNorm((768,), eps=1e-05, elementwise_affine=True)
          (dropout): Dropout(p=0.1, inplace=False)
        )
      )
    )
  )
  (pooler): RobertaPooler(
    (dense): Linear(in_features=768, out_features=768, bias=True)
    (activation): Tanh()
  )
)
"""

The model.encode call ensures that the model is in eval mode, while the fit method ensures that it's in train mode. If I remove the self.eval() call in model.encode and then do:

tensor([[0.9942]])

You'll get more drastic changes if you increase the dropout, e.g. by updating the p on all Dropout classes. This is with 0.3:

tensor([[0.9729]])

Note that the SimCSE paper uses 0.1 dropout, which is exactly the default for many transformers models already.

  • Tom Aarsen

@riyajatar37003
Copy link
Author

thank you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants