Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure eval mode for farm and transformer models for predictions #3791

Merged
merged 21 commits into from
Jun 6, 2023

Conversation

sjrl
Copy link
Contributor

@sjrl sjrl commented Dec 30, 2022

Related Issues

Proposed Changes:

I added model.eval() calls to make sure to set the models in:

  • FARMReader
  • TransformerReader
  • EmbeddingRetriever (both with the _RetribertEmbeddingEncoder and _DefaultEmbeddingEncoder)
  • PromptNode
  • TransformersDocumentClassifier
  • TransformersTranslator
  • Text2SparqlRetriever
  • Text2Speech
  • EntityExtractor

are set to eval mode when running an inference prediction. Otherwise, currently, if the model is set to train mode the predictions become random due to layers like dropout, and BatchNormalization being present in the underlying model architecture. This is something we already do for some of our nodes like the DensePassageRetriever.

How did you test it?

I added unit tests that cover:

  • FARMReader and TransformerReader
  • EmbeddingRetriever
  • Text2Speech

to make sure the nodes provide correct results even if they are set to train mode before running a prediction. I did confirm that these unit tests fail without the new changes.

Notes for the reviewer

  • This probably was not noticed before because the model is set to eval mode by default when loading it. However, this error could have occurred when using the nodes right after training. For example, using the predict function of the FARMReader right after training.
  • This was not an issue when using a SentenceTransformer model since the SentenceTransformer.encode function sets the underlying PyTorch model to eval mode within the function call.

Checklist

@sjrl sjrl requested a review from a team as a code owner December 30, 2022 12:56
@sjrl sjrl requested review from vblagoje and removed request for a team December 30, 2022 12:56
@sjrl sjrl added topic:reader topic:retriever topic:predictions format of predictions, score, probability ... labels Dec 30, 2022
@sjrl sjrl changed the title fix: Set Reader and Retriever models to eval mode for predictions fix: Set farm and transformer models to eval mode for predictions Dec 30, 2022
@sjrl sjrl changed the title fix: Set farm and transformer models to eval mode for predictions fix: Ensure eval mode for farm and transformer models for predictions Dec 30, 2022
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sjrl looks good from the outset. Any other pipelines we use in the codebase? Let's not switch those explicitly to eval, as it's already done automatically.

haystack/nodes/prompt/prompt_node.py Outdated Show resolved Hide resolved
@vblagoje
Copy link
Member

vblagoje commented Jan 9, 2023

@sjrl wait wait, I don't agree we call the eval() switch on every model invocation. That's excessive; why do that? Take the example of an encoder, we need to encode millions of documents, and yet we'll call eval on every call. I would respectfully disagree.

@sjrl
Copy link
Contributor Author

sjrl commented Jan 9, 2023

wait wait, I don't agree we call the eval() switch on every model invocation. That's excessive; why do that? Take the example of an encoder, we need to encode millions of documents, and yet we'll call eval on every call. I would respectfully disagree.

Hmm, well considering that Sentence Transformers actually does this for their sentence encoder models already whenever you call the function encode

    def encode(self, sentences: Union[str, List[str]],
               batch_size: int = 32,
               show_progress_bar: bool = None,
               output_value: str = 'sentence_embedding',
               convert_to_numpy: bool = True,
               convert_to_tensor: bool = False,
               device: str = None,
               normalize_embeddings: bool = False) -> Union[List[Tensor], ndarray, Tensor]:
        """
        Computes sentence embeddings
        :param sentences: the sentences to embed
        :param batch_size: the batch size used for the computation
        :param show_progress_bar: Output a progress bar when encode sentences
        :param output_value:  Default sentence_embedding, to get sentence embeddings. Can be set to token_embeddings to get wordpiece token embeddings. Set to None, to get all output values
        :param convert_to_numpy: If true, the output is a list of numpy vectors. Else, it is a list of pytorch tensors.
        :param convert_to_tensor: If true, you get one large tensor as return. Overwrites any setting from convert_to_numpy
        :param device: Which torch.device to use for the computation
        :param normalize_embeddings: If set to true, returned vectors will have length 1. In that case, the faster dot-product (util.dot_score) instead of cosine similarity can be used.
        :return:
           By default, a list of tensors is returned. If convert_to_tensor, a stacked tensor is returned. If convert_to_numpy, a numpy matrix is returned.
        """
        self.eval()
        ...

which we call here in Haystack

emb = self.embedding_model.encode(

we are in fact already calling it every time at model invocation time for our EmbeddingRetriever node whenever we use Sentence Transformers (which we use most often in dC) and has not seemed to impact our timings.

I don't think looping through all layers in a model takes that long, but I can collect some timings on some of the Flan Models to double check.

@vblagoje
Copy link
Member

vblagoje commented Jan 9, 2023

Ok ok @sjrl , I wouldn't want you to waste your time on such tests now. Let's just get a nod from other team members as well. I am convinced by your proofs, but HF also doesn't switch the pipeline to eval on every call. Maybe it's an omission. Let's confirm this change with other team members. cc @mayankjobanputra @julian-risch @bogdankostic

@bogdankostic
Copy link
Contributor

I don’t have much experience with this but I suppose that setting a model into eval mode shouldn’t be a heavy operation as it only affects a small fraction of layers like Dropout for example.
On the other hand, I’m not really sure if this is really needed for every node. I see that it’s useful for FARMReader because there we provide our own train method that sets the model to training mode. For the other nodes where we don’t provide our own train method, we allow users to load a model only using a model identifier or a local path and pass the task of actually loading the model to transformers - where it is set to eval model by default. The users must have therefore set the models to training mode explicitly themselves, and I think we can expect them to set it back to eval mode as well - but that's just my point of view, interested to hear what others think about this.

@vblagoje
Copy link
Member

I echo Bogdan's comments. If we were a train-heavy framework - let's go for it. But we only train a few components! Are you concerned about some nefarious user actions @sjrl ? What motivates this change? Consistency? What else?

@julian-risch
Copy link
Member

It looks to me as if self.train() and self.eval() just set a boolean for the module and all its children ( = neural network layers).
https://github.com/pytorch/pytorch/blob/d24324bf1d2b921c9c631022c9009f4840ee6acb/torch/nn/modules/module.py#L2265
The forward pass of the modules takes into account the value of the boolean, for example, here: https://github.com/pytorch/pytorch/blob/d24324bf1d2b921c9c631022c9009f4840ee6acb/torch/nn/modules/batchnorm.py#L161
So the self.eval() operation itself shouldn't take long. However, I also think that calling self.eval() when a model is loaded should be enough.
In Haystack, we call self.model.train() in the beginning of the training here and thereby set the model into training mode, right? Maybe we could call self.model.eval() at the end of the training before returning the trained model here? Basically ensuring training mode temporarily during training and ensuring eval mode everywhere else because models are loaded in eval mode as Bogdan pointed out.

@bogdankostic
Copy link
Contributor

Maybe we could call self.model.eval() at the end of the training before returning the trained model here? Basically ensuring training mode temporarily during training and ensuring eval mode everywhere else because models are loaded in eval mode as Bogdan pointed out.

I like this idea!

@vblagoje
Copy link
Member

@sjrl what do you think about the ideas ☝️ ?

@sjrl
Copy link
Contributor Author

sjrl commented Jan 16, 2023

Maybe we could call self.model.eval() at the end of the training before returning the trained model here? Basically ensuring training mode temporarily during training and ensuring eval mode everywhere else because models are loaded in eval mode as Bogdan pointed out.

This sounds good to me! I'll go ahead and change the PR to do this instead. I'm fairly busy this week so I might not be able to get to it until next week.

@sjrl
Copy link
Contributor Author

sjrl commented Mar 27, 2023

Hey @vblagoje sorry that this took me so long, but it is finished now! I've made the changes as discussed in this PR and it is ready for another review.

@sjrl sjrl requested a review from vblagoje March 27, 2023 07:54
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM @sjrl , let's resolve this conflict and integrate this one

@coveralls
Copy link
Collaborator

coveralls commented May 8, 2023

Coverage Status

Coverage: 38.92% (-0.002%) from 38.921% when pulling 2002b65 on inf-model-eval into 8228081 on main.

@vblagoje vblagoje merged commit 1777b22 into main Jun 6, 2023
45 checks passed
@vblagoje vblagoje deleted the inf-model-eval branch June 6, 2023 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:modeling topic:predictions format of predictions, score, probability ... topic:reader topic:tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants