Fix duplicated replies when HuggingFaceLocalGenerator uses multiple stop words#11414
Closed
18062706139fcz wants to merge 1 commit into
Closed
Conversation
|
Someone is attempting to deploy a commit to the deepset Team on Vercel. A member of the Team first needs to authorize it. |
|
ryker seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account. You have signed the CLA already but the status is still pending? Let us recheck it. |
1 task
Member
|
@18062706139fcz Thank you for opening this pull request. We are closing this PR as duplicate of #11413 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #11409.
HuggingFaceLocalGeneratorcurrently duplicates replies when multiplestop_wordsare configured. This PR updates the stop word post-processing so each stop word is removed sequentially from the existing replies instead of producing a cross-product of replies and stop words.Problem
When
stop_wordscontains more than one entry, the generator returns too many replies after generation.Root Cause
The current implementation uses a nested list comprehension that iterates over both
repliesandself.stop_wordsat the same time:That creates one output entry per
(reply, stop_word)pair, which duplicates replies.Fix
Apply stop words sequentially to the current reply list:
self.stop_wordsrepliesin place for each stop wordTests
Ran:
Result:
Added a regression test that verifies multiple stop words do not duplicate replies.