Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace 'none' titles with empty string in DPR #663

Merged
merged 3 commits into from
Dec 29, 2020
Merged

Conversation

tholor
Copy link
Member

@tholor tholor commented Dec 24, 2020

DPR preprocessing crashed in cases where embed_title=True but the incoming passages did not have a title.
Now, we replace those titles with an empty string and log a warning...

Copy link
Contributor

@Timoeller Timoeller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code makes FARM DPR more failsafe, so lets merge it.

We will need to refactor TextSimilarityProcessor and move the processing into dataset_from_dicts, as done with the other processors. Lets do this in a new PR

for title, ctx in zip(titles, texts):
if title is None:
title = ""
logger.warning(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning would be triggered for every datapoint if the title is missing.

@Timoeller
Copy link
Contributor

Mhh dpr tests still failing, will need to investigate

@Timoeller Timoeller merged commit 0f10976 into master Dec 29, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants