Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add options for what to do with missing metadata fields in MetaFieldRanker #7700

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

robpasternak
Copy link
Member

Related Issues

        :param missing_meta:
            What to do with documents that are missing the sorting metadata field.
            Possible values are:
            - 'drop' will drop the documents entirely.
            - 'top' will place the documents at the top of the metadata-sorted list
                (regardless of 'ascending' or 'descending').
            - 'bottom' will place the documents at the bottom of metadata-sorted list
                (regardless of 'ascending' or 'descending').

Proposed Changes:

  • The missing_meta param has three options: "bottom", "top", and "drop".
    • Using "bottom" exhibits the same behavior as was implemented prior to this PR, i.e., documents without the sorting metadata field are put on the bottom of the sorted list.
    • Using "top" puts them at the top instead.
    • Using "drop" drops such documents entirely.
  • Validation was added to ensure that the value of missing_meta is legit.
  • Tests were added for the new functionality.

How did you test it?

Wrote and tried new tests functions in the test directory:

  • test_raises_value_error_if_wrong_missing_meta: Tests validation of missing_meta
  • test_missing_meta_bottom: Tests that missing_meta = "bottom" behaves as desired.
  • test_missing_meta_top: Tests that missing_meta = "top" behaves as desired.
  • test_missing_meta_drop: Tests that missing_meta = "drop" behaves as desired.

Notes for the reviewer

None

Checklist

@github-actions github-actions bot added topic:tests type:documentation Improvements on the docs labels May 15, 2024
@coveralls
Copy link
Collaborator

coveralls commented May 15, 2024

Pull Request Test Coverage Report for Build 9449261101

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • 61 unchanged lines in 4 files lost coverage.
  • Overall coverage increased (+0.09%) to 89.818%

Files with Coverage Reduction New Missed Lines %
components/evaluators/llm_evaluator.py 2 95.5%
evaluation/eval_run_result.py 4 92.19%
core/pipeline/base.py 14 93.73%
core/pipeline/pipeline.py 41 62.58%
Totals Coverage Status
Change from base Build 9387490162: 0.09%
Covered Lines: 6854
Relevant Lines: 7631

💛 - Coveralls

@robpasternak robpasternak marked this pull request as ready for review June 3, 2024 14:57
@robpasternak robpasternak requested review from a team as code owners June 3, 2024 14:57
@robpasternak robpasternak requested review from dfokina and shadeMe and removed request for a team June 3, 2024 14:57
Copy link
Collaborator

@shadeMe shadeMe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Added a few comments.

@@ -43,6 +43,7 @@ def __init__(
top_k: Optional[int] = None,
ranking_mode: Literal["reciprocal_rank_fusion", "linear_score"] = "reciprocal_rank_fusion",
sort_order: Literal["ascending", "descending"] = "descending",
missing_meta: Literal["drop", "top", "bottom"] = "bottom",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I'd want to convert the Literal init parameters to follow the enum pattern seen in other parts of the library (c.f HFGenerationAPIType and HuggingFaceAPIGenerator).

Would you be up to fixing that in a follow-up PR? This would also mean that the validation code gets changed/moved around.

Comment on lines +70 to +76
What to do with documents that are missing the sorting metadata field.
Possible values are:
- 'drop' will drop the documents entirely.
- 'top' will place the documents at the top of the metadata-sorted list
(regardless of 'ascending' or 'descending').
- 'bottom' will place the documents at the bottom of metadata-sorted list
(regardless of 'ascending' or 'descending').
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once we introduce the enum, the bulk of this docstring can be moved to the corresponding docstrings of the former.

haystack/components/rankers/meta_field.py Outdated Show resolved Hide resolved
haystack/components/rankers/meta_field.py Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MetaFieldRanker: allow different options for what to do with missing metadata field
3 participants