Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(search): De-duplicate scale factors across entities #8718

Merged
merged 4 commits into from
Aug 28, 2023

Conversation

iprentic
Copy link
Contributor

We have scales, such as downweighting deprecated entities, that are applied on multiple indices. Previously, when we were querying only one index at a time, only one of these would be applied at once, but now that we query multiple indices together, we have no de-duplicating logic. Therefore, the scales are being applied multiple times. This change is to de-duplicate the scale factors.

Tested locally on quickstart (since our test entity registry does not have these set up)
function scores section before the change:

          "functions": [
            {
              "filter": {
                "match_all": {
                  "boost": 1
                }
              },
              "weight": 1
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "active": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 2
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "materialized": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            }
          ]

Functions section after the change:

          "functions": [
            {
              "filter": {
                "match_all": {
                  "boost": 1
                }
              },
              "weight": 1
            },
            {
              "filter": {
                "term": {
                  "deprecated": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            },
            {
              "filter": {
                "term": {
                  "active": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 2
            },
            {
              "filter": {
                "term": {
                  "materialized": {
                    "value": "true",
                    "boost": 1
                  }
                }
              },
              "weight": 0.5
            }
          ]

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the product PR or Issue related to the DataHub UI/UX label Aug 24, 2023
@iprentic
Copy link
Contributor Author

  • Since this PR does affect ranking, I ran the golden tests locally, and they passed. I think the failure here is due to the common CI failures we have been seeing across PRs.

Collectors.mapping(annotation -> annotation,
Collectors.toList())));

for (Map.Entry<String, List<SearchableAnnotation>> annotationEntry : annotations.entrySet()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, can't this just be changed in the lambda that's currently in the flatMap call? As in instead of streaming across all of them just pull the first one and calculate the weight factor? Doing an intermediate collect complicates the readability and is relatively expensive.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To deduplicate equivalent values in the stream you can use distinct()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I can do that! This is how it's done above but I'll rewrite it to do that (and maybe the one above too)

@iprentic iprentic merged commit 2f11f24 into master Aug 28, 2023
43 checks passed
@iprentic iprentic deleted the nd-dedup-function-scores branch August 28, 2023 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
product PR or Issue related to the DataHub UI/UX
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants