-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(elasticsearch): advanced query, identity autocomplete, exact match weight #7354
feat(elasticsearch): advanced query, identity autocomplete, exact match weight #7354
Conversation
Need to look at that search test closer. Probably needs addressing. |
Thanks for updating the search guide! We specifically covered the ways to search as part of our on boarding for new users, and it will be very helpful to have examples |
020a3f5
to
b632c4e
Compare
return finalQuery.should().size() > 0 ? Optional.of(finalQuery) : Optional.empty(); | ||
} | ||
|
||
private static QueryBuilder getPhraseQuery(@Nonnull EntitySpec entitySpec, String query) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dead code?
BrowsePath is stored as a complete string, for instance ```/datasets/prod/hive/SampleKafkaDataset```, hence the need for wildcards on both ends of the term to return a result. | ||
If you want to: | ||
|
||
- Exact match on term or phrase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!!
@@ -32,6 +32,9 @@ | |||
|
|||
|
|||
public class SearchQueryBuilder { | |||
|
|||
public static final String STRUCTURED_QUERY_PREFIX = "\\\\/q "; | |||
public static final float EXACT_MATCH_BOOST_FACTOR = 10.0f; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exact match on ANY field?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a multiplier, so the annotation * this factor.
BoolQueryBuilder finalQuery = QueryBuilders.boolQuery(); | ||
if (query.contains("\"")) { | ||
finalQuery.should(QueryBuilders.termQuery("urn", query.replaceAll("\"", "")) | ||
.boost(Float.parseFloat((String) PRIMARY_URN_SEARCH_PROPERTIES.get("boostScore")) * EXACT_MATCH_BOOST_FACTOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait wait so this on occurs if the query contains " " quotations (at least one) and only applies to the urn field?
Is that really what we want? My experience is that exact match on NAME fields is not very good, which may or may not be in the URN at all.
Also, do users really want to have to put quotes in order to do exact match? If I search for a Table with the exact name I expect to see it - quotes or no quotes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The urn field is not part of any of the models' search specs. It is always treated as an addition added everywhere. This would only be used in a quoted urn, exact match. Below is where we handle the other fields.
More important for the non-urn field, the quotes become important when considering synonyms, stemming, and other tokenization steps should not be considered. Fulltext search is not the same as a query where exact matches are implicit, both the field and the search terms are manipulated for getting relevant results over exact match.
finalQuery.should(QueryBuilders.matchPhrasePrefixQuery(fieldSpec.getFieldName() + ".delimited", query) | ||
.boost((float) fieldSpec.getBoostScore()) | ||
.queryName(fieldSpec.getFieldName())); // less than exact | ||
if (query.contains("\"")) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like kinda opaque functionality.
If the user provides quotes for a field, we automatically hit the keyword index?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
.queryName(fieldSpec.getFieldName())); // less than exact | ||
if (query.contains("\"")) { | ||
finalQuery.should(QueryBuilders.termQuery(fieldSpec.getFieldName() + ".keyword", query.replaceAll("\"", "")) | ||
.boost(Float.parseFloat((String) PRIMARY_URN_SEARCH_PROPERTIES.get("boostScore")) * EXACT_MATCH_BOOST_FACTOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit confused, why are we using PRIMARY_URN_SEARCH_PROPERTIES
here?
Also, can we factor out the constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Err, this should. not be based on urn properties. Fixing.
.queryName(fieldSpec.getFieldName())); // less than exact | ||
if (query.contains("\"")) { | ||
finalQuery.should(QueryBuilders.termQuery(fieldSpec.getFieldName() + ".keyword", query.replaceAll("\"", "")) | ||
.boost(Float.parseFloat((String) PRIMARY_URN_SEARCH_PROPERTIES.get("boostScore")) * EXACT_MATCH_BOOST_FACTOR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So again expressing what I mentioned above, boosting cases where there are quotations in the query string I don't think will be sufficient for what users expect. They expect that searching for a name of a Table by exact string will find that table, with or without quotations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the quote conditional
highlightBuilder.field("urn.delimited"); | ||
_defaultQueryFieldNames.stream() | ||
.flatMap(fieldName -> Stream.of(fieldName, fieldName + ".*")).distinct() | ||
.forEach(highlightBuilder::field); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we verify the column name highlighting still works after this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
_defaultQueryFieldNames.forEach(fieldName -> highlightBuilder | ||
.field(fieldName) | ||
.field(fieldName + ".*")); | ||
highlightBuilder.field("urn.delimited"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is removing this safe? Is there history showing why we added this initially?
@@ -288,6 +293,12 @@ private List<MatchedField> extractMatchedFields(@Nonnull Map<String, HighlightFi | |||
highlightedFieldNamesAndValues.get(fieldName.get()).add(fieldValue.string()); | |||
} | |||
} | |||
// fallback matched query, non-analyzed field | |||
for (String queryName : hit.getMatchedQueries()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of this change? I'm not familiar enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For exact match, keyword fields + term queries are not analyzed, therefore they do not return highlighted fields. This is to populate which field is a matching when the field is unanalyzed. Let me see if I can populate the value. The queries I've seen that hit this were only returning the urn, however if the hit does have the matched field, it could be used to fill the value in.
} | ||
|
||
@Test | ||
public void testPrefixVsExact() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
awesome tests!
@@ -20,6 +20,7 @@ record CorpGroupInfo { | |||
@Searchable = { | |||
"fieldType": "TEXT_PARTIAL" | |||
"queryByDefault": true, | |||
"enableAutocomplete": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this previously enabled and now it is removed? Curious about the reasoning for including this? Or did a customer ask for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're saying this was enabled previously and removed? I think it is natural to search for the display name of the group. I added this specifically because a customer was running a search and I'd suggested autocomplete for users/groups. I looked to write a test for this (in this pr in fact) and it didn't work because of missing annotations.
@@ -90,6 +91,7 @@ record CorpUserInfo includes CustomProperties { | |||
@Searchable = { | |||
"fieldType": "TEXT_PARTIAL", | |||
"queryByDefault": true, | |||
"enableAutocomplete": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this removed? I recall we used to have this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't remove it, so makes sense to add it, especially given that I am suggesting enabling autocomplete for users/groups. Also I added tests for autocomplete on users/groups to ensure it works as expected.
@@ -11,7 +11,10 @@ record CorpGroupKey { | |||
* The URL-encoded name of the AD/LDAP group. Serves as a globally unique identifier within DataHub. | |||
*/ | |||
@Searchable = { | |||
"fieldType": "TEXT_PARTIAL" | |||
"fieldType": "TEXT_PARTIAL", | |||
"queryByDefault": true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, thanks!
/q
for backwards compatibilityChecklist