Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

misc fixes #7633

Merged
merged 11 commits into from
Mar 21, 2023
Merged

misc fixes #7633

merged 11 commits into from
Mar 21, 2023

Conversation

david-leifker
Copy link
Collaborator

@david-leifker david-leifker commented Mar 18, 2023

Included: #7628

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX smoke_test Contains changes related to smoke tests labels Mar 18, 2023
@@ -41,6 +41,8 @@ services:
- METADATA_SERVICE_AUTH_ENABLED=false
- JAVA_TOOL_OPTIONS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=*:5001
- BOOTSTRAP_SYSTEM_UPDATE_WAIT_FOR_SYSTEM_UPDATE=false
- SEARCH_SERVICE_ENABLE_CACHE=false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was search caching causing the failure in the test?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not used when running smoke tests. This is run when developing locally (enables debugger for example 2 lines above), not used when running quickstart by default, nor in smoke-tests. I think that cache should be disabled for local development by default because one of the first things I did was to search, then realized I was missing data, loaded data, and the results were cached. This is a quality of life for local development.

smoke-test/test_e2e.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@hsheth2 hsheth2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docker/python changes in this PR lgtm

@@ -39,7 +40,12 @@ public class SearchQueryBuilderTest {
exactMatchConfiguration.setCaseSensitivityFactor(0.7f);
exactMatchConfiguration.setEnableStructured(true);

PartialConfiguration partialConfiguration = new PartialConfiguration();
partialConfiguration.setFactor(0.4f);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactor

# Field weight annotations are typically calibrated for exact match, if partial match is possible on the field use these adjustments
partial:
urnFactor: ${ELASTICSEARCH_QUERY_PARTIAL_URN_FACTOR:0.7} # multiplier on Urn token match, a partial match on Urn > non-Urn is assumed
factor: ${ELASTICSEARCH_QUERY_PARTIAL_FACTOR:0.4} # multiplier on possible non-Urn token match
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonder if we could set this on a per-field basis inside the Searchable annotation...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably useful, but ultimately likely to be replaced by a proper search runtime configuration yaml. The model annotations are fine for index time decisions, we can detect and reindex as needed. However query-time/runtime configuration as model annotations is difficult to override at runtime unless you also are introducing custom models replacing the existing ones ....not even sure that would work. The query-time search configuration via annotations in my opinion are likely to be removed for a dedicated configuration file that is used at runtime. cc: @RyanHolstien @iprentic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this makes sense to me. The model should host static (non-changing) configurations only.

@david-leifker david-leifker marked this pull request as ready for review March 20, 2023 16:30
@david-leifker david-leifker requested review from leifker, RyanHolstien and iprentic and removed request for leifker March 20, 2023 21:30
@vercel
Copy link

vercel bot commented Mar 20, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated
docs-website ✅ Ready (Inspect) Visit Preview 💬 Add your feedback Mar 21, 2023 at 6:53AM (UTC)

@vercel vercel bot temporarily deployed to Preview March 20, 2023 21:53 Inactive
@david-leifker david-leifker enabled auto-merge (squash) March 20, 2023 23:08
@vercel vercel bot temporarily deployed to Preview March 21, 2023 04:33 Inactive
.github/workflows/docker-ingestion.yml Outdated Show resolved Hide resolved
@vercel vercel bot temporarily deployed to Preview March 21, 2023 06:37 Inactive
@vercel vercel bot temporarily deployed to Preview March 21, 2023 06:44 Inactive
@vercel vercel bot temporarily deployed to Preview March 21, 2023 06:53 Inactive
Copy link
Contributor

@shirshanka shirshanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anshbansal anshbansal dismissed their stale review March 21, 2023 14:12

fixed by Shirshanka

@david-leifker david-leifker merged commit 697e8e2 into datahub-project:master Mar 21, 2023
shirshanka pushed a commit to shirshanka/datahub that referenced this pull request Mar 22, 2023
shirshanka pushed a commit to shirshanka/datahub that referenced this pull request Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops PR or Issue related to DataHub backend & deployment ingestion PR or Issue related to the ingestion of metadata product PR or Issue related to the DataHub UI/UX smoke_test Contains changes related to smoke tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants