Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sample high/low watermark data to quick start (#62) #145

Merged

Conversation

Mikhail-Ivanov
Copy link
Contributor

Summary of Changes

Sample quickstart data enriched with table watermarks info (oldest & latest partitions).
Also HiveWatermark class redesigned to handle any DB type watermarks.

Tests

No new tests. Existing tests for watermarks have been updated.

Documentation

No new documentation.

Displayed watermark data example:
image

@codecov-io
Copy link

codecov-io commented Sep 30, 2019

Codecov Report

Merging #145 into master will increase coverage by 0.04%.
The diff coverage is 87.75%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #145      +/-   ##
=========================================
+ Coverage   83.26%   83.3%   +0.04%     
=========================================
  Files          56      57       +1     
  Lines        2791    2798       +7     
  Branches      295     295              
=========================================
+ Hits         2324    2331       +7     
  Misses        378     378              
  Partials       89      89
Impacted Files Coverage Δ
databuilder/models/hive_watermark.py 100% <100%> (+13.95%) ⬆️
databuilder/models/watermark.py 86.36% <86.36%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab9b138...dc74355. Read the comment docs.

@@ -5,20 +5,21 @@
RELATION_END_LABEL, RELATION_TYPE, RELATION_REVERSE_TYPE


class HiveWatermark(Neo4jCsvSerializable):
class Watermark(Neo4jCsvSerializable):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to keep the mode file as it is if possible given the ticket is to add sample data. If we decide to support watermark for other databases other than hive, we could change the model file as well.

Copy link
Contributor

@jornh jornh Sep 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d very much like to see this supporting databases more broadly than just Hive.

If I understand @feng-tao’s concern correctly though renaming will be a breaking change for Lyft because you already have an Airflow DAG referencing the Hive API?

Would introducing the new model/API with a Hive API wrapper simply calling the new API and - possibly - a deprecation note be a feasible way forward?

Side-note: Luckily the Neo4j naming already is without Hive in the names like e.g.: https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/neo4j_proxy.py#L160 so there’s no need for a migration path for that end 🎉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@feng-tao, I've used approach proposed by @jornh so now we have both HiveWatermark (with deprecation warning) and Watermark classes. Hope this will be enough for backward compatibility.

To my mind multiple databases support for watermarks is an important feature (with only HiveWatermark I can't even specify partitions for DynamoDB table from the sample data :) ) but if you consider these changes as irrelevant for this PR - I can extract them into a separate follow-up PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current one is good, thanks for making it backward compatible:)

@jornh
Copy link
Contributor

jornh commented Sep 30, 2019

@Mikhail-Ivanov looks like the CLA signing isn’t happy. Probably the same old email mismatch problem.

@feng-tao
Copy link
Member

feng-tao commented Oct 2, 2019

looks good! thanks @Mikhail-Ivanov @jornh

@feng-tao feng-tao merged commit e64a83b into amundsen-io:master Oct 2, 2019
@Mikhail-Ivanov Mikhail-Ivanov deleted the feature/quickstart_watermarks branch October 7, 2019 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants