-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add sample high/low watermark data to quick start (#62) #145
Add sample high/low watermark data to quick start (#62) #145
Conversation
Codecov Report
@@ Coverage Diff @@
## master #145 +/- ##
=========================================
+ Coverage 83.26% 83.3% +0.04%
=========================================
Files 56 57 +1
Lines 2791 2798 +7
Branches 295 295
=========================================
+ Hits 2324 2331 +7
Misses 378 378
Partials 89 89
Continue to review full report at Codecov.
|
82d82b4
to
d484ba4
Compare
@@ -5,20 +5,21 @@ | |||
RELATION_END_LABEL, RELATION_TYPE, RELATION_REVERSE_TYPE | |||
|
|||
|
|||
class HiveWatermark(Neo4jCsvSerializable): | |||
class Watermark(Neo4jCsvSerializable): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to keep the mode file as it is if possible given the ticket is to add sample data. If we decide to support watermark for other databases other than hive, we could change the model file as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d very much like to see this supporting databases more broadly than just Hive.
If I understand @feng-tao’s concern correctly though renaming will be a breaking change for Lyft because you already have an Airflow DAG referencing the Hive API?
Would introducing the new model/API with a Hive API wrapper simply calling the new API and - possibly - a deprecation note be a feasible way forward?
Side-note: Luckily the Neo4j naming already is without Hive in the names like e.g.: https://github.com/lyft/amundsenmetadatalibrary/blob/master/metadata_service/proxy/neo4j_proxy.py#L160 so there’s no need for a migration path for that end 🎉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@feng-tao, I've used approach proposed by @jornh so now we have both HiveWatermark (with deprecation warning) and Watermark classes. Hope this will be enough for backward compatibility.
To my mind multiple databases support for watermarks is an important feature (with only HiveWatermark I can't even specify partitions for DynamoDB table from the sample data :) ) but if you consider these changes as irrelevant for this PR - I can extract them into a separate follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the current one is good, thanks for making it backward compatible:)
@Mikhail-Ivanov looks like the CLA signing isn’t happy. Probably the same old email mismatch problem. |
d484ba4
to
0a5e470
Compare
0a5e470
to
3d05856
Compare
looks good! thanks @Mikhail-Ivanov @jornh |
Summary of Changes
Sample quickstart data enriched with table watermarks info (oldest & latest partitions).
Also HiveWatermark class redesigned to handle any DB type watermarks.
Tests
No new tests. Existing tests for watermarks have been updated.
Documentation
No new documentation.
Displayed watermark data example: