refactor: Add a level of record abstraction #380

AndrewCiambrone · 2020-10-13T17:51:21Z

Summary of Changes

RFC: amundsen-io/rfcs#5

Tests

I modified all the model tests to incorporate the new serializable functionality.

Documentation

What documentation did you add or modify and why? Add any relevant links then remove this line

CheckList

Make sure you have checked all steps below to ensure a timely review.

PR title addresses the issue accurately and concisely. Example: "Updates the version of Flask to v1.0.2"
- In case you are adding a dependency, check if the license complies with the ASF 3rd Party License Policy.
PR includes a summary of changes.
PR adds unit tests, updates existing unit tests, OR documents why no test additions or modifications are needed.
In case of new functionality, my PR adds documentation that describes how to use it.
- All the public functions and the classes in the PR contain docstrings that explain what it does
PR passes make test

Signed-off-by: Andrew <andrjc4@vt.edu>

feng-tao · 2020-10-16T04:18:13Z

hey @AndrewCiambrone , thanks for the pr, has been a busy week..Do you think you could fix the conflict and CI? I would like to use the branch to do a quick lyft staging test. thanks a lot!

feng-tao · 2020-10-22T21:24:18Z

ping @AndrewCiambrone , I think this generic is super useful if we would like to support neptune in databuilder. Let us know if you have the chance to update the pr and I would like to do a quick test in our staging as well. cc @allisonsuarez in case I don't have a chance to test it.

AndrewCiambrone · 2020-10-22T22:11:19Z

Hey @feng-tao sorry its been a busy week. I will pull in the latest changes tomorrow.

Signed-off-by: Andrew <andrjc4@vt.edu>

feng-tao · 2020-10-23T04:56:39Z

thanks for the rebase!

feng-tao · 2020-10-23T05:35:56Z

just did a quick staging test with the pr with a few tasks and it runs fine so far :) I will look at the pr in more detail tomorrow and may ask questions as well, thanks for the great work!

feng-tao

I try to go through every line, but overall, I think the change lgtm and it is well written! Once the conflict is fixed, we could merge it ! thanks.

feng-tao · 2020-11-04T06:41:25Z

databuilder/models/table_metadata.py



 class ColumnMetadata:
    COLUMN_NODE_LABEL = 'Column'
    COLUMN_KEY_FORMAT = '{db}://{cluster}.{schema}/{tbl}/{col}'
    COLUMN_NAME = 'name'
    COLUMN_TYPE = 'type'
-    COLUMN_ORDER = 'sort_order{}'.format(UNQUOTED_SUFFIX)  # int value needs to be unquoted when publish to neo4j


hey @AndrewCiambrone , I think we still need to keep this? or i miss the fix somewhere else?

nvm, I see the latter part

feng-tao · 2020-11-04T06:49:28Z

I end up to fix the merge conflict

feng-tao · 2020-11-04T06:53:16Z

@AndrewCiambrone given the change is complete, could you remove the WIP? or let me know if there are any models that haven't done the refactors.

For amundsen-io/rfcs#5, do you plan to implement the Neptune serializer as part of the same RFC?

AndrewCiambrone · 2020-11-04T15:19:25Z

If no new models were added in the last rebase I believe every model is accounted for.

As for the RFC I felt like they were two separate topics so I was planning on writing a second rfc for the Neptune databuilder.

feng-tao · 2020-11-04T16:59:02Z

thanks @AndrewCiambrone !

dorianj · 2020-11-10T17:13:33Z

When I run sample_data_loader.py with this changeset applied, I get this error:

Traceback (most recent call last):
  File "example/scripts/sample_data_loader.py", line 285, in <module>
    'databuilder.models.table_stats.TableColumnStats')
  File "example/scripts/sample_data_loader.py", line 115, in run_csv_job
    publisher=Neo4jCsvPublisher()).launch()
  File "/home/nada/paymob/data_discovery/data-discovery-env/lib/python3.7/site-packages/amundsen_databuilder-4.0.3-py3.7.egg/databuilder/job/job.py", line 77, in launch
  File "/home/nada/paymob/data_discovery/data-discovery-env/lib/python3.7/site-packages/amundsen_databuilder-4.0.3-py3.7.egg/databuilder/job/job.py", line 67, in launch
  File "/home/nada/paymob/data_discovery/data-discovery-env/lib/python3.7/site-packages/amundsen_databuilder-4.0.3-py3.7.egg/databuilder/task/task.py", line 65, in run
  File "/home/nada/paymob/data_discovery/data-discovery-env/lib/python3.7/site-packages/amundsen_databuilder-4.0.3-py3.7.egg/databuilder/loader/file_system_neo4j_csv_loader.py", line 119, in load
  File "/home/nada/anaconda3/lib/python3.7/csv.py", line 155, in writerow
    return self.writer.writerow(self._dict_to_list(rowdict))
  File "/home/nada/anaconda3/lib/python3.7/csv.py", line 151, in _dict_to_list
    + ", ".join([repr(x) for x in wrong_fields]))
ValueError: dict contains fields not in fieldnames: 'stat_val'

I haven't been able to debug it further, any ideas?

AndrewCiambrone added 3 commits July 9, 2020 13:04

move away from dicts and more structured types

d944fc9

Add a abstraction layer between the databuilder records and neo4j

9f82a6d

Signed-off-by: Andrew <andrjc4@vt.edu>

pull in latest

99eb1a9

Signed-off-by: Andrew <andrjc4@vt.edu>

AndrewCiambrone requested review from allisonsuarez, dikshathakur3119, feng-tao and jinhyukchang as code owners October 13, 2020 17:51

AndrewCiambrone changed the title ~~[refactor] Databuilder record abstraction~~ [refactor] Add a level of record abstraction Oct 13, 2020

AndrewCiambrone mentioned this pull request Oct 13, 2020

Data Builder Record Abstraction amundsen-io/rfcs#5

Merged

AndrewCiambrone changed the title ~~[refactor] Add a level of record abstraction~~ refactor: Add a level of record abstraction Oct 13, 2020

AndrewCiambrone changed the title ~~refactor: Add a level of record abstraction~~ refactor: Add a level of record abstraction [WIP] Oct 13, 2020

feng-tao added the keep fresh Disables stalebot from closing an issue label Oct 16, 2020

pull in latest changes

e8976fa

Signed-off-by: Andrew <andrjc4@vt.edu>

feng-tao approved these changes Nov 4, 2020

View reviewed changes

Merge branch 'master' into ajc-databuilder-record-abstraction

4aca971

AndrewCiambrone changed the title ~~refactor: Add a level of record abstraction [WIP]~~ refactor: Add a level of record abstraction Nov 4, 2020

feng-tao merged commit 414e825 into amundsen-io:master Nov 5, 2020

feng-tao mentioned this pull request Nov 7, 2020

Support orm in databuilder amundsen-io/rfcs#10

Merged

dorianj mentioned this pull request Nov 10, 2020

Running sample_data_loader.py raises a ValueError amundsen-io/amundsen#803

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Add a level of record abstraction #380

refactor: Add a level of record abstraction #380

AndrewCiambrone commented Oct 13, 2020

feng-tao commented Oct 16, 2020

feng-tao commented Oct 22, 2020

AndrewCiambrone commented Oct 22, 2020

feng-tao commented Oct 23, 2020

feng-tao commented Oct 23, 2020

feng-tao left a comment

feng-tao Nov 4, 2020

feng-tao Nov 4, 2020

feng-tao commented Nov 4, 2020

feng-tao commented Nov 4, 2020

AndrewCiambrone commented Nov 4, 2020

feng-tao commented Nov 4, 2020

dorianj commented Nov 10, 2020 •

edited

refactor: Add a level of record abstraction #380

refactor: Add a level of record abstraction #380

Conversation

AndrewCiambrone commented Oct 13, 2020

Summary of Changes

Tests

Documentation

CheckList

feng-tao commented Oct 16, 2020

feng-tao commented Oct 22, 2020

AndrewCiambrone commented Oct 22, 2020

feng-tao commented Oct 23, 2020

feng-tao commented Oct 23, 2020

feng-tao left a comment

Choose a reason for hiding this comment

feng-tao Nov 4, 2020

Choose a reason for hiding this comment

feng-tao Nov 4, 2020

Choose a reason for hiding this comment

feng-tao commented Nov 4, 2020

feng-tao commented Nov 4, 2020

AndrewCiambrone commented Nov 4, 2020

feng-tao commented Nov 4, 2020

dorianj commented Nov 10, 2020 • edited

dorianj commented Nov 10, 2020 •

edited