-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLI added to enable CI of Hamilton dataflows #707
Conversation
The CLI is implemented using the Typer library which is based on Click and have similar structure to FastAPI. For design: - hamilton.cli.__main__ is the CLI app. It should only rely on hamilton.cli.commands and typer - hamilton.cli.commands isn't concerned with I/O and primarily relies on hamilton.cli.logic - hamilton.cli.logic implements the core of the CLI in testable chunks. The entrypoint is `hamilton` and was added to setup.py There are currently 4 commands: - build: build a dataflow from Python modules - view: dr.display_all_functions() from Python modules - version: get nodes and dataflow hashes - diff: compare version of current dataflow vs git reference Use `hamilton --help` for help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code is looking pretty clean, a few comments. Let's add a README
on this as well, and maybe add to the docs.
): | ||
"""Build dataflow with MODULES""" | ||
try: | ||
config = dict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a TODO for the config loading -- good to pass in a json file but not needed now
graph = graph_types.HamiltonGraph.from_graph(dr.graph) | ||
|
||
nodes_version = dict() | ||
for n in graph.nodes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this order going to be deterministic? I think dicts are, but might as well sort it as hashing is sensitive to order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is responsible for creating one hash per node, which should be independent.
Next, hash_dataflow
sorts the hashes and creates a single string from the list and encodes it to bytes. Not the most elegant, but it should work.
Did I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh got it. I'm wondering if there's a good reason to split it out? I imagine you're using both in different places. Makes sense.
I was able to easily generate docs via their typer utils. Will add manually to API reference in the docs, but it could be part of CI |
@zilto broken unit tests for 3.11 and 3.12 ? |
continue | ||
|
||
return dict(v1_only=v1_only, v2_only=v2_only, edit=edit) | ||
return dict(current_only=current_only, reference_only=reference_only, edit=edit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the only
suffix on the keys?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since that's what is output to the command line right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so for clarity. Thinking in terms of set, "current only" is the difference of "current" and "reference".
pre-commit run --all-files is what I did here.
The docs here should also have a presence in the main docs. You could include them doing something like I did here - https://raw.githubusercontent.com/DAGWorks-Inc/hamilton/main/docs/how-tos/use-hamilton-for-lineage.md. Note if you're going to make edits you'll need to pull as I fixed up the pre-commit issues for you. |
The CLI is implemented using the Typer library which is based on Click and have similar structure to FastAPI.
For design:
The entrypoint is
hamilton
and was added to setup.py There are currently 4 commands:hamilton --help
for help.How I tested this
Notes
examples/
Checklist