Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI added to enable CI of Hamilton dataflows #707

Merged
merged 20 commits into from
Feb 26, 2024
Merged

CLI added to enable CI of Hamilton dataflows #707

merged 20 commits into from
Feb 26, 2024

Conversation

zilto
Copy link
Collaborator

@zilto zilto commented Feb 21, 2024

The CLI is implemented using the Typer library which is based on Click and have similar structure to FastAPI.

For design:

  • hamilton.cli.main is the CLI app. It should only rely on hamilton.cli.commands and typer
  • hamilton.cli.commands isn't concerned with I/O and primarily relies on hamilton.cli.logic
  • hamilton.cli.logic implements the core of the CLI in testable chunks.

The entrypoint is hamilton and was added to setup.py There are currently 4 commands:

  • build: build a dataflow from Python modules
  • view: dr.display_all_functions() from Python modules
  • version: get nodes and dataflow hashes
  • diff: compare version of current dataflow vs git reference Use hamilton --help for help.

How I tested this

  • added some tests to core logic and cli
  • multiple relevant tests are missing

Notes

  • currently bad at logging exceptions to JSON
  • will add an example to examples/
  • could add more docstrings to internal CLI logic
  • write docs page

Checklist

  • PR has an informative and human-readable title (this will be pulled into the release notes)
  • Changes are limited to a single goal (no scope creep)
  • Code passed the pre-commit check & code is left cleaner/nicer than when first encountered.
  • Any change in functionality is tested
  • New functions are documented (with a description, list of inputs, and expected output)
  • Placeholder code is flagged / future TODOs are captured in comments
  • [] Project documentation has been updated if adding/changing functionality.

The CLI is implemented using the Typer library which
is based on Click and have similar structure to FastAPI.

For design:
- hamilton.cli.__main__ is the CLI app. It should
  only rely on hamilton.cli.commands and typer
- hamilton.cli.commands isn't concerned with I/O and
  primarily relies on hamilton.cli.logic
- hamilton.cli.logic implements the core of the CLI
  in testable chunks.

The entrypoint is `hamilton` and was added to setup.py
There are currently 4 commands:
- build: build a dataflow from Python modules
- view: dr.display_all_functions() from Python modules
- version: get nodes and dataflow hashes
- diff: compare version of current dataflow vs git reference
Use `hamilton --help` for help.
Copy link
Collaborator

@elijahbenizzy elijahbenizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is looking pretty clean, a few comments. Let's add a README on this as well, and maybe add to the docs.

hamilton/cli/__main__.py Show resolved Hide resolved
):
"""Build dataflow with MODULES"""
try:
config = dict()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a TODO for the config loading -- good to pass in a json file but not needed now

hamilton/cli/__main__.py Show resolved Hide resolved
hamilton/cli/__main__.py Show resolved Hide resolved
graph = graph_types.HamiltonGraph.from_graph(dr.graph)

nodes_version = dict()
for n in graph.nodes:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this order going to be deterministic? I think dicts are, but might as well sort it as hashing is sensitive to order.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is responsible for creating one hash per node, which should be independent.

Next, hash_dataflow sorts the hashes and creates a single string from the list and encodes it to bytes. Not the most elegant, but it should work.

Did I missing something?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh got it. I'm wondering if there's a good reason to split it out? I imagine you're using both in different places. Makes sense.

tests/cli/test_cli.py Show resolved Hide resolved
tests/cli/test_cli.py Show resolved Hide resolved
@zilto
Copy link
Collaborator Author

zilto commented Feb 21, 2024

Let's add a README on this as well, and maybe add to the docs.

I was able to easily generate docs via their typer utils. Will add manually to API reference in the docs, but it could be part of CI

@skrawcz
Copy link
Collaborator

skrawcz commented Feb 23, 2024

@zilto broken unit tests for 3.11 and 3.12 ?

continue

return dict(v1_only=v1_only, v2_only=v2_only, edit=edit)
return dict(current_only=current_only, reference_only=reference_only, edit=edit)
Copy link
Collaborator

@skrawcz skrawcz Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the only suffix on the keys?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since that's what is output to the command line right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so for clarity. Thinking in terms of set, "current only" is the difference of "current" and "reference".

zilto and others added 2 commits February 24, 2024 15:49
pre-commit run --all-files is what I did here.
@skrawcz
Copy link
Collaborator

skrawcz commented Feb 26, 2024

The docs here should also have a presence in the main docs.

You could include them doing something like I did here - https://raw.githubusercontent.com/DAGWorks-Inc/hamilton/main/docs/how-tos/use-hamilton-for-lineage.md.

Note if you're going to make edits you'll need to pull as I fixed up the pre-commit issues for you.

@zilto zilto merged commit ff65fd0 into main Feb 26, 2024
22 checks passed
@zilto zilto deleted the feat/cli branch February 26, 2024 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants