diff --git a/docs/how-tos/_pre-commit/validate_changes.png b/docs/how-tos/_pre-commit/validate_changes.png new file mode 100644 index 000000000..f08f9c71e Binary files /dev/null and b/docs/how-tos/_pre-commit/validate_changes.png differ diff --git a/docs/how-tos/cli-reference.md b/docs/how-tos/cli-reference.md index 73698e290..b140a09a2 100644 --- a/docs/how-tos/cli-reference.md +++ b/docs/how-tos/cli-reference.md @@ -15,8 +15,6 @@ You can verify the installation with hamilton --help ``` -# Reference - ## `hamilton` (global) **Options**: diff --git a/docs/how-tos/index.rst b/docs/how-tos/index.rst index e48b4c2e3..a528f3a5d 100644 --- a/docs/how-tos/index.rst +++ b/docs/how-tos/index.rst @@ -20,5 +20,6 @@ directory. If there's an example you want but don't see, reach out or open an is use-hamilton-for-lineage use-hamilton-for-llm-workflows cli-reference + pre-commit-hooks cache-nodes custom-driver diff --git a/docs/how-tos/pre-commit-hooks.md b/docs/how-tos/pre-commit-hooks.md new file mode 100644 index 000000000..780105285 --- /dev/null +++ b/docs/how-tos/pre-commit-hooks.md @@ -0,0 +1,136 @@ +# Hamilton pre-commit +## Use pre-commit hooks for safer Hamilton code changes + +This page gives an introduction to pre-commit hooks and how to use custom hooks to validate your Hamilton code. + +## What are pre-commit hooks? +A pre-commit hook is a script or command that's executed automatically before making a commit. The goal of these hooks is to standardize code formatting and catch erroneous code before being committed. For example, popular hooks include ensuring files have no syntax errors, sorting imports, and normalizing line breaks. + +Note that it's different from testing, which focuses on the behavior of the code. You can think of pre-commit hooks as checks and formatting you would do everytime you save a file. + +## Add pre-commit hooks to your project +Hooks are a mechanism of the `git` version control system. You can find your project's hooks under the `.git/hooks` directory (it might be hidden by default). There should be many files with the `.sample` extension that serve as example scripts. + +The preferred way of working with pre-commit hooks is through the [pre-commit](https://pre-commit.com/) Python library. This library allows you to import and configure hooks for your repository with a `.pre-commit-config.yaml` file. + +### Steps to get started +1. install the pre-commit library + ```python + pip install pre-commit + ``` + +2. add a `.pre-commit-config.yaml` to your repository + ```yaml + # .pre-commit-config.yaml + repos: + # repository with hook definitions + - repo: https://github.com/pre-commit/pre-commit-hooks + rev: v2.3.0 # release version of the repo + hooks: # list of hooks from the repo to include in this project + - id: end-of-file-fixer + - id: trailing-whitespace + - id: check-yaml + args: ['--unsafe'] # some accept arguments + + # download another repository with hooks + - repo: https://github.com/psf/black + rev: 22.10.0 + hooks: + - id: black + ``` + +3. install the hooks defined in `.pre-commit-config.yaml` + ```console + pre-commit install + ``` + Now, hooks will automatically run on `git commit` + +4. to manually run hooks + ```console + pre-commit run --all-files + ``` + +## Custom Hamilton pre-commit hooks +pre-commit hooks are great developer tools, but off-the-shelf solutions aren't aware of the Hamilton framework. Hence, we developed a pre-commit hook to help you author Hamilton dataflows! Under the hood, they leverage the `hamilton` CLI, so if you are unfamiliar with it, feel free to install it and view the `--help` messages. + +```console +pip install sf-hamilton[cli] +hamilton --help +``` + +### Checking dataflow definition +Hamilton doesn't have many syntactic constraints, but there's a few things we want to catch: +- functions parameters and return are type annotated +- a node consistently has the same type (e.g., a parameter in multiple functions) +- functions with a name starting with underscore (`_`) are ignored from the dataflow +- functions with a `@config` decorator received a trailing double underscore with a suffix (e.g., `hello__weekday()`, `hello__weekend()`) + +Instead of reimplementing this logic, we can try to build the Hamilton Driver with the command `hamilton build MODULES` and catch errors. This also ensures the verification is always in sync with the actual build mechanism. This hook will help prevent us from committing invalid dataflow definitions. + +### Checking dataflow paths +A dataflow definition might be valid, but it might break paths in unexpected ways. The command `hamilton validate` (which internally uses `Driver.validate_execution()`) can check if a node is reachable. + +For example, take a look at `my_module.py`, which contains the nodes `A, B, C`, and the changes between `v1` and `v2`. + +```python +# my_module.py - v1 +def A() -> int: ... + +def B(A: int) -> float: ... + +def C(A: int, B: float) -> None: ... + +# driver code +dr = driver.Builder().with_mdoules(my_module).build() +dr.validate_execution(final_vars=["C"]) # <- success +``` + +```python +# my_module.py - v2 +def B(X: int) -> float: ... + +def C(A: int, B: float) -> None: ... + +# driver code +dr = driver.Builder().with_mdoules(my_module).build() +dr.validate_execution(final_vars=["C"]) # <- failure. missing `A` +``` + +![alt text](_pre-commit/validate_changes.png) + +In `v1`, the dataflow could be validated for `C` without any inputs. Now, a developer made `B` depend on `X` instead of `A` and removed `A`. This change accidentally impacted `C` which now depends on the external input `A`. Note that both `v1` and `v2` have a valid dataflow definition. To catch breaking changes to the path to `C`, we could use `hamilton validate --context context.json my_module.py` with the context: + +```json +// context.json +{ "HAMILTON_FINAL_VARS": ["C"] } +// will call .validate_execution(final_vars["C"]) +``` + +```{note} +pre-commit hooks can prevent commits from breaking a core path, but you should use unit and integration tests for more robust checks. +``` + +## Add Hamilton pre-commit to your project +We alluded to the relationship between pre-commit hooks and the `hamilton` command line tool. In fact, the basic hook is designed to take a list of `hamilton` commands and will execute them in order when hooks are triggered. + +To use them, add this snippet to your `.pre-commit-config.yaml` and adapt it to your project: + +```yaml +- repo: https://github.com/dagworks/hamilton-pre-commit + rev: v0.1.2 # use a ref >= 0.1.2 + hooks: + - id: cli-command + name: Hamilton CLI command + args: [ # list of CLI commands to execute + hamilton build my_module.py, + hamilton build my_module2.py, + hamilton validate --context context.json my_module.py my_module2.py, + ] +``` + +The above snippet would: +- check the dataflow definition of `my_module.py` +- check the dataflow definition of `my_module2.py` +- validate the execution path specified in `context.json` for dataflow composed of `my_module.py` and `my_module2.py` + +You can pass any `hamilton` CLI command to the pre-commit hook, but it will only care about it succeeding or failing.