From 326d8005053e3751fae7e85a9587f981b4960734 Mon Sep 17 00:00:00 2001 From: zilto Date: Tue, 16 Apr 2024 15:57:17 -0400 Subject: [PATCH 1/2] docs: "user guide" section cleanup Renamed the section from "How To Guides" to "User Guide". Changed the title of "User Guide" pages to be easier to visually scan. Essentially, this consists of shorter titles with the most important word at the beginning --- docs/how-tos/{use-online.rst => async.rst} | 2 +- docs/how-tos/cache-nodes.rst | 2 +- docs/how-tos/cli-reference.md | 2 +- docs/how-tos/custom-driver.rst | 32 ------------------- docs/how-tos/index.rst | 18 ++++------- ...-for-llm-workflows.md => llm-workflows.md} | 2 +- docs/how-tos/load-data.rst | 12 ------- ...or-training-models.rst => ml-training.rst} | 2 +- docs/how-tos/pre-commit-hooks.md | 2 +- docs/how-tos/run-data-quality-checks.rst | 2 +- docs/how-tos/scale-up.rst | 2 +- docs/how-tos/use-for-feature-engineering.rst | 2 +- docs/how-tos/use-in-jupyter-notebook.md | 2 +- docs/how-tos/use-without-pandas.rst | 14 -------- .../use-with-dbt.rst => integrations/dbt.rst} | 2 +- docs/integrations/index.rst | 10 +++++- 16 files changed, 27 insertions(+), 81 deletions(-) rename docs/how-tos/{use-online.rst => async.rst} (95%) delete mode 100644 docs/how-tos/custom-driver.rst rename docs/how-tos/{use-hamilton-for-llm-workflows.md => llm-workflows.md} (97%) delete mode 100644 docs/how-tos/load-data.rst rename docs/how-tos/{use-for-training-models.rst => ml-training.rst} (96%) delete mode 100644 docs/how-tos/use-without-pandas.rst rename docs/{how-tos/use-with-dbt.rst => integrations/dbt.rst} (96%) diff --git a/docs/how-tos/use-online.rst b/docs/how-tos/async.rst similarity index 95% rename from docs/how-tos/use-online.rst rename to docs/how-tos/async.rst index 177ae616e..2f7dba2d2 100644 --- a/docs/how-tos/use-online.rst +++ b/docs/how-tos/async.rst @@ -1,5 +1,5 @@ ============================== -Run Hamilton in a Microservice +Online transformations ============================== While we've mainly been discussing running Hamilton in a batch environment, it can easily be used diff --git a/docs/how-tos/cache-nodes.rst b/docs/how-tos/cache-nodes.rst index 33738e41d..043754ddb 100644 --- a/docs/how-tos/cache-nodes.rst +++ b/docs/how-tos/cache-nodes.rst @@ -1,5 +1,5 @@ ====================== -Cache Node Computation +Caching results ====================== Sometimes it is convenient to cache intermediate nodes. This is especially useful during development. diff --git a/docs/how-tos/cli-reference.md b/docs/how-tos/cli-reference.md index b140a09a2..342e1bbcb 100644 --- a/docs/how-tos/cli-reference.md +++ b/docs/how-tos/cli-reference.md @@ -1,4 +1,4 @@ -# Hamilton CLI +# Command line interface This page covers the Hamilton CLI. It is built directly from the CLI, but note that the command `hamilton --help` always provide the most accurate documentation. diff --git a/docs/how-tos/custom-driver.rst b/docs/how-tos/custom-driver.rst deleted file mode 100644 index 4fc628897..000000000 --- a/docs/how-tos/custom-driver.rst +++ /dev/null @@ -1,32 +0,0 @@ - -Should I define my own Driver? ------------------------------- - -The APIs that the Hamilton Driver is built on, are considered internal. So it is possible for you to define your own -driver in place of the stock Hamilton driver, we suggest the following path if you don't like how the current Hamilton -Driver interface is designed: - -`Write a "Wrapper" class that delegates to the Hamilton Driver.` - -i.e. - -.. code-block:: python - - from hamilton import driver - - class MyCustomDriver(object): - def __init__(self, constructor_arg, ...): - self.constructor_arg = constructor_arg - ... - # some internal functions specific to your context - # ... - - def my_execute_function(self, arg1, arg2, ...): - """What actually calls the Hamilton""" - dr = driver.Driver(self.constructor_arg, ...) - df = dr.execute(self.outputs) - return self.augmetn(df) - -That way, you can create the right API constructs to invoke Hamilton in your context, and then delegate to the stock -Hamilton Driver. By doing so, it will ensure that your code continues to work, since we intend to honor the Hamilton -Driver APIs with backwards compatibility as much as possible. diff --git a/docs/how-tos/index.rst b/docs/how-tos/index.rst index a528f3a5d..8bd4e6751 100644 --- a/docs/how-tos/index.rst +++ b/docs/how-tos/index.rst @@ -1,5 +1,5 @@ ============== -How To Guides +User Guide ============== This portion of the documentation goes over the set of common examples for Hamilton usage, so you can apply @@ -8,18 +8,14 @@ directory. If there's an example you want but don't see, reach out or open an is .. toctree:: - load-data - use-without-pandas use-in-jupyter-notebook - run-data-quality-checks - scale-up - use-for-training-models - use-with-dbt - use-online use-for-feature-engineering + ml-training + llm-workflows + run-data-quality-checks use-hamilton-for-lineage - use-hamilton-for-llm-workflows + cache-nodes + scale-up + async cli-reference pre-commit-hooks - cache-nodes - custom-driver diff --git a/docs/how-tos/use-hamilton-for-llm-workflows.md b/docs/how-tos/llm-workflows.md similarity index 97% rename from docs/how-tos/use-hamilton-for-llm-workflows.md rename to docs/how-tos/llm-workflows.md index d7a1f29ef..1992903c8 100644 --- a/docs/how-tos/use-hamilton-for-llm-workflows.md +++ b/docs/how-tos/llm-workflows.md @@ -1,4 +1,4 @@ -# How to use Hamilton for LLM Workflows +# LLM workflows Hamilton is great for describing dataflows, and a lot of "actions" you want an "agent" to perform can be described as one, e.g. create an embedding diff --git a/docs/how-tos/load-data.rst b/docs/how-tos/load-data.rst deleted file mode 100644 index 2e22cfc32..000000000 --- a/docs/how-tos/load-data.rst +++ /dev/null @@ -1,12 +0,0 @@ -================== -Load External Data -================== - -While we've been injecting data in from the driver in previous examples, Hamilton functions are fully capable of loading their own data. -In the following example, we'll show how to use Hamilton to: - -1. Load data from an external source (CSV file and duckdb database) -2. Alter the source of data depending on how the DAG is parameterized/created -3. Mock data for a test-setting (so you can quickly execute your DAG without having to wait for data to load) - -See the full tutorial `here `_. diff --git a/docs/how-tos/use-for-training-models.rst b/docs/how-tos/ml-training.rst similarity index 96% rename from docs/how-tos/use-for-training-models.rst rename to docs/how-tos/ml-training.rst index fe4674577..be0289f07 100644 --- a/docs/how-tos/use-for-training-models.rst +++ b/docs/how-tos/ml-training.rst @@ -1,5 +1,5 @@ =============================== -Use Hamilton for Model Training +Model training =============================== As Hamilton is a generic library for representing dataflows in pandas, it can be used for a wide array of tasks. diff --git a/docs/how-tos/pre-commit-hooks.md b/docs/how-tos/pre-commit-hooks.md index 780105285..52b868fa8 100644 --- a/docs/how-tos/pre-commit-hooks.md +++ b/docs/how-tos/pre-commit-hooks.md @@ -1,4 +1,4 @@ -# Hamilton pre-commit +# pre-commit hooks ## Use pre-commit hooks for safer Hamilton code changes This page gives an introduction to pre-commit hooks and how to use custom hooks to validate your Hamilton code. diff --git a/docs/how-tos/run-data-quality-checks.rst b/docs/how-tos/run-data-quality-checks.rst index 536380f7f..52d2a5e21 100644 --- a/docs/how-tos/run-data-quality-checks.rst +++ b/docs/how-tos/run-data-quality-checks.rst @@ -1,5 +1,5 @@ ======================= -Run Data Quality Checks +Data quality ======================= Hamilton comes with data quality included out of the box. diff --git a/docs/how-tos/scale-up.rst b/docs/how-tos/scale-up.rst index ebf26fab8..a7d28df95 100644 --- a/docs/how-tos/scale-up.rst +++ b/docs/how-tos/scale-up.rst @@ -1,5 +1,5 @@ ===================== -Run Hamilton at Scale +Scaling computation ===================== Hamilton enables a variety of tools for allowing you to scale your data processing by integrating with third-party libraries. diff --git a/docs/how-tos/use-for-feature-engineering.rst b/docs/how-tos/use-for-feature-engineering.rst index 9fceadf2c..08ada96b0 100644 --- a/docs/how-tos/use-for-feature-engineering.rst +++ b/docs/how-tos/use-for-feature-engineering.rst @@ -1,5 +1,5 @@ ========================================== -Use Hamilton for Feature Engineering +Feature engineering ========================================== Hamilton's roots are in time-series offline feature engineering. But it can be used for any type of feature engineering: diff --git a/docs/how-tos/use-in-jupyter-notebook.md b/docs/how-tos/use-in-jupyter-notebook.md index dd3b1e062..7c4d23e2d 100644 --- a/docs/how-tos/use-in-jupyter-notebook.md +++ b/docs/how-tos/use-in-jupyter-notebook.md @@ -1,4 +1,4 @@ -# Using Hamilton in a notebook +# Jupyter notebooks There are two main ways to use Hamilton in a notebook. diff --git a/docs/how-tos/use-without-pandas.rst b/docs/how-tos/use-without-pandas.rst deleted file mode 100644 index b00c839c3..000000000 --- a/docs/how-tos/use-without-pandas.rst +++ /dev/null @@ -1,14 +0,0 @@ -=========================== -Use Hamilton without Pandas -=========================== - -As we made clear earlier, Making use of Hamilton does not require that you utilize Pandas. -Not only can hamilton functions output any valid python object, but Hamilton also naturally integrates -with a few dataframe libraries. - -In this example, we rebuild the hello_world example using the `polars `_ library. - -https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/polars. - -Note that we are currently working on other examples, including one for pyspark -(Hamilton already has native `pandas-on-spark `_ support). diff --git a/docs/how-tos/use-with-dbt.rst b/docs/integrations/dbt.rst similarity index 96% rename from docs/how-tos/use-with-dbt.rst rename to docs/integrations/dbt.rst index 7d96b9b39..61abaf917 100644 --- a/docs/how-tos/use-with-dbt.rst +++ b/docs/integrations/dbt.rst @@ -1,5 +1,5 @@ ===================== -Use Hamilton with DBT +dbt ===================== If you're familiar with DBT, you likely noticed that it can fill a similar role to Hamilton. What DBT does for SQL diff --git a/docs/integrations/index.rst b/docs/integrations/index.rst index 831ddaf17..7fb9aaafb 100644 --- a/docs/integrations/index.rst +++ b/docs/integrations/index.rst @@ -9,11 +9,19 @@ This section showcases how Hamilton integrates with popular frameworks. fastapi ibis/index streamlit + dbt Airflow + Amazon Web Services + Burr + Dagster Dask - dbt Feast + Metaflow Pandera + Plotly + Polars Prefect Ray + Slack Spark + Vaex From 7b4b96831f4238cc51e40b8b314c4816d1f58cbb Mon Sep 17 00:00:00 2001 From: zilto Date: Wed, 17 Apr 2024 10:27:02 -0400 Subject: [PATCH 2/2] updated docs --- docs/how-tos/index.rst | 4 ++- docs/how-tos/load-data.rst | 12 ++++++++ docs/how-tos/{async.rst => microservice.rst} | 2 +- docs/how-tos/wrapping-driver.rst | 31 ++++++++++++++++++++ 4 files changed, 47 insertions(+), 2 deletions(-) create mode 100644 docs/how-tos/load-data.rst rename docs/how-tos/{async.rst => microservice.rst} (96%) create mode 100644 docs/how-tos/wrapping-driver.rst diff --git a/docs/how-tos/index.rst b/docs/how-tos/index.rst index 8bd4e6751..56e22f457 100644 --- a/docs/how-tos/index.rst +++ b/docs/how-tos/index.rst @@ -9,6 +9,7 @@ directory. If there's an example you want but don't see, reach out or open an is .. toctree:: use-in-jupyter-notebook + load-data use-for-feature-engineering ml-training llm-workflows @@ -16,6 +17,7 @@ directory. If there's an example you want but don't see, reach out or open an is use-hamilton-for-lineage cache-nodes scale-up - async + microservice + wrapping-driver cli-reference pre-commit-hooks diff --git a/docs/how-tos/load-data.rst b/docs/how-tos/load-data.rst new file mode 100644 index 000000000..1e2fe1351 --- /dev/null +++ b/docs/how-tos/load-data.rst @@ -0,0 +1,12 @@ +================== +Loading data +================== + +While we've been injecting data in from the driver in previous examples, Hamilton functions are fully capable of loading their own data. +In the following example, we'll show how to use Hamilton to: + +1. Load data from an external source (CSV file and duckdb database) +2. Alter the source of data depending on how the DAG is parameterized/created +3. Mock data for a test-setting (so you can quickly execute your DAG without having to wait for data to load) + +See the full tutorial `here `_. diff --git a/docs/how-tos/async.rst b/docs/how-tos/microservice.rst similarity index 96% rename from docs/how-tos/async.rst rename to docs/how-tos/microservice.rst index 2f7dba2d2..bbf90eada 100644 --- a/docs/how-tos/async.rst +++ b/docs/how-tos/microservice.rst @@ -1,5 +1,5 @@ ============================== -Online transformations +Microservice ============================== While we've mainly been discussing running Hamilton in a batch environment, it can easily be used diff --git a/docs/how-tos/wrapping-driver.rst b/docs/how-tos/wrapping-driver.rst new file mode 100644 index 000000000..f63d2459e --- /dev/null +++ b/docs/how-tos/wrapping-driver.rst @@ -0,0 +1,31 @@ +Wrapping the Driver +------------------------------ + +The APIs that the Hamilton Driver is built on, are considered internal. So it is possible for you to define your own +driver in place of the stock Hamilton driver, we suggest the following path if you don't like how the current Hamilton +Driver interface is designed: + +`Write a "Wrapper" class that delegates to the Hamilton Driver.` + +i.e. + +.. code-block:: python + + from hamilton import driver + + class MyCustomDriver(object): + def __init__(self, constructor_arg, ...): + self.constructor_arg = constructor_arg + ... + # some internal functions specific to your context + # ... + + def my_execute_function(self, arg1, arg2, ...): + """What actually calls the Hamilton""" + dr = driver.Driver(self.constructor_arg, ...) + df = dr.execute(self.outputs) + return self.augmetn(df) + +That way, you can create the right API constructs to invoke Hamilton in your context, and then delegate to the stock +Hamilton Driver. By doing so, it will ensure that your code continues to work, since we intend to honor the Hamilton +Driver APIs with backwards compatibility as much as possible.