-
Notifications
You must be signed in to change notification settings - Fork 95
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Plugin: cache hook with
diskcache
(#684)
* diskcache hook + tests added * install via sf-hamilton[diskcache] (requires Python >=3.9) * examples/cache_hook added ```python from hamilton import driver from hamilton.plugins import h_diskcache import functions # get the logger to view cache retrieval info import logging logger = logging.getLogger("hamilton.plugins.h_diskcache") logger.setLevel(logging.DEBUG) # or logging.INFO logger.addHandler(logging.StreamHandler()) # build driver with cache hook dr = ( driver.Builder() .with_modules(functions) .with_adapters(h_diskcache.CacheHook()) .build() ) # use execute or materialize as usual dr.execute(["C"]) ``` --------- Co-authored-by: zilto <tjean@DESKTOP-V6JDCS2>
- Loading branch information
Showing
14 changed files
with
711 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# Cache hook | ||
This hook uses the [diskcache](https://grantjenks.com/docs/diskcache/tutorial.html) to cache node execution on disk. The cache key is a tuple of the function's `(source code, input a, ..., input n)`. | ||
|
||
> 💡 This can be a great tool for developing inside a Jupyter notebook or other interactive environments. | ||
Disk cache has great features to: | ||
- set maximum cache size | ||
- set automated eviction policy once maximum size is reached | ||
- allow custom `Disk` implementations to change the serialization protocol (e.g., pickle, JSON) | ||
|
||
> ⚠ The default `Disk` serializes objects using the `pickle` module. Changing Python or library versions could break your cache (both keys and values). Learn more about [caveats](https://grantjenks.com/docs/diskcache/tutorial.html#caveats). | ||
> ❓ To store artifacts robustly, please use Hamilton materializers or the [CachingGraphAdapter](https://github.com/DAGWorks-Inc/hamilton/tree/main/examples/caching_nodes) instead. The `CachingGraphAdapter` stores tagged nodes directly on the file system using common formats (JSON, CSV, Parquet, etc.). However, it isn't aware of your function version and requires you to manually manage your disk space. | ||
|
||
# How to use it | ||
## Use the hook | ||
Find it under plugins at `hamilton.plugins.h_diskcache` and add it to your Driver definition. | ||
|
||
```python | ||
from hamilton import driver | ||
from hamilton.plugins import h_diskcache | ||
import functions | ||
|
||
dr = ( | ||
driver.Builder() | ||
.with_modules(functions) | ||
.with_adapters(h_diskcache.CacheHook()) | ||
.build() | ||
) | ||
``` | ||
|
||
## Inspect the hook | ||
To inspect the caching behavior in real-time, you can get the logger: | ||
|
||
```python | ||
logger = logging.getLogger("hamilton.plugins.h_diskcache") | ||
logger.setLevel(logging.DEBUG) # or logging.INFO | ||
logger.addHandler(logging.StreamHandler()) | ||
``` | ||
- INFO will only return the total cache after executing the Driver | ||
- DEBUG will return inputs for each node and specify if the value is `from cache` or `executed` | ||
|
||
## Clear cache | ||
The utility function `h_diskcache.evict_except_driver` allows you to clear cached values for all nodes except those in the passed driver. This is an efficient tool to clear old artifacts as your project evolves. | ||
|
||
```python | ||
from hamilton import driver | ||
from hamilton.plugins import h_diskcache | ||
import functions | ||
|
||
dr = ( | ||
driver.Builder() | ||
.with_modules(functions) | ||
.with_adapters(h_diskcache.CacheHook()) | ||
.build() | ||
) | ||
h_diskcache_evict_except_driver(dr) | ||
``` | ||
|
||
## Cache settings | ||
Find all the cache settings in the [diskcache docs](https://grantjenks.com/docs/diskcache/api.html#constants). |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
def A(external: int) -> int: | ||
return external % 7 + 1 | ||
|
||
|
||
def B(A: int) -> float: | ||
return A / 4 | ||
|
||
|
||
def C(A: int, B: float) -> float: | ||
return A**2 + B |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from hamilton import driver\n", | ||
"from hamilton.plugins import h_diskcache\n", | ||
"\n", | ||
"import functions" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"import logging\n", | ||
"\n", | ||
"# get the plugin logger\n", | ||
"logger = logging.getLogger(\"hamilton.plugins.h_diskcache\")\n", | ||
"logger.setLevel(logging.DEBUG) # set logging.INFO for less info\n", | ||
"logger.addHandler(logging.StreamHandler())" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 3, | ||
"metadata": {}, | ||
"outputs": [ | ||
{ | ||
"name": "stderr", | ||
"output_type": "stream", | ||
"text": [ | ||
"A {'external': 10}: from cache\n", | ||
"B {'A': 4}: from cache\n", | ||
"C {'A': 4, 'B': 1.0}: from cache\n", | ||
"Cache size: 0.03 MB\n" | ||
] | ||
} | ||
], | ||
"source": [ | ||
"dr = (\n", | ||
" driver.Builder()\n", | ||
" .with_modules(functions)\n", | ||
" .with_adapters(h_diskcache.CacheHook())\n", | ||
" .build()\n", | ||
")\n", | ||
"# if you ran `run.py`, you should see the nodes being\n", | ||
"# read from cache\n", | ||
"results = dr.execute([\"C\"], inputs=dict(external=10))" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": null, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "venv", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.9" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
sf-hamilton[diskcache] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
import logging | ||
|
||
import functions | ||
|
||
from hamilton import driver | ||
from hamilton.plugins import h_diskcache | ||
|
||
|
||
def main(): | ||
dr = driver.Builder().with_modules(functions).with_adapters(h_diskcache.CacheHook()).build() | ||
results = dr.execute(["C"], inputs=dict(external=10)) | ||
print(results) | ||
|
||
|
||
if __name__ == "__main__": | ||
logger = logging.getLogger("hamilton.plugins.h_diskcache") | ||
logger.setLevel(logging.DEBUG) | ||
logger.addHandler(logging.StreamHandler()) | ||
main() |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.