DOC: flesh out cache documentation.

* Demonstrate output caching at the top. This is the primary usage of the cache. * Adds Disk Cache section. * Adds S3 Cache section. * Adds Advanced Usage section with caching other objects, cached_property, and cache expiration.
creare-com · May 18, 2021 · d049b75 · d049b75
1 parent ef91654
commit d049b75
Showing 1 changed file with 146 additions and 46 deletions.
diff --git a/doc/source/cache.md b/doc/source/cache.md
@@ -2,71 +2,171 @@
 
 This document describes the caching methodology used in PODPAC, and how to control it. PODPAC uses a central cache shared by all nodes. Retrieval from the cache is based on the node's definition (`node.json`), the coordinates, and a key. 
 
-Each node has a **Cache Control** (`cache_ctrl`) defined by default, and the **Cache Control** may contain multiple **Cache Stores** (.e.g 'disk', 'ram'). A **Cache Store** may also have a specific **Cache Container**. 
+Each node has a **Cache Control** (`cache_ctrl`) defined by default, and the **Cache Control** may contain multiple **Cache Stores** (.e.g 'disk', 'ram').
 
 
 
-## Default Cache
+## Caching Outputs
 
-By default, every node caches their outputs to memory (RAM). These settings can be controlled using `podpac.settings`.
+By default, PODPAC caches evaluated node outputs to memory (RAM). When a node is evaluated with the same coordinates, the output is retrieved from the cache.
 
-**Settings and their Defaults:**
+The following example demonstrates that the output was retrieved from the cache on teh second evaluation:
 
-* DEFAULT_CACHE : list
-    * Defines a default list of cache stores in priority order. Defaults to `['ram']`. Can include ['ram', 'disk', 's3'].
-    * This can be over-written on an individual node by specifying `cache_ctrl` when creating the node. E.g. `node = podpac.Node(cache_ctrl=['disk'])`
-    * Authors of nodes may require certain caches always be available. For example, the `podpac.datalib.smap.SMAPDateFolder` node always requires a 'disk' cache, and will add it. 
-* DISK_CACHE_DIR : str
-    * Subdirectory to use for the disk cache. Defaults to ``'cache'`` in the podpac root directory.
-* S3_CACHE_DIR : str
-    * Subdirectory to use for S3 cache (within the specified S3 bucket). Defaults to ``'cache'``.
-* CACHE_OUTPUT_DEFAULT : bool
-    * Automatically cache node outputs to the default cache store(s). Outputs for nodes with `cache_output=False` will not be cached. Defaults to ``True``.
-* RAM_CACHE_ENABLED: bool
-    * Enable caching to RAM. Note that if disabled, some nodes may fail. Defaults to ``True``.
-* DISK_CACHE_ENABLED: bool
-    * Enable caching to disk. Note that if disabled, some nodes may fail. Defaults to ``True``.
-* S3_CACHE_ENABLED: bool
-    * Enable caching to S3. Note that if disabled, some nodes may fail. Defaults to ``True``.
+```python
+[.] import podpac
+[.] import podpac.datalib
+[.] coords = podpac.Coordinates([podpac.clinspace(40, 39, 16),
+                                 podpac.clinspace(-100, -90, 16),
+                                 '2015-01-01T00', ['lat', 'lon', 'time']])
+[.] smap = podpac.datalib.smap.SMAP()
+[.] o = smap1.eval(coords)
+[.] smap._from_cache
+False
+[.] o = smap1.eval(coords)
+[.] smap._from_cache
+True
+```
 
-## Clearing Cache
-To globally clear cache use:
+Importantly, different instances of the same node share a cache. The following example demonstrates that a different instance of a node will retrieve output from the cache as well:
 
 ```python
-podpac.utils.clear_cache(mode)
+[.] smap2 = podpac.datalib.smap.SMAP()
+[.] o = smap2.eval(coords)
+[.] smap2._from_cache
+True
 ```
-where `mode` can be 'ram', 'disk', or 's3'. This will clean the entire cache store. 
 
-To clear cache for an individual node: 
+### Configure Output Caching
 
-## Examples
+Automatic caching of outputs can be controlled globally and in individual nodes. For example, to globally disable caching outputs:
 
-To globally disable automatic caching of outputs use:
 ```python
-import podpac
 podpac.settings["CACHE_OUTPUT_DEFAULT"] = False
-podpac.settings.save()
 ```
 
-To overwrite this behavior for a particular node (i.e. making sure outputs are cached) use:
+To disable output caching for a particular node:
+
 ```python
-smap = podpac.datalib.smap.SMAP(cache_output=True)
+smap = podpac.datalib.smap.SMAP(cache_output=False)
 ```
 
-Different instances of the same node share a cache. For example:
+## Disk Cache
+
+In addition to caching to memory (RAM), PODPAC provides a disk cache that persists across processes. For example, when the disk cache is used, a script that evaluates a node can be run multiple times and will retrieve node outputs from the disk cache on subsequent runs.
+
+Each node has a `cache_ctrl` that specifies which cache stores to use, in priority order. For example, to use the RAM cache and the disk cache:
+
 ```python
-[.] import podpac
-[.] import podpac.datalib
-[.] coords = podpac.Coordinates([podpac.clinspace(40, 39, 16),
-                                 podpac.clinspace(-100, -90, 16),
-                                 '2015-01-01T00', ['lat', 'lon', 'time']])
-[.] smap1 = podpac.datalib.smap.SMAP()
-[.] o = smap1.eval(coords)
-[.] smap1._from_cache
-False
-[.] del smap1
-[.] smap2 = podpac.datalib.smap.SMAP()
-[.] o = smap2.eval(coords)
-[.] smap2._from_cache
-True
+smap = podpac.datalib.smap.SMAP(cache_ctrl=['ram', 'disk'])
+```
+
+The default cache control can be set globally in the settings:
+
+```python
+podpac.settings["DEFAULT_CACHE"] = ['ram', 'disk']
+```
+
+### Configure Disk Caching
+
+The disk cache directory can be set using the `DISK_CACHE_DIR` setting.
+
+## S3 Cache
+
+PODPAC also provides caching to the cloud using AWS S3. Configure the S3 bucket and cache subdirectory using the `S3_BUCKET_NAME` and `S3_CACHE_DIR` settings.
+
+## Clearing the Cache
+
+To clear the entire cache use:
+
+```python
+podpac.utils.clear_cache()
+```
+
+To clear the cache for a particular node: 
+
+```python
+smap.clear_cache()
+```
+
+You can also clear a particular cache store, for example clear the disk cache leaving the RAM cache in place:
+
+```python
+# node
+smap.clear_cache('disk')
+
+# entire cache
+podpac.utils.clear_cache('disk')
+```
+
+## Cache Limits
+
+PODPAC provides a limit for each cache store in the podpac settings.
+
+```
+RAM_CACHE_MAX_BYTES
+DISK_CACHE_MAX_BYTES
+S3_CACHE_MAX_BYTES
+```
+
+When a cache store is full, new entries are ignored cached.
+
+
+## Advanced Usage
+
+### Caching Other Objects
+
+Nodes can cache other data and objects using a cache key and, optionally, coordinates. The following example caches and retrieves data using the key `my_data`.
+
+```python
+[.] smap.put_cache(10, 'my_data')
+[.] smap.get_cache('my_data')
+10
+```
+
+In general, the node cache can be managed using the `Node.put_cache`, `Node.get_cache`, `Node.has_cache`, and `Node.rem_cache` methods.
+
+
+### Cache Expiration
+
+Cached entries can optionally have an expiration date, after which the entry is considered invalid and automatically removed.
+
+To specify an expiration date
+
+```python
+# specific datetime
+node.put_cache(10, 'my_data', expires='2021-01-01T12:00:00')
+
+# timedelta, in 12 hours
+node.put_cache(10, 'my_data', expires='12,h')
+```
+
+### Cached Node Properties
+
+PODPAC provides a `cached_property` decorator that enhances the builtin `property` decorator.
+
+By default, the `cached_property` stores the value as a private attribute in the object. To use the PODPAC cache so that the property persists across objects or processes according to the node node `cache_ctrl`:
+
+```python
+class MyNode(podpac.Node):
+    @podpac.cached_property(use_cache_ctrl=True)
+    def my_cached_property(self):
+        return 10
 ```
+
+### Updating Existing Entries
+
+By default, a existing cache entries will be overwritten with new data.
+
+```python
+[.] smap.put_cache(10, 'my_data')
+[.] smap.put_cache(20, 'my_data')
+[.] smap.get_cache('my_data')
+20
+```
+
+To prevent overwriting existing cache entries, use `overwrite=False`:
+
+```python
+[.] smap.put_cache(100, 'my_data', overwrite=False)
+podpac.core.node.NodeException: Cached data already exists for key 'my_data' and coordinates None
+```