DOC: Updating/correcting documentation.

creare-com · Apr 23, 2020 · 63ef0bd · 63ef0bd
1 parent 3895592
commit 63ef0bd
Show file tree

Hide file tree

Showing 12 changed files with 120 additions and 294 deletions.
diff --git a/doc/notebooks/pipeline-from-JSON.ipynb b/doc/notebooks/pipeline-from-JSON.ipynb
diff --git a/doc/source/_static/img/node.png b/doc/source/_static/img/node.png
diff --git a/doc/source/aws.md b/doc/source/aws.md
@@ -1,5 +1,22 @@
 # AWS Integration
 
-PODPAC integrates with AWS to enable processing in the cloud.
+PODPAC integrates with AWS to enable processing in the cloud. To process on the cloud you need to:
 
-> This document is under construction. See the [AWS Lambda Tutorial Notebook](https://github.com/creare-com/podpac-examples/blob/master/notebooks/developer/aws-lambda-tutorial.ipynb) for more details.
+1. Obtain and AWS account
+2. Generate and save the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` (see [AWS documentation](https://aws.amazon.com/blogs/security/wheres-my-secret-access-key/))
+3. Build the necessary AWS resources using PODPAC (see the [Setting up AWS Lambda Tutorial Notebook](https://github.com/creare-com/podpac-examples/blob/master/notebooks/4-advanced/aws-lambda.ipynb))
+
+After these steps, nearly any PODPAC processing pipeline can be evaluated using AWS Lambda functions. 
+
+```python
+import podpac
+...
+output = node.eval(coords)  # Local evaluation of node
+cloud_node = podpac.managers.Lambda(source=node)
+cloud_output = cloud_node.eval(coords)
+```
+
+This functionality is documented in the following notebooks:
+* [Running on AWS Lambda Tutorial Notebook](https://github.com/creare-com/podpac-examples/blob/master/notebooks/3-processing/running-on-aws-lambda.ipynb)
+* [Setting up AWS Lambda Tutorial Notebook](https://github.com/creare-com/podpac-examples/blob/master/notebooks/4-advanced/aws-lambda.ipynb)
+* [Budgeting with AWS Lambda Tutorial Notebook](https://github.com/creare-com/podpac-examples/blob/master/notebooks/4-advanced/aws-budget.ipynb)
diff --git a/doc/source/datasets.md b/doc/source/datasets.md
@@ -6,18 +6,18 @@ continue to expand support each release. The following datasets are currently su
 ## SMAP
 
 - **Source**: [NASA Soil Moisture Active Passive (SMAP) Satellites](https://smap.jpl.nasa.gov/data/)
-- **Module**: `podpac.datalib.smap`
+- **Module**: `podpac.datalib.smap`, `podpac.datalib.smap_egi`
 
 Global soil moisture measurements from NASA.
 
 ### Examples
 
-- [Analyzing SMAP Data](https://github.com/creare-com/podpac-examples/blob/master/notebooks/basic_examples/analyzing-SMAP-data.ipynb)
-- [Running SMAP Analysis on AWS Lambda](https://github.com/creare-com/podpac-examples/blob/master/notebooks/basic_examples/running-on-aws-lambda.ipynb)
-- [SMAP Sentinel data access](https://github.com/creare-com/podpac-examples/blob/master/notebooks/demos/SMAP-Sentinel-data-access.ipynb)
-- [SMAP downscaling example application](https://github.com/creare-com/podpac-examples/blob/master/notebooks/demos/SMAP-downscaling-example-application.ipynb)
-- [SMAP level 4 data access](https://github.com/creare-com/podpac-examples/blob/master/notebooks/demos/SMAP-level4-data-access.ipynb)
-- [SMAP data access widget](https://github.com/creare-com/podpac-examples/blob/master/notebooks/demos/SMAP-widget-data-access.ipynb)
+- [Retrieving SMAP Data](https://github.com/creare-com/podpac-examples/blob/master/notebooks/5-datalib/smap/010-retrieving-SMAP-data.ipynb)
+- [Analyzing SMAP Data](https://github.com/creare-com/podpac-examples/blob/master/notebooks/5-datalib/smap/100-analyzing-SMAP-data.ipynb)
+- [Working with SMAP-Sentinel Data](https://github.com/creare-com/podpac-examples/blob/master/notebooks/5-datalib/smap/101-working-with-SMAP-Sentinel-data.ipynb)
+- [SMAP-EGI](https://github.com/creare-com/podpac-examples/blob/master/notebooks/5-datalib/smap/SMAP-EGI.ipynb)
+- [SMAP Data Access Without PODPAC](https://github.com/creare-com/podpac-examples/blob/master/notebooks/5-datalib/smap/SMAP-data-access-without-podpac.ipynb)
+- [SMAP Downscaling Example Application](https://github.com/creare-com/podpac-examples/blob/master/notebooks/5-datalib/smap/SMAP-downscaling-example-application.ipynb)
 
 ## TerrainTiles
 
@@ -28,8 +28,8 @@ Global dataset providing bare-earth terrain heights, tiled for easy usage and pr
 
 ### Examples
 
-- [Terrain Tiles Usage](https://github.com/creare-com/podpac-examples/blob/master/notebooks/demos/Terrain-Tiles.ipynb)
-- [Terrain Tiles Pattern Match](https://github.com/creare-com/podpac-examples/blob/master/notebooks/demos/Terrain-Tiles-Pattern-Match.ipynb)
+- [Terrain Tiles Usage](https://github.com/creare-com/podpac-examples/blob/master/notebooks/5-datalib/terrtain-tiles.ipynb)
+- [Terrain Tiles Pattern Match](https://github.com/creare-com/podpac-examples/blob/master/notebooks/scratch/demos/Terrain-Tiles-Pattern-Match.ipynb)
 
 ## GFS
 
@@ -40,4 +40,4 @@ Weather forecast model produced by the National Centers for Environmental Predic
 
 ### Examples
 
-- [GFS Usage](https://github.com/creare-com/podpac-examples/blob/master/notebooks/demos/gfs.ipynb)
+- [GFS Usage](https://github.com/creare-com/podpac-examples/blob/master/notebooks/5-datalib/gfs.ipynb)
diff --git a/doc/source/dependencies.md b/doc/source/dependencies.md
@@ -12,11 +12,11 @@ If using `pip` to install, the following OS specific dependencies are required t
 
 ### Windows
 
-> No external dependencies necessary
+> No external dependencies necessary, though using Anaconda is recommended.
 
 ### Mac
 
-> No external dependencies necessary
+> No external dependencies necessary, though using Anaconda is recommended.
 
 ### Linux
 
@@ -64,9 +64,11 @@ $ sudo apt-get install build-essential python-dev
     - [rasterio](https://github.com/mapbox/rasterio): read GeoTiff and other raster datasets
     - [lxml](https://github.com/lxml/lxml): read xml and html files
     - [beautifulsoup4](https://www.crummy.com/software/BeautifulSoup/): text parser and screen scraper
+    - [zarr](https://zarr.readthedocs.io/en/stable/): cloud optimized storage format
 - `aws`: AWS integration
     - [awscli](https://github.com/aws/aws-cli): unified command line interface to Amazon Web Services
     - [boto3](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html): Amazon Web Services (AWS) SDK for Python
+    - [s3fs](https://pypi.org/project/s3fs/): Convenient Filesystem interface over S3.
 - `algorithm`: Algorithm development
     - [numexpr](https://github.com/pydata/numexpr): fast numerical expression evaluator for NumPy
 - `notebook`: Jupyter Notebooks

diff --git a/doc/source/design.rst b/doc/source/design.rst
@@ -61,7 +61,8 @@ Pipelines can also be complex, like two data sources being combined into an algo
 .. image:: /_static/img/complex-pipeline.png
     :width: 85%
 
-
+Pipelines are note explicitly implemented, but this functionality is available through `Nodes`.To see the representation of
+a pipeline use `Node.definition`. To create a pipeline from a definition use `Node.from_definition(definition)`. 
 
 Repository Organization
 -----------------------
@@ -70,9 +71,6 @@ The directory structure is as follows:
 
 - ``dist``: Contains installation instructions and environments for various deployments, including cloud deployment on AWS
 - ``doc``: Sphinx based documentation
-    - ``source``: sphinx docs source
-    - ``notebooks``: example jupyter notebooks
-- ``html``: HTML pages used for demonstrations
 - ``podpac``: The PODPAC Python library
     - ``core``: The core PODPAC functionality -- contains general implementation so of classes
     - ``datalib``: Library of Nodes used to access specific data sources -- this is where the SMAP node is implemented (for example)

diff --git a/doc/source/examples.rst b/doc/source/examples.rst
@@ -81,7 +81,7 @@ An ``Array`` Node is a sub-class of ``DataSource`` Node.
 .. code:: python
 
     # create node for data source
-    In [9]: node = podpac.data.Array(source=data, native_coordinates=coords)
+    In [9]: node = podpac.data.Array(source=data, coordinates=coords)
     In [10]: node
     Out[10]:
     Array DataSource
@@ -91,7 +91,7 @@ An ``Array`` Node is a sub-class of ``DataSource`` Node.
       0.11195743 0.58360194 0.15225759 0.99246553 0.31122967 0.80974094
       0.00474486 0.00650152 0.08999056]
         ...]]
-            native_coordinates:
+            coordinates:
                     lat: ArrayCoordinates1d(lat): Bounds[40.0, 50.0], N[11]
                     lon: ArrayCoordinates1d(lon): Bounds[-10.0, 10.0], N[21]
             interpolation: nearest

diff --git a/doc/source/install.md b/doc/source/install.md
@@ -82,7 +82,7 @@ $ docker run -i -t podpac
 ```
 
 
-## Standalone Distibution
+## Standalone Windows Distibution
 
 ### Windows 10
 
@@ -104,6 +104,7 @@ Once the folder is unzipped:
     - This will open up a Windows command prompt, and launch a JupyterLab notebook in your default web browser
     - To close the notebook, close the browser tab, and close the Windows console
 
+To make this standalone distribution, see the [deploy notes](deploy-notes.md).
 
 ## Install from Source
 

diff --git a/doc/source/nodes.md b/doc/source/nodes.md
@@ -2,18 +2,29 @@
 
 This document describes the detailed interfaces for core node types so that a user may know what to expect. It also documents some of the available nodes implemented as part of the core library. 
 
-... tbd ... (for now see the [DeveloperSpec](https://github.com/creare-com/podpac/blob/develop/doc/source/developer/specs/nodes.md))
+In PODPAC, Nodes represent the basic unit of computation. They take inputs, produce outputs, and can represent source data, intermediate results, or final output. The base `Node` class defined a common interface for all PODPAC `Nodes`.
+
+In particular, the base `Node` class implements:
+
+- Caching behaviour of `Node` outputs, and interfaces with the cache system
+- Serialization and deserialization of `Nodes` using our JSON format
+- Saving and loading `Node` outputs
+- Creating `Node` output data structures using the `create_output_array` method.
+- Common interfaces required and used by all subsequent nodes:
+    * `Node.eval(coordinates, output)`
+    * `Node.find_coordinates()`
 
 ## DataSource
 
-DataSource nodes interface with remote geospatial data sources (i.e. raster images, DAP servers, numpy arrays) and define how to retrieve data from these remote sources using PODPAC coordinates. PODPAC defines common generic DataSource nodes (i.e. Array, PyDAP), but advanced users can define their own DataSource nodes by defining the methods to retrieve data (`get_data(coordinates, index)`) and the method to define the `native_coordinates` property (`get_native_coordinates()`).
+DataSource nodes interface with remote geospatial data sources (i.e. raster images, DAP servers, numpy arrays) and define how to retrieve data from these remote sources using PODPAC coordinates. PODPAC defines common generic DataSource nodes (i.e. Array, PyDAP), but advanced users can define their own DataSource nodes by defining the methods to retrieve data (`get_data(coordinates, index)`) and the method to define the `coordinates` property (`get_native_coordinates()`).
 
 Key properties of DataSource nodes include:
 
 - `source`: The location of the source data. Depending on the child node this can be a filepath, numpy array, or server URL).
-- `native_coordinates`: The PODPAC coordinates of the data in `source`
+- `coordinates`: The PODPAC coordinates of the data in `source`
 - `interpolation`: Definition of the interpolation method to use with the data source.
 - `nan_vals`: List of values from source data that should be interpreted as 'no data' or 'nans'.
+- `boundary`: A structure defining the boundary of each data point in the data source (for example to define a point, area, or arbitrary polygon)
 
 To evaluate data at arbitrary PODPAC coordinates, users can input `coordinates` to the eval method of the DataSource node. The DataSource `eval` process consists of the following steps:
 
@@ -31,46 +42,83 @@ The DataSource `interpolation` property defines how to handle interpolation of c
 Definition of the interpolation method on a DataSource node may either be a string:
 
 ```python
-node.interpolation = 'nearest'  # nearest neighbor interpolation
+interpolation = 'nearest'  # nearest neighbor interpolation
 ```
 
 or a dictionary that supplies extra parameters:
 
 ```python
-node.interpolation = {
+interpolation = {
     'method': 'nearest',
     'params': {
         'spatial_tolerance': 1.1
     }
 }
 ```
 
-For the most advanced users, the interpolation definition supports defining different interpolation methods for different dimensions:
+For the most advanced users, the interpolation definition supports defining different interpolation methods for different dimensions (as of 2.0.0 this functionality is not fully implemented):
 
 ```python
-node.interpolation = {
-    ('lat', 'lon'): 'bilinear', 
-    'time': 'nearest'
-}
+interpolation = [
+    {
+        'method': 'bilinear',
+        'dims': ['lat', 'lon']
+    },
+    {
+        'method': 'nearest',
+        'dims': ['time']
+    }
+]
 ```
 
 When a DataSource node is created, the interpolation manager selects a list of applicable `Interpolator` classes to apply to each set of defined dimensions. When a DataSource node is being evaluated, the interpolation manager chooses the first interpolator that is capable of handling the dimensions defined in the requested coordinates and the native coordinates using the `can_interpolate` method. After selecting an interpolator for all sets of dimensions, the manager sequentially interpolates data for each set of dimensions using the `interpolate` method.
 
 ## Compositor
 
-... tbd ...
+`Compositor` `Nodes` are used to combine multiple data files or dataset into a single interface. 
+
+The `BaseCompositor` implements:
+
+- The `find_coordinates` method
+- The `eval` method
+- The `iteroutputs` method used to iterate over all possible input data sources
+- The `select_sources(coordinates`) method to sub-select input data sources BEFORE evaluating them, as an optimization
+- The interface for the `composite(coordinates, data_arrays, result)` method. Child classes implement this method which determines the logic for combining data sources.
+
+Beyond that there is the:
+
+- `OrderedCompositor`
+    - This is meant to composite disparate data sources together that might have different resolutions and coverage
+    - For example, prefer a high resolution elevation model which has missing data, but fill missing values with a coarser elevation datasource
+    - In practice, we use this `Compositor` to provide a single interface for a dataset that is divided into multiple files
+    - Data sources are composited AFTER harmonization.
+- `TileCompositor`
+    - This is meant to composite a data source stored in multiple files into a single interface
+    - For example, consider an elevation data source that covers the globe and is stored in 10K different files that only cover land areas
+    - Data source are composited BEFORE harmonization
 
 ## Algorithm
 
-... tbd ...
+`Algorithm` `Nodes` are the backbone of the pipeline architecture and are used to perform computations on one or many data sources or the user-requested coordinates.
+
+The `BaseAlgorithm`, `Algorithm` (for multiple input nodes) and `UnaryAlgorithm` (for single input nodes) `Nodes` implement the basic functionality:
+
+- The `find_coordinates` method
+- The `Algorithm.eval` method for multiple input `Nodes`
+- The `inputs` property that finds any PODPAC `Node` as part of the class definition
+- The interfaces for the `algorith(inputs)` method which is used to implement the actual algorithm
+
+Based on this basic interface, PODPAC implements algorithms that manipulate coordinates, does signal processing (e.g. convolutions), statistics (e.g. Mean), and completely generic, user-defined algorithms. 
+
+In particular, the `Arithmetic` allows users to specify and `eqn` which allows nearly arbitrary point-wise computations. Also the `Generic` algorithm allows users to specify arbitrary Python code, as long as the `output` variable is set. 
 
 ## Extending Podpac with Custom Nodes
 
 In addition to the core data sources and algorithms, you may need to write your own node to handle unique data sources or additional data processing. You can do this by subclassing a core podpac node and extending it for your needs. The DataSource node in particular is designed to be extended for new sources of data.
 
 ### Example
 
-An example of creating a simple array-based datasource can be found in the [array-data-source](https://github.com/creare-com/podpac/blob/master/doc/notebooks/array-data-source.ipynb) notebook. 
+An example of creating a simple array-based datasource can be found in the [array-data-source](https://github.com/creare-com/podpac-examples/blob/master/notebooks/4-advanced/create-data-source.ipynb) notebook. 
 
 ### Tagging attributes
 
@@ -122,9 +170,9 @@ Individual node definition specify the node class along with its inputs and attr
 
 Additional properties and examples for each of the core node types are provided below.
 
-## DataSource
+### DataSource
 
-### Sample
+#### Sample
 
 ```
 {
@@ -137,13 +185,13 @@ Additional properties and examples for each of the core node types are provided
 }
 ```
 
-## Compositor
+### Compositor
 
-### Additional Properties
+#### Additional Properties
 
  * `sources`: nodes to composite *(list, required)*
 
-### Sample
+#### Sample
 
 ```
 {
@@ -158,12 +206,12 @@ Additional properties and examples for each of the core node types are provided
 }
 ```
 
-## Algorithm
+### Algorithm
 
-### Additional Properties
+#### Additional Properties
  * `inputs`: node inputs to the algorithm. *(object, required)*
 
-### Sample
+#### Sample
 
 ```
 {
@@ -189,7 +237,7 @@ Additional properties and examples for each of the core node types are provided
 }
 ```
 
-## Notes
+### Notes
 
  * The `node` path should include the submodule path and the node class. The submodule path is omitted for top-level classes. For example:
    - `"node": "datalib.smap.SMAP"` is equivalent to `from podpac.datalib.smap import SMAP`.

diff --git a/doc/source/overview.md b/doc/source/overview.md
@@ -87,8 +87,8 @@ import podpac
 nodeA = podpac.data.Rasterio(source="elevation.tif", interpolation="cubic")  
 nodeB = podpac.datalib.TerrainTiles(tile_format='geotiff', zoom=8) 
 
-# take the mean of the two data sources
-alg_node = podpac.algorithm.Arithmetic(A=nodeA, B=nodeB, eqn='(A * B) / 2')
+# average the two data sources together point-wise
+alg_node = podpac.algorithm.Arithmetic(A=nodeA, B=nodeB, eqn='(A + B) / 2')
 ```
 
 Evaluate pipelines at arbitrary PODPAC coordinates.

diff --git a/doc/source/settings.md b/doc/source/settings.md
@@ -30,7 +30,7 @@ The settings are stored in a dictionary format, accessible in the interpreter:
 In [2]: settings
 Out[2]:
 {'DEBUG': False,
- 'ROOT_PATH': 'C:\\Users\\user\\.podpac',
+ 'ROOT_PATH': 'C:\\Users\\user\\.config\\podpac',
  'AUTOSAVE_SETTINGS': False,
  ...
 }
@@ -42,7 +42,7 @@ To view the default settings, view `settings.defaults`:
 In [3]: settings.defaults
 Out[3]:
 {'DEBUG': False,
- 'ROOT_PATH': 'C:\\Users\\user\\.podpac',
+ 'ROOT_PATH': 'C:\\Users\\user\\.config\\podpac',
  'AUTOSAVE_SETTINGS': False,
  ...
 }
@@ -145,7 +145,7 @@ To see the PODPAC root directory, view `settings["ROOT_PATH"]`:
 In [1]: from podpac import settings
 
 In [2]: settings["ROOT_PATH"]
-Out[5]: 'C:\\Users\\user\\.podpac'
+Out[5]: 'C:\\Users\\user\\.config\\podpac'
 ```
 
 Edit the `settings.json` file in the `"ROOT_PATH"` location, then open a new interpreter and load the `podpac.settings` module to see the overwritten values:
@@ -164,19 +164,19 @@ Out[2]:  1000000000.0
 ```
 
 If a `settings.json` files exist in multiple places, PODPAC will load settings in the following order,
-overwriting previously loaded settings in the process:
+overwriting previously loaded settings (lower numbered items) in the process:
 
-* podpac default settings
-* home directory settings (`~/.podpac/settings.json`)
-* current working directory settings (`./settings.json`)
+1. podpac default settings
+2. home directory settings (`~/.config/podpac/settings.json`)
+3. current working directory settings (`./settings.json`)
 
 The attribute `settings.settings_path` shows the path of the last loaded settings file (e.g. the active settings file).
 
 ```python
 In [1]: from podpac import settings
 
 In [2]: settings.settings_path
-Out[2]: 'C:\\Users\\user\\.podpac'
+Out[2]: 'C:\\Users\\user\\.config\\podpac'
 ```
 
 A `settings.json` file can be loaded from outside the search path using the `settings.load()` method: