Merge branch 'develop' into feature/hdf4_subdatasets

creare-com · Aug 21, 2020 · 30e283f · 30e283f
2 parents 7590664 + cdedd85
commit 30e283f
Show file tree

Hide file tree

Showing 32 changed files with 942 additions and 367 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,5 +1,45 @@
 # Changelog
 
+## 2.2.2
+### Bug Fixes
+* Fixed floating point errors on selection of data subset (short circuit optimization to avoid unnecessary interpolation)
+* Fixed bug in cosmos_stations.latlon_from_label giving the wrong latlon for a label
+* Fixing compositor to update interpolation of sources automatically (and deleting cached definitions). 
+    * Also making cached node definitions easier to remove -- no longer caching node.json, node.json_pretty and node.hash
+
+## 2.2.0
+### Introduction
+
+Wrapping Landsat8, Sentinel2, and MODIS data and improving interpolation.
+
+### Features
+* Added `datalib.satutils` which wraps Landsat8 and Sentinel2 data
+* Added `datalib.modis_pds` which wraps MODIS products ["MCD43A4.006", "MOD09GA.006", "MYD09GA.006", "MOD09GQ.006", "MYD09GQ.006"]
+* Added settings['AWS_REQUESTER_PAYS'] and `authentication.S3Mixing.aws_requester_pays` attribute to support Sentinel2 data
+* Added `issubset` method to Coordinates which allows users to test if a coordinate is a subset of another one
+* Added environmental variables in Lambda function deployment allowing users to specify the location of additional 
+dependencies (`FUNCTION_DEPENDENCIES_KEY`) and settings (`SETTINGS`). This was in support the WMS service. 
+* Intake nodes can now filter inputs by additional data columns for .csv files / pandas dataframes by using the pandas
+`query` method. 
+* Added documentation on `Interpolation` and `Wrapping Datasets`
+
+### Bug Fixes
+* Added `dims` attributes to `Compositor` nodes which indicates the dimensions that sources are expected to have. This 
+fixes a bug where `Nodes` throw and error if Coordinates contain extra dimensions when the `Compositor` sources are missing
+those dimensions.
+* `COSMOSStations` will no longer fail for sites with no data or one data point. These sites are now automatically filtered. 
+* Fixed `core.data.file_source` closing files prematurely due to using context managers
+* Fixed heterogenous interpolation (where lat/lon uses a different interpolator than time, for example)
+* `datalib.TerrainTiles` now accesses S3 anonymously by default. Interpolation specified at the compositor level are 
+also now passed down to the sources. 
+
+### Breaking changes
+* Fixed `core.algorithm.signal.py` and in the process removed `SpatialConvolution` and `TemporalConvolutions`. Users now
+have to label the dimensions of the kernel -- which prevents results from being modified if the eval coordinates are 
+transposed. This was a major bug in the `Convolution` node, and the new change obviates the need for the removed Nodes, 
+but it may break some pipelines. 
+
+
 ## 2.1.0
 ### Introduction
 

diff --git a/dist/local_Windows_install/update_podpac.bat b/dist/local_Windows_install/update_podpac.bat
@@ -0,0 +1,24 @@
+@echo off
+call bin\set_local_conda_path.bat
+call bin\fix_hardcoded_absolute_paths.bat
+call bin\activate_podpac_conda_env.bat
+
+cd podpac
+echo "Updating PODPAC"
+git fetch
+for /f %%a in ('git describe --tags --abbrev^=0 origin/master') do git checkout %%a
+cd ..
+echo "Updating PODPAC EXAMPLES"
+cd podpac-examples
+git fetch
+for /f %%a in ('git describe --tags --abbrev^=0 origin/master') do git checkout %%a
+cd ..
+cd podpac
+cd dist
+echo "Updating CONDA ENVIRONMENT"
+conda env update -f windows_conda_environment.yml
+cd ..
+cd ..
+
+
+
diff --git a/dist/windows_conda_environment.json b/dist/windows_conda_environment.json
diff --git a/dist/windows_conda_environment.yml b/dist/windows_conda_environment.yml
@@ -25,5 +25,5 @@ dependencies:
   - jupyterlab
   - ipyleaflet
   - ipympl
-prefix: D:\podpac-1.3.0\miniconda\envs\podpac
+  - sat-search
 
diff --git a/doc/source/deploy-notes.md b/doc/source/deploy-notes.md
@@ -2,6 +2,17 @@
 
 > Note this it not included in built documentation
 
+## Checklist
+* [ ] Update version number
+* [ ] Update changelog
+* [ ] Update windows installation (see below)
+* [ ] Check all of the notebooks using the updated windows installation 
+* [ ] Update the conda environment .yml file (do this by-hand with any new packages in setup.py)
+* [ ] Update the explicit conda environemnt file `conda list --explicit > filename.json`
+* [ ] Update the `podpac_deps.zip` and `podpac_dist.zip` for the lambda function installs
+* [ ] Upload windows install folder to AWS
+* [ ] Make windows install folder public on AWS
+
 ## Uploading to pypi
 Run this command to create the wheel and source code tarball
 ```bash

diff --git a/doc/source/design.rst b/doc/source/design.rst
@@ -17,7 +17,8 @@ Node
 **Nodes** describe the components of your analysis.
 These include data sources, combined data sources (**Compositors**), algorithms, and the assembly of data sources.
 Nodes are assembled into :ref:`design_pipelines`, which can be output to a text file or pushed to the cloud
-with minimal configuration.
+with minimal configuration. **Nodes** are designed to **FAIL ON EVAL**, not fail when instantiated. This is in order to defer
+expensive operations till the user really wants them. 
 
 .. image:: /_static/img/node.png
     :width: 100%

diff --git a/doc/source/index.rst b/doc/source/index.rst
@@ -27,11 +27,11 @@ to enable simple, reproducible geospatial analyses that run locally or in the cl
     soil_moisture = podpac.data.H5PY(source="smap.h5", interpolation="bilinear")
 
     # evaluate soil moisture at the coordinates of the elevation data
-    output = soil_moisture.eval(elevation.native_coordinates)
+    output = soil_moisture.eval(elevation.coordinates)
 
     # run evaluation in the cloud
     aws_node = podpac.managers.aws.Lambda(source=soil_moisture)
-    output = aws_node.eval(elevation.native_coordinates)
+    output = aws_node.eval(elevation.coordinates)
 
 
 
@@ -83,6 +83,7 @@ The purpose of PODPAC is to facilitate:
     coordinates  
     cache
     datasets
+    interpolation
     earthdata
     aws-development
 
@@ -92,6 +93,7 @@ The purpose of PODPAC is to facilitate:
     :caption: References
 
     api
+    wrapping-datasets
 
 .. Anything else clerical
 .. toctree::

diff --git a/doc/source/install.md b/doc/source/install.md
@@ -7,7 +7,7 @@ Select the installation method the best suits your development environment:
 - [pip](#install-with-pip): Recommended for most users
 - [Docker](#docker): For use in containers
 - [Install from source](#install-from-source): For development
-- [Standalone distribution](#standalone-distibution): Includes Python and all dependencies
+- [Windows Standalone distribution](#standalone-windows-distribution): Includes Python and all dependencies
 
 ## Install with pip
 
@@ -82,7 +82,7 @@ $ docker run -i -t podpac
 ```
 
 
-## Standalone Windows Distibution
+## Standalone Windows Distribution
 
 ### Windows 10
 

diff --git a/doc/source/interpolation.md b/doc/source/interpolation.md
@@ -0,0 +1,62 @@
+# Interpolation
+
+## Description
+
+PODPAC allows users to specify various different interpolation schemes for nodes with
+increased granularity, and even lets users write their own interpolators. 
+
+Relevant example notebooks include:
+* [Advanced Interpolation](https://github.com/creare-com/podpac-examples/blob/master/notebooks/4-advanced/interpolation.ipynb)
+* [Basic Interpolation](https://github.com/creare-com/podpac-examples/blob/master/notebooks/2-combining-data/automatic-interpolation-and-regridding.ipynb)
+* [Drought Monitor Data Access Harmonization Processing](https://github.com/creare-com/podpac-examples/blob/master/notebooks/examples/drought-monitor/03-data-access-harmonization-processing.ipynb)
+
+## Examples
+Consider a `DataSource` with `lat`, `lon`, `time` coordinates that we will instantiate as: 
+`node = DataSource(..., interpolation=interpolation)` 
+
+`interpolation` can be specified ...
+
+### ...as a string
+
+`interpolation='nearest'`
+* **Descripition**: All dimensions are interpolated using nearest neighbor interpolation. This is the default, but available options can be found here: `podpac.core.interpolation.interpolation.INTERPOLATION_METHODS` .
+* **Details**: PODPAC will automatically select appropriate interpolators based on the source coordinates and eval coordinates. Default interpolator orders can be found in `podpac.core.interpolation.interpolation.INTERPOLATION_METHODS_DICT` 
+
+### ...as a dictionary
+
+```python
+interpolation = {
+    'method': 'nearest',
+    'params': {    # Optional. Available parameters depend on the particular interpolator
+        'spatial_tolerance': 1.1,
+        'time_tolerance': np.timedelta64(1, 'D')
+    },
+    'interpolators': [ScipyGrid, NearestNeighbor]  # Optional. Available options are in podpac.core.interpolation.interpolation.INTERPOLATORS
+}
+```
+* **Descripition**: All dimensions are interpolated using nearest neighbor interpolation, and the type of interpolators are tried in the order specified. For applicable interpolators, the specified parameters will be used.
+* **Details**: PODPAC loops through the `interpolators` list, checking if the interpolator is able to interpolate between the evaluated and source coordinates. The first capable interpolator available will be used. 
+
+### ...as a list
+
+```python
+interpolation = [
+    {
+        'method': 'bilinear',
+        'dims': ['lat', 'lon']
+    },
+    {
+        'method': 'nearest',
+        'dims': ['time']
+    }
+]
+```
+
+* **Descripition**: The dimensions listed in the `'dims'` list will used the specified method. These dictionaries can also specify the same field shown in the previous section. 
+* **Details**: PODPAC loops through the `interpolation` list, using the settings specified for each dimension independently. 
+
+## Notes and Caveats
+While the API is well developed, all conceivable functionality is not. For example, while we can interpolate gridded data to point data, point data to grid data interpolation is not as well supported, and there may be errors or unexpected results. Advanced users can develop their own interpolators, but this is not currently well-documented. 
+
+**Gotcha**: Parameters for a specific interpolator may silently be ignored if a different interpolator is automatically selected. 
+
diff --git a/doc/source/references.md b/doc/source/references.md
@@ -5,7 +5,7 @@
 > In development
 
 ## Presentations
-
+- Scipy 2020: [Geospatial Analysis in the Cloud Using PODPAC and JupyterLab](https://www.youtube.com/watch?v=BXI6w9BECgs&t=959s)
 - AGU 2019: [Building Web Browser Apps for On-Demand Retrieval and Processing of Cloud-Optimized Earth Science Data using the Open-Source WebESD Toolkit](https://agu.confex.com/agu/fm19/meetingapp.cgi/Paper/505588)
 - AMS 2018: [A RESTful API for Python-Based Server-Side Analysis of High-Resolution Soil Moisture Downscaling Data](https://ams.confex.com/ams/98Annual/webprogram/Paper332957.html)
 

diff --git a/doc/source/wrapping-datasets.md b/doc/source/wrapping-datasets.md
@@ -0,0 +1,29 @@
+# Wrapping Datasets
+
+Wrapping a new dataset is challenging because you have to understand all of the quirks of the new dataset and deal with the quirks of PODPAC as well. This reference is meant to record a few rules of thumb when wrapping new datasets to help you deal with the latter. 
+
+## Rules
+1. When evaluating a node with a set of coordinates:
+   1. The evaluation coordinates must include ALL of the dimensions present in the source dataset
+   1. The evaluation coordinates MAY contain additional dimensions NOT present in the source dataset, and the source may ignore these
+2. When returning data from a data source node:
+   1. The ORDER of the evaluation coordinates MUST be preserved (see `UnitsDataArray.part_transpose`)
+   1. Any multi-channel data must be returned using the `output` dimension which is ALWAYS the LAST dimension
+3. Nodes should be **lightweight** to instantiate and users should expect *fail on eval*. Easy checks should be performed on initialization, but anything expensive should be delayed. 
+
+## Guide
+In theory, to wrap a new `DataSource`:
+1. Create a new class that inherits from `podpac.core.data.DataSource` or a derived class (see the `podpac.core.data` module for generic data readers).
+2. Implement a method for opening/accessing the data, or use an existing generic data node and hard-code certain attributes
+3. Implement the `get_coordinates(self)` method
+4. Implement the `get_data(self, coordinates, coordinates_index)` method
+    1. `coordinates` is a `podpac.Coordinates` object and it's in the same coordinate system as the data source (i.e. a subset of what comes out of `get_coordinates`)
+    2. `coordinates_index` is a list (or tuple?) of slices or boolean arrays or index arrays to indexes into the output of `get_coordinates()` to produce `coordinates` that come into this function. 
+
+In practice, the real trick is implementing a compositor to put multiple tiles together to look like a single `DataSource`. We tend to use the `podpac.compositor.OrderedCompositor` node for this task, but it does not handle interpolation between tiles. Instead, see the `podpac.core.compositor.tile_compositor` module. 
+
+When using compositors, it is prefered the that `sources` attribute is populated at instantiation, but on-the-fly (i.e. at eval) population of sources is also acceptible and sometimes necessary for certain datasources. 
+
+For examples, check the `podpac.datalib` module. 
+
+Happy wrapping!
diff --git a/podpac/algorithm.py b/podpac/algorithm.py
@@ -28,4 +28,4 @@
     YearSubstituteCoordinates,
     TransformTimeUnits,
 )
-from podpac.core.algorithm.signal import Convolution, SpatialConvolution, TimeConvolution
+from podpac.core.algorithm.signal import Convolution