Skip to content

Commit

Permalink
Merge branch 'develop' into feature/hdf4_subdatasets
Browse files Browse the repository at this point in the history
  • Loading branch information
mpu-creare committed Aug 21, 2020
2 parents 7590664 + cdedd85 commit 30e283f
Show file tree
Hide file tree
Showing 32 changed files with 942 additions and 367 deletions.
40 changes: 40 additions & 0 deletions CHANGELOG.md
@@ -1,5 +1,45 @@
# Changelog

## 2.2.2
### Bug Fixes
* Fixed floating point errors on selection of data subset (short circuit optimization to avoid unnecessary interpolation)
* Fixed bug in cosmos_stations.latlon_from_label giving the wrong latlon for a label
* Fixing compositor to update interpolation of sources automatically (and deleting cached definitions).
* Also making cached node definitions easier to remove -- no longer caching node.json, node.json_pretty and node.hash

## 2.2.0
### Introduction

Wrapping Landsat8, Sentinel2, and MODIS data and improving interpolation.

### Features
* Added `datalib.satutils` which wraps Landsat8 and Sentinel2 data
* Added `datalib.modis_pds` which wraps MODIS products ["MCD43A4.006", "MOD09GA.006", "MYD09GA.006", "MOD09GQ.006", "MYD09GQ.006"]
* Added settings['AWS_REQUESTER_PAYS'] and `authentication.S3Mixing.aws_requester_pays` attribute to support Sentinel2 data
* Added `issubset` method to Coordinates which allows users to test if a coordinate is a subset of another one
* Added environmental variables in Lambda function deployment allowing users to specify the location of additional
dependencies (`FUNCTION_DEPENDENCIES_KEY`) and settings (`SETTINGS`). This was in support the WMS service.
* Intake nodes can now filter inputs by additional data columns for .csv files / pandas dataframes by using the pandas
`query` method.
* Added documentation on `Interpolation` and `Wrapping Datasets`

### Bug Fixes
* Added `dims` attributes to `Compositor` nodes which indicates the dimensions that sources are expected to have. This
fixes a bug where `Nodes` throw and error if Coordinates contain extra dimensions when the `Compositor` sources are missing
those dimensions.
* `COSMOSStations` will no longer fail for sites with no data or one data point. These sites are now automatically filtered.
* Fixed `core.data.file_source` closing files prematurely due to using context managers
* Fixed heterogenous interpolation (where lat/lon uses a different interpolator than time, for example)
* `datalib.TerrainTiles` now accesses S3 anonymously by default. Interpolation specified at the compositor level are
also now passed down to the sources.

### Breaking changes
* Fixed `core.algorithm.signal.py` and in the process removed `SpatialConvolution` and `TemporalConvolutions`. Users now
have to label the dimensions of the kernel -- which prevents results from being modified if the eval coordinates are
transposed. This was a major bug in the `Convolution` node, and the new change obviates the need for the removed Nodes,
but it may break some pipelines.


## 2.1.0
### Introduction

Expand Down
24 changes: 24 additions & 0 deletions dist/local_Windows_install/update_podpac.bat
@@ -0,0 +1,24 @@
@echo off
call bin\set_local_conda_path.bat
call bin\fix_hardcoded_absolute_paths.bat
call bin\activate_podpac_conda_env.bat

cd podpac
echo "Updating PODPAC"
git fetch
for /f %%a in ('git describe --tags --abbrev^=0 origin/master') do git checkout %%a
cd ..
echo "Updating PODPAC EXAMPLES"
cd podpac-examples
git fetch
for /f %%a in ('git describe --tags --abbrev^=0 origin/master') do git checkout %%a
cd ..
cd podpac
cd dist
echo "Updating CONDA ENVIRONMENT"
conda env update -f windows_conda_environment.yml
cd ..
cd ..



244 changes: 139 additions & 105 deletions dist/windows_conda_environment.json

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion dist/windows_conda_environment.yml
Expand Up @@ -25,5 +25,5 @@ dependencies:
- jupyterlab
- ipyleaflet
- ipympl
prefix: D:\podpac-1.3.0\miniconda\envs\podpac
- sat-search

11 changes: 11 additions & 0 deletions doc/source/deploy-notes.md
Expand Up @@ -2,6 +2,17 @@

> Note this it not included in built documentation
## Checklist
* [ ] Update version number
* [ ] Update changelog
* [ ] Update windows installation (see below)
* [ ] Check all of the notebooks using the updated windows installation
* [ ] Update the conda environment .yml file (do this by-hand with any new packages in setup.py)
* [ ] Update the explicit conda environemnt file `conda list --explicit > filename.json`
* [ ] Update the `podpac_deps.zip` and `podpac_dist.zip` for the lambda function installs
* [ ] Upload windows install folder to AWS
* [ ] Make windows install folder public on AWS

## Uploading to pypi
Run this command to create the wheel and source code tarball
```bash
Expand Down
3 changes: 2 additions & 1 deletion doc/source/design.rst
Expand Up @@ -17,7 +17,8 @@ Node
**Nodes** describe the components of your analysis.
These include data sources, combined data sources (**Compositors**), algorithms, and the assembly of data sources.
Nodes are assembled into :ref:`design_pipelines`, which can be output to a text file or pushed to the cloud
with minimal configuration.
with minimal configuration. **Nodes** are designed to **FAIL ON EVAL**, not fail when instantiated. This is in order to defer
expensive operations till the user really wants them.

.. image:: /_static/img/node.png
:width: 100%
Expand Down
6 changes: 4 additions & 2 deletions doc/source/index.rst
Expand Up @@ -27,11 +27,11 @@ to enable simple, reproducible geospatial analyses that run locally or in the cl
soil_moisture = podpac.data.H5PY(source="smap.h5", interpolation="bilinear")
# evaluate soil moisture at the coordinates of the elevation data
output = soil_moisture.eval(elevation.native_coordinates)
output = soil_moisture.eval(elevation.coordinates)
# run evaluation in the cloud
aws_node = podpac.managers.aws.Lambda(source=soil_moisture)
output = aws_node.eval(elevation.native_coordinates)
output = aws_node.eval(elevation.coordinates)
Expand Down Expand Up @@ -83,6 +83,7 @@ The purpose of PODPAC is to facilitate:
coordinates
cache
datasets
interpolation
earthdata
aws-development

Expand All @@ -92,6 +93,7 @@ The purpose of PODPAC is to facilitate:
:caption: References

api
wrapping-datasets

.. Anything else clerical
.. toctree::
Expand Down
4 changes: 2 additions & 2 deletions doc/source/install.md
Expand Up @@ -7,7 +7,7 @@ Select the installation method the best suits your development environment:
- [pip](#install-with-pip): Recommended for most users
- [Docker](#docker): For use in containers
- [Install from source](#install-from-source): For development
- [Standalone distribution](#standalone-distibution): Includes Python and all dependencies
- [Windows Standalone distribution](#standalone-windows-distribution): Includes Python and all dependencies

## Install with pip

Expand Down Expand Up @@ -82,7 +82,7 @@ $ docker run -i -t podpac
```


## Standalone Windows Distibution
## Standalone Windows Distribution

### Windows 10

Expand Down
62 changes: 62 additions & 0 deletions doc/source/interpolation.md
@@ -0,0 +1,62 @@
# Interpolation

## Description

PODPAC allows users to specify various different interpolation schemes for nodes with
increased granularity, and even lets users write their own interpolators.

Relevant example notebooks include:
* [Advanced Interpolation](https://github.com/creare-com/podpac-examples/blob/master/notebooks/4-advanced/interpolation.ipynb)
* [Basic Interpolation](https://github.com/creare-com/podpac-examples/blob/master/notebooks/2-combining-data/automatic-interpolation-and-regridding.ipynb)
* [Drought Monitor Data Access Harmonization Processing](https://github.com/creare-com/podpac-examples/blob/master/notebooks/examples/drought-monitor/03-data-access-harmonization-processing.ipynb)

## Examples
Consider a `DataSource` with `lat`, `lon`, `time` coordinates that we will instantiate as:
`node = DataSource(..., interpolation=interpolation)`

`interpolation` can be specified ...

### ...as a string

`interpolation='nearest'`
* **Descripition**: All dimensions are interpolated using nearest neighbor interpolation. This is the default, but available options can be found here: `podpac.core.interpolation.interpolation.INTERPOLATION_METHODS` .
* **Details**: PODPAC will automatically select appropriate interpolators based on the source coordinates and eval coordinates. Default interpolator orders can be found in `podpac.core.interpolation.interpolation.INTERPOLATION_METHODS_DICT`

### ...as a dictionary

```python
interpolation = {
'method': 'nearest',
'params': { # Optional. Available parameters depend on the particular interpolator
'spatial_tolerance': 1.1,
'time_tolerance': np.timedelta64(1, 'D')
},
'interpolators': [ScipyGrid, NearestNeighbor] # Optional. Available options are in podpac.core.interpolation.interpolation.INTERPOLATORS
}
```
* **Descripition**: All dimensions are interpolated using nearest neighbor interpolation, and the type of interpolators are tried in the order specified. For applicable interpolators, the specified parameters will be used.
* **Details**: PODPAC loops through the `interpolators` list, checking if the interpolator is able to interpolate between the evaluated and source coordinates. The first capable interpolator available will be used.

### ...as a list

```python
interpolation = [
{
'method': 'bilinear',
'dims': ['lat', 'lon']
},
{
'method': 'nearest',
'dims': ['time']
}
]
```

* **Descripition**: The dimensions listed in the `'dims'` list will used the specified method. These dictionaries can also specify the same field shown in the previous section.
* **Details**: PODPAC loops through the `interpolation` list, using the settings specified for each dimension independently.

## Notes and Caveats
While the API is well developed, all conceivable functionality is not. For example, while we can interpolate gridded data to point data, point data to grid data interpolation is not as well supported, and there may be errors or unexpected results. Advanced users can develop their own interpolators, but this is not currently well-documented.

**Gotcha**: Parameters for a specific interpolator may silently be ignored if a different interpolator is automatically selected.

2 changes: 1 addition & 1 deletion doc/source/references.md
Expand Up @@ -5,7 +5,7 @@
> In development
## Presentations

- Scipy 2020: [Geospatial Analysis in the Cloud Using PODPAC and JupyterLab](https://www.youtube.com/watch?v=BXI6w9BECgs&t=959s)
- AGU 2019: [Building Web Browser Apps for On-Demand Retrieval and Processing of Cloud-Optimized Earth Science Data using the Open-Source WebESD Toolkit](https://agu.confex.com/agu/fm19/meetingapp.cgi/Paper/505588)
- AMS 2018: [A RESTful API for Python-Based Server-Side Analysis of High-Resolution Soil Moisture Downscaling Data](https://ams.confex.com/ams/98Annual/webprogram/Paper332957.html)

Expand Down
29 changes: 29 additions & 0 deletions doc/source/wrapping-datasets.md
@@ -0,0 +1,29 @@
# Wrapping Datasets

Wrapping a new dataset is challenging because you have to understand all of the quirks of the new dataset and deal with the quirks of PODPAC as well. This reference is meant to record a few rules of thumb when wrapping new datasets to help you deal with the latter.

## Rules
1. When evaluating a node with a set of coordinates:
1. The evaluation coordinates must include ALL of the dimensions present in the source dataset
1. The evaluation coordinates MAY contain additional dimensions NOT present in the source dataset, and the source may ignore these
2. When returning data from a data source node:
1. The ORDER of the evaluation coordinates MUST be preserved (see `UnitsDataArray.part_transpose`)
1. Any multi-channel data must be returned using the `output` dimension which is ALWAYS the LAST dimension
3. Nodes should be **lightweight** to instantiate and users should expect *fail on eval*. Easy checks should be performed on initialization, but anything expensive should be delayed.

## Guide
In theory, to wrap a new `DataSource`:
1. Create a new class that inherits from `podpac.core.data.DataSource` or a derived class (see the `podpac.core.data` module for generic data readers).
2. Implement a method for opening/accessing the data, or use an existing generic data node and hard-code certain attributes
3. Implement the `get_coordinates(self)` method
4. Implement the `get_data(self, coordinates, coordinates_index)` method
1. `coordinates` is a `podpac.Coordinates` object and it's in the same coordinate system as the data source (i.e. a subset of what comes out of `get_coordinates`)
2. `coordinates_index` is a list (or tuple?) of slices or boolean arrays or index arrays to indexes into the output of `get_coordinates()` to produce `coordinates` that come into this function.

In practice, the real trick is implementing a compositor to put multiple tiles together to look like a single `DataSource`. We tend to use the `podpac.compositor.OrderedCompositor` node for this task, but it does not handle interpolation between tiles. Instead, see the `podpac.core.compositor.tile_compositor` module.

When using compositors, it is prefered the that `sources` attribute is populated at instantiation, but on-the-fly (i.e. at eval) population of sources is also acceptible and sometimes necessary for certain datasources.

For examples, check the `podpac.datalib` module.

Happy wrapping!
2 changes: 1 addition & 1 deletion podpac/algorithm.py
Expand Up @@ -28,4 +28,4 @@
YearSubstituteCoordinates,
TransformTimeUnits,
)
from podpac.core.algorithm.signal import Convolution, SpatialConvolution, TimeConvolution
from podpac.core.algorithm.signal import Convolution

0 comments on commit 30e283f

Please sign in to comment.