<center>

## PODPAC: Pipeline for Observational Data Processing, Analysis and Collaboration

<a href="https://podpac.org"><img src='../Images/podpac-logo.png' style='margin-bottom:-60px;margin-top:0px;margin-left:auto;margin-right:auto'/></a>

#### NASA / USFS (Salt Lake City, UT) <br><br> Marc Shapiro (Creare LLC) <br><br> <small>April 30, 2018</small>
</center>

# What is PODPAC and why do you care?

As a geospatial scientist **I want to**: 

<p class='fragment' style='margin-bottom:-20px;'>1. <b>Find</b>: <i>Identify and obtain geospatial data relevant to a scientific problem</i></p>
<p class='fragment' style='margin-bottom:-20px;'>2. <b>Explore</b>: <i>Discover new characteristics or features in a geospatial dataset</i></p>
<p class='fragment' style='margin-bottom:-20px;'>3. <b>Produce</b>: <i>Derive new datasets by processing and integrating geospatial data</i></p>
<p class='fragment' style='margin-bottom:-20px;'>4. <b>Validate</b>: <i>Assess the goodness of a simulation dataset</i></p>

<p class='fragment' style='margin-bottom:-20px;'>5. <b>Answer</b>: <i>Perform analysis/analytics on geospatial data to answer scientific questions</i></p>
<p class='fragment' style='margin-bottom:-20px;'>6. <b>Deploy</b>: <i>Build real-time production applications from analysis for end users </i></p>
<br>
<p class='fragment'>...but <b>I have to</b> deal with large scale data challenges.
</p>
<p></p>
<p class='fragment' style='font-size:12pt;'>Mehrotra, Piyush, et al. "Supporting 'Big Data' Analysis and Analytics at the NASA advanced Supercomputing (NAS) Facility." NASA Ames Research Center, NASA Advanced Supercomputing TR-NAS-2014-02, Moffett Field, CA (2014).</p>

<div><div style='float:left; width:50%;'><img src='../Images/DataVariety.png' style='width:95%;'/></div>
    <div class='fragment' style='float:right; width:50%;'><img src='../Images/DataVolume.png' style='width:95%;'/></div></div>
<div style='clear:both'></div>

# What is PODPAC and why do you care? 

<img src='../Images/PODPAC.png' style='width:85%;margin-left:auto;margin-right:auto' />

## Philosophy

- Leverage open-source tools with momentum by the geospatial community ([NumFocus](https://numfocus.org/sponsored-projects), [Pangeo](https://pangeo.io/))
- Wrap datasets **once**
- Make analysis repeatable and transparent
- Bring the analysis to the data (less downloading)
- Provide a bridge from analysis (python) to application (web)

## Try it Out

- Visit: https://github.com/creare-com/podpac-examples
    - Click "Launch Binder"
    - Navigate to: *notebooks* -> *basic_examples*
- Earth Data Login:
    - username:`podpac_demo`
    - password: `PODPAC-demo-2019`

# AMS 2020 PODPAC Short Course

Welcome to the PODPAC AMS 2020 short course. During this course you will learn 
...
Index

# PODPAC Basics

1. Working with coordinates
2. Working with Nodes
    a. Algorithm
    b. Datasource
    c. Compositor
    d. Working with Xarray objects
3. Caching
4. Interpolation


In [1]:
# For interactive plots, comment the next line
%pylab inline
# For interactive plots, uncomment the next line
# %pylab ipympl
# We filter warnings just for a cleaner presentation in these notebooks
import warnings
warnings.filterwarnings('ignore')

Populating the interactive namespace from numpy and matplotlib


# Introduction

> For instructions on using Jupyter notebooks, see the [README.md](../../README.md) file. 

This notebook provides resources and also describes the structure of PODPAC library. Specifically we will go over:

* Where to find documentation
* How to install and import PODPAC
* How to import PODPAC
* The structure of the PODPAC library
* Getting help about a function
* Labeled arrays using [xarray](http://xarray.pydata.org/en/stable/)

# Finding Documentation 
* Documentation for the current version of PODPAC can be found [podpac.org](https://podpac.org)
* To track the development of PODPAC, see the source code site [PODPAC GitHub](https://github.com/creare-com/podpac)
    * If you run into a problem or find a bug, please go to [PODPAC Issues](https://github.com/creare-com/podpac/issues)
    * To peek at what's planned for the next release, have a look at [PODPAC Projects](https://github.com/creare-com/podpac/projects)
    * To download a specific release or see what changed between releases look at [PODPAC Releases](https://github.com/creare-com/podpac/releases)
* For examples see [PODPAC-Examples on GitHub](https://github.com/creare-com/podpac-examples)
* The source for the example application that we will be building can be found at [PODPAC Drought-Monitor GitHub](https://github.com/creare-com/podpac-drought-monitor)    


# How to get started with PODPAC

* Git repository: [https://github.com/creare-com/podpac](https://github.com/creare-com/podpac)
* Documentation: [https://podpac.org](https://podpac.org)
* Installation instructions: [https://podpac.org/install.html](https://podpac.org/install.html)
* Examples Git repository: [https://github.com/creare-com/podpac_examples](https://github.com/creare-com/podpac_examples)
* Problems? Create an issue: [https://github.com/creare-com/podpac/issues](https://github.com/creare-com/podpac/issues)

# Installing PODPAC
* Detailed instructions can be found on the [Main Documentation](https://podpac.org/install.html)
* PODPAC can be installed using: 
    * The standalone Windows zip file
        * Just download the file, unzip, and run the `run_podpac_jupyterlab.bat` file
        * This contains a full standalone Python installation and may not be desirable for seasoned Python users
    * `pip install podpac`
        * The base PODPAC installation has a minimal number of dependencies, so all functionality won't be available
        * To install all of the remaining dependencies use: `pip install podpac[all]`
    * From source by
        * Downloading a [release](https://github.com/creare-com/podpac/releases)
        * Cloning the PODPAC repository `git clone https://github.com/creare-com/podpac-drought-monitor.git`
        * Running `pip install -e .` from the directory with the PODPAC `setup.py` file
* Setting up a Python environment can be complex. Fortunately, PODPAC has no compiled components, but its dependencies do. 

# Importing PODPAC

* Unlike MATLAB, Python libraries need to be `imported` before they can be used
* Imported libraries usually have a namespace
* Portions of libraries, can be imported

## Examples

In [3]:
import podpac                     # Import PODPAC with the namespace 'podpac'
import podpac as pc               # Import PODPAC with the namespace 'pc'
from podpac import Coordinates    # Import Coordinates from PODPAC into the main namespace

# PODPAC library structure
PODPAC is composed out of multiple sub-modules/sub-libraries. The major ones, from a user's perspective are shown below. 
<img src='../Images/podpac-user-api.png' style='width:80%; margin-left:auto;margin-right:auto;' />


We can examine what's in the PODPAC library by using the `dir` function

In [5]:
dir(podpac)

['Coordinates',
 'Node',
 'NodeException',
 'NodeTrait',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 'algorithm',
 'authentication',
 'clinspace',
 'compositor',
 'coordinates',
 'core',
 'crange',
 'data',
 'interpolators',
 'managers',
 'pipeline',
 'settings',
 'units',
 'utils',
 'version',
 'version_info']

Anything that starts with the "dunderscore" `__<attr>__` is an internal Python method and can be ignored. 

In PODPAC, the top-level classes and functions are frequently used and include:
* `Coordinates`: class for defining coordinates
* `clinspace`: A helper function used to create uniformly spaced coordinates based on the number of points
* `crange`: Another helper function used to create uniformly spaced coordinates based on step size
* `Node`: Base class for defining PODPAC compute Pipeline
* `NodeException`: The error type thrown by Nodes
* `settings`: A module with various settings that define caching behavior, login credentials, etc.
* `version_info`: Python dictionary giving the version of the PODPAC library

The top-level modules or sub-packages (or sub libraries) include: 
* `algorithm`: here you can find generic `Algorithm` nodes to do different types of computations
* `authentication`: this contains utilities to help authenticate users to download data
* `compositor`: here you can find nodes that help to combine multiple data sources into a single node
* `coordinates`: this module contains additional utilities related to creating coordinates
* `core`: this is where the core library is implemented, and follows the directory structure of the code
* `data`: here you can find generic `DataSource` nodes for reading and interpreting  data sources
* `datalib`: here you can find domain-specific `DataSource` nodes for reading data from specific instruments, studies, and programs
* `interpolators`: this contains classes for dealing with automatic interpolation
* `managers`: contains classes and Nodes related to managing how and where code is run. This is where the AWS functionalit lives

Diving into specifically what's available in some of these submodules

In [11]:
# Generic Algorithm nodes
dir(podpac.algorithm)

['Algorithm',
 'Arange',
 'Arithmetic',
 'Convolution',
 'CoordData',
 'Count',
 'DayOfYear',
 'ExpandCoordinates',
 'GroupReduce',
 'Kurtosis',
 'Max',
 'Mean',
 'Median',
 'Min',
 'SelectCoordinates',
 'SinCoords',
 'Skew',
 'SpatialConvolution',
 'StandardDeviation',
 'Sum',
 'TimeConvolution',
 'Variance',
 'YearSubstituteCoordinates',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__']

In [12]:
# Generic DataSource nodes
dir(podpac.data)

['Array',
 'CSV',
 'DataSource',
 'Dataset',
 'H5PY',
 'INTERPOLATION_DEFAULT',
 'INTERPOLATION_METHODS',
 'INTERPOLATION_METHODS_DICT',
 'INTERPOLATORS',
 'INTERPOLATORS_DICT',
 'Interpolation',
 'InterpolationException',
 'PyDAP',
 'Rasterio',
 'ReprojectedSource',
 'WCS',
 'Zarr',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__spec__',
 'interpolation_trait']

In [13]:
# Specific data libraries built into podpac
import podpac.datalib   # not loaded by default
dir(podpac.datalib)

['EGI',
 'GFS',
 'GFSLatest',
 'IntakeCatalog',
 'SMAP',
 'SMAPBestAvailable',
 'SMAPPorosity',
 'SMAPProperties',
 'SMAPSource',
 'SMAPWilt',
 'SMAP_PRODUCT_MAP',
 'TerrainTiles',
 '__builtins__',
 '__cached__',
 '__doc__',
 '__file__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 'drought_monitor',
 'egi',
 'gfs',
 'intake',
 'nasaCMR',
 'smap',
 'smap_egi',
 'sys',
 'terraintiles']

In [None]:
# Nothing here yet
# dir(podpac.alglib)

# Getting help on specific PODPAC components
* Each Object/Function/Module has a documentation string
    * If you find documentation that is incomplete or confusion, please let us know by creating and [Issue](https://github.com/creare-com/podpac/issues).
    * Or better yet, fixing it, and submitting a pull request!
* The [API documentation](https://podpac.org/user/api.html) also contains this information.

In [16]:
# Inside Jypyter Notebooks use:
podpac.algorithm.Arithmetic?

[1;31mInit signature:[0m [0mpodpac[0m[1;33m.[0m[0malgorithm[0m[1;33m.[0m[0mArithmetic[0m[1;33m([0m[1;33m*[0m[0margs[0m[1;33m,[0m [1;33m**[0m[0mkwargs[0m[1;33m)[0m[1;33m[0m[1;33m[0m[0m
[1;31mDocstring:[0m     
Create a simple point-by-point computation of up to 7 different input nodes.

Attributes
----------
A : podpac.Node
    An input node that can be used in a computation. 
B : podpac.Node
    An input node that can be used in a computation. 
C : podpac.Node
    An input node that can be used in a computation. 
D : podpac.Node
    An input node that can be used in a computation. 
E : podpac.Node
    An input node that can be used in a computation. 
F : podpac.Node
    An input node that can be used in a computation. 
G : podpac.Node
    An input node that can be used in a computation. 
eqn : str
    An equation stating how the datasources can be combined. 
    Parameters may be specified in {}'s
    
Examples
----------
a = SinCoords()
b = Arange()
ar

In [17]:
# In any Python interpreter use:
help(podpac.algorithm.Arithmetic)

Help on class Arithmetic in module podpac.core.algorithm.algorithm:

class Arithmetic(Algorithm)
 |  Arithmetic(*args, **kwargs)
 |  
 |  Create a simple point-by-point computation of up to 7 different input nodes.
 |  
 |  Attributes
 |  ----------
 |  A : podpac.Node
 |      An input node that can be used in a computation. 
 |  B : podpac.Node
 |      An input node that can be used in a computation. 
 |  C : podpac.Node
 |      An input node that can be used in a computation. 
 |  D : podpac.Node
 |      An input node that can be used in a computation. 
 |  E : podpac.Node
 |      An input node that can be used in a computation. 
 |  F : podpac.Node
 |      An input node that can be used in a computation. 
 |  G : podpac.Node
 |      An input node that can be used in a computation. 
 |  eqn : str
 |      An equation stating how the datasources can be combined. 
 |      Parameters may be specified in {}'s
 |      
 |  Examples
 |  ----------
 |  a = SinCoords()
 |  b = Arange()
 |  ar