# Plug-and-Play Access, Processing, and Analysis of SMAP Data Using the Open Source Python Library, PODPAC

## SMAP Science Utilization Meeting #2

## Matt Ueckermann, Jerry Bieszczad, Dara Entekhabi

### November 28, 2018

# What is PODPAC and why do you care? (1/2)
As a scientist **I want to**: 
1. Find: *Identify and obtain NASA data relevant to a scientific problem*
2. Explore: *Discover new characteristics or features in a NASA dataset*
3. Produce: *Derive new datasets by processing and integrating NASA data*
4. Validate: *Assess the goodness of a simulation dataset*
5. Answer: *Perform analysis/analytics on NASA data to answer scientific questions*
6. Share: *Distribute the results of analysis/analytics to others*

...but **I have to** deal with large scale data challenges.

# What is PODPAC and why do you care? (2/2)
* PODPAC is a Python (2/3) library developed to address the large scale data **variety** and **volume** challenges
<img src='../Images/PODPAC.png' width='100%'/>

* So scientists **don't have to** deal with ***data-wrangling***
<img src='../Images/VarietyVolume.png' width='100%'/>

# PODPAC's Core Features

| Feature                                                                                                                                                                                          | Benefit                                                                                                                                                                                                                                                                                                                                                      |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Integrated with NASA observational products and other data sources                                                                                                                               |  <ul><li> Access SMAP data through a single, unified interface </li>     <ul><li> No need to download individual files</li>      <li> No need for custom scripts to retrieve data </li>     </ul> <li> Framework for making additional data sources available </li>  <li> Community contributions welcomed to provide access to more data sources</li> </ul> |
|  Automated Data Wrangling <ul><li> Coordinate reference systems</li> <li> Geospatial projections</li>  <li> Data structures</li>  <li> Interpolation (re-gridding) </li>  <li> Units </li> </ul> |  <ul><li> Combine data in a plug-and-play manner</li> <li> Save time not having to repeat code</li>  <li> Reduce conversion errors</li> </ul>                                                                                                                                                                                                                |
| Pipeline Architecture with Recorded Provenance                                                                                                                                                   |  <ul><li> Share algorithms using light-weight JSON-formatted text</li> <li> Swap data sources into existing processing pipelines</li> <li> Reuse algorithms and data-sources</li>  <li> Save time by using existing pipeline Nodes (building blocks)</li> </ul>                                                                                              |
| Caching                                                                                                                                                                                          |  <ul><li> Avoid downloading same data multiple times</li> <li> Store only the desired data locally</li>  <li> Only compute expensive quantities once</li> </ul>                                                                                                                                                                                              |

# Examples
* [Introduction](../Introduction.ipynb)
* [Reading a raster file](../basic_examples/open-raster-file.ipynb)
* [Reading a .csv file](../basic_examples/open_point_file.ipynb)
* [Creating coordinates](../basic_examples/using-coordinates.ipynb) 
* [Retrieving SMAP data](../basic_examples/retrieving-SMAP-data)
* [Combining data sources in a custom algorithm](combining-data-in-algorithm)
* [Compositing data sources together](../basic_examples/composite-array-datasources.ipynb)
* [Looking at SMAP-Sentinal data](../demos/SMAP-Sentinel-data-access.ipynb)
* [Running pipelines in the cloud (using AWS Lambda functions)](../basic_examples/running-on-aws-lambda.ipynb)

# How to get PODPAC?
* Git Repository: [https://github.com/creare-com/podpac](https://github.com/creare-com/podpac)
* Documentation: [https://creare-com.github.io/podpac-docs/](https://creare-com.github.io/podpac-docs/)
* Installation Instructions: [https://creare-com.github.io/podpac-docs/install.html](https://creare-com.github.io/podpac-docs/install.html)
* Problems? Create an Issues: [https://github.com/creare-com/podpac/issues](https://github.com/creare-com/podpac/issues)
