<div class="alert alert-block alert-info">
    <h1>Using the AρρEEARS API in your Analysis Workflow - Getting Started</h1>
</div>

---
## Objective
The intent of this tutorial is to familiarize [Earth Observation Data](https://earthdata.nasa.gov/earth-observation-data) users with the [AρρEEARS](https://lpdaac.usgs.gov/tools/appeears/) application programming interface (API) with demonstrations on how the API, and the services it provides, can be leveraged in an analysis workflow.

## Topics Covered
1. [**Getting Started**](#gettingstarted)  
    1.1 [Enable Access to the API](#1.1)  
    1.2 [Login](#1.2)  
2. [**Submit an Area Request**](#submittask)  
    2.1 [Import a Shapefile](#2.1)  
    2.2 [Compile the JSON payload to submit to AρρEEARS](#2.2)  
    2.3 [Submit a task request](#2.3)  
    2.4 [Get task status](#2.4)  
3. [**Download a Request [Bundle API]**](#downloadrequest)  
    3.1 [List files associated with the request](#3.2)  
    3.2 [Download files in a request](#3.2)  
4. [**Explore AρρEEARS Outputs**](#explore)  
    4.1 [Open and explore data using xarray](#4.1)  
    4.2 [Create summary statistics](#4.2)  
    4.3 [Create plots](#4.3)  
5. [**Quality Filtering**](#qualityfiltering)  
    5.1 [Decode quality values](#5.1)  
    5.2 [Create and apply quality mask](#5.2)  
    5.3 [Plot quality filtered data](#5.3)  


## AρρEEARS Information
To access AρρEEARS, visit: https://lpdaacsvc.cr.usgs.gov/appeears/

For comprehensive documentation of the full functionality of the [AρρEEARS API](https://lpdaacsvc.cr.usgs.gov/appeears/api/), please see the AρρEEARS API Documentation: https://lpdaacsvc.cr.usgs.gov/appeears/api/

Throughout the exercise, specific sections of the API documentation can be accessed by clicking the hyperlinked text.

## Setup and Dependencies 
- This Python Jupyter Notebook tutorial was has be test on Python versions 3.6 and 3.7

- Minicondas was used to create the python environments
    - Windows OS  
    `conda create -n py3.7 python=3.7`

- Required Python packages were installed from the conda-forge channel. Installing packages from the conda-forge channel is done by adding conda-forge to your channels with: `conda config --add channels conda-forge`

- Required Packages needed for this exercise are listed below. 
    - requests  
    `conda install requests`  
    - pandas  
    `conda install pandas`  
    - geopandas  
    `conda install geopandas`  
    - xarray  
    `conda install xarray`  
    - numpy  
    `conda install numpy`  
    - netCDF4  
    `conda install netCDF4`  
    - pyviz &emsp;&emsp;**NOTE** - [PyViz](http://pyviz.org/) is installed using the pyviz channel not conda-forge.  
    `conda install -c pyviz hvplot`  
    

---
## Procedures

### 1. Getting Started <a id="gettingstarted"></a>
[AρρEEARS API](https://lpdaacsvc.cr.usgs.gov/appeears/api/) access requires the same [NASA Earthdata Login](https://urs.earthdata.nasa.gov/) as the AρρEEARS user interface. In addition to having a valid NASA Earthdata Login account, the API feature must be enabled for the user within AρρEEARS.

#### 1.1 Enable Access to the API <a id="1.1"></a>
> To enable access to the [AρρEEARS API](https://lpdaacsvc.cr.usgs.gov/appeears/api/), navigate to the [AρρEEARS website](https://lpdaacsvc.cr.usgs.gov/appeears/). Click the *Sign In* button in the top right portion of the AρρEEARS landing page screen.  

<table><tr><td>
    <img src="https://lpdaacsvc.cr.usgs.gov/assets/images/help/image001.7f0d8820.png" />
</td></tr></table>  

> Once you are signed in, click the *Manage User* icon in the top right portion of the AρρEEARS landing page screen and select *Settings*.   

<table><tr><td>
    <img src="https://lpdaacsvc.cr.usgs.gov/assets/images/help/api/image001.3bb7c98a.png" />
</td></tr></table>  

> Select the *Enable API* box to gain access to the AρρEEARS API.  

<table><tr><td>
    <img src="https://lpdaacsvc.cr.usgs.gov/assets/images/help/api/image002.ebbb9431.png" />
</td></tr></table>

#### 1.2 Login to AρρEEARS/Earthdata <a id="1.2"></a>
> To submit a request, you must first [login](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#login) to the AρρEEARS API using your Earthdata login credentials.  We’ll use the `getpass` package to conceal our Earthdata login username and password. When executed, the code below will prompt you to enter your username followed by your password and store them as variables.

In [None]:
# Import required Python packages
import requests
import getpass
import time
import os
import cgi
import json
import pandas as pd
import geopandas as gpd
import xarray
import numpy as np
import hvplot.xarray

In [None]:
# Enter Earthdata login credentials
username = getpass.getpass('Earthdata Username:')
password = getpass.getpass('Earthdata Password:')

In [None]:
# AρρEEARS API URL
API = 'https://lpdaacsvc.cr.usgs.gov/appeears/api' 

> We'll use the `requests` package to POST our username and password to the AρρEEARS system. A successful login will provide you with a token to be used later in this tutorial to submit a request. For more information or if you are experiencing difficulties, please see the [API Documentation](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#login).

In [None]:
login_response = requests.post(f"{API}/login", auth=(username, password)).json()
login_response 

> The response returns a Bearer Token which is needed to leverage the AρρEEARS API via HTTP request methods (e.g. POST and GET). Note that this token will expire approximately 48 hours after being acquired.

In [None]:
# Assign the token to a variable
token = login_response['token']
head = {'Authorization': f"Bearer {token}"} 
head

---

### 2. Submit an Area Request <a id="submittask"></a>
The [Tasks](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#tasks) service, among other things (see below), is used to submit requests (e.g. POST and GET) to the AρρEEARS system. Each call to the [Tasks](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#tasks) service is associated with your user account. Therefore, each of the calls to this service require an authentication token. The [*submit task*](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#submit-task) API call provides a way to submit a new request. It accepts data via JSON, query string, or a combination of both. In the example below, we will compile a json and submit a request.

#### 2.1 Import a shapefile  <a id="2.1"></a>

> In this example, we are interested in Yellowstone National Park. We will use the `Geopandas` package to import a shapefile that contains the adminstrative boundary for the park. The shapefile was extracted from the [National Park Service unit boundaries shapefile](https://irma.nps.gov/DataStore/DownloadFile/621132) distributed by [National Park Service - Land Resources Division](https://irma.nps.gov/DataStore/Reference/Profile/2224545?lnv=True).

In [None]:
yellowstone = gpd.read_file('Data/yellowstone_subset_geo.shp')
yellowstone.head()

> Geopandas imports the shapefile in as a Geopandas GeoDataframe. 

In [None]:
type(yellowstone)

> We need to convert the `Geopandas GeoDataframe` into an object that has a  geojson structure. We'll use the method `json.loads` to make the conversion.

In [None]:
yellowstone = json.loads(yellowstone.to_json())
#yellowstone
type(yellowstone)

> The **yellowstone** variable is now a python dictionary that matches the geojson structure.  

#### 2.2 Compile the JSON payload to submit to AρρEEARS <a id="2.2"></a>
> Many of the required items needed in the AρρEEARS API request payload have multiple options. For example, AρρEEARS has several projections that can be selected for the output. We can use the AρρEEARS API to find out what projections are availables. In this example, we are explicitly assigning our projection to the **proj** variable. To find out how to use the AρρEEARS API to list the available options for each parameter, check out the [AρρEEARS API Tutorials](https://git.earthdata.nasa.gov/projects/LPDUR/repos/appeears-api-getting-started/browse) produced by the [LP DAAC](https://lpdaac.usgs.gov/).

In [None]:
task_name = 'Yellowstone_NP_Vegetation'    # User-defined name of the task
task_type = 'area'                         # Type of task, area or point
proj = 'geographic'                        # Set output projection 
outFormat = 'netcdf4'                      # Set output file format type
startDate = '01-01-2016'                   # Start of the date range for which to extract data: MM-DD-YYYY
endDate = '12-31-2018'                     # End of the date range for which to extract data: MM-DD-YYYY
recurring = False                          # Specify True for a recurring date range
#yearRange = [2000,2016]

prodLayer = [{'layer': '_250m_16_days_NDVI', 'product': 'MOD13Q1.006'}]    # See layer names for MOD13Q1.006 here: https://lpdaacsvc.cr.usgs.gov/appeears/api/product/MOD13Q1.006
#prodLayer = [{'layer': '_250_16_days_NDVI', 'product': 'MOD13Q1.006'}, {'layer': 'LC_Type1', 'product': 'MCD12Q1.006'}]

In [None]:
task = {
    'task_type': task_type,
    'task_name': task_name,
    'params': {
         'dates': [
         {
             'startDate': startDate,
             'endDate': endDate
         }],
         'layers': prodLayer,
         'output': {
                 'format': {
                         'type': outFormat}, 
                         'projection': proj},
         'geo': yellowstone,
    }
}
#task

> The **task** object is what we will submit to the AρρEEARS system.

#### 2.3 Submit a task request <a id="2.3"></a> 
> We will now submit our **task** object to AρρEEARS using the [*submit task*](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#submit-task) API call

In [None]:
task_response = requests.post(f"{API}/task", json=task, headers=head)    # Post json to the API task service, return response as json
task_response.json()                                                     # Print task response

> A task ID is generated for each request and is returned in the response. Task IDs are unique for each request and are used to check request status, explore request details, and list files generated for the request.

In [None]:
task_id = task_response.json()['task_id']
task_id

#### 2.4 Get task status <a id="2.4"></a>
> We can use the [Status](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#status) service to retrieve information on the status of all task requests that are currently being processed for your account. We will use the [*task status*](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#task-status) API call with our **task_id** to get information on the request we just submitted. 

In [None]:
status_response = requests.get(f"{API}/status/{task_id}", headers=head)
status_response.json()

> For longer running requests we can gently ping the API to get the status of our submitted request using the snippet below. Once the request is complete, we can move on to downloading our request contents.

In [None]:
#starttime = time.time()
#while requests.get(f"{API}/task/{task_id}", headers=head).json()['status'] != 'done':
#    print(requests.get(f"{API}/task/{task_id}", headers=head).json()['status'])
#    time.sleep(20.0 - ((time.time() - starttime) % 20.0))
#print(requests.get(f"{API}/task/{task_id}", headers=head).json()['status'])

---

### 3. Download a Request <a id="downloadrequest"></a>
The [Bundle](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#bundle) service provides information about completed tasks (i.e. tasks that have a status of **done**). A bundle will be generated containing all of the files that were created as part of the task request.

#### 3.1 List files associated with the request  <a id="3.1"></a>
> The [list files](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#list-files) API call lists all of the files contained in the bundle which are available for download.

In [None]:
bundle = requests.get(f"{API}/bundle/{task_id}").json()    # Call API and return bundle contents for the task_id as json
bundle                                                     # Print bundle contents

#### 3.2 Download files in a request <a id="3.2"></a>
>The [download file](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#download-file) API call gives us the information needed to download all, or a subset, of the files available for a request. Just as the task has a **task_id** to identify it, each file in the bundle will also have a unique **file_id** which should be used for any operation on that specific file. The `Content-Type` and `Content-Disposition` headers will be returned when accessing each file to give more details about the format of the file and the filename to be used when saving the file.

> The `bundle` variable we created has more information than we need to download the files. We will first create a python dictionary to hold the **file_id** and associated **file_name** for each file.

In [None]:
files = {}
for f in bundle['files']: 
    files[f['file_id']] = f['file_name']    # Fill dictionary with file_id as keys and file_name as values
files

> Now we will download the files using the **file_id**s from the dictionary into an output directory.

In [None]:
#outDir = os.path.join(os.getcwd(), 'Outputs')    # When executing from local machine
outDir = 'Outputs'                                # When using binder
if not os.path.exists(outDir):
    os.makedirs(outDir)

In [None]:
for file in files:
    download_response = requests.get(f"{API}/bundle/{task_id}/{file}", stream=True)                                   # Get a stream to the bundle file
    filename = os.path.basename(cgi.parse_header(download_response.headers['Content-Disposition'])[1]['filename'])    # Parse the name from Content-Disposition header 
    filepath = os.path.join(outDir, filename)                                                                         # Create output file path
    with open(filepath, 'wb') as file:                                                                                # Write file to dest dir
        for data in download_response.iter_content(chunk_size=8192): 
            file.write(data)
print(f"Downloaded files can be found at: {outDir}")

> Here are the files we just downloaded.

---

### 4. Explore AρρEEARS Outputs <a id="explore"></a>
Now that we have downloaded all the files from our request, let's start to check out our data! In our AρρEEARS request, we set the output format to 'netcdf4'. As a result, we have only one data file to deal with. We will open the dataset as an `xarray Dataset` and start to explore.

#### 4.1 Open and explore data using [`xarray`](http://xarray.pydata.org/en/stable/) <a id="4.1"></a>

> [`Xarray`](http://xarray.pydata.org/en/stable/) extends and combines much of the core functionality from both the Pandas library and Numpy, hence making it very good at handling multi-dimensional (N-dimensional) datasets that contain labels (e.g. variable names or dimension name). Let's open the netcdf file with our data as an xarray object.

In [None]:
ds = xarray.open_dataset('Outputs/MOD13Q1.006_250m_aid0001.nc')
ds

> Xarray has two fundamental  data structures. A `Dataset` holds multiple variables that potentially share the same coordinates and global metadata for the file (see above). A `DataArray` contains a single multi-dimensional variable and its coordinates, attributes, and metadata. Data values can be pull out of the DataArray as a `numpy.ndarray` using the `values` attribute.

In [None]:
type(ds)

In [None]:
#ds['_250m_16_days_NDVI']
type(ds['_250m_16_days_NDVI'])

In [None]:
#ds['_250m_16_days_NDVI'].values
type(ds['_250m_16_days_NDVI'].values)

> We can also pull out information for each coordinate item (e.g. lat, lon, time). Here we pull out the *time* coordinate.

In [None]:
ds['time']

> The `cftime.DatetimeJulian` format of the time coordinate is a little problematic for some plotting libraries and analysis routines. We are going to [convert the time coordinate](https://stackoverflow.com/questions/55786995/converting-cftime-datetimejulian-to-datetime) to the more useable datetime format `datetime64`.

In [None]:
datatimeindex = ds.indexes['time'].to_datetimeindex()

In [None]:
ds['time'] = datatimeindex
ds['time']

> Since the data is in an xarray we can intuitively slice or reduce dataset. Let's select a single time slice from the normalized difference vegetation index (NDVI) variable.

In [None]:
ds['_250m_16_days_NDVI'].sel(time='2015-12-19')

> Let's pull out the NDVI DataArray from the Dataset and name the variable ndvi. This will make plotting a little easier later on. 

In [None]:
ndvi = ds['_250m_16_days_NDVI']
ndvi

> Notice the the our DataArray still has all of it's associated attributes and metadata.

#### 4.2 Create summary statistics <a id="4.2"></a>
> The download bundle for each AρρEEARS request includes a CSV with summary statistics. Since we already have the data in our python environment lets calculate our own summary statistics and plot them. 

> Let's calculate the mean, standard deviation, maximum value, and minimum value for each time interval in our DataArray creating a seperate variable for each statistic. 

In [None]:
ndvi_mean = ds['_250m_16_days_NDVI'].mean(('lat', 'lon'))
ndvi_sd = ds['_250m_16_days_NDVI'].std(('lat', 'lon'))
ndvi_max = ds['_250m_16_days_NDVI'].max(('lat', 'lon'))
ndvi_min = ds['_250m_16_days_NDVI'].min(('lat', 'lon'))

In [None]:
ndvi_mean

#### 4.3 Create plots <a id="4.3"></a>
> We now have the `mean` and `standard deviation` for each time slice as well as the `maximum` and `minimum` values. Let's do some plotting! We will use the [`hvPlot`](https://hvplot.pyviz.org/index.html) package to create simple but interactive chart/plots.

In [None]:
ndvi_mean.hvplot.line()

In [None]:
stats = (
    ndvi_mean.hvplot.line(height=350, width=450, line_width=1.50, color='red', grid=True, padding=0.05) + 
    ndvi_sd.hvplot.line(height=350, width=450, line_width=1.50, color='red', grid=True, padding=0.05) + 
    ndvi_max.hvplot.line(height=350, width=450, line_width=1.50, color='red', grid=True, padding=0.05) + 
    ndvi_min.hvplot.line(height=350, width=450, line_width=1.50, color='red', grid=True, padding=0.05)
).cols(2)
stats

In [None]:
del(ndvi_mean, ndvi_sd, ndvi_max, ndvi_min) # Clean-up

> Let's take a look at out ndvi variable.

In [None]:
ndvi.hvplot()

In [None]:
#ndvi.hvplot.line()
ndvi.hvplot.line('time')

> Let's create some box and whisker plots! Notice how we can use *time* to slice the data.

In [None]:
# Single date
ndvi.sel(time='2016-05-08').hvplot.box('_250m_16_days_NDVI', by=['time'], rot=45, box_fill_color='lightblue', padding=0.1, width=450, height=350)

In [None]:
# Observations between months
ndvi.sel(time=slice('2016-05', '2016-10')).hvplot.box('_250m_16_days_NDVI', by=['time'], rot=45, box_fill_color='lightblue', padding=0.1, width=800, height=450)

In [None]:
# Obervations for the specified year
ndvi.sel(time='2016').hvplot.box('_250m_16_days_NDVI', by=['time'], rot=45, box_fill_color='lightblue', padding=0.1, width=800, height=450)

> See if the trend fits with what the AρρEEARS interface provides. Paste the string below into your browser, **without** the `'` to make the comparison. They should match...hopefully.

In [None]:
f"https://lpdaacsvc.cr.usgs.gov/appeears/view/{task_id}"

> Now let's create a multidimensional (t,x,y) plot of our gridded data.

In [None]:
ndvi.hvplot(groupby='time', cmap='BrBG', width=640, height=469, colorbar=True)

### 5. Quality Filtering <a id="qualityfiltering"></a>
When available, AρρEEARS extracts and returns quality assurance (QA) data for each data file returned regardless of whether the user requests it. This is done to ensure that the user possesses the information needed to determine the usability and usefulness of the data they get from AρρEEARS. The [Quality](https://lpdaacsvc.cr.usgs.gov/appeears/api/#quality) service from the AρρEEARS API can be leveraged to create masks that filter out undesirable data values. 

In [None]:
ds

> Notice that the xarray Dataset contains a data array/variable called `_250m_16_days_VI_Quality`, which has the same dimensions as the `_250m_16_days_NDVI` data array/variable. We can use the quality array to create a mask of poor-quality data. We'll use the [Quality](https://lpdaacsvc.cr.usgs.gov/appeears/api/?language=Python%203#quality) service to decode the quality assurance information. 

> We'll use the following criteria to mask out poor quality data:
- high aerosol content
- cloud contamination
- snow and ice cover.

#### 5.1 Decode quality values <a id="5.1"></a>
> We do not want to decode the same value multiple times. Let's extract all of the unique data values from the `_250m_16_days_VI_Quality` xarray DataArray.

In [None]:
quality_values = pd.DataFrame(np.unique(ds._250m_16_days_VI_Quality.values), columns=['value']).dropna()
quality_values

> The following function decodes the data values from the `_250m_16_days_VI_Quality` xarray DataArray using the [Quality](https://lpdaacsvc.cr.usgs.gov/appeears/api/#quality) service.

In [None]:
def qualityDecode(qualityservice_url, product, qualitylayer, value):
    req = requests.get(f"{qualityservice_url}/{product}/{qualitylayer}/{value}")
    return(req.json())

> Now we will create an empty dataframe to store the decoded quality information for the masking criteria we identified above.

In [None]:
quality_desc = pd.DataFrame(columns=['value', 'AQ_bits', 'AQ_description', 'MC_bits', 'MC_description', 'SI_bits', 'SI_description'])

> The for loop below goes through all of the unique quality data values, decodes them using the quality service, and appends the quality descriptions to our empty dataframe.

In [None]:
for index, row in quality_values.iterrows():
    decode_int = qualityDecode('https://lpdaacsvc.cr.usgs.gov/appeears/api/quality',
                               'MOD13Q1.006',
                               '_250m_16_days_VI_Quality',
                               str(int(row['value'])))
    quality_info = decode_int
    df = pd.DataFrame({'value': int(row['value']),
                       'AQ_bits': quality_info['Aerosol Quantity']['bits'], 
                       'AQ_description': quality_info['Aerosol Quantity']['description'], 
                       'MC_bits': quality_info['Mixed Clouds']['bits'],
                       'MC_description': quality_info['Mixed Clouds']['description'],
                       'SI_bits': quality_info['Possible snow/ice']['bits'],
                       'SI_description': quality_info['Possible snow/ice']['description']}, index=[index])

    quality_desc = quality_desc.append(df)

In [None]:
quality_desc

#### 5.2 Create and apply quality mask <a id="5.2"></a>
> Now we have a dataframe with all of the quality information we need to create a quality mask. Next we'll identify the quality categories that we would like to keep.

In [None]:
mask_values = quality_desc[((quality_desc['AQ_description'] == 'Low')|
                           (quality_desc['AQ_description'] == 'Average'))&
                           (quality_desc['MC_description'] == 'No')&
                           (quality_desc['SI_description'] == 'No')]
mask_values

In [None]:
ds

> Let's apply the mask to our xarray dataset, keeping only the values that we have deemed acceptable

In [None]:
ds_masked = ds.where(ds['_250m_16_days_VI_Quality'].isin(mask_values['value']))
ds_masked

#### 5.3 Plot quality filtered data <a id="5.3"></a>
> Using the same plotting functionality from above, let's see how our data looks when we mask out the undesirable pixels.

In [None]:
ds_masked['_250m_16_days_NDVI'].hvplot(groupby='time', cmap='BrBG', width=640, height=469, colorbar=True)

> Whoa! Looks like a lot of the pixels over the winter months didn't make the cut.

> Let's use xarray's powerfull idexing method to pull out the 'summer months' (i.e. June, July, and August).

In [None]:
ds_masked_jja = ds_masked['_250m_16_days_NDVI'].sel(time=ds_masked['time.season']=='JJA')

In [None]:
ds_masked_jja['time']

In [None]:
ds_masked_jja.hvplot(groupby='time', cmap='BrBG', width=640, height=469, colorbar=True)

### This tutorial provides a template to use for your own research workflows. Leveraging the AρρEEARS API for extracting and formatting analysis ready data, and importing it directly into Python means that you can keep your entire research workflow in a single software program, from start to finish.

---

<div class="alert alert-block alert-info">
    <h1> Contact Information </h1>
    <h3> Material written by Aaron Friesz$^{1}$ & Cole Krehbiel$^{1}$ </h3>
    <ul>
        <b>Contact:</b> LPDAAC@usgs.gov <br> 
        <b>Voice:</b> +1-605-594-6116 <br>
        <b>Organization:</b> Land Processes Distributed Active Archive Center (LP DAAC) <br>
        <b>Website:</b> https://lpdaac.usgs.gov/ <br>
        <b>Date last modified:</b> 05-15-2019 <br>
    </ul>

$^{1}$Innovate! Inc., contractor to the U.S. Geological Survey, Earth Resources Observation and Science (EROS) Center, Sioux Falls, South Dakota, 57198-001, USA. Work performed under USGS contract G15PD00467 for LP DAAC$^{2}$.

$^{2}$LP DAAC Work performed under NASA contract NNG14HH33I.
</div>