# Step 1: Notebook Setup

The cell below contains a number of helper functions used throughout this walkthrough. They are mainly wrappers around existing `matplotlib` functionality and are provided for the sake of simplicity in the steps to come.

Take a moment to read the descriptions for each method so you understand what they can be used for. You will use these "helper methods" as you work through this notebook below.

If you are familiar with `matplotlib`, feel free to alter the functions as you please.

## TODOs

1. Click in the cell below and run the cell.

In [None]:
# TODO: Make sure you run this cell before continuing!

%matplotlib inline
import matplotlib.pyplot as plt

def show_plot(x_datas, y_datas, x_label, y_label, legend=None, title=None):
    """
    Display a simple line plot.
    
    :param x_data: Numpy array containing data for the X axis
    :param y_data: Numpy array containing data for the Y axis
    :param x_label: Label applied to X axis
    :param y_label: Label applied to Y axis
    """
    fig = plt.figure(figsize=(16,8), dpi=100)
    for (x_data, y_data) in zip(x_datas, y_datas):
        plt.plot(x_data, y_data, '-', marker='|', markersize=2.0, mfc='b')
    plt.grid(b=True, which='major', color='k', linestyle='-')
    plt.xlabel(x_label)
    fig.autofmt_xdate()
    plt.ylabel (y_label)
    if legend:
        plt.legend(legend, loc='upper left')
    if title:
        plt.title(title)
    plt.show()
    return plt
    
def plot_box(bbox):
    """
    Display a Green bounding box on an image of the blue marble.
    
    :param bbox: Shapely Polygon that defines the bounding box to display
    """
    min_lon, min_lat, max_lon, max_lat = bbox.bounds
    import matplotlib.pyplot as plt1
    from matplotlib.patches import Polygon
    from mpl_toolkits.basemap import Basemap

    map = Basemap()
    map.bluemarble(scale=0.5)
    poly = Polygon([(min_lon,min_lat),(min_lon,max_lat),(max_lon,max_lat),
                    (max_lon,min_lat)],facecolor=(0,0,0,0.0),edgecolor='green',linewidth=2)
    plt1.gca().add_patch(poly)
    plt1.gcf().set_size_inches(15,25)
    
    plt1.show()
    
def show_plot_two_series(x_data_a, x_data_b, y_data_a, y_data_b, x_label, y_label_a, 
                         y_label_b, series_a_label, series_b_label, align_axis=True):
    """
    Display a line plot of two series
    
    :param x_data_a: Numpy array containing data for the Series A X axis
    :param x_data_b: Numpy array containing data for the Series B X axis
    :param y_data_a: Numpy array containing data for the Series A Y axis
    :param y_data_b: Numpy array containing data for the Series B Y axis
    :param x_label: Label applied to X axis
    :param y_label_a: Label applied to Y axis for Series A
    :param y_label_b: Label applied to Y axis for Series B
    :param series_a_label: Name of Series A
    :param series_b_label: Name of Series B
    :param align_axis: Use the same range for both y axis
    """
    
    fig, ax1 = plt.subplots(figsize=(10,5), dpi=100)
    series_a, = ax1.plot(x_data_a, y_data_a, 'b-', marker='|', markersize=2.0, mfc='b', label=series_a_label)
    ax1.set_ylabel(y_label_a, color='b')
    ax1.tick_params('y', colors='b')
    ax1.set_ylim(min(0, *y_data_a), max(y_data_a)+.1*max(y_data_a))
    ax1.set_xlabel(x_label)
    
    ax2 = ax1.twinx()
    series_b, = ax2.plot(x_data_b, y_data_b, 'r-', marker='|', markersize=2.0, mfc='r', label=series_b_label)
    ax2.set_ylabel(y_label_b, color='r')
    ax2.set_ylim(min(0, *y_data_b), max(y_data_b)+.1*max(y_data_b))
    ax2.tick_params('y', colors='r')
    
    if align_axis:
        axis_min = min(0, *y_data_a, *y_data_b)
        axis_max = max(*y_data_a, *y_data_b)
        axis_max += .1*axis_max
        
        ax1.set_ylim(axis_min, axis_max)
        ax2.set_ylim(axis_min, axis_max)
    
    plt.grid(b=True, which='major', color='k', linestyle='-')
    plt.legend(handles=(series_a, series_b), bbox_to_anchor=(1.1, 1), loc=2, borderaxespad=0.)
    plt.show()


# Step 2: List available Datasets

Now we can interact with NEXUS using the `nexuscli` python module. The `nexuscli` module has a number of useful methods that allow you to easily interact with the NEXUS webservice API. One of those methods is `nexuscli.dataset_list` which returns a list of Datasets in the system along with their start and end times.

However, in order to use the client, it must be told where the NEXUS webservice is running. The `nexuscli.set_target(url)` method is used to target NEXUS. An instance of NEXUS is already running for you and is available at `http://<public dns>:8083` where `<public dns>` is the public DNS of the EC2 instance you signed up for.

## TODOs

1. Import the `nexuscli` python module.
2. Target your EC2 instance
3. Call `nexuscli.dataset_list()` and print the results

In [None]:
# TODO: Import the nexuscli python module.

# TODO: Target your AWS NEXUS server using your public DNS name and port 8083
nexuscli.set_target("http://<public dns>:8083", use_session=False)

# TODO: Call nexuscli.dataset_list() and print the results

# Step 3: Subset using Bounding Box

As you have noticed from reading the documentation, the `nexuscli` module has a method called subset that accepts a [bounding box](http://toblerity.org/shapely/shapely.geometry.html?highlight=box#shapely.geometry.box) argument. We can use this method to subset data using a geographical bounding box.

>def subset(	dataset, bounding_box, start_datetime, end_datetime, parameter, metadata_filter)  
>
>Fetches point values for a given dataset and geographical area or metadata criteria and time range.  
>
>__dataset__ Name of the dataset as a String  
>__bounding_box__ Bounding box for area of interest as a shapely.geometry.polygon.Polygon  
>__start_datetime__ Start time as a datetime.datetime  
>__end_datetime__ End time as a datetime.datetime  
>__parameter__ The parameter of interest. One of 'sst', 'sss', 'wind' or None  
>__metadata_filter__ List of key:value String metadata criteria  
>
>__return__ List of Point namedtuples

Try using the subset function to get data in the Gulf of Mexico from the `AVHRR_OI_L4_GHRSST_NCEI` dataset on one day. You can use `-98, 17.8, -81.5, 30.8` (west, south, east, north) as the bounding box extents.

## TODOs

1. Target your EC2 instance
1. Create the bounding box using shapely's `box` method
2. Plot the bounding box using the `plot_box` helper method
3. Subset data by calling the `subset` method in the `nexuscli` module
  - __Hint__: `datetime` is already imported for you. You can create a `datetime` using the method `datetime(int: year, int: month, int: day)`
  - __Hint__: make sure you pick a day that was between the `start` and `end` returned for the dataset from the `dataset_list` function
  - __Hint__: in python, to get the size of a list called `my_list` you would use `len(my_list)`
4. Print the result

In [None]:
import time
import nexuscli
from datetime import datetime

from shapely.geometry import box

# TODO: Target your AWS NEXUS server using your public DNS name and port 8083
nexuscli.set_target("http://<public dns>:8083", use_session=False)

# TODO: Create a bounding box using the box method imported above

# TODO: Plot the bounding box using the helper method plot_box

In [None]:
# Do not modify this line ##
start = time.perf_counter()#
############################


# TODO: Call the subset method for the AVHRR_OI_L4_GHRSST_NCEI dataset using 
# your bounding box and time period of 1 day. Then print the size/length of the resulting list.


# Enter your code above this line
print("Subsetting data took {} seconds".format(time.perf_counter() - start))

# Step 4: Subset Using a Metadatafilter

We can also subset data using metadata filters. This is a relatively new feature and currently it does require some knowledge about the dataset in order to know what metadata is available to be filtered on.

For this example we will use the sample river flow dataset `RAPID_WSWM` which has data modeled to look like river flow guages in North America. Each river has been given a unique identifier which is available to be filtered on. For our case, we want to focus on LA County. 9 rivers located in LA county have been selected already and their River IDs are listed for you.

Use the subset method to get data for the selected rivers and then plot the resulting data using the `show_plot` helper method. The filter format is `rivid_i:<River ID>` and has been provided for you.

## TODOs

1. Target your EC2 instance
1. Iterate over the list of River IDs
2. For each River ID, call the subset function passing in the metadata filter for that river
  - __Hint__: you will want to store the result of the subset function on each iteration in a data structure (like a list) so you can plot it later
  - __Hint__: in python you can use `my_list.append(data)` to append data to a list called `my_list`
  - __Hint__: `subset` returns a list of [Point](https://htmlpreview.github.io/?https://raw.githubusercontent.com/apache/incubator-sdap-nexus/107438af45b479348ffb75a667b276ee3c81f9da/client/docs/nexuscli/nexuscli.m.html#nexuscli.nexuscli.Point) objects
3. Graph the results using the show_plot helper method

In [None]:
import requests
import time
import nexuscli
from datetime import datetime


# TODO: Target your AWS NEXUS server using your public DNS name and port 8083
nexuscli.set_target("http://<public dns>:8083", use_session=False)

# River IDs for 9 Rivers in LA County
la_county_river_ids = [17575859, 17574289, 17575711, 17574677, 17574823,
                       948070361, 22560728, 22560730, 22560738]

# Do not modify this line ##
start = time.perf_counter()#
############################

# TODO: Iterate over the list of River IDs

    # TODO: For each River ID, call the subset function passing in the metadata filter for that river
    metadataFilter = "rivid_i:{}".format(river_id)

print("Subsetting took {} seconds".format(time.perf_counter() - start))

In [None]:
# TODO: Graph the results using the show_plot helper method
show_plot(, # x values (time)
          , # y values (data)
          'Time', # x axis label
          'Discharge (m³s⁻¹)', # y axis label
          legend=[str(r) for r in la_county_river_ids],
          title='LA County Rivers'
         )

# Step 5: Averaging Results

Now that we have data on 9 rivers in LA county, we might want to process that data even futher and determine the mean flow rate of all 9 rivers over time. This can be done using the `numpy` library.

Try using your results from the previous cell and apply `numpy.mean` to the data and plot the result.

## TODO

1. Average the results from the previous cell using `numpy.mean`
2. Plot the results

In [None]:
import numpy

# TODO Average the results from the previous cell using `numpy.mean`

avg_discharge_rates = numpy.mean(discharge_rates, axis=0)

# TODO Plot the results

show_plot([single_river_time_steps], # x values
          [avg_discharge_rates], # y values
          'Time', # x axis label
          'Discharge (m³s⁻¹)', # y axis label
          title='Average Discharge of LA County Rivers'
         )

