# Glider Data via ERDDAP
*Compiled by Sage Lichtenwalner, Rutgers University, June 6, 2018*

*This example is largely based on a notebook by [Filipe Fernandes's](https://github.com/ioos/BioData-Training-Workshop/blob/master/notebooks/intro_errdapy-IOOS.ipynb), presented at the IOOS Biological Data Training Workshop. Thanks Filipe!*

ERDDAP data servers provide an easy to use RESTful API that make it easy to search for and request data.  There are many ERDDAP servers now available including [NOAA Coastwatch](erddap coastwatch), the [IOOS Glider DAC](https://data.ioos.us/gliders/erddap/index.html), and the  [OOI](http://oceanobservatories.org/erddap-server/).

For this example, we're going to grab OOI glider data that is stored in the IOOS Glider DAC.

A typical ERDDAP URL to request data looks like this:

[https://data.ioos.us/gliders/erddap/tabledap/whoi_406-20160902T1700.mat?depth,latitude,longitude,salinity,temperature,time&time>=2016-07-10T00:00:00Z&time<=2017-02-10T00:00:00Z
&latitude>=38.0&latitude<=41.0&longitude>=-72.0&longitude<=-69.0](https://data.ioos.us/gliders/erddap/tabledap/whoi_406-20160902T1700.mat?depth,latitude,longitude,salinity,temperature,time&time>=2016-07-10T00:00:00Z&time<=2017-02-10T00:00:00Z&latitude>=38.0&latitude<=41.0&longitude>=-72.0&longitude<=-69.0)

It's a mouthful, but it can easily be broken down into smaller parts.

* **server**: https://data.ioos.us/gliders/erddap/
* **protocol**: tabledap
* **dataet_id**: blue-20160818T1448.csv
* **variables**: depth,latitude,longitude,temperature,time
* **constraints**:
    - time>=2016-07-10T00:00:00Z
    - time<=2017-02-10T00:00:00Z
    - latitude>=38.0
    - latitude<=41.0
    - longitude>=-72.0
    - longitude<=-69.0

## We can use *erddapy* to help us make these URLs

![tar.png](https://imgs.xkcd.com/comics/tar.png)

In [None]:
!pip install xarray

!pip install erddapy
from erddapy import ERDDAP

# Part 1 - Let's Grab Some Data

* Were going going to use the [IOOS Glider DAC](https://data.ioos.us/gliders/erddap/index.html)
  * It includes gliders from many providers, including OOI
  * The data is provided in profile format, which makes it easier to use
  
* If we know the dataset ID we're interested in, we can request data

In [None]:
server = 'https://data.ioos.us/gliders/erddap'

dataset_id = 'whoi_406-20160902T1700'

constraints = {
    'time>=': '2016-07-10T00:00:00Z',
    'time<=': '2017-02-10T00:00:00Z',
    'latitude>=': 38.0,
    'latitude<=': 41.0,
    'longitude>=': -72.0,
    'longitude<=': -69.0,
}

variables = [
 'depth',
 'latitude',
 'longitude',
 'salinity',
 'temperature',
 'time',
]

In [None]:
e = ERDDAP(
    server=server,
    protocol='tabledap',
    response='nc'
)

e.dataset_id=dataset_id
e.constraints=constraints
e.variables=variables

print(e.get_download_url())

## Talk is cheap, so me the data!

There are a few methods you can use to get the data in a usable format:
* *to_pandas()* 
* *to_xarray()*

In [None]:
df = e.to_pandas(
    index_col='time',
    parse_dates=True,
    skiprows=(1,)  # units information can be dropped.
).dropna()

df.head()

In [None]:
ds = e.to_xarray(decode_times=False)

ds['temperature']

## Let's plot the data

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.dates as mdates

In [None]:
fig, ax = plt.subplots(figsize=(17, 5))
kw = dict(s=15, c=df['temperature'], marker='o', edgecolor='none')
cs = ax.scatter(df.index, df['depth'], **kw, cmap='RdYlBu_r')

ax.invert_yaxis()
ax.set_xlim(df.index[0], df.index[-1])
xfmt = mdates.DateFormatter('%H:%Mh\n%d-%b')
ax.xaxis.set_major_formatter(xfmt)

cbar = fig.colorbar(cs, orientation='vertical', extend='both')
cbar.ax.set_ylabel('Temperature ($^\circ$C)')
ax.set_ylabel('Depth (m)');

# Part 2 - Searching for Datasets

* [ERDDAP](https://data.ioos.us/gliders/erddap/search/advanced.html?page=1&itemsPerPage=1000) also provides a way to search for datasets
* And we can use *erddapy* to search for datasets interactively

In [None]:
e = ERDDAP(
    server='https://data.ioos.us/gliders/erddap'
)

In [None]:
import pandas as pd

# Grab every dataset available
datasets = pd.read_csv(e.get_search_url(response='csv', search_for='all'))

In [None]:
'We have {} tabledap, {} griddap, and {} wms endpoints.'.format(
    len(set(datasets['tabledap'].dropna())),
    len(set(datasets['griddap'].dropna())),
    len(set(datasets['wms'].dropna()))
)

In [None]:
datasets.head()

## Let's refine our search

Let's narrow the search area, time span, and look for *sea_water_temperature* only.

In [None]:
kw = {
    'standard_name': 'sea_water_temperature',
    'min_lon': -72.0,
    'max_lon': -69.0,
    'min_lat': 38.0,
    'max_lat': 41.0,
    'min_time': '2018-01-10T00:00:00Z',
    'max_time': '2019-01-10T00:00:00Z',
    'cdm_data_type': 'trajectoryprofile'
}

In [None]:
search_url = e.get_search_url(response='csv', **kw)
print(search_url)

# Grab the results
search = pd.read_csv(search_url)

# Extract the IDs
gliders = search['Dataset ID'].values

msg = 'Found {} Glider Datasets:\n\n{}'.format
print(msg(len(gliders), '\n'.join(gliders)))

# Part 3 - Dataset Metadata

Once we know the *Dataset IDs* we can explore its metadata with `get_info_url()`

In [None]:
info_url = e.get_info_url(dataset_id=gliders[2], response='csv')
info = pd.read_csv(info_url)

print(gliders[2])
info.head(10) # First 10 attributes

In [None]:
# Let's pull out the values for one of the variables
cdm_profile_variables = info.loc[
    info['Attribute Name'] == 'cdm_profile_variables', 'Value'
]

print(''.join(cdm_profile_variables))

## Finding variable names using attributes

In [None]:
# Find a variable name based on it's CF Compliant standard name
e.get_var_by_attr(
    dataset_id='cp_335-20170116T1459',
    standard_name='sea_water_temperature'
)

In [None]:
# Now let's find a few of them at once
standard_names=['sea_water_temperature', 'sea_water_practical_salinity']
variables = e.get_var_by_attr(
    dataset_id=dataset_id, 
    standard_name=lambda v: v in standard_names
)
variables

## Finding coordinate variables

In [None]:
axis = e.get_var_by_attr(
    dataset_id='cp_339-20180126T0000',
    axis=lambda v: v in ['X', 'Y', 'Z', 'T']
)
axis

# Putting everything together

Let's find all of the gliders that flew within the Pioneer Array during 2018.

In [None]:
constraints = {
    'time>=': '2018-01-01T00:00:00Z',
    'time<=': '2019-01-01T00:00:00Z',
    'latitude>=': 38.0,
    'latitude<=': 41.0,
    'longitude>=': -72.0,
    'longitude<=': -69.0,
}

variables = [
 'depth',
 'latitude',
 'longitude',
 'salinity',
 'temperature',
 'time',
]

In [None]:
from requests.exceptions import HTTPError

def download_csv(url):
    return pd.read_csv(
        url, index_col='time', parse_dates=True, skiprows=[1]
)

dfs = {}
for glider in gliders:
    print(glider)
    try:
        download_url = e.get_download_url(
            dataset_id=glider,
            protocol='tabledap',
            variables=variables,
            response='csv',
            constraints=constraints
        )
    except HTTPError:
        continue
    dfs.update({glider: download_csv(download_url)})

We can use [Folium](http://python-visualization.github.io/folium/) to create an interactive map of all of the glider tracks using leaflet.js.

In [None]:
!pip install folium 
import folium

In [None]:
def plot_track(df, name, color='orange'):
    df = df.reset_index().drop_duplicates(['latitude','longitude'], keep='first').sort_values('time')

    locations = list(zip(df['latitude'].values, df['longitude'].values))
    folium.PolyLine(
        locations=locations,
        color=color,
        weight=8,
        opacity=0.7,
        tooltip=name
    ).add_to(m)
    print(color)


In [None]:
import matplotlib.colors as mc
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
print(colors)

tiles = ('http://services.arcgisonline.com/arcgis/rest/services/'
         'World_Topo_Map/MapServer/MapServer/tile/{z}/{y}/{x}')

m = folium.Map(location=(40.3052, -70.8833), zoom_start=7,
               tiles=tiles, attr='ESRI')

k=0
for name, df in list(dfs.items()):
    plot_track( df, name, color=mc.to_hex(colors[k]) )
    k = k+1;

m #Display the map

In [None]:
def glider_scatter(df, ax, glider):
    ax.scatter(df['temperature'], df['salinity'], s=10, alpha=0.5, label=glider)
    
fig, ax = plt.subplots(figsize=(7, 7))
ax.set_ylabel('Salinity')
ax.set_xlabel('Temperature')
ax.grid(True)

for glider, df in dfs.items():
    glider_scatter(df, ax, glider)
    
leg = ax.legend()
ax.set_ylim(20, 41)
ax.set_xlim(2.5, 26);

# References

* Check out the [erddapy documentation](https://pyoceans.github.io/erddapy/) for more information on using the library, especially the [Quick Intro](https://pyoceans.github.io/erddapy/quick_intro.html).