<div style="width:1000 px">

<div style="float:right; width:98 px; height:98px;">
<img src="https://raw.githubusercontent.com/Unidata/MetPy/master/src/metpy/plots/_static/unidata_150x150.png" alt="Unidata Logo" style="height: 98px;">
</div>

<h1>Using Siphon and MetPy to access and manipulate data<h1>
    <h3>AMS 2022 Short Course: MetPy for Quantitative Analysis of Meteorological Data</h3>

<div style="clear:both"></div>
</div>

<hr style="height:2px;">

### Tasks
1. <a href="#tdscatalog">Working with the TDS Catalog</a>
1. <a href="#datastruct">Working with xarray Data Structures</a>

<a name="background"></a>
## Background
Atmospheric data are collected by numerous institutions in a variety of data formats and stored in disparate places. Accessing and distributing these datasets are complicated activities, but are made simpler with the use of the THREDDS Data Server (TDS). In this lesson, you will learn more about data access with the TDS and how to use data in Python.

### THREDDS Data Server (TDS)
THREDDS is middleware to bridge the gap between data providers and data users. Data on the TDS are organized into catalogs that data users can browse and use to request data. While anyone can host their own TDS, Unidata hosts a publicly accessible TDS at [thredds.ucar.edu](https://thredds.ucar.edu/).

### Siphon
A web browser is one way to interact with a TDS, but we can also pull data from a TDS into Python projects using the Siphon Python package. Siphon doesn't require downloading data locally, saving time and storage space. Once pulled into Python, we can use packages like MetPy and Cartopy to visualize and analyze the data.

<center><img src="https://elearning.unidata.ucar.edu/metpy/AMS2022/TDSecosystem.png" width="300"/><br>
<i>The TDS - Siphon - Python ecosystem</i></center>
<br><br>
Siphon accomplishes this through a <b>TDS catalog</b> object created from an xml catalog document served by the TDS. This is a virtual catalog of items that are available on the TDS that we can then access remotely (or download locally if needed).

`cat = TDSCatalog('https://thredds.ucar.edu/.../catalog.xml') `

<a name="tdscatalog"></a>
## Working with Siphon

### The TDSCatalog

We can view a THREDDS Data Server (TDS) Catalog in a browser as well as in Python. For this activity, we'll start by examining Unidata's TDS catalog in our browser. <a href="https://thredds.ucar.edu" target="blank">https://thredds.ucar.edu</a>

<div class="alert alert-success">
    <b>EXERCISE</b>: TDS in the browser
    

Open this TDS link in a new tab in your browser: <a href="https://thredds.ucar.edu" target="blank">https://thredds.ucar.edu</a>
    
Locate the following catalog:
    
 <ul>
     <li>Source: High Resolution Rapid Refresh (HRRR), Analysis</li> 
     <li>Resolution: 2.5 km </li>
     <li>Collection: latest</li>
</ul>
    
Then create a variable called <code>url</code> with a value set to the URL to the dataset as a string.
</div>

In [None]:
# YOUR CODE HERE

The TDSCatalog object requires an xml document as input, so we now change the extension from html to xml.

In [None]:
# Change the URL above to be an xml document using Python's built-in replace module
url_xml = url.replace(".html", ".xml")
print(url_xml)

Now that we have the catalog located, it's time to create and examine the TDSCatalog object. First we import the object from Siphon, then we input the url to the catalog of data we need.

In [None]:
# import the TDSCatalog class from Siphon for obtaining our data 
from siphon.catalog import TDSCatalog

# Create the TDS Catalog object, satcat
cat = TDSCatalog(url_xml)

This gives us a catalog of the grib2 files we found in the browser. The names of each file are stored in the `datasets` property.

In [None]:
# Print all filenames associated with the catalog
print(cat.datasets)

# Total number of files
print('Total files: ' + str(len(cat.datasets)))

In this example there is only one file referenced within this catalog. We can inspect what data access pathways the TDS and Siphon provide for us.

In [None]:
ds = cat.datasets[0]
ds.access_urls

### NetCDF Subset Service (NCSS)

Our focus for this workshop will be accessing the TDS NetCDF Subset Service (NCSS) via Siphon, which will enable us to generate a NetCDF file with our relevant data variables, spatial subset, and more, regardless of the specific data format behind the scences.

In [None]:
url = 'https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/Global_0p5deg/catalog.xml'
cat = TDSCatalog(url)
cat.datasets

In [None]:
ncss = cat.datasets['Best GFS Half Degree Forecast Time Series'].subset()
# ncss.variables

In [None]:
from datetime import datetime

query = ncss.query()
query.add_lonlat()
query.lonlat_box(west=-130, east=-50, south=10, north=60)
query.time(datetime.utcnow())
query.variables('Temperature_isobaric',
                'Geopotential_height_isobaric',
                'u-component_of_wind_isobaric',
                'v-component_of_wind_isobaric')
query.accept('netcdf4')

nc = ncss.get_data(query)
nc

<div class="alert alert-success">
    <b>EXERCISE</b>: Explore NCSS in the browser
    

Pick up where you left off from the previous exercise! The URL for the catalog we identified before is https://thredds.ucar.edu/thredds/catalog/grib/NCEP/HRRR/CONUS_2p5km_ANA/latest.html if you need it again.
    
Inspect the actual dataset, in this case the `.grib2` file present. On this page, you will see a visual representation of the access URLs we had Siphon display for us above.

* Using these URLs, access the data via **NetcdfSubset** in your browser
* Explore the available variables and select one or more you're interested in
    * Share the name of one of these variables in the chat
* Change the Output Format to `netcdf4`
* **Optional:** **submit** a request to download your custom NetCDF file from the server
    
While here, be sure to notice the variety of vertical coordinates present in the data.
    
</div>

<a name="datastruct"></a>
## Working with xarray

### xarray Primer

Now we have an xarray **Dataset** that we can work with. This is a framework used for organizing multidimensional datasets, such as NetCDF and GRIB. 

<div class="admonition alert alert-warning">
    <p class="admonition-title" style="font-weight:bold">More Info</p>
    You may see the CF (Climate and Forecasting) metadata conventions in many popular atmospheric datasets. These conventions provide standardized variable names and units and recommendations on metadata such as projection information and coordinate information. You can read more about CF conventions here: <a href="cfconventions.org" target="blank">https://cfconventions.org/</a>
</div>

In [None]:
import xarray as xr
from xarray.backends import NetCDF4DataStore

ds = xr.open_dataset(NetCDF4DataStore(nc))

![xarray diagram](https://github.com/pydata/xarray/raw/main/doc/_static/dataset-diagram.png "xarray model diagram")

xarray has an HTML-formatted interactive summary tool for examing datasets. Simply execute the variable name to create the summary. This is a tool we will use often to examine our data throughout this course.  

In [None]:
# Preview xarray DataSet in an HTML-formatted preview
ds

In the preview, we see an interactive summary of the dimensions, coordinates, variables, attributes for the DataSet. Each variable is stored as an xarray [DataArray](https://docs.xarray.dev/en/stable/user-guide/data-structures.html#dataarray). DataArrays carry metadata such as units and projection as well as a numpy-like array of values that MetPy can leverage for calculations and plotting. 

In [None]:
ds['Temperature_isobaric']

In [None]:
temp = ds.Temperature_isobaric
temp

The variable `temp` is now an xarray DataArray that we can interact with. Notice how there are 4 dimensions in this DataArray:
- `time` (length 1)
- `isobaric1` (length 41)
- `lon` (length 101)
- `lat` (length 161)

However, for plotting (and many analyses), we need a 2D array. 

First, we can remove the time dimension using the `squeeze()` method to eliminate any dimensions of length 1.

In [None]:
temp = temp.squeeze()
temp

### xarray with MetPy

xarray provides many pandas-style <a href="https://xarray.pydata.org/en/stable/user-guide/indexing.html" target="blank">indexing methods</a> for selecting data using descriptive labels or coordinate locations. Using MetPy, we can make these smartly unit-aware and select e.g. the 925 hPa level.

In [None]:
# ALL MetPy xarray helpers become
# available with ANY MetPy import
from metpy.units import units

# select vertical level equal to 925 hPa
temp_925 = temp.metpy.sel(vertical=925 * units.hPa)
temp_925

Under the hood, MetPy can identify your relevant coordinates _regardless of their specific names_. This is useful for meteorological data, where data variables might rely on differently named coordinates present within the same dataset!

In [None]:
temp_925.metpy.vertical

In [None]:
temp_925.isobaric1

<div class="alert alert-success">
    <b>EXERCISE</b>: Get 1000 hPa geopotential height
<br><br>    
Create a 2D array of geopotential height at the 1000 hPa (10000 Pa) level.
    
<ol>
     <li>From the <code>ds</code> DataSet, pull the <code>Geopotential_height_isobaric</code> DataArray</li> 
     <li><code>squeeze()</code> out any dimensions of length 1</li>
     <li><code>.sel()</code> the 1000 hPa vertical level</li>
     <li>Write the DataArray to a variable named <code>hgt1000</code>
     <li>Record the first data value of this 2-dimensional array to share.
     <li><b>Optional:</b> Create a simple plot of your result if you're already familiar with Matplotlib.</li>
</ol>

</div>

In [None]:
# YOUR CODE HERE

Let's get back to our 925 hPa Temperature DataArray. Metpy enables for us a variety of shortcuts to explore the units of your data.

In [None]:
temp_925.metpy.convert_units('degC')

Recall unit `quantity` objects from the previous notebook. We can create those automatically from the underlying data.

In [None]:
temp_925.metpy.unit_array

Note that, by default, these unique `quantity` objects are not already present in our DataArrays.

In [None]:
temp_925

However, for some MetPy calculations, we need these to be present _within_ xarray objects. We can do this by _quantifying_ the data!

In [None]:
temp_925_quant = temp_925.metpy.quantify()
temp_925_quant

In [None]:
temp_925_quant.metpy.dequantify()

Finally, the last important piece of functionality we will explore today is making our data more _geographically aware_ using MetPy. This relies on the powerful [Pyproj](https://pyproj4.github.io/pyproj/stable/) and [Cartopy](https://scitools.org.uk/cartopy/docs/latest/) libraries, as well as standardized metadata in compliance with [CF Conventions](https://cfconventions.org).

Using these standardized metadata, we can characterize the data-relevant projection automatically with MetPy's `parse_cf()` xarray method.

In [None]:
ds = ds.metpy.parse_cf().squeeze()
ds

Note the new `metpy_crs` coordinate! MetPy will look for this coordinate in a variety of its calculations, including spatial calculations and cross-sections. Let's take a look at a calculation made smarter by this,

In [None]:
import metpy.calc as mpcalc
mpcalc.advection

In [None]:
temp_850 = ds.Temperature_isobaric.metpy.sel(vertical=850 * units.hPa)
u_850 = ds['u-component_of_wind_isobaric'].metpy.sel(vertical=850 * units.hPa)
v_850 = ds['v-component_of_wind_isobaric'].metpy.sel(vertical=850 * units.hPa)

In [None]:
temp_adv_850 = mpcalc.advection(temp_850, u=u_850, v=v_850)
temp_adv_850

If `metpy_crs` is available after using `parse_cf()`, we can also use a few shortcuts to get us familiar plotting information and more. You might be familiar with creating Cartopy `crs` objects for plotting or transforming data onto maps.

That sure is annoying to specify correctly for every new dataset or project we tackle. Could we make this easier?

See our full [xarray tutorial](https://unidata.github.io/MetPy/latest/tutorials/xarray_tutorial.html) on the documentation for more examples and what to do if your data isn't CF-compliant.

<div class="alert alert-success">
    <b>EXERCISE</b>: Calculating advection of a new variable
    
Recreate the steps we've followed so far to calculate **700 hPa advection of variables of your choosing**.
    
* Create the appropriate `TDSCatalog` to reach our GFS data.
* Query the `Best GFS Half Degree Forecast Time Series` dataset using NCSS as before.
* Request our `u` and `v` winds on their isobaric surfaces again.
* Find one or more new variables _on the same `isobaric` surface_ (**hint**, look at the variable names with `NCSS.variables`) and add those to our `query`. Something like _specific humidity_ could be interesting!
* Use our new `query` to get our NetCDF data from the server.
* Open our NetCDF dataset in xarray using the `NetCDFDataStore`, `squeeze` out any extra dimensions, and `parse_cf` the geographic metadata.
* Finally, calculate advection of one or more new variables on the **700 hPa isobaric level**.
* **Optionally**, plot the resulting calculation.

</div>

In [None]:
# YOUR CODE HERE
url = ''
cat = 

ncss = cat.datasets['Best GFS Half Degree Forecast Time Series'].subset()

query = ncss.query()
query.lonlat_box(west=-130, east=-50, south=10, north=60)
query.time(datetime.utcnow())
query.accept('netcdf4')
# The rest is up to you!