# Using Siphon to Download METAR Data

In this series, we work on some simpler tasks:
1. Making a line plot using matplotlib
2. **Downloading a time-series of data from a THREDDS server**
3. Plotting the data using matplotlib

## Finding the METAR Data

METAR are a standard form of surface observation, where data are coded into a text format. Luckily for us, there are already decoded METAR observations available to us on [Unidata's THREDDS server](http://thredds.ucar.edu/thredds/). If we surf there, we see 
"Observation Data", which sound like what we're looking for; from there, we can click on "Metar Station Data". This page offers "files", which are individual netCDF files containing the data, or the "Feature Collection", which aggregates these files together. We want this latter link, which allows us to access the entire collection of data; this permits accessing multiple files as one logical dataset.

We'll be using Unidata's Python library for talking to THREDDS, [Siphon](http://siphon.readthedocs.org), to access this information in a way that makes it easy to program. So we start by importing the `TDSCatalog` class from `siphon` and giving it the URL to the page (catalog) to which we just navigated.

**Note:** Instead of giving it the link to the HTML catalog, we change the extension to XML, which asks the TDS for the XML version of the catalog. This is much better to work with in code.

In [1]:
from siphon.catalog import TDSCatalog

catalog = TDSCatalog('http://thredds.ucar.edu/thredds/catalog/nws/metar/ncdecoded/catalog.xml?'
                     'dataset=nws/metar/ncdecoded/Metar_Station_Data_fc.cdmr')

From this `catalog` we want to get to the METAR dataset. We can see the datasets in the catalog by looking at the `datasets` attribute on `catalog`. This is a Python dictionary, mapping the name of the dataset to a Python `Dataset` object (which came from more XML supplied by the TDS — notice a theme?) Since this is a dictionary, we can look at a list of the keys:

In [2]:
list(catalog.datasets)

['Feature Collection']

As expected, there's only a single dataset, the Feature Collection we're interested in. We can grab the `Dataset` instance for the feature collection and store in `fc` with:

In [3]:
fc = list(catalog.datasets.values())[0]

## Downloading the Data

To download the data, we first look at what access methods are available for this dataset; to do this, we look at the `access_urls` attribute of `fc`, which is a Python dictionary mapping the type of access to the corresponding URL:

In [4]:
list(fc.access_urls)

['NetcdfSubset']

This means the feature collection is only accessible using the [netCDF Subset Service (NCSS)](https://www.unidata.ucar.edu/software/thredds/current/tds/reference/NetcdfSubsetServiceReference.html). NCSS is a TDS web-service that allows downloading subsets of datasets using lat/lon or projection points (or bounding boxes) and date ranges. It's similar to OPenDAP except that you don't need to figure out what indices of data you need. You can access this service using an [HTML form](http://thredds.ucar.edu/thredds/ncss/nws/metar/ncdecoded/Metar_Station_Data_fc.cdmr/dataset.html) for the dataset. This requires human input; what we want is programmatic access.

Fortunately, Siphon provides code to make accessing NCSS painless. The first step is to import the `NCSS` class from siphon and point it to the NCSS access URL on the feature collection.

In [5]:
from siphon.ncss import NCSS
ncss = NCSS(fc.access_urls['NetcdfSubset'])

This handles setting up access to that NCSS URL, as well as downloading and parsing various pieces of XML metadata. For instance, if we want to see what variables are available in this dataset, we can look at the `variables` attribute on `ncss`:

In [6]:
ncss.variables

{'air_pressure_at_sea_level',
 'air_temperature',
 'cloud_area_fraction',
 'dew_point_temperature',
 'hectoPascal_ALTIM',
 'high_cloud_area_fraction',
 'high_cloud_base_altitude',
 'inches_ALTIM',
 'low_cloud_area_fraction',
 'low_cloud_base_altitude',
 'middle_cloud_area_fraction',
 'middle_cloud_base_altitude',
 'numChildren',
 'precipitation_amount_24',
 'precipitation_amount_hourly',
 'report',
 'report_id',
 'report_length',
 'snowfall_amount',
 'snowfall_amount_last_hour',
 'visibility_in_air',
 'visibility_in_air_direction',
 'visibility_in_air_surface',
 'visibility_in_air_vertical',
 'weather',
 'wind_from_direction',
 'wind_from_direction_max',
 'wind_from_direction_min',
 'wind_gust',
 'wind_peak_from_direction',
 'wind_peak_speed',
 'wind_peak_time',
 'wind_speed',
 'xfields'}

So now let's make a request for data. For this case we'll get the latest observation for my location, asking for the air_temperature, wind direction, and wind speed. The first step is to ask the NCSS client to make a new query for us. This `query` object is used to assemble all of the parameters we'll be sending to the server in order to make our request.

In [7]:
query = ncss.query()

The first step is to set the query to ask for data for the station closest to our lon/lat point (mine is -105W, 40N):

In [15]:
query.lonlat_point(-104.6, 39.9)

var=air_temperature&var=wind_from_direction&var=wind_speed&time=2016-05-17T22%3A22%3A02.800345&latitude=39.9&longitude=-104.6&accept=netcdf4

Notice that we can also see the string representation of the query, which is shown as the standard URL string used to make the request to the web server. The next step is to state what time we want. We'll use Python's `datetime` standard library module to easily get the current time:

In [16]:
from datetime import datetime
query.time(datetime.utcnow())

var=air_temperature&var=wind_from_direction&var=wind_speed&time=2016-05-17T22%3A22%3A26.788581&latitude=39.9&longitude=-104.6&accept=netcdf4

Next, we add the variables for which we'd like data, using the names listed above:

In [17]:
query.variables('air_temperature', 'wind_from_direction', 'wind_speed')

var=air_temperature&var=wind_speed&var=wind_from_direction&time=2016-05-17T22%3A22%3A26.788581&latitude=39.9&longitude=-104.6&accept=netcdf4

Lastly, we ask the TDS to return data in netCDF4 format:

In [18]:
query.accept('netcdf4')

var=air_temperature&var=wind_speed&var=wind_from_direction&time=2016-05-17T22%3A22%3A26.788581&latitude=39.9&longitude=-104.6&accept=netcdf4

We can also check that we haven't mis-typed any variable names by using the `validate_query` method on the NCSS client, which checks (as much as it can) whether what we're asking for makes sense

In [19]:
ncss.validate_query(query)

True

All that's left now is to get the data:

In [20]:
data = ncss.get_data(query)

In [21]:
data

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4_CLASSIC data model, file format HDF5):
    Conventions: CF-1.6
    history: Written by CFPointWriter
    title: Extracted data from TDS Feature Collection Metar Station Data
    featureType: timeSeries
    time_coverage_start: 2016-05-17T21:53:00Z
    time_coverage_end: 2016-05-17T21:53:00Z
    geospatial_lat_min: 39.8694989319
    geospatial_lat_max: 39.8704989319
    geospatial_lon_min: -104.670498169
    geospatial_lon_max: -104.669498169
    dimensions(sizes): obs(1), station(1), station_description_strlen(24), wmo_id_strlen(5), station_id_strlen(3)
    variables(dimensions): float64 [4mlatitude[0m(station), float64 [4mlongitude[0m(station), float64 [4mstationAltitude[0m(station), |S1 [4mstation_id[0m(station,station_id_strlen), |S1 [4mstation_description[0m(station,station_description_strlen), |S1 [4mwmo_id[0m(station,wmo_id_strlen), float64 [4mtime[0m(obs), int32 [4mstationIndex[0m(obs), int32 [4mwind_from_di

## Working With the Data

Siphon takes care of opening the binary blob of data that comes from the server, using the excellent [netcdf4-python](https://unidata.github.io/netcdf4-python/) library. This gives us an easy way to see what variables are present in the data returned:

In [24]:
list(data.variables)

['latitude',
 'longitude',
 'stationAltitude',
 'station_id',
 'station_description',
 'wmo_id',
 'time',
 'stationIndex',
 'wind_from_direction',
 'wind_speed',
 'air_temperature']

So we see in the data we got back not only the variable we asked for, but some useful metadata about the station the data came from. For instance, to get the station id we can do:

In [26]:
station_id = data['station_id'][:].tostring()
station_id

b'DEN'

or the description:

In [29]:
data['station_description'][:].tostring()

b'DENVER INTNL ARPT, CO US'

More importantly, let's see what the current conditions are:

In [31]:
data['air_temperature'][0], data['wind_from_direction'][0], data['wind_speed'][0]

(8.0, 80, 2.057776)

To truly make sense of that, we should probably also look at the units:

In [32]:
data['air_temperature'].units, data['wind_from_direction'].units, data['wind_speed'].units

('Celsius', 'degrees', 'm/s')

## Other Queries
What if we wanted to get a bunch of stations in area instead of a single station? We just need to tweak the `query` and give it a box instead:

In [34]:
query.lonlat_box(west=-106, east=-104, south=39, north=41)

var=air_temperature&var=wind_speed&var=wind_from_direction&time=2016-05-17T22%3A22%3A26.788581&east=-104&west=-106&south=39&north=41&accept=netcdf4

The `query` object is smart enough to replace the previous spatial query (for a single point) but keeps the rest of what we've asked for. We can now pass this to the NCSS client and get back much more data:

In [35]:
data = ncss.get_data(query)

Now we can look at all of the air temperature observations in that area:

In [52]:
data['air_temperature'][:]

array([-4.        ,  0.        ,  8.19999981,  8.        ,  8.        ,
        9.        ,  5.        ], dtype=float32)

## Conclusion

Next time, we'll wrap up the series by creating a meteogram using MetPy, Siphon, and matplotlib all together. For more information on what was covered today, we suggest looking at:

- Siphon's [documentation](http://siphon.readthedocs.io)
- Unidata's THREDDS [server](http://thredds.ucar.edu/thredds/) has more datasets you can explore

For more of Unidata's work in Python, see:
- [Unidata Notebook Gallery](http://github.com/Unidata/blog-notebooks) ([View Here](http://nbviewer.jupyter.org/github/unidata/notebook-gallery/tree/master/blog-notebooks))
- [Notebooks](http://github.com/Unidata/unidata-python-workshop) from Unidata's Annual Python Training Workshop

Was this too much detail? Too slow? Just right? Do you have suggestions on other topics or examples we should cover? Do you have a notebook you would like us to show off? We'd love to have your feedback. You can send a message to the (python-users AT unidata.ucar.edu) mailing list or send a message to support-python AT unidata.ucar.edu. You can also leave a comment below, directly on the blog post.