<h2>Finding variable names in THREDDS model datasets via exploring NCSS url</h2>

Motivation:

 - Sometimes when working with different or new model datasets housed on the THREDDS server, I need to look at the specific model and product NCSS html (xml) urls in a browser for the variable name string required when using <code>siphon</code> and it's <code>TDSCatalog</code> and <code>NCSS</code> methods. 

---
 
Example: 

I want to query and grab the MSLP data from the GFS 20km CONUS dataset for the current date with 00Z init hour. The first step is to use NCSS to query the variable(s) (ie MSLP in this case) from the NetcdfSubSet url, then grab the data with <code>NCSS.get_data(query)</code
    
* Problem: Say I don't know the exact name for MSLP in the GFS 20km CONUS dataset (MSLP_Eta_model_reduction_msl) for the <code>NCSS.query()</code>??
        


In [2]:
# Accessing Data from XLM Catalog via Siphon Libraries
from siphon.catalog import TDSCatalog
from siphon.ncss import NCSS

In [3]:
from datetime import datetime, timedelta
now = datetime.utcnow()

## My experimental contributions to the siphon package

In [4]:
# method to get either the new browser with desired model product variable names
from siphon.ncss import open_var_browser

# dictionary that includes all NCSS-access models as keys and products and url extentions
from siphon.ncss import thredds_model_dict

In [5]:
open_var_browser

<function siphon.ncss.open_var_browser(model, prod, datetime_obj, init_hour, open_browser=False)>

In [6]:
thredds_model_dict

{'RAP': {'CONUS_13km': 'RAP/CONUS_13km/RR_CONUS_13km',
  'CONUS_20km': 'RAP/CONUS_20km/RR_CONUS_20km',
  'CONUS_40km': 'RAP/CONUS_40km/RR_CONUS_40km'},
 'GFS': {'0p25_ana': 'GFS/Global_0p25deg_ana/GFS_Global_0p25deg_ana',
  '0p25': 'GFS/Global_0p25deg/GFS_Global_0p25deg',
  '0p5_ana': 'GFS/Global_0p5deg_ana/GFS_Global_0p5deg_ana',
  '0p5': 'GFS/Global_0p5deg/GFS_Global_0p5deg',
  'onedeg_ana': 'GFS/Global_onedeg_ana/GFS_Global_onedeg_ana',
  'onedeg': 'GFS/Global_onedeg/GFS_Global_onedeg',
  'Pac_20km': 'GFS/Pacific_20km/GFS_Pacific_20km',
  'PR_0p25': 'GFS/Puerto_Rico_0p25deg/GFS_Puerto_Rico_0p25deg',
  'CONUS_95km': 'GFS/CONUS_95km/GFS_CONUS_95km',
  'CONUS_80km': 'GFS/CONUS_80km/GFS_CONUS_80km',
  'CONUS_20km': 'GFS/CONUS_20km/GFS_CONUS_20km',
  'AK_20km': 'GFS/Alaska_20km/GFS_Alaska_20km'},
 'HRRR': {'CONUS_3km': 'HRRR/CONUS_3km/surface/HRRR_CONUS_3km',
  'CONUS_2p5km_ana': 'HRRR/CONUS_2p5km_ANA/HRRR_CONUS_2p5km_ana',
  'CONUS_2p5km': 'HRRR/CONUS_2p5km/HRRR_CONUS_2p5km'},
 'GEFS': 

### The top-level keys of the dictionary are the available models

In [7]:
list(thredds_model_dict.keys())

['RAP', 'GFS', 'HRRR', 'GEFS', 'NAM', 'NDFD', 'RTMA', 'NCEP Blend']

## Quickly check out available products and their url extensions for the GFS model

* the next-level keys will be the model products/domains

* values associated with these keys are part of the url needed to access these products
    * see the open_var_browser method to understand more

In [8]:
list(thredds_model_dict["GFS"].keys())

['0p25_ana',
 '0p25',
 '0p5_ana',
 '0p5',
 'onedeg_ana',
 'onedeg',
 'Pac_20km',
 'PR_0p25',
 'CONUS_95km',
 'CONUS_80km',
 'CONUS_20km',
 'AK_20km']

### Let's put these additions to use, first we need to supply a datetime

In [9]:
now = datetime.utcnow()
now = datetime(2020,10,8,0,0)
now, type(now)

(datetime.datetime(2020, 10, 8, 0, 0), datetime.datetime)

### The method can open a browser or not, but it will always return the xlm url to use in querying the data!

* just plug the returned variable to TDSCatalog() method 

In [10]:
catalog_url = open_var_browser(model = "GFS",
                                   prod = "CONUS_20km",
                                   datetime_obj = now,
                                   init_hour = "0000",
                                   open_browser=False)
catalog_url

'https://thredds.ucar.edu/thredds/catalog/grib/NCEP/GFS/CONUS_20km/GFS_CONUS_20km_20201008_0000.grib2/catalog.xml'

<h3> Now supply the <code style="font-size:14px">catalog_url</code> to the <code style="font-size:14px">TDSCatalog method</code></h3>

In [11]:
model = TDSCatalog(catalog_url)
list(model.datasets.values())

[GFS_CONUS_20km_20201008_0000.grib2]

In [12]:
type(model),type(model.datasets.values())

(siphon.catalog.TDSCatalog, odict_values)

In [13]:
dataset = list(model.datasets.values())[0]
list(dataset.access_urls.keys())

['OPENDAP',
 'HTTPServer',
 'WCS',
 'WMS',
 'NetcdfSubset',
 'CdmRemote',
 'NCML',
 'UDDC',
 'ISO']

In [14]:
# Create NCSS object to access the NetcdfSubset
ncss = NCSS(dataset.access_urls['NetcdfSubset'])

In [15]:
# define time range you want the data for
start = now
print(start)
delta_t = 12
end = now + timedelta(hours=delta_t)
print(end)


2020-10-08 00:00:00
2020-10-08 12:00:00


In [16]:
mslp_name = "MSLP????" # What is the name in the catalog?
# .
# .
# .

# check out the thredds_model_dict dictionary and then open_var_browser method


### Say we're instersted in the GFS model, we can search the thredds model dictionary <code style="font-size:14px">thredds_model_dict["GFS"]</code>. 

Remember, the keys of the search will be all the available products for that model

In [17]:
# say we're instersted in the GFS model, we can
list(thredds_model_dict["GFS"].keys())

['0p25_ana',
 '0p25',
 '0p5_ana',
 '0p5',
 'onedeg_ana',
 'onedeg',
 'Pac_20km',
 'PR_0p25',
 'CONUS_95km',
 'CONUS_80km',
 'CONUS_20km',
 'AK_20km']

### Sweet, let's choose a product and run that and the other desired arguments for <code style="font-size:14px">open_var_browser</code> and open a new browser to see what the available variable names needed for NCSS query

* same as before, but this time set open_browser to True

In [None]:
open_var_browser?

In [25]:
current_catalog = open_var_browser(model = "GFS",
                                   prod = "CONUS_20km",
                                   datetime_obj = now,
                                   init_hour = "0000",
                                   open_browser=True)

### Sample image of website brought up

https://thredds.ucar.edu/thredds/ncss/grib/NCEP/GFS/CONUS_20km/GFS_CONUS_20km_20201008_0000.grib2/dataset.html

<img src="thredds_variable_browser_img.png"></img>

### Query variable names and grab data

#### So now that we can search the dataset by the actual variable names needed for the data query, let's plug in the name for the MSLP

In [18]:
query = ncss.query()
query.time_range(start, end)
query.lonlat_box(north=80, south=0, east=310, west=200)
query.accept('netcdf4')

mslp_name = "MSLP_Eta_model_reduction_msl"
query.variables(mslp_name).add_lonlat(True)

# Request data for the variables you want to use
data = ncss.get_data(query)

In [19]:
data

<class 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    Originating_or_generating_Center: US National Weather Service, National Centres for Environmental Prediction (NCEP)
    Originating_or_generating_Subcenter: 0
    GRIB_table_version: 2,1
    Type_of_generating_process: Forecast
    Analysis_or_forecast_generating_process_identifier_defined_by_originating_centre: Analysis from GFS (Global Forecast System)
    Conventions: CF-1.6
    history: Read using CDM IOSP GribCollection v3
    featureType: GRID
    History: Translated to CF-1.0 Conventions by Netcdf-Java CDM (CFGridWriter2)
Original Dataset = /data/ldm/pub/native/grid/NCEP/GFS/CONUS_20km/GFS_CONUS_20km_20201008_0000.grib2.ncx3#LambertConformal_257X369-40p69N-100p4W; Translation Date = 2020-10-09T16:47:19.834Z
    geospatial_lat_min: 13.79481320192154
    geospatial_lat_max: 57.335950219792934
    geospatial_lon_min: -153.0345501621164
    geospatial_lon_max: -49.20529767978462
    dimensions(