**Brian Blaylock**  
**June 24, 2020**  
COVID-19 Era

# 🏗 Demo: How to download a bunch of HRRR grib2 files.
HRRR grib2 files can be downloaded from the [University of Utah's HRRR archive](http://hrrr.chpc.utah.edu/) on the CHPC Pando archive system. You may also get HRRR grib2 files from the [NOAA Operational Model Archive and Distribution System (NOMADS)](https://nomads.ncep.noaa.gov/), but only for the today's and yesterday's runs.

---

🌐 HRRR Archive Website: http://hrrr.chpc.utah.edu/  
📧 Brian Blaylock blaylockbk@gmail.com  
✒ Citation this details:
> Blaylock B., J. Horel and S. Liston, 2017: Cloud Archiving and Data Mining of High Resolution Rapid Refresh Model Output. Computers and Geosciences. 109, 43-50. https://doi.org/10.1016/j.cageo.2017.08.005

---

Let's start by importing some modules we will use...

In [1]:
import os
from datetime import datetime, timedelta

import numpy as np
import urllib.request  # Used to download the file
import requests        # Used to check if a URL exists
import warnings

import pandas as pd    # Just used for the date_range function

This next cell is rather lengthy, but it includes the `download_HRRR` function, which we will use to download HRRR files from the University of Utah archive or from NOMADS. Read the document string to understand all the options.

In [2]:
def reporthook(a, b, c):
    """
    Report download progress in megabytes (prints progress to screen).
    
    Parameters
    ----------
    a : Chunk number
    b : Maximum chunk size
    c : Total size of the download
    """
    chunk_progress = a * b / c * 100
    total_size_MB =  c / 1000000.
    print(f"\r Download Progress: {chunk_progress:.2f}% of {total_size_MB:.1f} MB\r", end='')

def download_HRRR(DATES, fxx=range(0, 1), model='hrrr', field='sfc', 
                  SOURCE='pando', SAVEDIR='./', dryrun=False):
    """
    Downloads full HRRR grib2 files for a list of dates and forecasts.
    
    Files are downloaded from the University of Utah HRRR archive (Pando) 
    or NOAA Operational Model Archive and Distribution System (NOMADS).
    
    Parameters
    ----------
    DATES : datetime or list of datetimes
        A datetime or list of datetimes that represent the model 
        initialization time for which you want to download.
    fxx : int or list of ints
        Forecast lead time or list of forecast lead times to download.
        Default only grabs analysis hour (f00), but you might want all
        the forecasts hours, in that case, you could set ``fxx=range(0,19)``.
    model : {'hrrr', 'hrrrak', 'hrrrX'}
        The model type you want to download.
        - 'hrrr' HRRR Contiguous United States (operational)
        - 'hrrrak' HRRR Alaska. You can also use 'alaska' as an alias.
        - 'hrrrX' HRRR *experimental*
    field : {'prs', 'sfc', 'nat', 'subh'}
        Variable fields you wish to download. 
        - 'sfc' surface fields
        - 'prs' pressure fields
        - 'nat' native fields      ('nat' files are not available on Pando)
        - 'subh' subhourly fields  ('subh' files are not available on Pando)
    SOURCE : {'pando', 'nomads'}
        Specify the source from which to download the HRRR files.
        - 'pando' downloads HRRR files from University of Utah archive:
        http://hrrr.chpc.utah.edu/        
        - 'nomads' downloads HRRR files from NCEP NOMADS server:
        https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/
    SAVEDIR : str
        Directory path to save the downloaded HRRR files.
    dryrun : bool
        If True, instead of downloading the files, it will print out the
        files that could be downloaded. This is set to False by default.

    Returns
    -------
    Downloads the HRRR files, with filename prepended with the run date
    (i.e. `20170101_hrrr.t00z.wrfsfcf00.grib2`)
    """
    
    #**************************************************************************
    ## Check function input
    #**************************************************************************
    
    # Ping Pando for the first. This *might* prevent a "bad handshake" error.
    if SOURCE == 'pando':
        try:
            requests.head('https://pando-rgw01.chpc.utah.edu/')
        except:
            print('bad handshake...am I able to on?')
            pass
    
    # Force the `SOURCE` and `field` input string to be lower case.
    SOURCE = SOURCE.lower()
    field = field.lower()

    # `DATES` and `fxx` should be a list-like object, but if it doesn't have
    # length, (like if the user requests a single date or forecast hour),
    # then turn it item into a list-like object.
    if not hasattr(DATES, '__len__'): DATES = np.array([DATES])
    if not hasattr(fxx, '__len__'): fxx = [fxx]
    
    # HRRR data on NOMADS is only available for today's and yesterday's runs.
    # If any of the DATES are older than yesterday, raise a warning and
    # change SOURCE to pando.
    if SOURCE == 'nomads':
        yesterday = datetime.utcnow() - timedelta(days=1)
        yesterday = datetime(yesterday.year, yesterday.month, yesterday.day)
        if any(DATES < yesterday):
            warnings.warn("Changed the SOURCE to 'pando' because one or more of the requested DATES are for more than two days ago.")
            SOURCE = 'pando'
    
    # The user may set `model='alaska'` as an alias for 'hrrrak'.
    if model.lower() == 'alaska': model = 'hrrrak'
      
    _SOURCE = {'pando', 'nomads'}
    assert SOURCE in _SOURCE, f'`SOURCE` must be one of {_SOURCE}'
    
    # The model type and field depends on the SOURCE the files are downloaded.
    if SOURCE == 'pando':
        _models = {'hrrr', 'hrrrak', 'hrrrX'}
        _fields = {'sfc', 'prs'}
    elif SOURCE == 'nomads':
        _models = {'hrrr', 'hrrrak'}
        _fields = {'sfc', 'prs', 'nat', 'subh'}
        
    assert model in _models, f'`model` should be set to one of {_models} for `SOURCE={SOURCE}`'
    assert field in _fields, f'`field` should be set to one of {_fields} for `SOURCE={SOURCE}`'
    
    # Make SAVEDIR if path doesn't exist
    if not os.path.exists(SAVEDIR):
        os.makedirs(SAVEDIR)
        print(f'Created directory: {SAVEDIR}')

    #**************************************************************************
    # Build the URL path for every file we want
    #**************************************************************************
    # An example URL for a file from Pando is 
    # https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200624/hrrr.t01z.wrfsfcf17.grib2
    # 
    # An example URL for a file from NOMADS is
    # https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20200624/conus/hrrr.t00z.wrfsfcf09.grib2
    
    # `base_url`    : The first part of the URL path
    # `file_list`   : A list of full URL paths to each file (one for each 
    #                 forecast hour requested)
    # `file_rename` : A list of names the files will be renamed. It prepends
    #                 the original file name with the run date YYYYMMDD.
        
    if SOURCE == 'pando':
        base = f'https://pando-rgw01.chpc.utah.edu/{model}/{field}'
        URL_list = [f'{base}/{DATE:%Y%m%d}/{model}.t{DATE:%H}z.wrf{field}f{f:02d}.grib2' for DATE in DATES for f in fxx]
    
    elif SOURCE == 'nomads':
        base = 'https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod'
        if model == 'hrrr':
            URL_list = [f'{base}/hrrr.{DATE:%Y%m%d}/conus/hrrr.t{DATE:%H}z.wrf{field}f{f:02d}.grib2' for DATE in DATES for f in fxx]
        elif model == 'hrrrak':
            URL_list = [f'{base}/hrrr.{DATE:%Y%m%d}/alaska/hrrr.t{DATE:%H}z.wrf{field}f{f:02d}.ak.grib2' for DATE in DATES for f in fxx]
    
    #**************************************************************************
    # Ok, so we have a URL and filename for each requested forecast hour.
    # Now we need to check if each of those files exist, and if it does,
    # we will download that file to the SAVEDIR location.
    
    for file_URL in URL_list:
        # We want to prepend the filename with the run date, YYYYMMDD
        if SOURCE == 'pando':
            rename = '_'.join(file_URL.split('/')[-2:])
        elif SOURCE == 'nomads':
            rename = file_URL.split('/')[-3][5:] + '_' + file_URL.split('/')[-1]
        
        # Check if the URL returns a status code of 200 (meaning the URL is ok)
        # Also check that the Content-Length is >1000000 bytes (if it's smaller,
        # the file on the server might be incomplete)
        head = requests.head(file_URL)
        
        check_exists = head.ok
        check_content = int(head.raw.info()['Content-Length']) > 1000000
        
        if check_exists and check_content:
            # Download the file
            if dryrun:
                print(f'🌵 Dry Run Success! Would have downloaded {file_URL} as {SAVEDIR+rename}')
            else:
                urllib.request.urlretrieve(file_URL, SAVEDIR+rename, reporthook)
                print(f'✅ Success! Downloaded {file_URL} as {SAVEDIR+rename}')
        else:
            # The URL request is bad. If status code == 404, the URL does not exist.
            print()
            print(f'❌ WARNING: Status code {head.status_code}: {head.reason}. Content-Length: {int(head.raw.info()["Content-Length"]):,} bytes')
            print(f'❌ Could not download {head.url}')
    
    print("\nFinished 🍦")

## Examples...
Ok, now that you have the `download_HRRR` function, we need to tell it what we want to download.

Let's start with a range of dates. We imported the Pandas module just because I really like the `date_range` function.

> Note: These dates refer to the model's *initialization* time.

We use python's standard `datetime` module to define a start date and end date. Then we use pandas `date_range` function to create a list of dates. We set the freq to be `freq='1H'` becuase we know the HRRR model runs every hour. If you don't want every hour of HRRR data, you could change the frequency (e.g., '3H' would create a list of dates for every 3 hours between the start and end time).

In [3]:
# Set the start and end date for the HRRR files we want to download
sDATE = datetime(2020, 4, 24)
eDATE = datetime(2020, 4, 24, 3)

# Create a list of datetimes we want to download with Pandas `date_range` function.
# The HRRR model is run every hour, so make a list of every hour
DATES = pd.date_range(sDATE, eDATE, freq='1H')
DATES

DatetimeIndex(['2020-04-24 00:00:00', '2020-04-24 01:00:00',
               '2020-04-24 02:00:00', '2020-04-24 03:00:00'],
              dtype='datetime64[ns]', freq='H')

The other thing you might need to specify is which forecast hours to download. By default, the function will only download the analyis (F00, zero-hour forecast).
I like to use `fxx` as my variable for a list of forecast hours, becuase to me it looks like **F00**, **F02**, **F12**, etc.

Here are some examples:

|Code|Output
|---|---
|`fxx = range(0, 1)`| [0] *this is the function's default*
|`fxx = range(0, 19)`| [0, 1, 2, 3, ..., 18]
|`fxx = range(0, 19, 3)`| [0, 3, 6, 9, 12, 15, 18]
|`fxx = [2, 6, 7, 12]`| [2, 6, 7, 12] *of course, you can make your own list*

For this example, lets just download **F00** and **F03**.

In [4]:
fxx = range(0, 4, 3)
list(fxx)

[0, 3]

Now let's call the `download_HRRR` function with our specified DATES and forecasts hours. 

In [5]:
download_HRRR(DATES, fxx)

✅ Success! Downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t00z.wrfsfcf00.grib2 as ./20200424_hrrr.t00z.wrfsfcf00.grib2
✅ Success! Downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t00z.wrfsfcf03.grib2 as ./20200424_hrrr.t00z.wrfsfcf03.grib2
✅ Success! Downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t01z.wrfsfcf00.grib2 as ./20200424_hrrr.t01z.wrfsfcf00.grib2
✅ Success! Downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t01z.wrfsfcf03.grib2 as ./20200424_hrrr.t01z.wrfsfcf03.grib2
✅ Success! Downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t02z.wrfsfcf00.grib2 as ./20200424_hrrr.t02z.wrfsfcf00.grib2
✅ Success! Downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t02z.wrfsfcf03.grib2 as ./20200424_hrrr.t02z.wrfsfcf03.grib2
✅ Success! Downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t03z.wrfsfcf00.grib2 as ./20200424_hrrr.t03z.wrfsfcf00.grib2
✅ Success! Downloade

That just downloaded the HRRR files into the current working directory.

We can specify the directory we want to save the files to with the `SAVEDIR` argument. If the directory path doesn't exist, then it will be created. And one more thing...we can do a "dry run" of the download, meaning we go through the motions of the function, but skip the actual download. This will show how the download will work. Let's turn on the "dry run" for this next test. 

In [6]:
download_HRRR(DATES, fxx, SAVEDIR='./putInThisDir/', dryrun=True)

🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t00z.wrfsfcf00.grib2 as ./putInThisDir/20200424_hrrr.t00z.wrfsfcf00.grib2
🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t00z.wrfsfcf03.grib2 as ./putInThisDir/20200424_hrrr.t00z.wrfsfcf03.grib2
🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t01z.wrfsfcf00.grib2 as ./putInThisDir/20200424_hrrr.t01z.wrfsfcf00.grib2
🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t01z.wrfsfcf03.grib2 as ./putInThisDir/20200424_hrrr.t01z.wrfsfcf03.grib2
🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t02z.wrfsfcf00.grib2 as ./putInThisDir/20200424_hrrr.t02z.wrfsfcf00.grib2
🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrr/sfc/20200424/hrrr.t02z.wrfsfcf03.grib2 as ./putInThisDir/202

## More options

The useage above is what most people want from the HRRR archive, but there are three other options we can change.

**`model=`**
- `'hrrr'`Download the operational HRRR for the contiguous 48 states. *This is the default.*
- `'hrrrak'` or `'alaska'` Download the operational HRRR-Alaska domain.
- `'hrrrX'` Download the experimental HRRR. **Not available from NOMADS but some analyses are on Pando.**

**`field=`**
- `'sfc'` Download the surface fields. *This is the default.*
- `'prs'` Download the pressure fields
- `'nat'` Download the native fields. **Not available on Pando**
- `'subh'` Download the subhourly fields. **Not available on Pando**

**`SOURCE=`**
- `'pando'` Download files from the University of Utah's Pando archive system. *This is the default.*
- `'nomads'` Download files from NOMADS server. Files are only available for today and yesterday.

What does it look like when we download Alaska grids from NOMADS?

In [7]:
download_HRRR(DATES, fxx, model='alaska', SOURCE='nomads', dryrun=True)



🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrrak/sfc/20200424/hrrrak.t00z.wrfsfcf00.grib2 as ./20200424_hrrrak.t00z.wrfsfcf00.grib2
🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrrak/sfc/20200424/hrrrak.t00z.wrfsfcf03.grib2 as ./20200424_hrrrak.t00z.wrfsfcf03.grib2

❌ Could not download https://pando-rgw01.chpc.utah.edu/hrrrak/sfc/20200424/hrrrak.t01z.wrfsfcf00.grib2

❌ Could not download https://pando-rgw01.chpc.utah.edu/hrrrak/sfc/20200424/hrrrak.t01z.wrfsfcf03.grib2

❌ Could not download https://pando-rgw01.chpc.utah.edu/hrrrak/sfc/20200424/hrrrak.t02z.wrfsfcf00.grib2

❌ Could not download https://pando-rgw01.chpc.utah.edu/hrrrak/sfc/20200424/hrrrak.t02z.wrfsfcf03.grib2
🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrrak/sfc/20200424/hrrrak.t03z.wrfsfcf00.grib2 as ./20200424_hrrrak.t03z.wrfsfcf00.grib2
🌵 Dry Run Success! Would have downloaded https://pando-rgw01.chpc.utah.edu/hrrrak/sfc/202

A few things about that above example...
1. There is a **UserWarning** that says the `SOURCE` was changed to download from 'pando'. That is becuase the run DATES we requested are older than two days and are not available on NOMADS.
2. A printed **WARNING** tells us the requested URL could not be found for a few of our requested files. That is becuase the HRRR-Alaska model only runs at 00z, 03z, 06z, 12z, 15z, 18z, and 21z. It does not run hourly like the HRRR model. When retrieveing Alaska files, you should set your date_range with a 3-hour interval (e.g. `DATES = pd.date_range(sDATE, eDATE, freq='3H')`).
3. Remember that we ran this with `dryrun=True`, meaning we didn't actually download the files, but it told us what it would have downloaded and where it would have saved the file.

> If you get `WARNING: Status code 404`, you might want to check that the file exists. One way to do that on Pando is to use the [interactive web download interface](home.chpc.utah.edu/~u0553130/Brian_Blaylock/cgi-bin/hrrr_download.cgi) and check if the file you are trying to download is available. You might want to 

Let's request the F00-F19 forecasts from a single run from yesterday and do a "dryrun" to download files from NOMADS.

In [8]:
yesterday = datetime.utcnow() - timedelta(days=1)

download_HRRR(yesterday, fxx=range(0,19), model='hrrr', SOURCE='nomads', dryrun=True)

🌵 Dry Run Success! Would have downloaded https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20200624/conus/hrrr.t02z.wrfsfcf00.grib2 as ./20200624_hrrr.t02z.wrfsfcf00.grib2
🌵 Dry Run Success! Would have downloaded https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20200624/conus/hrrr.t02z.wrfsfcf01.grib2 as ./20200624_hrrr.t02z.wrfsfcf01.grib2
🌵 Dry Run Success! Would have downloaded https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20200624/conus/hrrr.t02z.wrfsfcf02.grib2 as ./20200624_hrrr.t02z.wrfsfcf02.grib2
🌵 Dry Run Success! Would have downloaded https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20200624/conus/hrrr.t02z.wrfsfcf03.grib2 as ./20200624_hrrr.t02z.wrfsfcf03.grib2
🌵 Dry Run Success! Would have downloaded https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20200624/conus/hrrr.t02z.wrfsfcf04.grib2 as ./20200624_hrrr.t02z.wrfsfcf04.grib2
🌵 Dry Run Success! Would have downloaded https://nomads.ncep.noaa.gov/pub/data/n

*And there you have it, a fancy function to help you download a bunch of HRRR grib2 files from the University of Utah's HRRR archive on Pando and NOMADS.