**Brian Blaylock**  
*February 8, 2021*

# 🛎 Download HRRR from different sources
When you download HRRR files, you may specify the source you wish to download the file from. 

Download sources include:
- [University of Utah's Pando archive](http://home.chpc.utah.edu/~u0553130/Brian_Blaylock/hrrr_FAQ.html)
- [Amazon Web Services](https://registry.opendata.aws/noaa-hrrr-pds/)
- [Google Cloud Platform](https://console.cloud.google.com/marketplace/product/noaa-public/hrrr)
- [NOMADS (for most recent and last day)](http://www.nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/)

In [1]:
from hrrrb.archive import download_hrrr, base_url

The download source is defined in `hrrrb/archive.py` by the **base_url** variable. The dictionary order defines the source priority. By default, it attempts to downlod the file from NOMADS first, then Google Cloud Platform, then the University of Utah Pando archive, then Amazon Web Services, then an alternative gateway on Pando (pando2). If the requested file isn't available at the first source, then it will try the second, and so on. 

> The order of the default priority might change. The .idx files weren't available on AWS at first, but it looks like they have been backfilled.

In [2]:
base_url

{'nomads': 'https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod',
 'google': 'https://storage.googleapis.com/high-resolution-rapid-refresh',
 'pando': 'https://pando-rgw01.chpc.utah.edu',
 'aws': 'https://noaa-hrrr-bdp-pds.s3.amazonaws.com',
 'pando2': 'https://pando-rgw02.chpc.utah.edu'}

---
For example, when you download a file from the last few hours, it will download the file from NOMADS.

In [3]:
from datetime import datetime, timedelta

hours_ago = datetime.utcnow() - timedelta(hours=6)
download_hrrr(hours_ago)

💡 Info: Downloading [1] GRIB2 files

✅ Success! Downloaded from [nomads] https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20210208/conus/hrrr.t16z.wrfsfcf00.grib2 --> /p/home/blaylock/data/hrrr/20210208/20210208_hrrr.t16z.wrfsfcf00.grib2
🚛💨 Download Progress: [1/1 completed] >> Est. Time Remaining 0:00:00         


🍦 Finished 🍦  Time spent on download: 0:00:06.061477


(PosixPath('/p/home/blaylock/data/hrrr/20210208/20210208_hrrr.t16z.wrfsfcf00.grib2'),
 'https://nomads.ncep.noaa.gov/pub/data/nccf/com/hrrr/prod/hrrr.20210208/conus/hrrr.t16z.wrfsfcf00.grib2')

---
But if we download a file from more than a day ago, it downloads from Google.

In [4]:
download_hrrr('2020-02-01')

💡 Info: Downloading [1] GRIB2 files

✅ Success! Downloaded from [google] https://storage.googleapis.com/high-resolution-rapid-refresh/hrrr.20200201/conus/hrrr.t00z.wrfsfcf00.grib2 --> /p/home/blaylock/data/hrrr/20200201/20200201_hrrr.t00z.wrfsfcf00.grib2
🚛💨 Download Progress: [1/1 completed] >> Est. Time Remaining 0:00:00         


🍦 Finished 🍦  Time spent on download: 0:00:02.521976


(PosixPath('/p/home/blaylock/data/hrrr/20200201/20200201_hrrr.t00z.wrfsfcf00.grib2'),
 'https://storage.googleapis.com/high-resolution-rapid-refresh/hrrr.20200201/conus/hrrr.t00z.wrfsfcf00.grib2')

---
We can change the download source priority with the `download_source_priority` argument. Here we download from Amazon Web Services instead. The source priority is a list of sources in the base url in the order we want to download from. 

In [5]:
download_hrrr('2020-02-01', download_source_priority=['aws'])

💡 Info: Downloading [1] GRIB2 files

✅ Success! Downloaded from [aws] https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20200201/conus/hrrr.t00z.wrfsfcf00.grib2 --> /p/home/blaylock/data/hrrr/20200201/20200201_hrrr.t00z.wrfsfcf00.grib2
🚛💨 Download Progress: [1/1 completed] >> Est. Time Remaining 0:00:00         


🍦 Finished 🍦  Time spent on download: 0:00:02.052446


(PosixPath('/p/home/blaylock/data/hrrr/20200201/20200201_hrrr.t00z.wrfsfcf00.grib2'),
 'https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20200201/conus/hrrr.t00z.wrfsfcf00.grib2')

however, if the file doesn't exist at one source, and we only specified one, it wont' know where to get the file. For example, native files do not exist on the Pando HRRR archive.

In [6]:
download_hrrr('2020-02-01', field='nat', download_source_priority=['pando'])

💡 Info: Downloading [1] GRIB2 files


❌ Could not download file for [hrrr] [nat]. Tried to get the following:
    1: https://pando-rgw01.chpc.utah.edu/hrrr/nat/20200201/hrrr.t00z.wrfnatf00.grib2

🚛💨 Download Progress: [1/1 completed] >> Est. Time Remaining 0:00:00         


🍦 Finished 🍦  Time spent on download: 0:00:00.541425


(array([], dtype=float64), array([], dtype=float64))

but if we add another source, it will use the next source as a fall-back. In this example, the file doesn't exist on Pando, so it was downloaded from AWS.

In [7]:
download_hrrr('2020-02-01', field='nat', download_source_priority=['pando', 'aws'])

💡 Info: Downloading [1] GRIB2 files

✅ Success! Downloaded from [aws] https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20200201/conus/hrrr.t00z.wrfnatf00.grib2 --> /p/home/blaylock/data/hrrr/20200201/20200201_hrrr.t00z.wrfnatf00.grib2
🚛💨 Download Progress: [1/1 completed] >> Est. Time Remaining 0:00:00         


🍦 Finished 🍦  Time spent on download: 0:00:22.985365


(PosixPath('/p/home/blaylock/data/hrrr/20200201/20200201_hrrr.t00z.wrfnatf00.grib2'),
 'https://noaa-hrrr-bdp-pds.s3.amazonaws.com/hrrr.20200201/conus/hrrr.t00z.wrfnatf00.grib2')