### GEFSv12 Data from the Cloud

This notebook demontrates how to:
* download a single GEFSv12 re-forecast data file from Amazon Web Services (AWS) without an AWS account - https://registry.opendata.aws/noaa-gefs-reforecast/
* how to read it in using `xarray` and plot the data

Note there are many other datasets available via AWS e.g. https://aws.amazon.com/blogs/big-data/visualize-over-200-years-of-global-climate-data-using-amazon-athena-and-amazon-quicksight/

`boto3` and `botocore` are the Python packages used to access data on AWS.

In [None]:
import boto3

from botocore import UNSIGNED
from botocore.client import Config

import numpy as np
import matplotlib.pyplot as plt
import os.path  # implements some useful functions on pathnames, see: https://docs.python.org/3/library/os.path.html
import xarray as xr
import cartopy.crs as ccrs

#import hvplot.xarray

Data on AWS is located in specific locations called `buckets`, so we must specify the bucket we want to look in. 

The prefix is the list of subdirectories in that `bucket`

We got this information from looking through the data on the web interface to find the file we wanted. 

In [None]:
BUCKET = 'noaa-gefs-retrospective'
prefix = 'GEFSv12/reforecast/2000/2000010500/c00/Days:1-10/' 
# year month day hour 
# c00 ensemble member 
# first 10 days on high res grid, rather then out to 35 days on lower res grid

### Define filenames for the file the we want to download

In [None]:
varname = 'pres_msl' # Barometric pressure extrapolated to mean sea level

my_userid = os.path.expanduser('~').split('/')[2]  # Get my userid

remote_file = f'{prefix}{varname}_2000010500_c00.grib2'
local_fname = os.path.basename(remote_file)
local_file = f'/scratch/{my_userid}/{local_fname}'

print(local_file)
print(remote_file)

### Access AWS without having an account 
This will only allow access to open, public datasets.

In [None]:
s3 = boto3.resource(
    's3',
    aws_access_key_id='',
    aws_secret_access_key='',
    config=Config(signature_version=UNSIGNED)
)

### Download the file from AWS

Without an AWS account we can only download the data, we cannot
do any calculations with the data. 

In [None]:
s3.Bucket(BUCKET).download_file(remote_file, local_file)

### Read in downloaded file

In [None]:
ds=xr.open_dataset(local_file,engine='cfgrib',
                   backend_kwargs={'indexpath': ''})
ds['msl']=ds['msl']/100.0  # Convert to hPa
ds

### About the time dimensions...

This is a model forecast field. 
There are data saved across a 10-day forecast period at 3-hour intervals (8 times per day), i.e. out to 240 hours from the initial time. 
The initial state is not included, so there are 80 global 2D arrays, not 81.

It has not one, but two separate time coordinates: `time` and `step` - actually there are three, because `valid_time` is a combination of the other two.
* `time` is the time of the start of the forecast - when the forecast model was initialized. It is a `datetime64` object.
* `step` is the forecast step - the time after the initial time. It is a `timedelta64` object, which behaves differently* than `datetime64`.
* `valid_time` is the time the forecast is valid, i.e. the sum of `time` and `step`. It is a `datetime64` object.

So, we have a few options for referring to various times in this forecast. We could:
1. Refer to sequential steps by their corresponding array indices, from 0 to 79.
2. Refer to the validation date and time, using the `valid_time` coordinate.
3. Refer to the forecast time _delta_ from the initial time, using the `step` coordinate.

All times are in units of $ns$: nanoseconds (billionths of a second). This is the default for timekeeping in `xarray`.

*`timedelta64` has fewer attributes than `datetime64`: we can only refer to `days`, `seconds`, `microseconds` and `nanoseconds` when parsing such a dimension.
The following would all list the seconds part of the coordinate `step`:

`ds.step.dt.seconds`

`ds['step'].dt.seconds`

`ds['step.seconds']`

In [None]:
# Make a list of tuples, each showing the:
#    (index, days, seconds) 
# corresponding to the forecast time step
list(zip(np.arange(len(ds['step'])),ds['step.days'].data,ds['step.seconds'].data))

### Python `datetime` module

`datetime` provides us nice functionality to deal with these peculiar time constructs. 
We can import the `timedelta` and `datetime` classes to give us a more intuitive way to translate times and select forecasts

In [None]:
from datetime import timedelta, datetime
fcst_24 = ds['msl'].sel(step=timedelta(hours=24)) # Choose the 24-hour forecast
fcst_24

In [None]:
fig = plt.figure(figsize=(12,6))
ax=plt.axes(projection=ccrs.PlateCarree())
cs = ax.contourf(fcst_24.longitude,fcst_24.latitude,fcst_24,transform = ccrs.PlateCarree(),cmap='YlGnBu',extend='both') 
ax.coastlines() 
cbar = plt.colorbar(cs,shrink=0.7,aspect=30,orientation='horizontal',label='Sea Level Pressure (hPa)') 
ic_time = fcst_24.time.dt.strftime('%-d %B %Y at %H%M UTC').item()
valid_time = fcst_24.valid_time.dt.strftime('%-d %B %Y at %H%M UTC').item()
plt.title(f"GEFSv12 Reforecast, IC={ic_time}\nValid: {valid_time}") ;

## NOAA uses AWS

NOAA has a contract with Amazon to make many of its operational and archive products openly available to the public via cloud services. These include _in situ_ observations, satellite data, and model products including forecasts. There is a [registry of data](https://registry.opendata.aws/collab/noaa/) that describes all the products available.