# Accessing NOAA Remote Data Files

#### The description of the available data is at:

- [NOAA Geostationary Operational Environmental Satellites (GOES) 16, 17 & 18](https://registry.opendata.aws/noaa-goes/)
- [NOAA GOES on AWS](https://github.com/awslabs/open-data-docs/tree/main/docs/noaa/noaa-goes16)

#### To explore the data:

- [AWS S3 Explorer](https://noaa-goes18.s3.amazonaws.com/index.html)

In [None]:
import warnings
warnings.filterwarnings('ignore')

In [None]:
import requests
import os

In [None]:
from datetime import datetime, timedelta

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt
import cartopy
from cartopy import crs as ccrs
import cartopy.feature as cfeature
import cartopy.io.shapereader as shapereader
from cartopy.mpl.ticker import LongitudeFormatter, LatitudeFormatter

In [None]:
import numpy as np
import netCDF4 as nc4
import xarray as xr

## <font color="blue">Take a Sample File</font>

### Get the File url

In [None]:
file_name_url = "https://noaa-goes17.s3.amazonaws.com/ABI-L2-TPWF/2023/001/14/OR_ABI-L2-TPWF-M6_G17_s20230011400321_e20230011409388_c20230011411032.nc"

You may want to consult: [NOAA GOES on AWS](https://github.com/awslabs/open-data-docs/tree/main/docs/noaa/noaa-goes16)


The url of the file:

```
https://noaa-goes17.s3.amazonaws.com/ABI-L2-TPWF/2023/001/14/OR_ABI-L2-TPWF-M6_G17_s20230011400321_e20230011409388_c20230011411032.nc
```
Can be broken into:


```
<base_url>/<Product>/<Year>/<Day_of_Year>/<Hour>/<Filename>

```
where:


- `base_url`: https://noaa-goes18.s3.amazonaws.com
- `Product`: ABI-L2-TPWF (Advanced Baseline Imager Level 2 Total Precipitable Water Full Disk)
- `Year`: year in the format YYYY
- `Day_of_Year`: Day of the year in the format ddd (1-365)
- `Hour`: the hour the data observation was made
- `Filename`: name of the netCDF-4file containing the data.

The `<filename>` s delineated by underscores '_' and is:
```
OR_ABI-L2-TPWF-M6_G17_s20230011400321_e20230011409388_c20230011411032.nc
```
where:

- `OR`: Operational system real-time data
- `ABI-L2-ACHTF-M6`: is delineated by hyphen '-':
   - `ABI`: is ABI Sensor
   - `L2`: is processing level, L2 data or L1b
   - `TPW`: Total Precipitable Water.
   - `F`: is full disk (normally every 15 minutes), C is continental U.S. (normally every 5 minutes), M1 and M2 is Mesoscale region 1 and region 2 (usually every minute each)
   - `M6`: is mode 6 (scan operation), M4 is mode 4 (only full disk scans every five minutes – no mesoscale or CONUS)
- `G17`: is satellite id for GOES-17
- `s20230011400321`: is start of scan time
   - 4 digit year
   - 3 digit day of year
   - 2 digit hour
   - 2 digit minute
   - 2 digit second
   - 1 digit tenth of second
- `e20230011409388`: is end of scan time
- `c20230011411032`: is netCDF4 file creation time
- `.nc`: is netCDF file extension


### Read the Remote File

1. Use the `requests` module to grab the file
2. Use the `netCDF4` module to read the content of Step 1.
3. Use `Xarray` to store Step 2 output.
4. Use `Xarray` to read Step 3 output as a `Xarray` DataSet.

Let us write a function that does the above steps.

In [None]:
def read_remote_noaa_file(file_name_url):
    """
    Read a remote NOAA file (in a public S3 Bucket)
    using the above steps.
    
    Parameters
    ----------
    file_name_url : str
         Name of the remote file (as a http url) to read
         
    Returns
    -------
    xr_ds : Xarray DataSet
    
    """
    file_name = os.path.basename(file_name_url)
    
    # Step 1:
    resp = requests.get(file_name_url)
    
    # Step 2:
    nc4_ds = nc4.Dataset(file_name, memory = resp.content)
    
    # Step 3:
    store = xr.backends.NetCDF4DataStore(nc4_ds)
    
    # Step 4:
    xr_ds = xr.open_dataset(store)
    
    return xr_ds

In [None]:
xr_ds = read_remote_noaa_file(file_name_url)

In [None]:
xr_ds

Size of the Xarray DataSet:

In [None]:
print(f"{xr_ds.nbytes / (1024*1024*1024)} Gb")

### Date and Time Information

- Each file represents the data collected during one scan sequence for the domain. 
- There are several different time stamps in this file, which are also found in the file's name.

Scan's start time, converted to datetime object:

In [None]:
scan_start = datetime.strptime(xr_ds.time_coverage_start, 
                               "%Y-%m-%dT%H:%M:%S.%fZ")

Scan's end time, converted to datetime object:

In [None]:
scan_end = datetime.strptime(xr_ds.time_coverage_end, 
                             "%Y-%m-%dT%H:%M:%S.%fZ")

File creation time, convert to datetime object:

In [None]:
file_created = datetime.strptime(xr_ds.date_created, 
                                 "%Y-%m-%dT%H:%M:%S.%fZ")

The 't' variable is the scan's midpoint time:

In [None]:
midpoint = str(xr_ds["t"].data)[:-8]
scan_mid = datetime.strptime(midpoint, "%Y-%m-%dT%H:%M:%S.%f")

In [None]:
print(f"Scan Start:    {scan_start}")
print(f"Scan midpoint: {scan_mid}")
print(f"Scan End:      {scan_end}")
print(f"File Created:  {file_created}")
print(f"Scan Duration: {(scan_end - scan_start).seconds / 60:.2f} minutes")

### Do Basic Plots

In [None]:
xr_ds.TPW.plot();

## <font color="blue">Compute the Latitude/Longitude Grid Points</font>

The following document:

[GOES-R Satellite Latitude and Longitude Grid Projection Algorithm](https://makersportal.com/blog/2018/11/25/goes-r-satellite-latitude-and-longitude-grid-projection-algorithm)

explains how to compute the latitude/longitude grid points using the parameters stored in the variable `goes_imager_projection`. The function below uses those parameters, along with the Math formulas, to return a Xarray DataSet with `lat` and `lon` as Xarray coordinates.

In [None]:
def compute_latlon_grid_points(ds):
    """
    Calculate the latitude and longitude grid points
    and add them as Xarray coordinates.
    
    Parameters
    ----------
    ds : Xarray DataSet
    
    Returns
    -------
    ds : Xarray DataSet
         Contains lat and lon as coordinates.
    """
    x = ds.x
    y = ds.y
    goes_imager_projection = ds.goes_imager_projection
    
    x,y = np.meshgrid(x,y)
    
    r_eq = goes_imager_projection.attrs["semi_major_axis"]
    r_pol = goes_imager_projection.attrs["semi_minor_axis"]
    l_0 = goes_imager_projection.attrs["longitude_of_projection_origin"] * (np.pi/180)
    h_sat = goes_imager_projection.attrs["perspective_point_height"]
    H = r_eq + h_sat
    
    a = np.sin(x)**2 + (np.cos(x)**2 * (np.cos(y)**2 + (r_eq**2 / r_pol**2) * np.sin(y)**2))
    b = -2 * H * np.cos(x) * np.cos(y)
    c = H**2 - r_eq**2
    
    r_s = (-b - np.sqrt(b**2 - 4*a*c))/(2*a)
    
    s_x = r_s * np.cos(x) * np.cos(y)
    s_y = -r_s * np.sin(x)
    s_z = r_s * np.cos(x) * np.sin(y)
    
    lat = np.arctan((r_eq**2 / r_pol**2) * (s_z / np.sqrt((H-s_x)**2 +s_y**2))) * (180/np.pi)
    lon = (l_0 - np.arctan(s_y / (H-s_x))) * (180/np.pi)
    
    ds = ds.assign_coords({
        "lat":(["y","x"],lat),
        "lon":(["y","x"],lon)
    })
    ds.lat.attrs["units"] = "degrees_north"
    ds.lon.attrs["units"] = "degrees_east"
    
    return ds

In [None]:
xr_ds = compute_latlon_grid_points(xr_ds)
xr_ds

In [None]:
def get_xy_from_latlon(ds, lats, lons):
    lat1, lat2 = lats
    lon1, lon2 = lons

    lat = ds.lat.data
    lon = ds.lon.data
    
    x = ds.x.data
    y = ds.y.data
    
    x,y = np.meshgrid(x,y)
    
    x = x[(lat >= lat1) & (lat <= lat2) & (lon >= lon1) & (lon <= lon2)]
    y = y[(lat >= lat1) & (lat <= lat2) & (lon >= lon1) & (lon <= lon2)] 
    
    return ((min(x), max(x)), (min(y), max(y)))

## <font color="blue">Perform a Contour Plot</font>

#### Define the native geostationary map projection

In [None]:
projection_variables = xr_ds['goes_imager_projection']
satellite_height = projection_variables.perspective_point_height
semi_major_axis = projection_variables.semi_major_axis
semi_minor_axis = projection_variables.semi_minor_axis
central_longitude = projection_variables.longitude_of_projection_origin

In [None]:
globe = ccrs.Globe(semimajor_axis=semi_major_axis, 
                   semiminor_axis=semi_minor_axis)
geo_projection = ccrs.Geostationary(central_longitude=central_longitude, 
                                    satellite_height=satellite_height,
                                    globe=globe, sweep_axis='x')

#### Do the plot

In [None]:
map_projection = geo_projection
data_transform = ccrs.PlateCarree()

In [None]:
contour_interval = 5
contours_array = np.arange(int(np.nanmin(xr_ds.TPW.values)),
                             int(np.nanmax(xr_ds.TPW.values))+1.0, 
                             contour_interval)

In [None]:
fig = plt.figure(figsize=(8, 10))
ax = plt.axes(projection=map_projection)

cmap = plt.get_cmap('viridis').with_extremes(over='darkred')

map_extend_geos = ax.get_extent(crs=map_projection)

im = ax.contourf(xr_ds.lon.values, xr_ds.lat.values,
                 xr_ds.TPW.values, 
                 contours_array, 
                 extent=map_extend_geos,
                 cmap=cmap,
                 zorder=2,
                 transform_first=True, 
                 transform=data_transform) 
                
# Map features
ax.add_feature(cfeature.COASTLINE, linewidth=0.95, zorder=3)
ax.add_feature(cfeature.BORDERS, linewidth=0.5, zorder=3)
ax.add_feature(cfeature.LAKES, facecolor='lightgrey')
ax.add_feature(cfeature.STATES, linewidth=0.25, zorder=3)
ax.add_feature(cfeature.LAND, facecolor='grey')
ax.add_feature(cfeature.OCEAN, facecolor='lightgrey')

# Colorbar
units = xr_ds.TPW.attrs['units']
long_name = xr_ds.TPW.attrs['long_name']
cbar = fig.colorbar(im, ax=ax,  orientation="vertical", shrink=0.65)
cbar.ax.tick_params(labelsize=15)
cbar.set_label(f"{long_name} \n {units}", labelpad=+1)

## <font color="blue"> Exercise</font>

Go to the webpage:

[https://noaa-goes18.s3.amazonaws.com/index.html](https://noaa-goes18.s3.amazonaws.com/index.html)

- Select a file in any of the collection.
- Open the file and read its content into an Xarray DataSet.
- Select a field and do a contour plot using Cartopy.