# Subsetting STOFS-2D-Global Files
Here we are subsetting and visualizing one forecast data from STOFS-2d-Global data.

To begin, load the environment. `source /nhc/Atieh.Alipour/environment/miniconda3/bin/activate  env_subsetting`

## STOFS-2D-Global Operational Output
https://noaa-gestofs-pds.s3.amazonaws.com/index.html

## Input information

- **Name:** Enter the folder name containing the data you want to subset.
- **Dates:** Specify the dates of the model output you are interested in.
- **Cycles:** Specify the cycles of the model output you are interested in.
- **Regions:** Specify the regions you want to subset the data for.

The current regions include the following:

1) Marianas (around 200,000 people) - 13.1 to 15.5N, 144.5 to 145.9E, includes Guam, Rota, Tinian and Saipan.  Approximate size is 164 miles NS (264 km) and 93 miles EW (150 km).  Total area would be 39,600 square km, and would require 633,600 grid points to reach 250 meter resolution.  This could replace the current Guam domain, if that would help.  We would place it over the northpacific domain to make a unified whole picture and fill our GFE domain.

 

2) Chuuk Lagoon (around 43,700 people) - 6.8 to 7.8 N, 151.3 to 152.2E.  Approximate size is 68 miles NS (109 km) and 61 miles EW (98 km).  Total area would be 10,682 square km, and would require 170,912 grid points to reach 250 meter resolution.

 

3) Majuro (incl. Arno, around 32,900 people) - 6.7 to 7.5 N, 170.8 to 172.1 E.  Approximate size is 55 miles NS (89 km) and 89 miles EW (143 km).  Total area would be 12,727 square km, and would require 203,632 grid points to reach 250 meter resolution.

 

4) Pohnpei (incl. Pakin and Ant, around 31,500 people) - 6.5 to 7.3 N, 157.6 to 158.6 E.  Approximate size is 55 miles NS (89 km) and 68 miles EW (109 km).  Total area would be 9,701 square km, and would require 155,216 grid points to reach 250 meter resolution.

 

5) Palau (incl. Angaur to Kayangel, around 18,000 people) - 6.7 to 8.2 N, 134.0 to 134.9 E.  Approximate size is 102 miles NS (164 km) and 62 miles EW (100 km).  Total area would be 16,400 square km, and would require 262,400 grid points to reach 250 meter resolution.

 

6) Kwajalein Atoll (around 13,700 people) - 8.5 to 9.6 N, 166.6 to 167.9 E.  Approximate size is 75 miles NS (121 km) and 89 miles EW (143 km). Total area would be 17,303 square km, and would require 276,848 grid points to reach 250 meter resolution.

 

7) Kosrae (around 7,300 people) - 5.0 to 5.7 N, 162.7 to 163.3 E.  Approximate size is 48 miles NS (77 km) and 41 miles EW (66 km).  Total area would be 5,082 square km, and would require 81,312 grid points to reach 250 meter resolution.

 

8) Yap (around 6,900 people) - 9.3 to 9.8 N, 137.9 to 138.4 E.  Approximate size is 34 miles NS (55 km) and 34 miles EW (55 km).  Total area would be 3,025 square km, and would require 48,400 grid points to rebelow are somewhat negotiable>

 

9) Satawan (incl. Lukunoch and Etal, about 3,900 people) - 5.2 to 5.7 N, 153.3 to 153.9 E.  Approximate size is 34 miles NS (55 km) and 41 miles EW (66 km).  Total area would be 3,630 square km, and would require 58,080 grid points to reach 250 meter resolution.

 

10) Ailinglaplap (about 1700 people) - 7.1 to 7.8 N, 168.4 to 169.1 E.  Approximate size is 48 miles NS (77 km) and 48 miles EW (77 km).  Total area would be 5,929 square km, and would require 94,864 grid points to reach 250 meter resolution.  If we extended the domain to 8.2N and 167.9E, It would include Namu Atoll, adding about 800 people.  This would increase the approximate size to 75 miles NS (121 km) and 82 miles (132 km).  Total area would then be 15,972 square km, and would require 255,552 grid points to reach 250 meter resolution.

 

11) Jaluit (about 1700 people) - 5.6 to 6.5 N, 169.2 to 169.9 E.  Approximate size is 61 miles NS (98 km) and 48 miles EW (77 km).  Total area would be 7,546 square km, and would require 120,736 grid points to reach 250 meter resolution.  If we extend this domain to 5.5N and 168.0E, we can add Namorik Atoll (around 800 people), that would make the size 68 miles NS (109 km) and 131 miles EW (211 km).  This would increase the total area to 22,999 square km, which would require 367,984 grid points to reach 250 meter resolution.

 

12) Ulithi (about 1000 people) - 9.7 to 10.2 N, 139.4 to 140.0E.  Approximate size is 34 miles NS (55 km) and 41 miles EW (66 km).  Total area would be 3,630 square km, and would require 58,080 grid points to reach 250 meter resolution.  If we extend this domain to 140.6E, it would be 82 miles EW (132 km).  This would increase the total area to 7,260 square kilometers, which would require 116,160 grid points to reach 250 meter resolution.

 

13) Woleai (about 300 people) - 7.2 to 7.5N, 143.7 to 144.0 E.  Approximate size is 20 miles NS (33 km) and 20 miles EW (33 km).  Total area would be 1,089 square km, and would require 17,424 grid points to reach 250 meter resolution.


In [26]:
# STOFS data to subset
name          = 'stofs_2d_glo' 
bucket_name   = 'noaa-gestofs-pds'
dates         = ['20240516', '20240517', '20240518']  # Dates you want to subset data for.
cycles        = ['00', '06', '12', '18']  # Cycles you want to subset data for.
Regions       = [(144.5, 145.9, 13.1, 15.5), (151.3, 152.2, 6.8, 7.8), (170.8, 172.1, 6.7, 7.5),
                 (157.6, 158.6, 6.5, 7.3), (134.0, 134.9, 6.7, 8.2), (166.6, 167.9, 8.5, 9.6),
                 (162.7, 163.3, 5.0, 5.7), (137.9, 138.4, 9.3, 9.8), (153.3, 153.9, 5.2, 5.7),
                 (168.4, 169.1, 7.1, 7.8), (169.2, 169.9, 5.6, 6.5), (139.4, 140.0, 9.7, 10.2),
                 (143.7, 144.0, 7.2, 7.5)] #(longitude_min, longitude_max, latitude_min, latitude_max)
Regions_names = ['Marianas', 'Chuuk Lagoon', 'Majuro', 'Pohnpei', 'Palau', 'Kwajalein Atoll',
                 'Kosrae', 'Yap', 'Satawan', 'Ailinglaplap', 'Jaluit', 'Ulithi', 'Woleai']
STOFS_files   = ['fields.cwl', 'fields.htp', 'fields.swl']

In [32]:
import dask
import geoviews as gv
import holoviews as hv
import numcodecs
import numpy as np
import pandas as pd
import shapely
import xarray as xr
import matplotlib.pyplot as plt
import s3fs  # Importing the s3fs library for accessing S3 buckets
import time  # Importing the time library for recording execution time
import shapely  # Importing shapely for geometric operations 
import thalassa  # Importing thalassa library for STOFS data analysis
from thalassa import api  # Importing thalassa API for data handling
from thalassa import normalization
from thalassa import utils
from holoviews import opts as hvopts
from holoviews import streams
from holoviews.streams import PointerXY
from holoviews.streams import Tap
import bokeh.plotting as bp
import panel as pn
from os.path import exists
import os


In [18]:
def read_netcdf_from_s3(bucket_name, key):
    """
    Function to read a NetCDF file from an S3 bucket using thalassa API.
    
    Parameters:
    - bucket_name: Name of the S3 bucket
    - key: Key/path to the NetCDF file in the bucket
    
    Returns:
    - ds: xarray Dataset containing the NetCDF data
    """
    s3 = s3fs.S3FileSystem(anon=True)
    url = f"s3://{bucket_name}/{key}"
    ds = xr.open_dataset(s3.open(url, 'rb'), drop_variables=['nvel'])
    return ds


In [21]:
def normalize_data(ds, name, cycle, bucket_name, base_key, field_cwl , filename, date):
    """
    Function to modify/normalize a dataset using the Thalassa package.

    Parameters:
    - ds: xarray Dataset containing the data
    - name: folder name 
    - bucket_name: Name of the S3 bucket
    - base_key: Base key for the dataset in the S3 bucket
    - schout: adcirc like file name
    - filename: Original filename to be replaced
    - date: Date string for the new filename
    
    Returns:
    - normalized_ds: Thalassa dataset ready for cropping or visualizing
    """

    if 'element' in ds:
        normalized_ds = thalassa.normalize(ds)
    else:
        
        key = f'{base_key}/{name}.{filename}'
        ds_with_element_key = key.replace(filename,  f't{cycle}z.{field_cwl}.nc')
        ds_with_element = read_netcdf_from_s3(bucket_name, ds_with_element_key)  # Read NetCDF data from S3 bucket

        # Modify the field2d.nc file based on schout_adcirc.nc file
        ds['nele'] = ds_with_element['nele']
        ds['nvertex'] = ds_with_element['nvertex']
        ds['element'] = ds_with_element['element']

        # Normalize data
        normalized_ds = thalassa.normalize(ds)

    return normalized_ds

In [22]:
def subset_thalassa(ds, box):
    """
    Function to subset a thalassa Dataset based on a bounding box using shapely.
    
    Parameters:
    - ds: thalassa Dataset containing the data
    - box: Tuple representing the bounding box (x_min, x_max, y_min, y_max)
    
    Returns:
    - new_ds: Subset of the input dataset within the specified bounding box
    """
    bbox = shapely.box(box[0], box[2], box[1], box[3])  # Create a shapely box from the bounding box coordinates
    new_ds = thalassa.crop(ds, bbox)  # Crop the dataset using the bounding box
    return new_ds

In [30]:
def save_subset_to_netcdf(xarray_ds, output_file):
    """
    Function to save a subset of an xarray Dataset to a NetCDF file.
    
    Parameters:
    - xarray_ds: Subset of the xarray Dataset
    - output_file: Path to save the output NetCDF file
    """
    xarray_ds.to_netcdf(output_file)  # Save the subset to a NetCDF file


## 1. Read and Subset Data on the Fly
The following lines of code read data, normalize it, and subset the data. Alternatively, they read the data from the local machine if the subset data already exists..


In [None]:

for date in dates:
    for cycle in cycles:
         base_key = f'{name}.{date}'
         
         for STOFS_file in STOFS_files:
             filename = f't{cycle}z.{STOFS_file}.nc'
             key = f'{base_key}/{name}.{filename}'
             dataset = read_netcdf_from_s3(bucket_name, key)  # Read NetCDF data from S3 bucket
             normalize_dataset = normalize_data(dataset, name, cycle,  bucket_name, base_key, 'fields.cwl', filename, date)
             rg = 0
             for box in Regions:
                 ds_subset = subset_thalassa(normalize_dataset, box)  # Subset the thalassa dataset
                 
                 # Define the output directory
                 output_dir = f'./{Regions_names[rg]}/{date}'
                 os.makedirs(output_dir, exist_ok=True)  # Create the output directory if it doesn't exist

                 # Save the subset to a NetCDF file
                 output_file = f'{output_dir}/{filename}'                 
                 save_subset_to_netcdf(ds_subset, output_file)  
                 print(output_file)
                 rg = rg+1


./Marianas/20240516/t00z.fields.cwl.nc
./Chuuk Lagoon/20240516/t00z.fields.cwl.nc
./Majuro/20240516/t00z.fields.cwl.nc
./Pohnpei/20240516/t00z.fields.cwl.nc
./Palau/20240516/t00z.fields.cwl.nc
./Kwajalein Atoll/20240516/t00z.fields.cwl.nc
./Kosrae/20240516/t00z.fields.cwl.nc
./Yap/20240516/t00z.fields.cwl.nc
./Satawan/20240516/t00z.fields.cwl.nc
./Ailinglaplap/20240516/t00z.fields.cwl.nc
./Jaluit/20240516/t00z.fields.cwl.nc
./Ulithi/20240516/t00z.fields.cwl.nc
./Woleai/20240516/t00z.fields.cwl.nc
./Marianas/20240516/t00z.fields.htp.nc
./Chuuk Lagoon/20240516/t00z.fields.htp.nc
./Majuro/20240516/t00z.fields.htp.nc
./Pohnpei/20240516/t00z.fields.htp.nc
./Palau/20240516/t00z.fields.htp.nc
./Kwajalein Atoll/20240516/t00z.fields.htp.nc
./Kosrae/20240516/t00z.fields.htp.nc
./Yap/20240516/t00z.fields.htp.nc
./Satawan/20240516/t00z.fields.htp.nc
./Ailinglaplap/20240516/t00z.fields.htp.nc
./Jaluit/20240516/t00z.fields.htp.nc
./Ulithi/20240516/t00z.fields.htp.nc
./Woleai/20240516/t00z.fields.htp