# UFS Weather Model: Repository Tag Versions to Datasets

__Introduction:__

Each unique code release version of NOAA's Unified Forecast System (UFS) weather model repository points to the datasets for which its code release version utilizes. These datasets include the input data required by the UFS weather model and the baseline data required for performing the regression tests of various UFS applications. The names of the datasets are timestamped and hard-coded into each unique code release version framework (aka "Github Tags"). This script will provide users/developers of the UFS weather model repository and its associated repositories a more user friendly way of determining which input and baseline datasets should be utilized for a given code release version of the UFS weather model repository. 


__Purpose:__

The purpose of this script is to extract the unique names of the timestamped input and baseline datasets (e.g. __input_data_YYYYMMDD__) for a given UFS weather model repository's code release version. To avoid users/developers from searching for the hard-coded timestamped names of the datasets within the code for a given UFS weather model repository release version, the extraction and mapping of the timestamped datasets to UFS weather model code release version will provide a more user friendly way for users/ developers to determine which dataset is being utilized for a given code release version. 

__Capabilities:__

This script will be able to perform the following actions:

- As release versions of the UFS weather model repository are being pushed out onto Github, this script will be able to extract each UFS weather model repository release version's name and access the scripts pointing to their corresponding timestamped input and baseline datasets directly from Github.

- Provide an information regarding a given Github repository such as:
    - Repository's name
    - Repository's description
    - Repository's created date.
    - Repository's number of tags
    - Repository's list of release versions/tags
    - Repository's number of forks
    - Repository's programming language
    - Repository's number of stars

- Provide a map/dictionary/table of the most recent UFS code release versions/tags to their corresponding timestamped input and baseline datasets.
        

__Future Capabilities:__

- With additional features added to this script, the script can be utilized for mapping the release versions of the Short Range Weather (SRW) model and Mid-Range Weather (MRW) model repository to their respective timestamped datasets. 

__Prerequisites:__

- This script utilizes the Github API. The Github API will cap a user from making N number of requests from its Github server within a narrow window of 1-2 hours, thus it is advise that the user of this script logs in with thier Github credentials when using the Github API to maximize the number of requests they can make from the Github server.


__Version:__
- Draft as of 02/23/22

__Reference(s):__

- N/A

In [1]:
from github import Github
import requests
import time
import os
import pandas as pd 
import pickle
import numpy as np
import re

class get_tag2data():
    """
        Scrapes & maps UFS weather model release version on Github to their corresponding required datasets.
    
    """
    def __init__(self, github_user = "ufs-community"):
        
        # Github login credentials if logging into Github API
        self.github_user = github_user
        
        # Name of Github user sourcing the repository of interest.
        self.gh_user = self.get_user_login()
        
        # List of Github user's repositories of interest.
        self.user_repos = self.get_user_repos()
        self.ufs_model_repo = self.user_repos['ufs-community/ufs-weather-model']
        self.srw_repo = self.user_repos['ufs-community/ufs-srweather-app']
        self.mrw_repo = self.user_repos['ufs-community/ufs-mrweather-app']

    def get_user_login(self):
        """
        Instantiate github user login.
        
        Args: 
            None
        
        Return (object): Instantiated Github user's login object.
        
        """
        
        # If using Github login credentials.
        #user=str(<Your Github User Account Name>)
        #p=str(<Your Github User Account Credential>)
        #gh = Github(user,p)
        
        # If not using Github login credentials.
        gh = Github()
        gh_user = gh.get_user(self.github_user)
        
        return gh_user 
    
    def get_user_repos(self):
        """
        Extract repositories of Github user sourcing the repositories of interest.
        
        Args: 
            None
            
        Return (dict): Dictionary of Github user's repositories.
        
        """
        
        # Map repository name to repository location.
        user_repos = {}
        for repo in self.gh_user.get_repos():
            user_repos[repo.full_name] = repo
            
        return user_repos
        
    def get_repo_details(self, repo):
        """
        Extracts a repository's details.
        
        Args:
            repo (object): Instantiated Github user's login object.
            
        Return (dict): Details of github repository.
        
        """
        repo_details = {}
        
        # Repository name & details.
        repo_details['Name'] = repo.full_name
        repo_details['Description'] = repo.description 
        
        # Repository release version/tag name
        tags_list = []
        tag_names = []
        for tag in ufs_model_repo.get_tags():
            tags_list.append(tag)
            tag_names.append(tag.name)
        repo_details['Num_Tags'] = len(tags_list)
        repo_details['Tag_Names'] = tag_names
        #repo_details['Tags'] = tags_list
        
        # Repository created date
        repo_details['Date_Created'] = repo.created_at
        
        # Repository last git push date.
        repo_details['Date_Last_Push'] = repo.pushed_at
        
        # Repository webpage (if present).
        repo_details['Webpage'] = repo.homepage
        
        # Repository programming language.
        repo_details['Description'] = repo.language
        
        # Repository total forks.
        repo_details['Num_forks'] = repo.forks
        
        # Repository total stars.
        repo_details['Num_stars'] = repo.stargazers_count     
        
        return repo_details
    
    def get_file_content(self, repo, file_dir, tag_name):
        """
        Extracts details of a given repository release version's script or file of interest.
        
        Args:
            repo (object): Instantiated Github user's login object.
            file_dir (str): File directory (relative to repo) of the file of interest.
            tag_name (str): Tag name of interest.
            
        Return (dict): Details of a given repository release version's script or file of interest.
        
        This method is called by the "get_tag2data(repo, repo_details)" method.
        
        """
        contents = repo.get_contents(file_dir, ref = tag_name)
        byte_list = contents.decoded_content.split(b'\n')
        str_txt = [byte_txt.decode("utf-8") for byte_txt in byte_list]
        
        return str_txt
    
    def get_tag2data(self, repo, repo_details):
        """
        Extracts repository release version's dataset for the UFS weather model repositories.
        
        **TODO: Create for SRW and MRW. Currenly, have dataset version extraction for RT.**
        
        Args:
            repo (object): Instantiated Github user's login object.
            repo_details (dict): Details of github repository.
            
        Return (list, dict): Text of github repository's script containing the filenames
        of the input datasets used. For example, for the UFS weather model's RT framework,
        rt.sh sets the baseline and input datasets utilized for its framework. Dictionary of 
        text mentioning dataset version per tag.
        
        This method will call the "get_tag2data(repo, repo_details)" method to extract the contents 
        of a requested file from a repository's release version.
        
        """
        
        # Extract repository's name.
        repo_type = os.path.split(ufs_model_details['Name'])[1]
                                  
        # Locate text containing dataset version per release version of UFS weather model repository.
        tag_file_text = {}
        if repo_type == 'ufs-weather-model':     
            
            # File of interest which points to baseline & input datasets
            file_dir = "tests/rt.sh"
            tag2dataset = {"BL_DATE":{},
                           "RTPWD":{},
                           "INPUTDATA_ROOT": {},
                           "INPUTDATA_ROOT_WW3":{},
                           "INPUTDATA_ROOT_BMIC": {}}
            
            # Extract content of the file of interest for a given release version.
            for tag_name in repo_details['Tag_Names']:
                #time.sleep(15)
                script_content = self.get_file_content(repo, file_dir, tag_name)
                tag_file_text[tag_name] = script_content
                                  
            for tag_name, script_body in tag_file_text.items():
                
                # Extract baseline datasets' name pointed by release version.
                tag2dataset["BL_DATE"][tag_name] = [match for match in script_body if "BL_DATE" in match]
                tag2dataset["RTPWD"][tag_name] = [match for match in script_body if "RTPWD=${RTPWD:" in match]
                
                # Extract input datasets' name pointed by release version.
                tag2dataset["INPUTDATA_ROOT"][tag_name] = [match for match in script_body if "INPUTDATA_ROOT=${INPUTDATA_ROOT:" in match]
                tag2dataset["INPUTDATA_ROOT_WW3"][tag_name] = [match for match in script_body if "INPUTDATA_ROOT_WW3=${INPUTDATA_ROOT}" in match]
                tag2dataset["INPUTDATA_ROOT_BMIC"][tag_name] = [match for match in script_body if "INPUTDATA_ROOT_BMIC=${INPUTDATA_ROOT_BMIC:" in match]
                
        return tag_file_text, tag2dataset

    def preprocess_tag2data(self, raw_tag2dataset, framework_name):
        """
        Preprocess raw text of the tags to dataset names.

        Args:
            raw_tag2dataset (dict): Dictionary of raw text mentioning dataset version per tag.
            framework_name (str): Framework name (e.g. rt, srw, mrw) representing raw text 
                                  mentioning dataset version per tag.

        Return (dict): Dictionaries mapping each UFS weather model's release version to 
        their corresponding baseline and input dataset names.
        
        """
        
        # === Preprocesses for baseline dataset version. ===
        # Dictionary of directory path of baseline timestamp utilized for given tag version.
        baseline_tuples_dict = {}

        # Dictionary of release versions' baseline datasets.
        baseline_versions = {}

        # Dictionary of directory paths to baseline datasets w/ baseline timestamp name.
        baseline_data_dict = {}

        # Dictionary of baseline dataset timestamps. 
        baseline_data_dates = {}

        # Detect all RTPWD mentions with data directory.
        # Note: BL_DATE may not be mentioned in a release ver. rather in some cases date is factored into RTPWD.
        # Depending on the tag version, rt framework may have two different datasets that it will accept -- platform specified.
        for k,v_list in raw_tag2dataset['RTPWD'].items():
            baseline_dirs = []
            baseline_dates = []
            for line in v_list:
                partition_line = line.lstrip().split('/')
                if not any(value in line for value in ('^^')) and k not in baseline_data_dict:

                    # Extract list of baseline directory paths.
                    baseline_dirs.append(partition_line[-2] + '/' + partition_line[-1].replace('}',''))

                    # Extract list of baseline dataset timestamps.
                    baseline_dates.append(partition_line[-1].replace('}','').replace('{','').replace('$',''))

                else:

                    # Extract list of baseline directory paths.
                    baseline_dirs.append(partition_line[-3] + '/' + partition_line[-2] + '/' + partition_line[-1])

                    # Extract list of baseline dataset timestamps.
                    baseline_dates.append(partition_line[-2].replace('}','').replace('{','').replace('$',''))

            # List of RTPWD dataset versions called for given tag version.(paths)
            # Note: *** TODO: Checked if Hera /scratch1/NCEPDEV/nems/emc.nemspara/RT/ contains data version.
            # Data set version for tag does not exist on Hera -- keep unique path to note this.****
            baseline_tuples_dict[framework_name, k, 'BL_INPUT'] = list(set(baseline_dirs))

            # List of RTPWD dataset versions called for given tag version.(dates)
            # Note: *** TODO: Checked if Hera /scratch1/NCEPDEV/nems/emc.nemspara/RT/ contains data version.
            # Data set version for tag does not exist on Hera -- keep unique path to note this.****
            baseline_versions[framework_name, k, 'BL_INPUT'] = list(set(baseline_dates))

        # Detect all BL_DATE mentions with data directory. Writes over tags' value for which contains BL_DATE.
        for k,v_list in raw_tag2dataset['BL_DATE'].items():
            for line in v_list:
                if any(value in line for value in ('^^')):
                    partition_line = line.lstrip().split('/')
                    model_name = line.split('/')[-3]

                if not any(value in line for value in ('$')):
                    partition_line = line.split('=')
                    model_name = 'NEMSfv3gfs' # TODO: Return back to extract modelname of dataset

                    # Dataset path
                    baseline_tuples_dict[framework_name, k, 'BL_INPUT'] = model_name + '/develop-' + partition_line[-1]

                    # Data version
                    baseline_versions[framework_name, k, 'BL_INPUT'] = ['develop-' + partition_line[-1]]

        # Nested baseline data dictionary (paths).            
        for (framework, tag_name, input_type), data_version in baseline_tuples_dict.items():
            baseline_data_dict.setdefault(framework, {}).setdefault(tag_name, {})[input_type] = data_version

        # Nested baseline data dictionary (dates)       
        for (framework, tag_name, input_type), data_version in baseline_versions.items():
            baseline_data_dates.setdefault(framework, {}).setdefault(tag_name, {})[input_type] = data_version

        # === Preprocesses for input dataset version. ===
        # Dictionary of directory path of input timestamp utilized for given tag version.
        input_tuples_dict = {}
        
        # Dictionary of directory paths to input datasets w/ input timestamp name.
        input_data_dict = {}
        
        # Dictionary of release versions' input datasets.
        input_versions = {}
        
        # Dictionary of input dataset timestamps. 
        input_data_dates = {}

        # Detect all INPUTDATA_ROOT mentions with data directory.
        # Note: There are three different variables defined to distinguish the input data categories (e.g. "main" root, WW3, BMIC)
        for k,v_list in raw_tag2dataset['INPUTDATA_ROOT'].items():
            for line in v_list:
                partition_line = line.split('/')
                if partition_line[-1] =='}':

                    # Update dict w/ input datasets' directory path.
                    input_tuples_dict[framework_name, k, 'INPUTDATA_ROOT'] = partition_line[-3] + '/' + partition_line[-2].replace('}','')

                    # Update dict w/ input datasets' timestamp.
                    input_versions[framework_name, k, 'INPUTDATA_ROOT'] = partition_line[-2].replace('}','')

                else:

                    # Update dict w/ input datasets' directory path.
                    input_tuples_dict[framework_name, k, 'INPUTDATA_ROOT'] = partition_line[-2] + '/' + partition_line[-1].replace('}','')

                    # Update dict w/ input datasets' timestamp.
                    input_versions[framework_name, k, 'INPUTDATA_ROOT'] = partition_line[-1].replace('}','')

        # Detect all INPUTDATA_ROOT_WW3 mentions with data directory.
        for k,v_list in raw_tag2dataset['INPUTDATA_ROOT_WW3'].items():
            for line in v_list:
                partition_line = line.split('/')

                # Dataset path.
                input_tuples_dict[framework_name, k, 'INPUTDATA_ROOT_WW3'] = partition_line[-1]

                # Data version.
                input_versions[framework_name, k, 'INPUTDATA_ROOT_WW3'] = partition_line[-1]

        # Detect all INPUTDATA_ROOT_BMIC mentions with data directory.
        for k,v_list in raw_tag2dataset['INPUTDATA_ROOT_BMIC'].items():
            for line in v_list:
                partition_line = line.split('/')

                # Dataset path.
                input_tuples_dict[framework_name, k, 'INPUTDATA_ROOT_BMIC'] = partition_line[-1].replace('}','')

                # Data version.
                input_versions[framework_name, k, 'INPUTDATA_ROOT_BMIC'] = partition_line[-1].replace('}','')

        # Generate nested dictionary. Update w/ input datasets' directory paths.
        for (framework, tag_name, input_type), data_version in input_tuples_dict.items():
            input_data_dict.setdefault(framework, {}).setdefault(tag_name, {})[input_type] = data_version

        # Generate newsted dictionary. Update w/ input datasets' timestamp dates.
        for (framework, tag_name, input_type), data_version in input_versions.items():
            input_data_dates.setdefault(framework, {}).setdefault(tag_name, {})[input_type] = data_version

        # Save baseline & input data details per release version maps to pickle files.
        self.save2pickle(baseline_data_dict, f'{framework_name}_baseline_data')
        self.save2pickle(input_data_dict, f'{framework_name}_input_data')
        self.save2pickle(baseline_data_dates, f'{framework_name}_baseline_data_dates')
        self.save2pickle(input_data_dates, f'{framework_name}_input_data_dates')
        
        return baseline_data_dict, input_data_dict, baseline_data_dates, input_data_dates

    def save2pickle(self, data2save, fn):
        """
        Save data to pickle file.
        
        Args:
            data2save (dict, str, tuple, list, pd.DataFrame): Data to save.
            fn (str): Filename for pickle file. 
        
        Return : None
        
        """
        with open(fn + '.pkl', 'wb') as file:
            pickle.dump(data2save, file)
            
        return
    
    def read_pickle(self, fn):
        """
        Read data from pickle file.
        
        Args:
            fn (str): Filename of pickle file. 
        
        Return (dict, str, tuple, list, pd.DataFrame): Pickle file's content. 
        
        """
        with open(fn + '.pkl', 'rb') as file:
            data = pickle.load(file)
            
        return data

# Demo

## Github Repositories of Interest.

| Repo. Description | GitHub Location |
| :- | :- |
| __Unified Forecast System (UFS) Model Repository__ | _ufs-community/ufs-weather-model_|
| __Short-Range Weather (SRW) Model Repository__ | _ufs-community/ufs-srweather-app_ |
| __Medium_Range Weather (MRW) Model Repository__ | _ufs-community/ufs-mrweather-app_ |

# Instantiate Wrapper

In [2]:
# Demo.
tag2data_wrapper = get_tag2data()

# UFS weather model repository.
ufs_model_repo = tag2data_wrapper.ufs_model_repo
ufs_model_details = tag2data_wrapper.get_repo_details(ufs_model_repo)

# Locating dataset version per release version of UFS weather model repository.
tag_file_text, raw_tag2dataset = tag2data_wrapper.get_tag2data(ufs_model_repo, ufs_model_details)

# Preprocess raw text of the tags to determine their unique dataset names.
framework_name = 'rt'
baseline_data_dict, input_data_dict, baseline_data_dates, input_data_dates = tag2data_wrapper.preprocess_tag2data(raw_tag2dataset, framework_name)

# Mapped Timestamped Baseline Dataset to UFS Repository Code Version/Tag

In [3]:
# For baseline dataset paths:
# baseline_data_dict['rt']

# For baseline dataset timestamps:
bl2tag_dict = baseline_data_dates['rt']
bl2tag_df = pd.DataFrame(bl2tag_dict).T
bl2tag_df[['BL_INPUT1', 'BL_INPUT2']] = pd.DataFrame(bl2tag_df.BL_INPUT.tolist(), index=bl2tag_df.index)
bl2tag_df

Unnamed: 0,BL_INPUT,BL_INPUT1,BL_INPUT2
ufs-v2.0.0,[ufs-public-release-v2-20210212],ufs-public-release-v2-20210212,
ufs-v1.1.0,[ufs-public-release-20200728],ufs-public-release-20200728,
ufs-v1.0.0,[ufs-public-release-20200224],ufs-public-release-20200224,
release/P8a,[develop-20211222],develop-20211222,
release/P7c,[develop-20210820],develop-20210820,
datm_mom6_cice6_cmeps,[develop-20201215],develop-20201215,
ccpp_ipd_comparison,[develop-20200923],develop-20200923,
Prototype-6.0beta,[develop-20210217],develop-20210217,
GFSv16_CCPP,[develop-20201214],develop-20201214,
GFS.v16.2.0,[develop-20200626],develop-20200626,


# Mapped Timestamped Input Dataset to UFS Repository Code Version/Tag

In [5]:
# For input dataset paths:
#input_data_dict['rt']

# For input dataset timestamps:
input2tag_dict = input_data_dates['rt']
input2tag_df = pd.DataFrame(input2tag_dict).T
input2tag_df 

Unnamed: 0,INPUTDATA_ROOT,INPUTDATA_ROOT_WW3,INPUTDATA_ROOT_BMIC
release/P8a,input-data-20211210,WW3_input_data_20211113,BM_IC-20210717
release/P7c,input-data-20210717,WW3_input_data_20210621,BM_IC-20210717
datm_mom6_cice6_cmeps,input-data-20201201,,
Prototype-6.0beta,input-data-20210212,WW3_input_data_20201220,
GFSv16_CCPP,input-data-20201201,,


# Evaluate Input & Baseline Paths Saved as Pickle Files.

In [6]:
rt_baseline = tag2data_wrapper.read_pickle('rt_baseline_data')
rt_input = tag2data_wrapper.read_pickle('rt_input_data')

#### Map of UFS Version Code to Baseline Dataset 'BL_INPUT' Paths

In [7]:
rt_baseline['rt']

{'ufs-v2.0.0': {'BL_INPUT': ['NEMSfv3gfs/ufs-public-release-v2-20210212/${COMPILER^^}}',
   'NEMSfv3gfs/ufs-public-release-v2-20210212']},
 'ufs-v1.1.0': {'BL_INPUT': ['NEMSfv3gfs/ufs-public-release-20200728',
   'RTPWD=${RTPWD:-$DISKNM/ufs-public-release-20200728/${COMPILER^^}}']},
 'ufs-v1.0.0': {'BL_INPUT': ['NEMSfv3gfs/ufs-public-release-20200224',
   'RTPWD=${RTPWD:-$DISKNM/ufs-public-release-20200224/${COMPILER^^}}']},
 'release/P8a': {'BL_INPUT': 'NEMSfv3gfs/develop-20211222'},
 'release/P7c': {'BL_INPUT': 'NEMSfv3gfs/develop-20210820'},
 'datm_mom6_cice6_cmeps': {'BL_INPUT': ['NEMSfv3gfs/develop-20201215',
   'NEMSfv3gfs/develop-20201215/${RT_COMPILER^^}}']},
 'ccpp_ipd_comparison': {'BL_INPUT': ['NEMSfv3gfs/develop-20200923',
   'NEMSfv3gfs/develop-20200923/${RT_COMPILER^^}}']},
 'Prototype-6.0beta': {'BL_INPUT': ['NEMSfv3gfs/develop-20210217',
   'NEMSfv3gfs/develop-20210217/${RT_COMPILER^^}}']},
 'GFSv16_CCPP': {'BL_INPUT': ['NEMSfv3gfs/develop-20201214',
   'NEMSfv3gfs/deve

#### Map of UFS Version Code to Baseline Dataset 'RTPWD' Path

In [8]:
raw_tag2dataset['RTPWD']

{'ufs-v2.0.0': ['  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/ufs-public-release-v2-20210212/${COMPILER^^}}',
  '  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/ufs-public-release-v2-20210212}'],
 'ufs-v1.1.0': ['  RTPWD=${RTPWD:-$DISKNM/ufs-public-release-20200728/${COMPILER^^}}',
  '  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/ufs-public-release-20200728}'],
 'ufs-v1.0.0': ['  RTPWD=${RTPWD:-$DISKNM/ufs-public-release-20200224/${COMPILER^^}}',
  '  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/ufs-public-release-20200224}'],
 'release/P8a': ['  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/develop-${BL_DATE}/${RT_COMPILER^^}}',
  '  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/develop-${BL_DATE}}'],
 'release/P7c': ['  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/develop-${BL_DATE}/${RT_COMPILER^^}}',
  '  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/develop-${BL_DATE}}'],
 'datm_mom6_cice6_cmeps': ['  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/develop-20201215/${RT_COMPILER^^}}',
  '  RTPWD=${RTPWD:-$DISKNM/NEMSfv3gfs/develop-20201215}'],
 'ccpp_ipd_comparison': ['  RTPWD=${RT

#### Map of UFS Version Code to Input Dataset 'INPUTDATA_ROOT', 'INPUTDATA_ROOT_WW3', 'INPUTDATA_ROOT_BMIC' Path

In [9]:
rt_input['rt'] 

{'release/P8a': {'INPUTDATA_ROOT': 'NEMSfv3gfs/input-data-20211210',
  'INPUTDATA_ROOT_WW3': 'WW3_input_data_20211113',
  'INPUTDATA_ROOT_BMIC': 'BM_IC-20210717'},
 'release/P7c': {'INPUTDATA_ROOT': 'NEMSfv3gfs/input-data-20210717',
  'INPUTDATA_ROOT_WW3': 'WW3_input_data_20210621',
  'INPUTDATA_ROOT_BMIC': 'BM_IC-20210717'},
 'datm_mom6_cice6_cmeps': {'INPUTDATA_ROOT': 'NEMSfv3gfs/input-data-20201201'},
 'Prototype-6.0beta': {'INPUTDATA_ROOT': 'NEMSfv3gfs/input-data-20210212',
  'INPUTDATA_ROOT_WW3': 'WW3_input_data_20201220'},
 'GFSv16_CCPP': {'INPUTDATA_ROOT': 'NEMSfv3gfs/input-data-20201201'}}

#### Map of UFS Version Code to Input Dataset 'INPUTDATA_ROOT' Path

In [10]:
raw_tag2dataset['INPUTDATA_ROOT']

{'ufs-v2.0.0': [],
 'ufs-v1.1.0': [],
 'ufs-v1.0.0': [],
 'release/P8a': ['INPUTDATA_ROOT=${INPUTDATA_ROOT:-$DISKNM/NEMSfv3gfs/input-data-20211210}'],
 'release/P7c': ['INPUTDATA_ROOT=${INPUTDATA_ROOT:-$DISKNM/NEMSfv3gfs/input-data-20210717}'],
 'datm_mom6_cice6_cmeps': ['INPUTDATA_ROOT=${INPUTDATA_ROOT:-$DISKNM/NEMSfv3gfs/input-data-20201201/}'],
 'ccpp_ipd_comparison': [],
 'Prototype-6.0beta': ['INPUTDATA_ROOT=${INPUTDATA_ROOT:-$DISKNM/NEMSfv3gfs/input-data-20210212}'],
 'GFSv16_CCPP': ['INPUTDATA_ROOT=${INPUTDATA_ROOT:-$DISKNM/NEMSfv3gfs/input-data-20201201/}'],
 'GFS.v16.2.0': [],
 'GFS.v16.0.17': [],
 'GFS.v16.0.16': [],
 'GFS.v16.0.15': [],
 'GFS.v16.0.14': [],
 'GFS.v16.0.13': [],
 'GFS.v16.0.12': [],
 'GFS.v16.0.11': [],
 'GFS.v16.0.10': [],
 'GFS.v16.0.9': [],
 'GFS.v16.0.8': [],
 'GFS.v16.0.7': [],
 'GFS.v16.0.6': [],
 'GFS.v16.0.5': [],
 'GFS.v16.0.4': [],
 'GFS.v16.0.3': [],
 'GFS.v16.0.2': [],
 'GFS.v16.0.1': [],
 'GFS.v16.0.0': [],
 'GFS_v15.2.1': [],
 'GFS_v15.2.0': [],

#### Map of UFS Version Code to Input Dataset 'INPUTDATA_ROOT_WW3' Path

In [11]:
raw_tag2dataset['INPUTDATA_ROOT_WW3']

{'ufs-v2.0.0': [],
 'ufs-v1.1.0': [],
 'ufs-v1.0.0': [],
 'release/P8a': ['INPUTDATA_ROOT_WW3=${INPUTDATA_ROOT}/WW3_input_data_20211113'],
 'release/P7c': ['INPUTDATA_ROOT_WW3=${INPUTDATA_ROOT}/WW3_input_data_20210621'],
 'datm_mom6_cice6_cmeps': [],
 'ccpp_ipd_comparison': [],
 'Prototype-6.0beta': ['INPUTDATA_ROOT_WW3=${INPUTDATA_ROOT}/WW3_input_data_20201220'],
 'GFSv16_CCPP': [],
 'GFS.v16.2.0': [],
 'GFS.v16.0.17': [],
 'GFS.v16.0.16': [],
 'GFS.v16.0.15': [],
 'GFS.v16.0.14': [],
 'GFS.v16.0.13': [],
 'GFS.v16.0.12': [],
 'GFS.v16.0.11': [],
 'GFS.v16.0.10': [],
 'GFS.v16.0.9': [],
 'GFS.v16.0.8': [],
 'GFS.v16.0.7': [],
 'GFS.v16.0.6': [],
 'GFS.v16.0.5': [],
 'GFS.v16.0.4': [],
 'GFS.v16.0.3': [],
 'GFS.v16.0.2': [],
 'GFS.v16.0.1': [],
 'GFS.v16.0.0': [],
 'GFS_v15.2.1': [],
 'GFS_v15.2.0': [],
 'GFS_v15.1.4': [],
 'GFS_v15.1.3': [],
 'GFS_v15.1.2': [],
 'GFS_v15.1.1': []}

#### Map of UFS Version Code to Input Dataset 'INPUTDATA_ROOT_BMIC' Path

In [12]:
raw_tag2dataset['INPUTDATA_ROOT_BMIC']

{'ufs-v2.0.0': [],
 'ufs-v1.1.0': [],
 'ufs-v1.0.0': [],
 'release/P8a': ['INPUTDATA_ROOT_BMIC=${INPUTDATA_ROOT_BMIC:-$DISKNM/NEMSfv3gfs/BM_IC-20210717}'],
 'release/P7c': ['INPUTDATA_ROOT_BMIC=${INPUTDATA_ROOT_BMIC:-$DISKNM/NEMSfv3gfs/BM_IC-20210717}'],
 'datm_mom6_cice6_cmeps': [],
 'ccpp_ipd_comparison': [],
 'Prototype-6.0beta': [],
 'GFSv16_CCPP': [],
 'GFS.v16.2.0': [],
 'GFS.v16.0.17': [],
 'GFS.v16.0.16': [],
 'GFS.v16.0.15': [],
 'GFS.v16.0.14': [],
 'GFS.v16.0.13': [],
 'GFS.v16.0.12': [],
 'GFS.v16.0.11': [],
 'GFS.v16.0.10': [],
 'GFS.v16.0.9': [],
 'GFS.v16.0.8': [],
 'GFS.v16.0.7': [],
 'GFS.v16.0.6': [],
 'GFS.v16.0.5': [],
 'GFS.v16.0.4': [],
 'GFS.v16.0.3': [],
 'GFS.v16.0.2': [],
 'GFS.v16.0.1': [],
 'GFS.v16.0.0': [],
 'GFS_v15.2.1': [],
 'GFS_v15.2.0': [],
 'GFS_v15.1.4': [],
 'GFS_v15.1.3': [],
 'GFS_v15.1.2': [],
 'GFS_v15.1.1': []}

__Remarks:__

- For UFS RT framwork datasets, I do not have access to "trunk-YYYYMMDD" datasets. Does not exist for Hera or Orion, but for other HPCs such as Cheyenne, Jet, gaea, wcoss_cray, and theia

- For SRW application: See ufs-srweather-app-develop/docs/UserGuide/source/InputOutputFiles.rst for ICS and LBCs. The external model, dates, & cycles are stated in a configuration files created by user and thus, the model analysis and model forecast datasets called to generate the ICs and LBCs through SRW preprocessor will be dependent upon the users' request. Thus, mapping the model analysis and model forecast datasets should be structured in the following fashion: {external_model_name: {date: {cycle_number}}}