
# Demo: 
# Categorizing UFS Input Data Files by UFS Weather Model Application-to-Physics Suite  & Regression Tests

### __Introduction__

There is a need for developers/users using the UFS Weather Model to be able to determine what data is required for a given UFS Weather Model Application-to-Physics Suite build as well as for a given unique UFS weather model application's regression test. Currently, a reference table does not exist for which will map the UFS Application-to-Physics Suite to their corresponding required list of data files nor a reference table for which will map the UFS weather model application's regression tests to their required list of data files. 
According to the current structure of the UFS weather model framework, the details are hidden within it's regression test framework scripts -- in which each unique regression test (listed within the regression test framework's **rt.conf** script) is represented by a unique "test" script (within ufs-weather-model repo's **/tests/tests**). Each unique "test" script will then call unique configuration files residing within the following ufs-weather-model repo's folders:

- **/fv3_conf**
- **/parm file**

Note: Each unique regression test can only be applicable to a unique UFS Weather Model Application build. 

### __Purpose__ 

The purpose of the tool is to extract additional information regarding the input and baseline datasets residing within the RDHPCS relationship to the UFS Weather Model Application-to-Physics Suite & their corresponding set of regression tests -- in an effort to provide developers/users a mapping of the data files required for a given UFS Application-to-Physics Suite build as well as for a UFS application's given unique regression test. 

In this demontration, the tool will be applied against the UFS input and baseline datasets residing within the RDHPC platform, Orion.

### __Capabilities__ 

The tool will be able to perform the following actions:

- Apply feature engineering to obtain additional information regarding the data files. 
  
- Extract all data filenames mapping them to their corresponding relative directory path, 
  root folder, filename, filesize, file format, compiler (if applicable), CNTL folder (if applicable),
  "input" or "restart" file type, resolution (km) category, resolution (C resolution) category. 
 
- Categorize & sort each data file into a given UFS Weather Model Application-to-Physics Suite build & Regression Tests.

- Generate reference table/map of each data files to UFS Weather Model Application-to-Physics Suite & Regression Tests

### __Future Capabilities__  
- This tool can be used as a skeleton framework for querying information regarding the data files as they pertain to a unique UFS Weather Model Application-to-Physics Suite build &/or Regression Test.

### __Procedural Steps to Utilize Tool__

#### ___Environment Setup___

1. Install miniconda on your machine. Note: Miniconda is a smaller version of Anaconda that only includes conda along with a small set of necessary and useful packages. With Miniconda, you can install only what you need, without all the extra packages that Anaconda comes packaged with:

Download latest Miniconda (e.g. 3.9 version):
- __wget https://repo.anaconda.com/miniconda/Miniconda3-py39_4.9.2-Linux-x86_64.sh__

Check integrity downloaded file with SHA-256:
- __sha256sum Miniconda3-py39_4.9.2-Linux-x86_64.sh__

Reference SHA256 hash in following link: https://docs.conda.io/en/latest/miniconda.html

Install Miniconda in Linux:
- __bash Miniconda3-py39_4.9.2-Linux-x86_64.sh__

Next, Miniconda installer will prompt where do you want to install Miniconda. Press ENTER to accept the default install location i.e. your $HOME directory. If you don't want to install in the default location, press CTRL+C to cancel the installation or mention an alternate installation directory. If you've chosen the default location, the installer will display “PREFIX=/var/home/<user>/miniconda3” and continue the installation.

For installation to take into effect, run the following command: 
- __source ~/.bashrc__

Next, you will see the prefix (base) in front of your terminal/shell prompt. Indicating the conda's base environment is activated.

2.	Once you have conda installed on your machine, perform the following to create a conda environment:

To create a new environment (if a YAML file is not provided)
- __conda create -n [Name of your conda environment you wish to create]__

__(OR)__

To ensure you are running Python 3.9:
- __conda create -n myenv Python=3.9__

__(OR)__

To create a new environment from an existing YAML file (if a YAML file is provided):
- __conda env create -f environment.yml__

__*Note:__ A .yml file is a text file that contains a list of dependencies, which channels a list for installing dependencies for the given conda environment. For the code to utilize the dependencies, you will need to be in the directory where the environment.yml file lives.

4.	Activate the new environment via: 
- __conda activate [Name of your conda environment you wish to activate]__

5.	Verify that the new environment was installed correctly via:
- __conda info --env__

__*Note:__
    - From this point on, must activate conda environment prior to .py script(s) or jupyter notebooks execution
using the following command: __conda activate__
    - To deactivate a conda environment: 
        - __conda deactivate__

#### ___Link Home Directory to Dataset Location on RDHPCS Platform___ 

6.	Unfortunately, there is no way to navigate to the /work/ filesystem from within the Jupyter interface. The best way to workaround is to create a symbolic link in your home folder that will take you to the /work/ filesystem. Run the following command from a linux terminal on Orion to create the link: 

- __ln -s /work /home/[Your user account name]/work__

Now, when you navigate to the /home/[Your user account name]/work directory in Jupyter, it will take you to the /work/ directory. Allowing you to obtain any data in the /work/ filesystem that you have permission to access from Jupyter. This same procedure will work for any filesystem available from the root directory. 

__*Note:__ On Orion, user must sym link from their home directory to the main directory containing the datasets of interest.

#### ___Open & Run Data Analytics Tool on Jupyter Notebook___

7.	Open OnDemand has a built-in file explorer and file transfer application available directly from its dashboard via ...
    - Login to https://orion-ood.hpc.msstate.edu/ 
    - In the Open OnDemand Interface, select __Interactive Apps__ > __Jupyter Notbook__
    - Set the following configurations to run Jupyter:


#### ___Additonal Information___

__To create a .yml file, execute the following commands:__

- Activate the environment to export: 
    - __conda activate myenv__

- Export your active environment to a new file:
    - __conda env export > environment.yml__


### __Reference(s)__
Latest UFS Weather Model Guide:
- https://ufs-weather-model.readthedocs.io/en/latest/InputsOutputs.html


# Demo

In [1]:
if __name__ == '__main__':

    # Instantiate wrapper
    from script_scraper import ScriptScraper
    scraper_wrapper = ScriptScraper(local_repo_folder = 'ufs-wm-repo-033022')

    # Read referencing script.
    appsphys2test_txt = scraper_wrapper.read_appsphys2test()
    #rtconf_txt = scraper_wrapper.read_script()

    # Generate dictonary of app-to-physics_suite to test name.
    appsphys2test_dict = scraper_wrapper.convert_list2dict(appsphys2test_txt)
    appsphys2test_dict = scraper_wrapper.get_app2test(appsphys2test_dict)
    scraper_wrapper.get_rtinfo(appsphys2test_dict)

    # # Read files within /fv3_conf & /parm & /tests
    raw_data_dict = scraper_wrapper.read_raw_filenames()
    input_data_dict = scraper_wrapper.read_tests_fv3_parms(prefix_list=['cp', 'mv', 'rsync', 'ln'])

    # Generate associating app-to-physics suite to their corresponding test parameters, fv3_conf filename, & parm filename.
    appsphys2test_dict = scraper_wrapper.get_appsphys2testparams(appsphys2test_dict.copy(), input_data_dict)

    # Convert dictionary to dataframe & save as pickle.
    appsphys2test_df = scraper_wrapper.convert_dict2df(appsphys2test_dict)

    # Unique global variables of entire fv3_conf files corpus
    unique_vars_fv3conf = scraper_wrapper.get_unique_vars(input_data_dict, filetype ='fv3_conf')



Total unique UFS App-to-Physics Suite builds:
 35

Total number of tests (if performing tests for all UFS App-to-Physics Suite builds):
 604

Total unique tests overall (per current rt.conf):
 118

List of unique tests present (per current rt.conf):
 {'hafs_regional_atm', 'control_fhzero', 'rap_unified_drag_suite_debug', 'cpld_restart_c384_p8', 'hafs_regional_atm_thompson_gfdlsf', 'datm_cdeps_mx025_gefs', 'control_wam', 'cpld_decomp_p8', 'hrrr_control', 'rap_control', 'datm_cdeps_control_gefs', 'rap_restart', 'control_stochy_debug', 'control_thompson_debug', 'regional_netcdf_parallel', 'control_rrtmgp_2threads', 'rap_decomp', 'rap_2threads', 'cpld_restart_c96_p8', 'hafs_regional_telescopic_2nests_atm', 'datm_cdeps_control_cfsr', 'rap_sfcdiff', 'rap_diag_debug', 'rap_control_debug', 'datm_cdeps_3072x1536_cfsr', 'cpld_mpi_p8', 'control_2threads', 'cpld_restart_c192_p8', 'cpld_control_c96_p8', 'rap_sfcdiff_debug', 'regional_noquilt', 'hafs_regional_1nest_atm', 'datm_cdeps_bulk_cfsr', 'con

# Demo: Dictionary of source-to-destination paths of data files being copied, linked, moved, or synced per fv3_conf.
Majority of fv3_conf listed file paths is defined by global variable. These global variables are set within the following scripts:

- __/ufs-weather-model-develop/ufs-weather-model-develop/default_vars.sh__
    - Within a unique 'default_vars.sh' method, there is a unique set of exported variables. => Table: Unique 'default_vars.sh' method-to-exported variables
    
    - Some of the exported variables are set by variables dependent on the 'MACHINE_ID' utilized by a user (listed at beginning of 'default.sh') => Table: Machine-to-Machine dependent variables
    
    - Each unique test will call a unique method in 'default_vars.sh' & overwrite some of those default variables in its '/tests file'. => Table: '/tests file'-to-overwriting default variables
    
For simplicity in the preliminary stage of ingesting data for performing regression tests, once data is allocated in an established cloud bucket (k-v object store) the fv3_conf files for th latest UFS weather model release version will be altered to incorporate the s3.boto to download files from the bucket. The input_data_dict['fv3_conf'] will be used as a reference to call the cloud data files by the data files' key. 

In [2]:
# Test parameters set per /tests file.
input_data_dict['tests']

# Data files mentioned per /fv3_conf file.
#input_data_dict['fv3_conf']

# Data files mentioned per /parm file.
#input_data_dict['parm']

defaultdict(dict,
            {'cpld_control_p7_rrtmgp': {'CNTL_DIR': 'cpld_control_p7_rrtmgp',
              'RESTART_N': '12',
              'RESTART_INTERVAL': '${RESTART_N} -1',
              'eps_imesh': '2.5e-1',
              'USE_CICE_ALB': '.false.',
              'MIN_SEAICE': '1.0e-11',
              'IOPT_SFC': '1',
              'DNATS': '1',
              'FIELD_TABLE': 'field_table_gfsv16',
              'KNOB_UGWP_VERSION': '1',
              'DO_UGWP_V0': '.false.',
              'DO_UGWP_V1': '.true.',
              'DO_GSL_DRAG_LS_BL': '.true.',
              'FSICL': '99999',
              'FSICS': '99999',
              'CDMBWD': "'1.0,2.2,1.0,1.0'",
              'DIAG_TABLE': 'diag_table_template',
              'FV3_RUN': 'cpld_control_run.IN',
              'DO_RRTMGP': '.true.',
              'CCPP_SUITE': 'FV3_GFS_v16_coupled_p7_rrtmgp',
              'INPUT_NML': 'cpld_control_rrtmgp.nml.IN',
              'Test Info': ['export_fv3', 'export_cpl']},
        

# Demo: Mapping of UFS Application-to-Physics Suite Build to Regression Tests & their Corresponding Configuration Files, CNTL Folder, & Regression Test Parameters.

Maps each UFS Applications-to-Physics Suite to their corresponding:
- Applicable Tests

Maps each regression test to their corresponding:
- CNTL folder
- Configuration Files (FV3_conf)
- Parameters (parm)
- Test Details
- Test Type

In [3]:
# Read map from pickle file.
appsphys2test_df = scraper_wrapper.read_pickle('./ufs_repo_mapped_data/rt_appsphys2test_df')
#appsphys2test_df.head(50)
appsphys2test_df

Unnamed: 0,UFS_App,Physics_Suite,Test Type,Test Name,Test Info,CNTL Folder,FV3 File,Parm File
0,S2SW,FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,tests,cpld_control_p8,"{'CNTL_DIR': 'cpld_control_p8', 'RESTART_N': '...",cpld_control_p8,cpld_control_run.IN,
1,S2SW,FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,tests,cpld_2threads_p8,"{'CNTL_DIR': 'cpld_control_p8', 'RESTART_N': '...",cpld_control_p8,cpld_control_run.IN,
2,S2SW,FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,tests,cpld_decomp_p8,"{'CNTL_DIR': 'cpld_control_p8', 'RESTART_N': '...",cpld_control_p8,cpld_control_run.IN,
3,S2SW,FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,tests,cpld_mpi_p8,"{'CNTL_DIR': 'cpld_control_p8', 'RESTART_N': '...",cpld_control_p8,cpld_control_run.IN,
4,S2SW,FV3_GFS_v16_coupled_nsstNoahmpUGWPv1,tests,cpld_control_p7_rrtmgp,"{'CNTL_DIR': 'cpld_control_p7_rrtmgp', 'RESTAR...",cpld_control_p7_rrtmgp,cpld_control_run.IN,cpld_control_rrtmgp.nml.IN
...,...,...,...,...,...,...,...,...
599,NG-GODAS,,tests,datm_cdeps_3072x1536_cfsr,"{'CNTL_DIR': 'datm_cdeps_3072x1536_cfsr', 'DAT...",datm_cdeps_3072x1536_cfsr,cpld_datm_cdeps.IN,
600,NG-GODAS,,debug_tests,datm_cdeps_debug_cfsr,"{'CNTL_DIR': 'datm_cdeps_debug_cfsr', 'DATM_SR...",datm_cdeps_debug_cfsr,cpld_datm_cdeps.IN,
601,ATMW,FV3_GFS_v16,tests,control_atmwav,"{'CNTL_DIR': 'control_atmwav', 'NPZ': '127', '...",control_atmwav,control_run.IN,control.nml.IN
602,ATMW,FV3_GFS_v16,tests,control_c384gdas_wav,"{'CNTL_DIR': 'control_c384gdas_wav', 'FHMAX': ...",control_c384gdas_wav,control_run.IN,control_gdas.nml.IN


# Demo: Select UFS Application-to-Physics Suite Build & Regression Test of Interest 

In [4]:
# App-to-Physics combo to Test file dictionary
unique_app2phys = ('ATM', 'FV3_GFS_v16_RRTMGP')
test_type = 'debug_tests'
test_name = 'control_thompson_no_aero_debug'
fv3_file = appsphys2test_dict[unique_app2phys][test_type][test_name]['FV3_RUN']
parm_file = appsphys2test_dict[unique_app2phys][test_type][test_name]['INPUT_NML']

print("\nUFS APPLICATION-to-PHYSICS SUITE:\n", unique_app2phys)
print("\nTEST NAME:\n", test_name)
print("\nFV3:\n", fv3_file)
print("\nPARM:\n", parm_file)

# Parameters of the Regression Test of Interest.
print("\nParameters of the Given App-to-Physics Regression Test:\n", appsphys2test_dict[unique_app2phys][test_type][test_name])

# FV3 files dictionary associated w/ the Regression Test of Interest.
unique_fv3_file = fv3_file
print("\nFV3 Mentioned Files Being Linked, Synced, Copied:\n", input_data_dict['fv3_conf'][unique_fv3_file])

# Parm files associated w/ the Regression Test of Interest.
unique_parm_file = parm_file
print("\nParm Mentioned Files:\n", input_data_dict['parm'][unique_parm_file])


UFS APPLICATION-to-PHYSICS SUITE:
 ('ATM', 'FV3_GFS_v16_RRTMGP')

TEST NAME:
 control_thompson_no_aero_debug

FV3:
 control_run.IN

PARM:
 control_thompson.nml.IN

Parameters of the Given App-to-Physics Regression Test:
 {'CNTL_DIR': 'control_thompson_no_aero_debug', 'NPZ': '127', 'NPZP': '128', 'DT_ATMOS': '600', 'SYEAR': '2021', 'SMONTH': '03', 'SDAY': '22', 'SHOUR': '06', 'OUTPUT_GRID': "'gaussian_grid'", 'NSTF_NAME': "'2,0,0,0,0'", 'FHMAX': '1', 'OUTPUT_FH': '0 1', 'IMP_PHYSICS': '8', 'DNATS': '0', 'DO_SAT_ADJ': '.false.', 'LRADAR': '.true.', 'NSRADAR_RESET': '3600.0', 'LTAEROSOL': '.false.', 'HYBEDMF': '.false.', 'SATMEDMF': '.true.', 'DO_MYNNEDMF': '.false.', 'IMFSHALCNV': '2', 'IMFDEEPCNV': '2', 'IAER': '5111', 'ICLIQ_SW': '2', 'IOVR': '3', 'LHEATSTRG': '.true.', 'DO_TOFD': '.T', 'FV3_RUN': 'control_run.IN', 'CCPP_SUITE': 'FV3_GFS_v16_thompson', 'INPUT_NML': 'control_thompson.nml.IN', 'FIELD_TABLE': 'field_table_thompson_noaero_tke', 'Test Info': ['export_fv3']}

FV3 Mentioned 