In [None]:
#%load_ext autoreload
#%autoreload 2

<a name="top"></a>
# <center>Pegasus WMS Workflow MATLAB Example</center>

## Abstract

- This Jupyter Notebook tool provides a template for running a Pegasus Workflow Management System (WMS) workflow, comprising MATLAB executables, on the University at Buffalo (UB)'s Center For Computational Research (CCR)'s generally accessible high performance compute cluster, UB-HPC.

- This tool's repository is located at https://github.com/GhubGateway/Ghub_Pegasus_WMS_MATLAB_Example.

- The Ghub tool name for this template is ghubex3. The files provided by this template are specific for this tool. You will need to update / replace the files with files specific for your tool as required. See the `Create Your Tool On Ghub` section for more details.

## Overview

- Enter the latitude and longitude coordinates in decimal degrees. Click the `Run Workflow` button to run the workflow which converts the coordinates to UTM.

## User Guide

### [**Steps for using this tool**](#steps_for_using_this_tool)<br />

1. [Enter the Latitude and Longitude Coordinates in Decimal Degrees](#step_1)<br />
2. [Run the Workflow](#step_2)<br />
3. [View Workflow Progress](#step_3)<br />
4. [View Workflow Results](#step_4)<br />
5. [View Log Output](#step_5)<br />

### [**Create Your Tool On Ghub**](#createyourtool)<br />

### [**Background**](#background)<br />



In [None]:
# As of 03/2024, tested with the Jupyter Notebooks (202210) tool and the Python3 (ipykernel)

# Setup and preoprocessing:

import sys
import os
import getpass
import platform
import shutil
import atexit
import math
import numpy as np
import pandas as pd
import time

import ipywidgets as widgets
from IPython.display import display, HTML, Markdown, clear_output, Image, Javascript
#import xml.etree.ElementTree as et

import hublib
#print (help(hublib))
import hublib.ui as ui
#print (help(ui))
import hublib.use
#print (help(hublib.use))

#print(sys.path)

# Set up the environment for this notebook

# Setup paths to executables
scriptpath = os.path.realpath(" ")
        
# Get the parent dirs
self_tooldir = os.path.dirname(scriptpath)

# Setup path to python and bash scripts
self_bindir = os.path.join(self_tooldir, "bin")

# Add to PYTHONPATH
sys.path.insert (1, self_bindir)

# Set up path to the current data directory
self_datadir = os.path.join(self_tooldir, "data")

# Set up path to the current doc directory
self_docdir = os.path.join(self_tooldir, "doc")

# Set up path to the current session directory
self_workingdir = os.getcwd()

# Set up path to the user's home directory
self_homedir = os.path.expanduser("~")

# Initialize the dated run directory.
# Workflow results are not available until after a workflow is executed via Pegasus and completes
self_rundir = ""

self_user = getpass.getuser()

# Configuration parameters

import Configuration as cfg

# Version of Pegasus
# Note: when switching the version of Pegasus, delete ~/.pegasus/workflow.db
%use pegasus-5.0.1
from launchWrapper import launchWrapper

np.set_printoptions(threshold=np.inf) 

self_log_filepath = os.path.join(self_workingdir, 'ghubex3_log_file.txt')
self_log_snapshot_filepath = os.path.join(self_workingdir, 'ghubex3_log_snapshot_file.txt')
self_log_backup_filepath = os.path.join(self_workingdir, 'ghubex3_log_backup_file.txt')

widget_border_style = '1px solid black'
widget_output_border_style = '1px solid black'

BOLD = '\033[1m'
SUCCESS = '\033[92m'
WARNING = '\033[93m'
FAIL = '\033[91m'
END = '\033[0m'

dropdown_str_width = 16

dropdown_width = '965px'
dropdown_height = '30px'
button_width = '250px'
button_height = '40px'
ui_string_width = '96.5%'
ui_dropdown_width = '96.2%'

# Clean up: remove files from the data/results folder and the bin/__pycache__ folder
def exit_handler():
    
    for file in os.listdir(self_workingdir):
        
        if os.path.isfile(file):
            if file.endswith(".txt"):
                if file != "README.txt" and file.endswith('utm.txt') == False and file.endswith('deg.txt') == False and file != 'ghubex3_log_file.txt'\
                    and file != self_log_filepath:
                    #print ("Deleting: %s\n" %file)
                    os.remove(file)
            elif file.endswith(".yml"):
                #print ("Deleting: %s\n" %file)
                os.remove(file)
            elif file.endswith(".stdout"):
                #print ("Deleting: %s\n" %file)
                os.remove(file)
            elif file.endswith(".stderr"):
                #print ("Deleting: %s\n" %file)
                os.remove(file)

    #dirpath = os.path.join(self_bindir, "__pycache__")
    #if (os.path.exists(dirpath)):
        #print ("Deleting: %s\n" %dirpath)
        #shutil.rmtree(dirpath)
        
    FH1.flush()
    FH1.close()

atexit.register(exit_handler);   

In [None]:
# prevent In[] and Out[] from displaying on left
#HTML('''
#<style>.prompt{width: 0px; min-width: 0px; visibility: collapse}</style>
#''')

In [None]:
#https://api.jquery.com/ready/
HTML('''
<script>
    function scroll_to_top() {
        Jupyter.notebook.scroll_to_top();
    } 
    $( window ).on( "load", scroll_to_top() );
</script>
''')

In [None]:
# Button styles
HTML('''
<style>.buttontextclass { color:black ; font-size:130%}</style>
''')

In [None]:
# Initialize

# Note: ~/.pegasus/workflow.db is not consistent between Pegasus 4.8.1 and Pegasus 5.0.1
pegasus_workflow_db_filepath = os.path.join(self_homedir, '.pegasus', 'workflow.db')
#print ('pegasus_workflow_db_filepath: ', pegasus_workflow_db_filepath)
if os.path.exists(pegasus_workflow_db_filepath):
    if os.path.exists(pegasus_workflow_db_filepath + '.save'):
        os.remove(pegasus_workflow_db_filepath + '.save')
    shutil.copy (pegasus_workflow_db_filepath, pegasus_workflow_db_filepath + '.save')
    os.remove(pegasus_workflow_db_filepath)
    
if os.path.exists(self_log_filepath):
    shutil.copy (self_log_filepath, self_log_backup_filepath)
    
FH1 = open(self_log_filepath, 'w')

show_log_output_button = widgets.Button(description="Show Log Output", disabled=False,\
    layout=widgets.Layout(width=button_width, height=button_height),\
    style= {'button_color':'lightgreen','font_weight':'bold'})

# Utility Functions

def log_info (message):
    if show_log_output_button.description == 'Hide Log Output': 
        with log_output:
            print (message)    
    FH1.write('%s\n' %message)
    FH1.flush()
        
def log_status (output_widget, message):
    
    with output_widget:
        print (message)
    log_info (message)
    
def log_success (output_widget, message):
    
    with output_widget:
        print ('%s%s%s' %(SUCCESS,message,END))
    log_info (message)
    
def log_warning (output_widget, message):
    
    with output_widget:
        print ('%s%s%s' %(WARNING,message,END))
    log_info (message)
    
def log_error (output_widget, message):
    
    with output_widget:
        print ('%s%s%s' %(FAIL,message,END))
    log_info (message)


if (1): #cfg.VERBOSE == True:
    
    log_info ('Operating System Platform: ' + platform.system() + ' ' + platform.release())
    log_info ('\n')

    log_info ('Environment:\n')
    log_info ('scriptpath: ' + scriptpath)
    log_info ('tooldir: ' + self_tooldir)
    log_info ('bindir: ' + self_bindir)
    log_info ('datadir: ' + self_datadir)
    log_info ('workingdir: ' + self_workingdir)
    log_info ('homedir: ' + self_homedir)
    log_info ('user: ' + self_user)
    log_info ('\n')
    
    #print (type(sys.path)) # <class 'list'>
    #print (sys.path)
    log_info ('sys.path: ' + ' '.join(str(path)+'\n' for path in sys.path))
    log_info ('\n')
    
    #print (type(os.environ["PATH"])) # <class 'str'>
    #print (os.environ["PATH"])
    log_info ('os.environ["PATH"]: ' + os.environ["PATH"])
    log_info ('\n')



In [None]:
environ = dict(os.environ)
#print (type(environ))
#print (environ)
key = 'SESSION'
if key in environ:
    session_num = str(environ[key])
else:
    session_num = 'session number unknown'
message = 'Ghub session number: ' + str(session_num)
#print ('%s%s%s' %(SUCCESS,message,END), flush=True)
log_info (message)


In [None]:
# #219F hex = #8607 decimal
# This works also works for an up arrow: [$\tiny\uparrow$](#top)

<a name="step_1"></a>
## Step 1: Enter the Latitude and Longitude Coordinates [&#8607;](#top)

Enter the latitude and longitude coordinates in decimal degrees.

Enter Latitude.   [Degrees North -90 to 90]<br />
Enter Longitude.  [Degrees East -180 to 180]


In [None]:
# Default latitude and longitude to Buffalo, NY latitude and longitude coordinates.
#https://www.gps-latitude-longitude.com/gps-coordinates-of-buffalo-ny
# Also see: https://www.latlong.net/lat-long-utm.html
latitude = ui.Number(
    name = 'Latitude',
    description = 'Latitude [degrees north -90 to 90]',
    units = '',
    value = '42.886447',
    min = '-90.0',
    max = '90.0'
)
longitude = ui.Number(
    name = 'Longitude',
    description = 'Longitude [degrees east -180 to 180]',
    units = '',
    value = '-78.878369',
    min = '-180.0',
    max = '180.0'
)
coordinates_form = ui.Form([latitude,
             longitude], name = 'Coordinates')


In [None]:
display(coordinates_form)

In [None]:
 # Run Workflow

self_numsamples = 0

maxwalltime = ui.Number(
    name = 'Maximum Walltime',
    description = 'Maximum Walltime [min]',
    units = 'min',
    value = '10.0',
    min = '5.0',
    max = '60.0'
)

workflow_run_options_form = ui.Form([maxwalltime], name = 'Workflow Run Options')

def run_workflow(p):
    
    # print (p) #Button
    global self_workflow_succeeded
    self_workflow_succeeded = False
        
    workflow_progress.clear_output()
    workflow_results.clear_output()
        
    with workflow_progress:
        
        runWorkflowButton.disabled = True
        show_log_output_button.disabled = True
        
        start_time = time.time()

        try:
            
            self_workflow_results1_filepath = os.path.join(self_workingdir, "utm.txt")
            self_workflow_results2_filepath = os.path.join(self_workingdir, "deg.txt")

            for file in os.listdir(self_workingdir):
                    if os.path.isfile(file):
                        if file.endswith('utm.txt') == True or file.endswith('utm.txt'):
                            os.remove(file)

            #Note: Workflow execution time depends on the current UB CCR workload.
            log_status (workflow_progress, "Pegasus workflow in progress. This will take approximately 15 minutes...")
            
            #'''
            launchWrapper (" ", \
                self_tooldir, self_bindir, self_datadir, self_workingdir, self_rundir, \
                latitude.value, longitude.value, int(maxwalltime.value))
            #'''
            
            log_status (workflow_progress, '\nWorkflow elapsed time: ' + str((time.time() - start_time)/60.0) + ' [min]\n')
            
            # Check if the results files were created and transferred from CCR 
            # to determine if workflow completed successfully

            if os.path.exists(self_workflow_results1_filepath) and os.path.exists(self_workflow_results2_filepath):

                log_status (workflow_progress, 'Workflow completed successfully\n')
                self_workflow_succeeded = True
                
                with workflow_results:
                    
                    print ('Workflow Results:\n')
        
                    print('UTM Coordinates:\n')
                    f = open(self_workflow_results1_filepath, 'r')
                    output = f.read()
                    f.close()
                    #print (type(output))
                    output = output.split('\n')
                    #print (type(output))
                    print ('x [UTM Easting]:  ' + output[0])
                    print ('y [UTM Northing]: ' + output[1])
                    print ('UTMZONE:          ' + output[2])
                    
                    print('\nLatitude and Longitude Coordinates for Verification Check:\n')
                    f = open(self_workflow_results2_filepath, 'r')
                    output = f.read()
                    f.close()
                    #print (type(output))
                    output = output.split('\n')
                    #print (type(output))
                    print ('Latitude  [Degrees North -90 to 90]:  ' + output[0])
                    print ('Longitude [Degrees East -180 to 180]: ' + output[1])

            else:

                log_error (workflow_progress, 'Workflow did not complete successfully')
                log_error (workflow_progress, '%s and/or %s not generated by the workflow\n' \
                       %(self_workflow_results1_filepath, self_workflow_results2_filepath))
                self_workflow_succeeded = False

                filepath = os.path.join(self_workingdir, 'pegasus.analysis')
                if (os.path.exists(filepath)):
                    log_info("pegasus.analysis:\n")
                    f = open(filepath, 'r')
                    output = f.read()
                    f.close()
                    log_info (output)
       
            finish_workflow_processing()
        
        except Exception as e:
        
            log_error (workflow_progress, 'Workflow Exception: %s\n' %str(e))
       
        runWorkflowButton.disabled = False
        show_log_output_button.disabled = False


# Abort
# Select Kernel Interrupt
#if self_tW.is_alive() == True:
   #self_tW.terminate()

runWorkflowButton = widgets.Button(description="Run Workflow", disabled=False,\
    layout=widgets.Layout(width=button_width, height=button_height),\
    style= {'button_color':'lightgreen','font_weight':'bold'})
runWorkflowButton.add_class("buttontextclass")
runWorkflowButton.on_click (run_workflow)
#help (runWorkflowButton)

# Note: See /apps/share64/debian7/anaconda/anaconda-6/lib/python3.7/site-packages/hublib/ui/pathselect.py,
# file property initialized to None, when a file is selected gets set to the selected file.


<a name="step_2"></a>
## Step 2: Run the Workflow [&#8607;](#top)

 Click the `Run Workflow` button to run the workflow which converts the coordinates from decimal degrees to UTM.
 
- The MATLAB executables are encapsulated as a workflow by the launchWrapper.py script in the tool's bin directory. The Pegasus Workflow Management System (WMS) automates and manages the execution of the workflow jobs, including staging the jobs, distributing the work, submitting the jobs to run in parallel on CCR's UB-HPC compute cluster, as well as handling data flow dependencies and overcoming job failures. See the `Background` section for more information on the Pegasus WMS.<br />

- The deg2utm MATLAB executable converts the coordinates to UTM and creates the file utm.txt. The utm2deg MATLAB executable reads utm.txt and converts the coordinates back to decimal degrees and creates the file deg.txt. The utm.txt and deg.txt files are returned from CCR and the results are displayed in the `View Workflow Results` section when the workflow completes. 

- You will receive an email when the workflow completes and the results are ready for review.<br />

- If an error is encountered while running the workflow, the cause of the error will be written to the log output file, ghubex3_log_file.txt. See the `View Log Output File` section for more information.<br />


In [None]:
display(workflow_run_options_form)
display(runWorkflowButton)

In [None]:
def send_user_email(workflow_succeeded):

    # Reference: JMS crevasseoib tool:
    job_num = str(os.environ['SESSION'])
    
    email_subject = 'ghubex3 session #' + session_num + '.'
    
    if workflow_succeeded:
        email_text = 'Your ghubex3 job is complete!\r'
        email_text = email_text+'\rOutput files can be accessed on theghub.org in the following directory:'
        email_text = email_text+'\r' + str(self_workingdir)
    else:
        email_text = 'ghubex3 job #' + str(job_num) + ' Failed.'
        email_text = email_text+'\rPlease check theghub.org for further information, in the directory:'
        email_text = email_text+'\r' + str(self_workingdir)        
        
    email_cmd = 'submit --progress silent mail2self -t "'+email_text+'" -s "'+email_subject+'"'
    
    # email debugging
    # submit blocks
    #start_time = time.time()
    os.system(email_cmd)
    #elapsed_time = time.time() - start_time
    #print ('email elapsed time: ', elapsed_time)
    
def finish_workflow_processing():
    
    try:

        log_info ('\nfinish_workflow_processing...')
        
        # ghub_exercise1-workflow.dax is created by Wrapper.py
        #filepath = os.path.join(self_workingdir, 'ghub_exercise1-workflow.dax')
        #if os.path.exists(filepath):
            #print ("Deleting: %s\n" %filepath)
            #os.remove(filepath)

        for file in os.listdir(self_workingdir):
            if os.path.isfile(file):
                if file.endswith('.stdout'):
                    #if file.startswith('matlab-'):
                        #log_info ('file ' + file + ':\n')
                        #f = open(file,'r')
                        #for line in f:
                            #log_info (line)
                        #f.close()
                    os.remove(file)
                    
        for file in os.listdir(self_workingdir):
            if os.path.isfile(file):
                if file.endswith('.stderr'):
                    if file.startswith('matlab-'):
                        log_info ('file ' + file + ':\n')
                        f = open(file,'r')
                        for line in f:
                            log_info (line)
                        f.close()
                    os.remove(file)
         
        filepath = os.path.join(self_workingdir, 'pegasus.analysis')
        if (os.path.exists(filepath)):
            filesize = os.path.getsize(filepath)
            log_info ('pegasus.analysis filesize: ' + str(filesize))
            log_info ('pegasus.analysis:\n')
            f = open(filepath, 'r')
            output = f.read()
            f.close()
            log_info (output)
            os.remove(filepath)
        
        filepath = os.path.join(self_workingdir, "pegasusstatus.txt")
        if os.path.exists(filepath):
            #print ("Deleting: %s\n" %filepath)
            os.remove(filepath)

        filepath = os.path.join(self_workingdir, "pegasusjobstats.csv")
        if os.path.exists(filepath):
            #print ("Deleting: %s\n" %filepath)
            os.remove(filepath)

        filepath = os.path.join(self_workingdir, "pegasussummary-time.csv")
        if os.path.exists(filepath):
            #print ("Deleting: %s\n" %filepath)
            os.remove(filepath)

        filepath = os.path.join(self_workingdir, "pegasussummary.csv")
        if os.path.exists(filepath):
            #print ("Deleting: %s\n" %filepath)
            os.remove(filepath)

        # send email to user
        send_user_email(self_workflow_succeeded)
        
        log_info ('finish_workflow_processing done.')
        
    except Exception as e:
        log_error (create_figures_button_callback_output, "EXCEPTION: %s\n" % str(e))


<a name="step_3"></a>
## Step 3: View Workflow Progress [&#8607;](#top)


In [None]:
workflow_progress = widgets.Output(layout={'border': widget_output_border_style})
display(workflow_progress)

<a name="step_4"></a>
## Step 4: View Workflow Results [&#8607;](#top)


In [None]:
workflow_results = widgets.Output(layout={'border': widget_output_border_style})
display(workflow_results)

<a name="step_5"></a>
## Step 5: View Log Output [&#8607;](#top)

- If an error is encountered while running this tool,
the cause of the error will be written to the log output file, ghubex3_log_file.txt.

- Click the `Show Log Output` button to open the `Log Output` window and view the log output file.


In [None]:
def show_log_output(change):
    
    if os.path.exists(self_log_filepath):
            
        if show_log_output_button.description == 'Show Log Output':
        
            show_log_output_button.description = 'Hide Log Output'
        
            with log_output:
            
                if os.path.exists(self_log_filepath):
                    print("%s: \n\n" %self_log_filepath)
                    f = open(self_log_filepath,'r')
                    for line in f:
                        print(line.rstrip())
                    f.close()
                else:
                    log_error (log_output, '%s does not exist ' %filepath + '. Please contact us.')
        else:
        
            show_log_output_button.description = 'Show Log Output'
            log_output.clear_output()
    else:
        log_error (log_output, '%s does not exist ' %filepath + '. Please contact us.')

show_log_output_button.add_class("buttontextclass")
show_log_output_button.on_click(show_log_output)
display (show_log_output_button)

In [None]:
log_output = widgets.Output(layout={'border': widget_output_border_style})
display (log_output)

In [None]:
# Download from Ghub
#def flush_log_file():
    #FH1.flush()
#display(HTML('<h4>Download File: %s</h4>' %os.path.basename(self_log_filepath)))
#downloadTXTButton = hublib.ui.Download(os.path.relpath(self_log_filepath, os.getcwd()),
    #label = 'Download Log', style='success', icon='fa-arrow-circle-down', cb=flush_log_file)
#display(downloadTXTButton);

<a name="createyourtool"></a>
## Create Your Tool On Ghub [&#8607;](#top)

### Host GIT repository on HUB

Follow the instructions on the https://theghub.org/tools/create web page.  Enter a name for your tool. Select the Repository Host, Host GIT repository on HUB. Select the Publishing Option, Jupyter Notebook. 

Note: when a new tool is created you will receive an email with a link to the tool's status page. The tool's status page will allow you to let the Ghub administrators know when you are ready to update, install, approve or publish your tool.

Note: published tools are launched from the Ghub Dashboard's My Tools component.

### Update Your Tool

1) Launch the Workspace 10 Tool from the Ghub Dashboard's My Tools component and in a xterm terminal window enter:

	git clone https://github.com/GhubGateway/Ghub_Pegasus_WMS_MATLAB_Example ghubex3

	git clone https://theghub.org/tools/<your tool name>/git/<your tool name> <your tool name>

2) Copy ghubex3/matlabBuild.ipynb to \<your tool name\>/matlabBuild.ipynb.<br />
3) Copy template files from the ghubex3 src, bin and remotebin directories to your tool's src, bin and remotebin directories.<br />
4) Update / replace the scripts in your tool's src directory with the scripts required for your tool.<br />
5) Update the matlabBuild.sh script in your tool's remotebin directory with the script required your tool.<br />
6) Update the launchWrapper.py script in your tool's bin directory with the script required to plan the workflow for your tool. See  Ghub_Pegasus_WMS_Workflow_MATLAB_Example.pdf in the ghubex3 doc directory for more information on Ghub Pegaus WMS workflows.<br />

### Compile the MATLAB Executables for Your Tool

1) Launch the Jupyter Notebooks (202210) tool from the Ghub Dashboard's My Tools component and open the \<your tool name\>/matlabBuild.ipynb Jupyter Notebook. See [Compile MATLAB Executables](matlabBuild.ipynb) for the version of matlabBuild.ipynb used to build the MATLAB executables for the ghubex3 tool.<br />
2) Update the self_binfiles list in the matlabBuild.ipynb notebook with the MATLAB executables required for your tool.<br />
3) Save the notebook update.<br />
4) Click the Appmode button.<br />
5) Click the Run Workflow button to compile the MATLAB executables. The MATLAB scripts in your tool's src directory are compiled on CCR and the returned MATLAB executables are moved your tools's bin directory.<br />

### Launch the MATLAB Executables for Your Tool

1) Launch the Jupyter Notebooks (202210) tool from the Ghub Dashboard's My Tools component and open the \<your tool name\>/\<your tool name\>.ipynb Jupyter Notebook.<br />
2) Update \<your tool name\>/\<your tool name\>.ipynb with the user interface required for your tool.<br />
3) Save the notebook updates.<br />
4) Click the Appmode button.<br />
5) Click the Run Workflow button to launch the MATLAB executables.<br />

### Commit Your Tool Updates:

1) Enter git add to add a new file or to update an existing file.<br />
2) Enter git commit -m "commit message" to describe your updates.<br />
3) Enter git push origin master to push your updates to GIT repository on Ghub.<br />



<a name="background"></a>
## Background [&#8607;](#top)

- Compiled MATLAB executables are encapsulated as a workflow. The Pegasus Workflow Management System (WMS) automates and manages the execution of the workflow jobs, including staging the jobs, distributing the work, submitting the jobs to run in parallel CCR's UB-HPC compute cluster, as well as handling data flow dependencies and overcoming job failures.  See https://pegasus.isi.edu/documentation/index.html for more information on the Pegasus Workflow Management System (WMS).

- The submit command enables Ghub users to execute code on CCR's UB-HPC compute cluster. See https://theghub.org/kb/development/using-submit for more information on the submit command. See https://help.hubzero.org/documentation/current/tooldevs/grid/pegasuswf for more information on submitting a pegasus-plan for a Pegasus WMS workflow.

- This Jupyter-based tool uses Python 3. See https://theghub.org/resources?alias=jupyterexamples for more information on developing Jupyter-based tools on Ghub.

- This tool is deployed on Debian 10 to run in Tool or App mode style. See https://theghub.org/kb/development/deploy-styles-for-jupyter-tools for more information on deploying Jupyter-based tools on Ghub.