# EPMT Analysis Example Notebook

This notebook is intended to show how EPMT can be used to analyze performance characteristics (metadata) of FRE postprocessing (`frepp`) jobs, in addition to`refineDiag` and analysis scripts. The notebook walks through API queries to extract and display performance-related quantities for a given FRE experiment. This introduction only covers a small subset of EPMT's analysis capabilities, as it captures volumnous amounts of job, process, and thread-level data. 

EPMT-gathered metadata is stored in a database, with access provided by the EPMT Query API. The API provides functions that can return usable data in various formats: `pandas.DataFrame` objects (easy graphing and inspection), Python `dict` objects (dictionary, unique key/value pairs), job number lists (`terse`; fast), and `orm` database objects (fast/powerful/flexible/lazy). Data in these formats can then be rendered into plots/graphs/figures using standard Python libraries (e.g. `matplotlib`). EPMT also comes with some graphing functions for immediate use out-of-the-box, available in the `ui/` directory. Some of these will be used in this notebook.

## Starting EPMT and opening this notebook
To begin with this notebook, open a terminal and `ssh` in to any JHAN enabled analysis node (e.g. `an201`) and do:

```
module load epmt
epmt notebook -- --no-browser --ip=`hostname` --notebook-dir=/home/First.Last/path/to/your/notebooks

```
Above, the `--no-browser` option is required to keep an internet browser from opening up immediately via `ssh` session, and the `--ip` option is helpful for printing a full URL for accessing these notebooks through a web browser on a local machine. Navigate to the output URL in any browser. What comes up should be the `notebook-dir` directory specified above. To begin working through this notebook then, click on `Analysis-Example.ipynb`.

### Further Info
Documentation of EPMT's Query and Graphics APIs can be seen in the `Query-API` and `Graphing-API` notebooks also in this repository.

## Import Relevant Python Modules
First, import the `epmt_query` and `ui.graphing` modules in order to access the relevant commands for retrieving data and plotting it. Additionally importing the `pandas` package allows the use of `DataFrame` objects, which are considered a convenient format for conducting data analysis. To import, click on the code cell below, then press `shift`+`enter` to execute the commands in the cell.

In [16]:
# import  epmt query and graphing modules
print('importing epmt_query')
import epmt_query as eq
print('importing ui.graphing')
import ui.graphing as gr

# import pandas. optional but helpful 'display.max_columns' arg shows all DataFrame columns when printing
print('importing pandas')
import pandas
pandas.set_option('display.max_columns', None)

importing epmt_query
importing ui.graphing
importing pandas


## Retrieving and Peeking at Job Metadata
For this example notebook, let's grab the metadata of 1000 jobs currently in the DB. Note that in general EPMT job information is not stored permanenently, and old (>2 months) data are periodically removed. 

To get the desired metadata through EPMT, we use the `eq` object's `get_jobs` function with a limit argument `limit=1000`  One and a desired output format (`fmt='dict'`). One can also specify a desired tag for specific qualities of retrieved jobs, like experiment name (e.g. `tags='exp_name:c96_am4p0'`).

<!-- 
For this example notebook, the metadata of a 2-year AM5 model run by user `Chris.Blanton`, with output corresponding to 
```
/home/Chris.Blanton/am5/2022.01/c96L33_am4p0/gfdl.ncrc4-intel21-prod-openmp/stdout/postProcess
```
is provided. Note that in general EPMT job information is not stored permanenently, and old (>2 months) data are periodically removed. 

To get the metadata through EPMT, we use the `eq` object's `get_jobs` function with a experimental name tag (`tags='exp_name:c96_am4p0'`) and a desired output format (`fmt='dict'`). -->

<!-- freppscripts, refineDiag, and analysis -->
<!-- e.g. the pp jobs were tagged like (e.g. atmos 1980) -->
<!-- exp_component:atmos; -->
<!-- exp_name:c96L33_am4p0; -->
<!-- exp_time:19800101; -->
<!-- exp_platform:gfdl.ncrc4-intel21; -->
<!-- exp_target:prod-openmp; -->
<!-- exp_seg_months:12; -->
<!-- script_name:c96L33 -->
<!-- retrieve desired jobs to analyze -->
<!-- g_am4p0_atmos_19800101' -->
<!-- let's retrieve all jobs for one experiment -->
<!-- modify this to retrieve jobs for another experiment -->

In [23]:
# retrieve all jobs corresponding to below criteria/tag(s)
# potential criteria are before/after job start/end/creation times, 
# potential tags are exp_component, exp_name, exp_time, exp_platform, and exp_target (i.e. command line args to frepp)
# potential output formats are dict, pandas, terse, and orm
jobs_all = eq.get_jobs(limit=200, before=-20,
                       fmt='dict')
print(f'number of elements in jobs_all={len(jobs_all)}')
jobs_all[0]['start']

number of elements in jobs_all=0


[]

What does one recorded job's worth of metadata look like? We can print out a single job entry in the usual way, but python yaml library can make the printout more digestible for little effort like so:

In [None]:
from yaml import dump
## uncomment me and run me- large output warning!
print(dump(data=jobs_all[0], default_flow_style=False))

We can get a further idea of what kind of data is in these jobs by `get_job_tags`

In [None]:
jobs_all_tags=eq.get_job_tags(jobs=jobs_all)
print(f'type(jobs_all_tags)={type(jobs_all_tags)}')
for key in jobs_all_tags:print(f'key={key}')

If we print a specific key in the `jobs_all_tags` dict, we see the values assigned to that key across jobs. The pritn statement below shows us that across these 1000 jobs, the `exp_platform` tag value took on only six different values.

In [None]:
print(f'jobs_all_tags[exp_platform]=\n{jobs_all_tags["exp_platform"]}')

What about `exp_name`? In this case, there's a lot of possible values, and so here this approach is not as helpful. It might be wise to do some looping over the retried jobs to get the info we need.

In [None]:
print(f'jobs_all_tags[exp_name]=\n{jobs_all_tags["exp_name"]}')

## Selecting Relevant Metadata 

Note that `get_jobs` is grabbing the oldest 1000 jobs *ingested by the DB*. To start picking at this, let's sort the jobs by experimental component (`exp_component`) while we exclude jobs from other users. One possible way of doing this is to loop over the dictionary data itself as below.

In [None]:
# username of who's jobs we'd like to analyze, if desired
name=None
#name='Ian.Laflotte'
### a convenient way to grab one's own username
##import os
##name = os.environ.get('USER')

# sort jobs into refineDiag (rd), analysis (ana) and postprocessing (pp) categories
jobs_rd = []
jobs_ana = []
jobs_pp = []

found_exp_components={}
found_exp_names={} # count times we find an exp_name

for job in jobs_all:
    
    # skip jobs that are not the user's
    if name is not None:
        if job['user'] == name:
            continue
    
    job_exp_name=job['tags']['exp_name']
    exp_name_unique=True
    for already_found_name in found_exp_names: #check if exp_name unique
        if job_exp_name == already_found_name:
            exp_name_unique=False
            found_exp_names[already_found_name]+=1 #if not unique, add 1 to count
            break
            
    if exp_name_unique: #if unique, new entry
        found_exp_names[job_exp_name]=1

    
    job_exp_component=job['tags']['exp_component']
    exp_component_unique=True
    for already_found_component in found_exp_components: #check if exp_component unique
        if job_exp_component == already_found_component:
            exp_component_unique=False
            found_exp_components[already_found_component]+=1 #if not unique, add 1 to count
            break
            
    if exp_component_unique: #if unique, new entry
        found_exp_components[job_exp_component]=1
        
        
        
    # then, separate the jobs into pp, refineDiag, and analysis jobs
    if job_exp_component == 'refineDiag':
        jobs_rd.append(job)
    elif job_exp_component == 'analysis':
        jobs_ana.append(job)
    else:
        ## if desired, print out exp_component of jobs other than the two above
        #print('pp exp_component for this job is '+str(job['tags']['exp_component']))
        jobs_pp.append(job)

print(f'number of elements in jobs_rd ={len(jobs_rd )}')
print(f'number of elements in jobs_ana={len(jobs_ana)}')
print(f'number of elements in jobs_pp ={len(jobs_pp )}')
print('\n')

print(f'found {len(found_exp_components)} unique exp_components')
sorted_found_exp_components= sorted(found_exp_components.items(),key=lambda x:x[1],reverse=True)
print(f'top five:\n{sorted_found_exp_components[:5]}')
print('\n')

print(f'found {len(found_exp_names)} unique exp_names')
sorted_found_exp_names= sorted(found_exp_names.items(),key=lambda x:x[1],reverse=True)
print(f'top five:\n{sorted_found_exp_names[:5]}')

## Plotting/Graphing Job Metadata 1

Now the fun part! A convenient function for making quick plots is the `gr` object's `graph_experiment` function, the code of which can be seen in file `ui/graphing.py`. `graph_experiment` will use `get_jobs` using the same experiment name as before, but only select the job numbers that also belong in `jobs_pp` (or `rd` or `ana` jobs). It uses the `pandas.DataFrame` format to average job times across experiment component, and then display the results as a bar graph. Run the below cells to show how wallclock (`duration`) and CPU times for each group of jobs.

In [None]:
# look at the type+name of a more-frequent exp_name
choice=0
print(type(sorted_found_exp_names[choice]))
print(sorted_found_exp_names[choice])

# it's a tupel, grab only the string
target_exp_name=sorted_found_exp_names[choice][0]
print(type(target_exp_name))
print(target_exp_name)

# title for plots
base_title=f'Avg Wall / CPU Time, {target_exp_name}'
print(f'base_title={base_title}')

In [None]:
# average duration and cpu_time by experimental component for post-processing jobs, 
# note: if this doesn't work, it might be that target_exp_name has no postprocessing.
# in that case, set exp_name to a different one by changing the "choice" int, in the
# previous field, or just typing your desired exp_name in the field below
gr.graph_experiment(exp_name=target_exp_name, 
                    jobs=jobs_pp, 
                    metric=['duration','cpu_time'],
                    title=f'{base_title} post-processing')
                    

In [None]:
# average duration and cpu_time for refineDiag jobs
# note: if this doesn't work, it might be that target_exp_name has no jobs of this component.
# in that case, set exp_name to a different one by changing the "choice" int, or just typing
# your desired exp_name in the field 
gr.graph_experiment(exp_name=target_exp_name, 
                    jobs=jobs_rd, 
                    metric=['duration','cpu_time'],
                    title=f'{base_title} post-processing')


In [None]:
# average duration and cpu_time for analysis jobs
# note: if this doesn't work, it might be that target_exp_name has no jobs of this component.
# in that case, set exp_name to a different one by changing the "choice" int, or just typing
# your desired exp_name in the field 
gr.graph_experiment(exp_name=target_exp_name, 
                    jobs=jobs_ana, 
                    metric=['duration','cpu_time'],
                    title=f'{base_title} analysis')
                   

## Plotting/Graphing Job Metadata 2

Another function for making quick plots is the `gr` object's `graph_components` function, the code of which can also be seen in file `ui/graphing.py`. This function will do no averaging of the times, instead sorting them in descending order by wall clock time and displaying the results across individual job IDs.

Below, the CPU time and duration are plotted for individual `refineDiag` jobs.

In [None]:
# duration and cpu_time for refineDiag jobs by job ID
# note: if this doesn't work, it might be that target_exp_name has no jobs of this component.
# in that case, set exp_name to a different one by changing the "choice" int, or just typing
# your desired exp_name in the field 
gr.graph_components(jobs=jobs_rd, 
                    exp_name=target_exp_name, 
                    exp_component='refineDiag', 
                    metric=['duration','cpu_time'])

Now we try to do the same for jobs with `exp_component='analysis'`.

In [None]:
# duration and cpu_time for analysis jobs by job ID
#plot=gr.graph_components(jobs=jobs_ana, 
gr.graph_components(jobs=jobs_ana, 
                    exp_name=target_exp_name, 
                    exp_component='analysis', 
                    metric=['duration','cpu_time'])
