In [None]:
!mkdir -p ~/agave/funwave-tvd-jenkins-pipeline

%cd ~/agave/funwave-tvd-jenkins-pipeline

!pip3 install --upgrade setvar

import re
import os
import sys
from setvar import *

# This cell enables inline plotting in the notebook
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

!auth-tokens-refresh

In [None]:
!echo ${AGAVE_STORAGE_SYSTEM_ID}

# Validating Your Build
* Automation is great, but things can quickly go wrong.
* This is why CI/CD strongly emphasizes good testing practices.
* Testing is central to CI/CD
  * It allows you assess viability of each commit
  * It is the determination of whether or not code can be successfully integrated
  * It allows for code to be automatically deployed to prod, without the direct oversight of a tightly controlled group of developers.
* Production pipelines have multiple testing guards
  * Types of tests: Unit, Functional, Acceptance, Benchmarks, ...

In this notebook, we'll be validating the performance of Funwave by running a strong scaling study. We've already collected a few points for a strong scaling study with processor counts of 1, 2, and 4. We're going to visualize our benchmark with Matplotlib, then we'll run another Benchmark with Jenkins to add additional points, and finally we'll plot our benchmarking results including the lastest data.

# Visualize Initial Benchmarks

We've stored output files from FUNWAVE in /home/jovyan/notebooks/build/np_{1, 2, 4}. The output files are names DATE_COMMIT_RUN.out.

In [None]:
!ls /home/jovyan/notebooks/build/np_1

## Collecting data
The general flow for gathering and plotting our data is as folows:
1. Scan for output files to see what benchmarks have already been run
2. Collect data from each output file and store it into a dictionary
3. Plot data from

We'll need a way to find out what benchmarks have already been run. get_xpoints takes a directory where output data from Funwave is stored and finds which returns filenames of all the output files. We've suffixed all our output files with .out so they're easy to find. We're also sorting these points by date then run number so they'll appear in order when we plot them.

In [None]:
def get_xpoints(directory):
    """Create a list of x-axis points to plot based on available runs
    Sorts by date then run number"""
    xpoints = []
    for filename in os.listdir(directory):
        if ".out" in filename:
            xpoints.append(filename)
    # Sort points by date then run number
    xpoints = sorted(xpoints, key = lambda x: (x.split("_")[0], x.split("_")[2]))
    return xpoints

The next step in gathering our data is to write a function to get timing results from Funwave output files. get_time_from_output takes a filename and returns a float of the simulation time if one exists.

In [None]:
# Takes an output filename and returns a simulation time
def get_time_from_output(output_filename):
    """output_filename: string of the output filename
    returns a float of simulation time if it exists"""
    
    with open(output_filename, 'r') as output:
        for line in output:
            if "simulation" in line.lower():
                line = ' '.join(line.lower().split())
                split_line = line.split()
                time = float(split_line[2])
                return time
        output.close()
    return "No timing result found!"

In [None]:
def get_metadata_from_output(output):
    """output: list output from funwave split by lines
    ex: ["line1", "line2", ...]
    returns the date and commit of the run"""
    date = ''
    commit = ''
    for line in output:
        if "funwave run date" in line.lower():
            line = ' '.join(line.lower().split())
            split_line = line.split()
            date = split_line[-1]
        if "funwave commit hash" in line.lower():
            line = ' '.join(line.lower().split())
            split_line = line.split()
            commit = split_line[-1]
    return date + '_' + commit

Now that we know what benchmarks have been run and can also get timing data, we can store our data. gather_data uses the get_xpoints and get_time_from_output functions to store our benchmarking data into a dictionary.

In [None]:
def gather_data(directories):
    """Use get_time_from_output and get_xpoints to
    create results dictionary and x ticks list
    directories = list of directories to get data from ['np_1', 'np_2', ...]
    results: dict of floats containing simulation times from funwave
    ex: 
    results[1] = [57.45267612299358, 58.964640157995746, 57.09651633500471]
    results[2] = [16.213947223004652, 16.57105119198968, 15.723207671995624]
    results['xpoints'] = ['day1_commit1_run#', 'day2_commit2_run#', 'day3_commit2_run#']
    """  
    
    results = {}
    results['xpoints'] = get_xpoints(directories[0])
    
    for directory in directories:
        np = int(float(directory.split('/')[-1][-1]))
        results[np] = []
        for filename in results['xpoints']:
            output_file = directory + '/' + filename
            results[np].append(get_time_from_output(output_file))
    print(results)
    return results

We've implemented two plotting functions for our strong scaling study. The first function plots strong scaling results for a single commit.

In [None]:
def plot_funwave_single_commit(results, date_commit_run):
    """Make a strong scaling plot from a single run
    results: dict of floats containing simulation times from funwave
    ex: 
    results[1] = [57.45267612299358, 58.964640157995746, 57.09651633500471]
    results[2] = [16.213947223004652, 16.57105119198968, 15.723207671995624]
    results['xpoints'] = ['day1_commit1_run#', 'day2_commit2_run#', 'day3_commit2_run#']
    The lists in results should be the same length!
    """
    
    np_list = [1, 2, 4]
    index = results['xpoints'].index(date_commit_run + '.out')
    timings = []
    
    for np in np_list:
        timings.append(results[np][index])
        
    plt.plot(np_list, timings, marker='o', markersize=12, linewidth=2)
    plt.title("Funwave Strong Scaling for {date_commit}".format(date_commit=date_commit_run))
    plt.ylabel('Total Simulation Time (s)')
    plt.xlabel('NP')

We can track the performace of our application over time with plot_funwave_over_time. The x-axis of this plot is each unique run of our benchmark. The y-axis is the total simulation time. We structured the data in the dictionary such that creating a plot over time doesn't require much data manipulation.

In [None]:
def plot_funwave_over_time(results):
    """Plots strong scaling results over time
    results: dict of floats containing simulation times from funwave
    ex: 
    results[1] = [57.45267612299358, 58.964640157995746, 57.09651633500471]
    results[2] = [16.213947223004652, 16.57105119198968, 15.723207671995624]
    results['xpoints'] = ['day1_commit1_run#', 'day2_commit2_run#', 'day3_commit2_run#']
    The lists in results should be the same length!
    """
    fig, ax = plt.subplots()
    
    # Format x labesl
    x_labels = [x.split('.')[0] for x in results['xpoints']]
    for np in [1, 2, 4]:
        plt.plot( x_labels, results[np], marker='o', markersize=12, linewidth=2, label=np)

    plt.title('Funwave Strong Scaling Over Time')
    plt.ylabel('Total Simulation Time (s)')
    plt.xlabel('Date_Commit_Run#')
    plt.xticks(rotation=30, horizontalalignment='right')
    plt.legend(loc='upper left', title="NP", bbox_to_anchor=(1,1))
    plt.show()

## Putting the visualization together
We can now plot our results in a few steps:
* Create a list of directories that the 'gather_data' function needs find the data. 
* Plot the data with 'plot_funwave'. plot_funwave takes the dictionary returned by 'gather_data' as an input.

In [None]:
directories = ['/home/jovyan/notebooks/build/' + np for np in ['np_1', 'np_2', 'np_4']]
results = gather_data(directories)

In [None]:
plot_funwave_over_time(results)

In [None]:
plot_funwave(results, '2018-10-23_d789c5d_2')

## Adding a Benchmark
* Let's add a simple benchmark to validate performance after each build.
* We'll make a new feature branch for this benchmark

In [None]:
!ssh sandbox "cd ~/FUNWAVE-TVD && git checkout -b benchmark"

* We're going to need new input files in order to run a strong scaling study

In [None]:
script_content = """
VERSION=$(cat version.txt | paste -sd "." -)

export BASE_DIR=$PWD 
for np in {1,2,4}; do
  cd ${BASE_DIR}/np_${np}
  docker run funwave-tvd:\${VERSION} mpirun -np ${np} /home/install/FUNWAVE-TVD/src/funwave_vessel
"""

with open('funwave-benchmark-wrapper.txt', 'w') as benchmark_wrapper:
    benchmark_wrapper.write(script_content)


!files-upload -S ${AGAVE_STORAGE_SYSTEM_ID} -F funwave-benchmark-wrapper.txt /home/jovyan/FUNWAVE-TVD/build/

In [None]:
writefile("funwave-benchmark-app.txt","""
{  
   "name":"${AGAVE_USERNAME}-${MACHINE_NAME}-funwave-dbenchmark",
   "version":"1.0",
   "label":"Benchmarks the funwave docker image",
   "shortDescription":"Funwave docker benchmark",
   "longDescription":"",
   "deploymentSystem":"${AGAVE_STORAGE_SYSTEM_ID}",
   "deploymentPath":"/home/jovyan/FUNWAVE-TVD/",
   "templatePath":"build/funwave-benchmark-wrapper.txt",
   "testPath":"version.txt",
   "executionSystem":"${AGAVE_EXECUTION_SYSTEM_ID}",
   "executionType":"CLI",
   "parallelism":"SERIAL",
   "modules":[],
   "inputs":[],
   "parameters":[{
     "id" : "code_version",
     "value" : {
       "visible":true,
       "required":true,
       "type":"string",
       "order":0,
       "enquote":false,
       "default":"latest"
     },
     "details":{
         "label": "Version of the code",
         "description": "If true, output will be packed and compressed",
         "argument": null,
         "showArgument": false,
         "repeatArgument": false
     },
     "semantics":{
         "argument": null,
         "showArgument": false,
         "repeatArgument": false
     }
   }],
   "outputs":[]
}
""")
!files-upload -S ${AGAVE_STORAGE_SYSTEM_ID} -F funwave-benchmark-app.txt /home/jovyan/FUNWAVE-TVD/build/

# Automating Our Benchmark

## Parameterized Builds

## Triggering Pipelines From Other Pipelines

# DRYing Out Jenkinsfiles In Production
Both our build and benchmark Jenkinsfiles are following the same basic workflow _(checkout code, setup Agave CLI, build and submit job to execution environment)_ which has resulted in a bunch of duplicated code that varies only by a keyword. For the sake of maintainability, we should find a way to improve the orthogonality of our pipeline.

## Jenkins Shared Libraries
A [Jenkins Shared Library](https://jenkins.io/doc/book/pipeline/shared-libraries/) is a repository of code that is centrally managed, and made accessible to multiple pipelines. These pipelines can call shared libraries as functions, and pass parameters to specify build options.

# Commit Your Benchmark, Watch It Run
* Let's merge our benchmark back into the `dev` branch and watch it run!

In [None]:
!ssh sandbox "set -x && cd ~/FUNWAVE-TVD && git add build"
!ssh sandbox "cd ~/FUNWAVE-TVD && git commit -m 'Added benchmark app.'"

In [None]:
!ssh sandbox "cd ~/FUNWAVE-TVD && git checkout dev && git merge --squash benchmark"

## Adding our new data to the plot
Now that we've run our benchmark, it's time to replot our data and see what has changed. 

In [None]:
directories = ['/home/jovyan/notebooks/build/' + np for np in ['np_1', 'np_2', 'np_4']]
results = gather_data(directories)
plot_funwave_over_time(results)

## What's left?
* Automating our plotting process

The plotting process shown above can be added to Jenkins if desired. Ideally, these functions would be added to *Jenkins shared libraries* so that our pipeline is readable and steps can be easily swapped out or modified. 

* Making the benchmarks easily reproducible

The benchmarks shown above are close to reproducible. However, we can still add metadata make them more meanigful. The most important metadata we've left out is *where* the benchmarks were run. 
