# Practicing Good Enough Software and Change Management Practices
 


In [198]:
!mkdir -p ~/agave

%cd ~/agave

import re
import os
import sys
import json
from setvar import *
from time import sleep

loadvar()
!auth-tokens-refresh || auth-tokens-create

import runagavecmd as r # created in notebook "05 - Hands on with The Agave CLI.ipynb"
import imp
imp.reload(r)

%load_ext rpy2.ipython

/home/jovyan/agave
AGAVE_APP_DEPLOYMENT_PATH=agave-deploy
AGAVE_APP_NAME=training-dooley
AGAVE_EXECUTION_SYSTEM_ID=sandbox-exec-dooley
AGAVE_STORAGE_HOME_DIR=/home/jovyan
AGAVE_STORAGE_SYSTEM_ID=sandbox-storage-dooley
AGAVE_STORAGE_WORK_DIR=/home/jovyan
AGAVE_SYSTEM_SITE_DOMAIN=localdomain
CMD=jobs-output-get 4347833118371933720-242ac114-0001-007 fork-command-1.out
INPUTS={}
JOB_FILE=job-remote-5058.txt
JOB_ID=Invalid Credentials
/bin/sh: 2: stty:: not found
MACHINE_NAME=sandbox
MACHINE_USERNAME=jovyan
OUTPUT=Invalid Credentials
stty: 'standard input': Inappropriate ioctl for device
PBTOK=
REMOTE_COMMAND=ls /usr/install
REQUESTBIN_URL=https://requestbin.agaveapi.co/1or1rh91
SCRATCH_DIR=/home/jovyan
STAT=Invalid Credentials
stty: 'standard input': Inappropriate ioctl for device

/bin/sh: 2: /bin/sh:: not found
VM_IPADDRESS=52.15.62.13
[1;0mToken for sandbox:dooley successfully refreshed and cached for 14400 seconds
f07b3ea320341d27334e602b13f61473[0m
The rpy2.ipython extension is alre

## Software  



> Place a brief explanatory comment at the start of every program  


Our previous notebook walked us through the process of building and testing the FUNWAVE-TVD app a couple different ways. We can take a further step towards improving the user experience by including a static test dataset with the source code so every user can run the exact same build, test, and validation commands with confidence that the results they got were the ones they should get. This lines up favorably with the Good Enough recommendation to:  

> Provide a simple example or test data set

Let's take a moment to add the `input.txt` file we used in our previous runs to the application repository on our sandbox.

In [88]:
%%bash
files-mkdir -N "data" -S $AGAVE_STORAGE_SYSTEM_ID   ./FUNWAVE-TVD
files-copy -D "FUNWAVE-TVD/data/input.txt"  -S $AGAVE_STORAGE_SYSTEM_ID  ./input.txt

Successfully created folder data
Successfully copied ./input.txt to FUNWAVE-TVD/data/input.txt


### Sample data

We will also add it to our Agave app deployment directory so we can be assured that it is available whenever the application is available.

In [105]:
%%bash
files-copy -D "$AGAVE_APP_DEPLOYMENT_PATH/input.txt"  -S $AGAVE_STORAGE_SYSTEM_ID ./input.txt

Successfully copied ./input.txt to agave-deploy/input.txt


Now that we have a sample dataset, let's update our app definition to include the sample dataset as the default input. In doing so, we can guarantee that users have a predictable experience the first time they run our app.

In [106]:
# read old app definition
appJson = json.load(open("fork-app.txt"))

# update the default value of our `datafile` input
appJson['inputs'][0]['value']['default'] ="agave://{}/{}/input.txt".format(appJson['deploymentSystem'],appJson['deploymentPath'])

# add some semantic info about the file
appJson['inputs'][0]['semantics'] = {'ontology': ["text"]}

# save the definition to a new file
json.dump(appJson, open("fork-app-with-default-input.json","w"),indent=2)

The new input defintion for our app now has the new default value and ontological information

In [107]:
print(json.dumps(appJson['inputs'][0],indent=2))

{
  "id": "datafile",
  "details": {
    "label": "Data file",
    "description": "",
    "argument": null,
    "showArgument": false
  },
  "value": {
    "default": "agave://sandbox-storage-dooley/agave-deploy/input.txt",
    "order": 0,
    "required": false,
    "validator": "",
    "visible": true
  },
  "semantics": {
    "ontology": [
      "text"
    ]
  }
}


### Publishing

At some point you will want to publish the results of your research. At that time, you will need to provide references to your data, code, and results. The Good Enough recommendation is to: 

> Submit code to a reputable DOI-issuing repository

Ironically, we are using Github for this tutorial and they do not currently issue DOI. There are, however, plenty of other options, including a recommendation from the Github team itself.

* Github wrote a blog about [Making Your Code Citeable](https://guides.github.com/activities/citable-code/) with a free DOI from [Zenodo](https://zenodo.org/) for your Github repository.  
* [Figshare](https://figshare.org) provides DOI for any data hosted there. 
* [The Journal of Open Source Software](https://joss.org) provides free publishing and DOI for software documented and published in their journal in the form of a short paper.
* [DataCite](https://datacite.org) Provides search, discovery, and DOI for published data.

One of the easiest ways to get started with these services is to add a [CodeMeta](https://codemeta.github.io/) file to your project so it can be discovered in ways that span languages, publishing services, and dependency system syntax. We included a sample codemeta file for our FUNWAVE-TVD application to this repository. 

In [129]:
print(readfile("../notebooks/etc/codemeta.json"))

Reading file `../notebooks/etc/codemeta.json'
{
  "@context": "https://raw.githubusercontent.com/codemeta/codemeta/master/codemeta.jsonld",
  "@type": "SoftwareSourceCode",
  "identifier": "FUNWAVE-TVD",
  "description": "FUNWAVE–TVD is the TVD version of the fully nonlinear Boussinesq wave model (FUNWAVE) initially developed by Kirby et al. (1998)",
  "name": "FUNWAVE-TVD",
  "codeRepository": "https://github.com/fengyanshi/FUNWAVE-TVD",
  "issueTracker": "https://github.com/fengyanshi/FUNWAVE-TVD/issues",
  "license": "https://opensource.org/licenses/BSD-2-Clause",
  "version": "3.3",
  "programmingLanguage": {
    "@type": "ComputerLanguage",
    "name": "Fortrain",
    "version": "77",
    "url": "https://gcc.gnu.org/fortran/"
  },
  "runtimePlatform": "GNU Fortran 77",
  "author": [
    {
      "@type": "Person",
      "givenName": "Fengyan",
      "familyName": "Shi",
      "email": "fyshi@udel.edu",
      "@id": "https://orcid.org/0000-0003-1568-2449"
    },
    {
      "@type":

Briefly looking at the format, you will notice that it's JSON and there is a good bit of linked data included. If you are familiar with Schema.org, you might recognize many of the fields as official types of one or more Schema.org definitions. This definition is in now way complete. We leave the addition of additional fields as an exercise. You can consult the [CodeMeta User Guide](https://codemeta.github.io/terms/) for a full list of terms. 

The [codemetar](https://ropensci.github.io/codemetar/) library for the R language provides several tools to help you author and validate CodeMeta files. Their validation tools is particularly helpful. Here is an example of its usage.

In [158]:
%%R

library(codemetar) 

codemeta_validate(codemeta = "../notebooks/etc/codemeta.json", context=NULL)

[1] TRUE


Let's upload the `codemeta.json` file to our repository directory to include with our source.

In [159]:
!files-upload --filetoupload=../notebooks/etc/codemeta.json  --systemid=$AGAVE_STORAGE_SYSTEM_ID  ./FUNWAVE-TVD

Uploading ../notebooks/etc/codemeta.json...
######################################################################## 100.0%


## Keeping Track of Changes  

Knowing how a piece of software changes over time can be challenging to do for invested iniviuals. For users, it can be downright intimidating, bordering on impossible. The Good Enough recommendation says to:  

> Create, maintain, and use a checklist for saving and sharing changes to the project  

One way to do that is by keeping a changelog. Changelogs are structued text documents that describe the major, an sometimes minor changes to a project over time an release. There are many way to stucture a changelog. For reasons we will quickly see, we recommend the format found at [keepachangelog.com](http://keepachangelog.com/). Their changelog format is a machine-parsable Markdown format which leverages semantic versioning. 

While we could generate the changelog by hand, for existing projects and active ones, this quickly becomes enough of a hassle to dissuade us from keeping up with. Luckily, we can leverage projects such as the [gitchangelog](https://github.com/vaab/gitchangelog) project to generate and maintain our changelog for us. The outcome will be a file named `CHANGELOG.md` that we can add and commit to our repository


In [165]:
# let's run the gitchangelog utility on the repository
r.runagavecmd('cp .gitchangelog.rc /home/jovyan/FUNWAVE-TVD/.gitchangelog.rc && ' + 
              'cd /home/jovyan/FUNWAVE-TVD && ' + 
              'gitchangelog | tee CHANGELOG.md', 
              "https://raw.githubusercontent.com/agavetraining/pearc18/master/etc/.gitchangelog.rc")

REMOTE_COMMAND=cp .gitchangelog.rc /home/jovyan/FUNWAVE-TVD/.gitchangelog.rc && cd /home/jovyan/FUNWAVE-TVD && gitchangelog | tee CHANGELOG.md
REQUESTBIN_URL=https://requestbin.agaveapi.co/1762h2g1

 ** QUERY STRING FOR REQUESTBIN **
https://requestbin.agaveapi.co/1762h2g1?inspect

INPUTS={"datafile":"https://raw.githubusercontent.com/agavetraining/pearc18/master/etc/.gitchangelog.rc"}
JOB_FILE=job-remote-5058.txt
Writing file `job-remote-5058.txt'
OUTPUT=Successfully submitted job 4347833118371933720-242ac114-0001-007
JOB_ID=4347833118371933720-242ac114-0001-007
STAT=PENDING
STAT=PROCESSING_INPUTS
STAT=STAGED
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=FINISHED
CMD=jobs-output-get 4347833118371933720-242ac114-0001-007 fork-command-1.out
All done! Output follows.
Reading file `fork-command-1.out'
# Changelog


## (unreleased)

### Other

* Added Onyx makefiles for sediment and vessel-sediment. [matt.malej@gmail.com]

* Block Dmas

Another Good Enough software recommendation is:

> Share changes frequently  

Having a changelog is much more useful when it is up to date. Rather than counting on ourself and our colleagues to remember to rebuild it each time, let's set up a git hook to rerun `gitchangelog` and update our changelog whenever we merge our develop branch back into master. The following script should handle that for us. 

```bash
#!/bin/sh

# post-checkout hook - looks for changes to source files and, if
# found, generates a new changelog file from the commit history.

# To install, copy to your project's .git/hooks folder, and 
# `chmod +x post-merge`

function changed {
  git diff --name-only HEAD@{2} HEAD | grep "^$1" > /dev/null 2>&1
}

gitchangelog 
```

We can install the scrirpt on our sandbox by copying the script to the repository `.git/hooks` folder and assigning execute privileges.

In [None]:
%%bash

files-upload --filetoupload scripts/git_hooks/post-merge --systemid $STORAGE_SYSTEM  "./FUNWAVE-TVD/.git/hooks"
ssh sandbox "chmod +x FUNWAVE-TVD/.git/hooks/post-merge"

bash: maintain,: command not found


## Manual Versioning  

The Good Enough recommenation is to:  

> Copy the entire project whenever a significant change has been made      

If your entire project, including data, is under version control, they may be a bit of overkill when you can just as easily branch the repository, however, as we are seeing, in our overall digital R&D lifcycle, there are other considerations upstream from just managing the code. 


## Maintaining your Agave app 

Keeping your Agave app in sync with the code it represents will help your users feel secure that what they think they will be running is what will actually be run. Aside from the standard revision increment that happens whenever you update your app definition, additional information can be added to better inform users of its activity, utility, and reliability. 


### Updating your app definition  

The lowest hanging fruit is simply to update your app definition. If any ontological terms, descriptions, etc. have changed, the `app.json` file should be updated to reflect the latest information. We can automate this with the use of the same `git-merge` file we used to update our changelog. 

Adding the following code to the end of that file will result in our app being updated whenever we merge into our master branch. 

```bash
...

AGAVE_APP_DEPLOYMENT_SYSTEM=$(jq -r '.deploymentSystem' app.json)
AGAVE_APP_DEPLOYMENT_PATH=$(jq -r '.deploymentPath' app.json)

if changed 'app.json'; then
  files-upload -F app.json -S $AGAVE_APP_DEPLOYMENT_SYSTEM $AGAVE_APP_DEPLOYMENT_PATH
  apps-addupdate -F app.json
fi $APP_ID  

...

```  


### Updating your app metadata

If you have an additional metadata such as a CodeMeta, yaml, or CWL file for your app, updating the app's metadata with any new or changed content should be done now. 

```bash
...

# Look up an existing codemeta file stored in the metadata API.
function lookup_agave_codemeta_metadata_uuid {
   metadata-list --query '{"name": "codemeta"}' -P 5547200605505711640-242ac117-0001-005  --filter=uuid -v | jq -r '.[] | .uuid' | grep -v "null" | head -n 1
}


AGAVE_APP_ID=$(jq -r '.name + "-" + .version' app.json)
AGAVE_APP_UUID=$(apps-list -v --filter=uuid $agave_app_id | jq -r '.uuid')

# Push the codemeta definition up to the server whenever this changes
if changed 'codemeta.json'; then
  files-upload -F codemeta.json -S $AGAVE_APP_DEPLOYMENT_SYSTEM $AGAVE_APP_DEPLOYMENT_PATH

  # Lookup Agave metadata item for the codemeta definition if it exists
  AGAVE_CODEMETA_METADATA_ID=$(lookup_agave_codemeta_metadata_uuid)

  # Update the entry. If no entry exists, this creates one.
  jq --arg app_uuid $AGAVE_APP_UUID '. | {"name": "codemeta", "value": ., "associatedUuid": $app_uuid}' | metadata-addupdate -F - $AGAVE_CODEMETA_METADATA_ID

fi


...
```


### Mirroring your app tags

The tags you assign to your app probably will not change very often, but in the event a new feature requires changes to existing tags, or a reorganization of tags, then those updates should be done now.  

### Deprecating your old app(s)

Some releases, such as rollbacks, yanked releases, and security patches may justify the deprecation or disabling of a previous app version. This is where it would be done.

### (Re)running benchmarks

Once your build, unit, and integration tests pass, your app and its data are updated, and your assets are publised to their new location, benchmark jobs can be run to measure application performance before and after the release.

```bash
...

# Uncomment to kick off benchmark suites whenever on merge to master
AGAVE_JOB_TEMPLATE=$(jobs-template --cache --allfields $AGAVE_APP_ID)
BENCHMARK_MERGE_REV=$(date +%s)

for i in `ls -d benchmarks`; do
  BENCHMARK_DIR="agave://$AGAVE_APP_DEPLOYMENT_SYSTEM/$(pwd)/benchmarks/$i"
  BENCHMARK_JOB_NAME="benchmark-$i"
  BENCHMARK_ARCHIVE_PATH="benchmarks/$BENCHMARK_MERGE_REV/$i"

  echo "$AGAVE_JOB_TEMPLATE" | \
    jq --arg job_archive_path "$BENCHMARK_ARCHIVE_PATH" \
       --arg job_name $BENCHMARK_JOB_NAME \
       --arg job_input_dir $BENCHMARK_DIR \
       '. | .name=$job_name | .inputs.datafile=$job_input_dir | .archive=true | .archivePath=$job_archive_path' | \
    jobs-submit -F -

done

...
```


### Updating analytics and published data  

When your benchmarks complete, they can be published along with other app assets as metadata, archived with the benchmark jobs themselves, or saved as part of a document kept with the code.  

### Version Control Systems  

Let's just quickly acknowledge that we need to use some form of version control if we're going to talk about anything else related to proper software development in an open science community. The Good Enough recommendation says it all: 

> Use a version control system


Using a version control system does not imply that everything should go in there. While Github, GitLab, and Bitbucket have large file support now, that does not mean we should commit large binary files without proper consideration.

> Consider what not to put under version control 

As a rule of thumb, lean away from committing large binary docs and anyting that doens't lend itself well to text diffs.



Before we finish this notebook, let's go ahead and commit the changes to our repository.

In [None]:
# remote in and commit the directory
#ssh sandbox "cd FUNWAVE_TVD && git add -A data && git commit -m 'Adding example dataset' ."

Finally, we will add the new files to version control.

In [123]:
r.runagavecmd('cd /home/jovyan/FUNWAVE-TVD && ' + 
              'git status && ' + 
              'git add -A data CHANGELOG.md codemeta.json && ' + 
              'git status  && ' + 
              'git commit -m "Adding example dataset, changelog, and "'')

REMOTE_COMMAND=cd /home/jovyan/FUNWAVE-TVD && git add -A data CHANGELOG.md codemeta.json
REQUESTBIN_URL=https://requestbin.agaveapi.co/s8ono0s8

 ** QUERY STRING FOR REQUESTBIN **
https://requestbin.agaveapi.co/s8ono0s8?inspect

INPUTS={}
JOB_FILE=job-remote-5058.txt
Writing file `job-remote-5058.txt'
OUTPUT=Successfully submitted job 1416183485880209896-242ac114-0001-007
JOB_ID=1416183485880209896-242ac114-0001-007
STAT=PENDING
STAT=STAGED
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=CLEANING_UP
STAT=FINISHED
CMD=jobs-output-get 1416183485880209896-242ac114-0001-007 fork-command-1.out
All done! Output follows.
Reading file `fork-command-1.out'



If everything succeeded, you should see your version number incremented and a build job now running in your job history.

In [199]:
# WITH the singularity check in the user's .profile and .bashrc files
r.runagavecmd("ls /usr/install")

REMOTE_COMMAND=ls /usr/install
REQUESTBIN_URL=https://requestbin.agaveapi.co/1fzhazm1

 ** QUERY STRING FOR REQUESTBIN **
https://requestbin.agaveapi.co/1fzhazm1?inspect

INPUTS={}
JOB_FILE=job-remote-5058.txt
Writing file `job-remote-5058.txt'
OUTPUT=Successfully submitted job 5818937556024496616-242ac114-0001-007
JOB_ID=5818937556024496616-242ac114-0001-007
STAT=PENDING
STAT=STAGED
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=RUNNING
STAT=FINISHED
CMD=jobs-output-get 5818937556024496616-242ac114-0001-007 fork-command-1.out
All done! Output follows.
Reading file `fork-command-1.out'



In [200]:
!jobs-output-get -P 5818937556024496616-242ac114-0001-007 fork-command-1.err

ls: cannot access '/usr/install': No such file or directory


In [201]:
# WITHOUT the singularity check in the user's .profile file
r.runagavecmd("ls /usr/install")

REMOTE_COMMAND=ls /usr/install
REQUESTBIN_URL=https://requestbin.agaveapi.co/1880ovh1

 ** QUERY STRING FOR REQUESTBIN **
https://requestbin.agaveapi.co/1880ovh1?inspect

INPUTS={}
JOB_FILE=job-remote-5058.txt
Writing file `job-remote-5058.txt'
OUTPUT=Successfully submitted job 1163765710547775000-242ac114-0001-007
JOB_ID=1163765710547775000-242ac114-0001-007
STAT=PENDING
STAT=STAGED
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=SUBMITTING
STAT=FINISHED
CMD=jobs-output-get 1163765710547775000-242ac114-0001-007 fork-command-1.out
All done! Output follows.
Reading file `fork-command-1.out'



In [202]:
!jobs-output-get -P 1163765710547775000-242ac114-0001-007 fork-command-1.err

ls: cannot access '/usr/install': No such file or directory


In [206]:
!ssh sandbox ls /usr/install

ls: cannot access '/usr/install': No such file or directory
