<h2>Setting Stuff Up</h2>

Here we import some packages that we'll need in various places. We'll also load all the variables we set in config.

In [1]:
!mkdir -p ~/agave

%cd ~/agave

!pip3 install --upgrade setvar

import re
import os
import sys
import json
import requests
from agavepy.agave import Agave
from urllib.parse import urlparse
from setvar import *
from time import sleep

requests.packages.urllib3.disable_warnings()

# This cell enables inline plotting in the notebook
%matplotlib inline

import matplotlib
import numpy as np
import matplotlib.pyplot as plt

setvar("""
AGAVE_APP_NAME=training-${AGAVE_USERNAME}
AGAVE_STORAGE_SYSTEM_ID=sandbox-storage-${AGAVE_USERNAME}
AGAVE_EXECUTION_SYSTEM_ID=sandbox-exec-${AGAVE_USERNAME}
AGAVE_SYSTEM_SITE_DOMAIN=localdomain
MACHINE_NAME=sandbox
MACHINE_USERNAME=jovyan
AGAVE_STORAGE_HOME_DIR=/home/${MACHINE_USERNAME}
SCRATCH_DIR=/home/${MACHINE_USERNAME}
AGAVE_STORAGE_WORK_DIR=/home/${MACHINE_USERNAME}
AGAVE_APP_DEPLOYMENT_PATH=agave-deploy
""")
loadvar()

AGAVE_APP_NAME = "training-"+os.environ['AGAVE_USERNAME']
AGAVE_STORAGE_SYSTEM_ID = "sandbox-storage-"+os.environ['AGAVE_USERNAME']
AGAVE_EXECUTION_SYSTEM_ID = "sandbox-exec-"+os.environ['AGAVE_USERNAME']
AGAVE_SYSTEM_SITE_DOMAIN = "localdomain"
MACHINE_NAME = "sandbox"
MACHINE_USERNAME = "jovyan"
AGAVE_STORAGE_HOME_DIR = "/home/"+os.environ['MACHINE_USERNAME']
SCRATCH_DIR = "/home/"+os.environ['MACHINE_USERNAME']
AGAVE_STORAGE_WORK_DIR = "/home/"+os.environ['MACHINE_USERNAME']
AGAVE_APP_DEPLOYMENT_PATH = "agave-deploy"

/home/jovyan/agave
[33mRetrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f80615458d0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /simple/setvar/[0m
[33mRetrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f8061545940>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /simple/setvar/[0m
[33mRetrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.VerifiedHTTPSConnection object at 0x7f80615459b0>: Failed to establish a new connection: [Errno -2] Name or service not known',)': /simple/setvar/[0m
[33mRetrying (Retry(

## Tenant configuration  

If you are running against a privately hosted Agave tenant, you need to configure the tutorial to run against your tenant. You can do that by setting the `AGAVE_TENANTS_API_BASEURL` and `AGAVE_TENANT_ID` environment variables. 

If you specified the correct address to your Tenants API, you should be able to discover the tenants available for you through the CLI.

Select the tenant you would like to use by setting the `AGAVE_TENANT` environment variable. You may select either the name of the tenant, or leave it blank to select the default tenant.

In [11]:
!tenants-init

[1;0mYou are now configured to interact with the APIs at https://sandbox.agaveplatform.org/[0m


<h3>Create the Client</h3>

In this next step we delete the client if it exists. Chances are, yours doesn't yet. We put this command here in case, for some reason, you want to re-create your client later on. If you delete the client you intend to create before you create it, no harm is done.

In [15]:
!clients-delete -u "$AGAVE_USERNAME" -p "$AGAVE_PASSWORD" $AGAVE_APP_NAME

[1;0mSuccessfully deleted client training-dooley[0m


In this step we create the client. Clients provide a way of encapsulating resources connected to a single project. Through the client, you will receive a token which you can use to run most of the Agave commands.

In [16]:
!clients-create -u "$AGAVE_USERNAME" -p "$AGAVE_PASSWORD" -N "$AGAVE_APP_NAME" -S

[1;0m[1;0mSuccessfully created client training-dooley
key: KrTTfIZ1OQ4Hie3ecO6aFK88LhEa 
secret: EuaogZcXq3_jMvGQCOQqtCNtgiQa[0m[0m


Create the token for your client. You will, from this point on, use this token to run the remainder of the Agave commands in this tutorial.

In [None]:
!auth-tokens-create -u $AGAVE_USERNAME -p "$AGAVE_PASSWORD" 

In [None]:
!auth-tokens-refresh -V

Now that we have a valid set of API keys, let's authenticate and get a valid token we can use.

In [None]:
agTokenCache = json.loads(open(os.environ['AGAVE_CACHE_DIR'] + '/current').read())
ag = Agave(token=agTokenCache['access_token'], 
           refresh_token=agTokenCache['refresh_token'], 
           username=agTokenCache['username'],
           password=os.environ['AGAVE_PASSWORD'],
           api_key=agTokenCache['apikey'], 
           api_secret=agTokenCache['apisecret'],
           api_server=agTokenCache['baseurl'], 
           client_name=agTokenCache['client_name'],
           verify=False)


In [None]:
ag.token.token_info

In [None]:
[client.name for client in ag.clients.list() if client['name'] == AGAVE_APP_NAME]

## FOLLOWING ALONG AT HOME  

If you are following along at home using the docker-compose stack, you will need to run the following cell to get the hostname and port of your tcp tunnel so Agave can contact your system without a public IP address.

In [73]:
from urllib.parse import urlparse

if os.environ.get('USE_TUNNEL') == 'True': 
    # fetch the hostname and port of the reverse tunnel running in the sandbox 
    # so Agave can connect to our local sandbox
    ngrokUrl = !ssh -q -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null sandbox 'curl -s  http://localhost:4040/api/tunnels | jq -r '.tunnels[0].public_url''
    parsedNgrokUrl = urlparse(ngrokUrl[0])
    AGAVE_SYSTEM_HOST = parsedNgrokUrl.hostname
    AGAVE_SYSTEM_PORT = parsedNgrokUrl.port
    
    print("Agave will be connecting to your sandbox at {}:{}".format(AGAVE_SYSTEM_HOST,AGAVE_SYSTEM_PORT))

    

Agave will be connecting to your sandbox at 0.tcp.ngrok.io:17667


We will recycle the ssh keys already in place to login from your notebook server.

In [74]:
PUB_KEY = readfile("/home/jovyan/.ssh/id_rsa.pub")
PRIV_KEY = readfile("/home/jovyan/.ssh/id_rsa")

Reading file `/home/jovyan/.ssh/id_rsa.pub'
Reading file `/home/jovyan/.ssh/id_rsa'


In this next cell, we create the json file used to describe the storage machine.

In [418]:
storageSystemDefinition = {
    "id": AGAVE_STORAGE_SYSTEM_ID,
    "name": "{} storage {}".format(MACHINE_NAME, MACHINE_USERNAME),
    "description": "The {} computer".format(MACHINE_NAME),
    "site": AGAVE_SYSTEM_SITE_DOMAIN,
    "type": "STORAGE",
    "storage": {
        "host": AGAVE_SYSTEM_HOST,
        "port": AGAVE_SYSTEM_PORT,
        "protocol": "SFTP",
        "rootDir": "/",
        "homeDir": AGAVE_STORAGE_HOME_DIR,
        "auth": {
          "username" : MACHINE_USERNAME,
          "publicKey" : PUB_KEY,
          "privateKey" : PRIV_KEY,
          "type" : "SSHKEYS"
        }
    }
}

Here, we tell Agave about the machine. You can re-run the previous cell and the next one if you want to change the definition of your storage machine.

In [76]:
storageSystem = ag.systems.add(body=storageSystemDefinition)

Next we run the Agave command `files-list`. This provides a check that we've set up the storage machine correctly.

In [420]:
fileListing = ag.files.list(systemId=storageSystem.id, filePath=".", limit=5)
[ f.name for f in fileListing ]

['.', '.bash_history', '.bash_logout', '.bashrc', '.cache']

<h2>Setting up the Execution Machine</h2>

You may not always wish to store your data on the same machine you run your jobs on. However, in this tutorial, we will assume that you do. The description for the execution machine is much like the storage machine. However, there are a few more pieces of information you'll need to provide. In this example, we are going to call commands directly on the host as opposed to using a batch queue scheduler. It is slightly simpler.

In [78]:
# Edit any parts of this file that you know need to be changed for your machine.
executionSystem = {
    "id": AGAVE_EXECUTION_SYSTEM_ID,
    "name": "{} {}".format(MACHINE_NAME, MACHINE_USERNAME),
    "description": "The {} computer".format(MACHINE_NAME),
    "site": AGAVE_SYSTEM_SITE_DOMAIN,
    "public": False,
    "status": "UP",
    "type": "EXECUTION",
    "executionType": "CLI",
    "scheduler" : "FORK",
    "environment": None,
    "scratchDir" : SCRATCH_DIR,
    "queues": [
        {
            "name": "fork",
            "default": True,
            "maxJobs": 10,
            "maxUserJobs": 10,
            "maxNodes": 6,
            "maxProcessorsPerNode": 6,
            "minProcessorsPerNode": 1,
            "maxRequestedTime": "00:30:00"
        }
    ],
    "login": {
        "auth": {
          "username" : MACHINE_USERNAME,
          "publicKey" : PUB_KEY,
          "privateKey" : PRIV_KEY,
          "type" : "SSHKEYS"
        },
        "host": AGAVE_SYSTEM_HOST,
        "port": AGAVE_SYSTEM_PORT,
        "protocol": "SSH"
    },
    "maxSystemJobs": 50,
    "maxSystemJobsPerUser": 50,
    "storage": {
        "host": AGAVE_SYSTEM_HOST,
        "port": AGAVE_SYSTEM_PORT,
        "protocol": "SFTP",
        "rootDir": "/",
        "homeDir": AGAVE_STORAGE_HOME_DIR,
        "auth": {
          "username" : MACHINE_USERNAME,
          "publicKey" : PUB_KEY,
          "privateKey" : PRIV_KEY,
          "type" : "SSHKEYS"
        }
    },
    "workDir": AGAVE_STORAGE_WORK_DIR
}

In [79]:
executionSystem = ag.systems.add(body=executionSystem)

In [80]:
# Test to see if this worked...
fileListing = ag.files.list(systemId=executionSystem.id, filePath=".", limit=5)
[ f.name for f in fileListing ]

['.', '.bash_logout', '.bashrc', '.cache', '.ngrok2']

<h3>Create the Application</h3>
Agave allows us to describe custom allocations, limiting users to run a specific job. In this case, we're going to create a simple "fork" scheduler that just takes the command we want to run as a job parameter. The wrapper file is a shell script we will run on the execution machine. If we were using a scheduler, this would be our batch file.

In [198]:
writefile("fork-wrapper.txt","""
#!/bin/bash
\${command}
""")

Writing file `fork-wrapper.txt'


Using Agave commands, we make a directory on the storage server an deploy our wrapper file there.

In [122]:
remoteDeploymentDir = ag.files.manage(systemId = storageSystem.id, 
                                      filePath = ".", 
                                      body = {
                                          "action": "mkdir", 
                                          "path": AGAVE_APP_DEPLOYMENT_PATH
                                      })



In [203]:
uploadedWrapperFileItem = ag.files.importData(
    systemId=storageSystem.id,
    filePath=AGAVE_APP_DEPLOYMENT_PATH,
    fileName='fork-wrapper.txt',
    fileToUpload=open("fork-wrapper.txt", 'rb'))

print("File successfully uploaded to",uploadedWrapperFileItem['path'])


File successfully uploaded to agave-deploy/fork-wrapper.txt


All agave applications require a test file. The test file is a free form text file which allows you to specify what resources you might need to test your application.

In [206]:
writefile("fork-test.txt","""
command=date
fork-wrapper.txt
""")

Writing file `fork-test.txt'


In [207]:
uploadedTestFileItem = ag.files.importData(
    systemId=storageSystem.id,
    filePath=AGAVE_APP_DEPLOYMENT_PATH,
    fileName="fork-test.txt",
    fileToUpload=open("fork-test.txt",'rb'))

print("File successfully uploaded to",uploadedTestFileItem['path'])


File successfully uploaded to agave-deploy/fork-test.txt


Like everything else in Agave, we describe our application with Json. We specifiy which machines the application will use, what method it will use for submitting jobs, job parameters and files, etc.

In [153]:
appDefinition = {  
   "name": "{}-{}-fork".format(os.environ['AGAVE_USERNAME'], MACHINE_NAME),
   "version":"1.0",
   "label":"Runs a command",
   "shortDescription":"Runs a command",
   "deploymentSystem": AGAVE_STORAGE_SYSTEM_ID,
   "deploymentPath": AGAVE_APP_DEPLOYMENT_PATH,
   "templatePath":"fork-wrapper.txt",
   "testPath":"fork-test.txt",
   "executionSystem": AGAVE_EXECUTION_SYSTEM_ID,
   "executionType":"CLI",
   "parallelism":"SERIAL",
   "inputs":[
         {   
         "id":"datafile",
         "details":{  
            "label":"Data file",
            "argument": None,
            "showArgument":False
         },
         "value":{  
            "default":"/dev/null",
         }
      }   
   ],
   "parameters":[{
     "id" : "command",
     "value" : {
       "required":True,
       "type":"string",
       "default":"/bin/date",
     },
     "details":{
         "label": "Command to run",
         "description": "This is the actual command you want to run. ex. df -h -d 1"
     }
   }]
}

In [253]:
app = ag.apps.add(body=appDefinition)
AGAVE_APP_ID = app.id
AGAVE_APP_ID


'dooley-sandbox-fork-1.0'

<h2>Running Jobs</h2>
Now that we have specified our application using Agave, it is time to try running jobs. To start a job we, once again, create a Json file. The Json file describes the app, what resource to run on, as well as how and when to send notifications. Notifications are delivered by callback url. EMAIL is the easiest type to configure, but we show here how to send webhook notifications to the popular [RequestBin](https://requestb.in/). 

Before we configure our notification, we need to create a requestbin to use. There are convenience commands to interact with requestbin built into the Agave CLI. We will use those to get our URL.

In [157]:
requestbinUrl = !requestbin-create 
os.environ['REQUESTBIN_URL'] = requestbinUrl[0].strip()

Now that we have a URL to recieve webhooks from our job, Let's look at our job request. The way this job is configured, it will send the requestbin notifications for every job event until the job reaches a terminal state. For a full list of job events, please see http://docs.agaveplatform.org/#job-monitoring

In [160]:
jobRequest = {
   "name": "fork-command-1",
   "appId": app.id,
   "executionSystem": executionSystem.id,
   "archive": False,
   "notifications": [
    {
      "url": requestbinUrl[0].strip()+"?event=${EVENT}&jobid=${JOB_ID}",
      "event":"*",
      "persistent":"true",
      "policy": {
         "retryStrategy": "DELAYED",
         "retryLimit": 2,
         "retryRate": 5,
         "retryDelay": 5,
         "saveOnFailure": True
      }
    }
   ],
   "parameters": {
      "command":"echo hello"
   }
}

Because the setvar() command can evalute `$()` style bash shell substitutions, we will use it to submit our job. This will capture the output of the submit command, and allow us to parse it for the JOB_ID. We'll use the JOB_ID in several subsequent steps.

In [231]:
job = ag.jobs.submit(body=jobRequest)
job.id
# setvar("""
# # Capture the output of the job submit command
# OUTPUT=$(jobs-submit -F job.txt)
# # Parse out the job id from the output
# JOB_ID=$(echo $OUTPUT | cut -d' ' -f4)
# """)

'2541425605197426200-242ac114-0001-007'

<h2>Job Monitoring and Output</h2>

While the job is running, the requestbin you registered will receive webhooks from Agave every time a job event occurs. To monitor this in real time, evaluate the next cell an visit the printed url in your browser:

In [232]:
print ('%s?inspect'%requestbinUrl[0])

https://requestbin.agaveapi.co/191t2li1?inspect


Of course, you can also monitor the job status by polling. Note that the notifications you receive via email and webhook are less wasteful of resources. However, we show you this for completeness.

In [235]:
for iter in range(40):
    stat = ag.jobs.get(jobId=job.id)
    print(stat.status)
    if stat.status == "FINISHED" or stat.status == "FAILED":
        job = stat;
        break
    else:
        sleep(5.0)

FINISHED


The jobs-history command provides you a record of the steps of what your job did. If your job fails for some reason, this is your best diagnostic.

In [236]:
jobHistory = ag.jobs.getHistory(jobId=job.id)
[ h.description for h in jobHistory ]

['Job accepted and queued for submission.',
 'Skipping staging. No input data associated with this job.',
 'Preparing job for submission.',
 'Attempt 1 to submit job',
 'Fetching app assets from agave://sandbox-storage-dooley/agave-deploy',
 'Staging runtime assets to agave://sandbox-exec-dooley//home/jovyan/dooley/job-2541425605197426200-242ac114-0001-007-fork-command-1',
 'CLI job successfully forked as process id 2803',
 'CLI job successfully forked as process id 2803',
 'Job receieved duplicate RUNNING notification',
 'Job completed execution',
 'Job completed. Skipping archiving at user request.']

This command shows you the job id's and status of the last 5 jobs you ran.

In [237]:
[ j.id for j in ag.jobs.list(limit=5) ]

['2541425605197426200-242ac114-0001-007',
 '8216704886581423640-242ac114-0001-007',
 '5093995656431791640-242ac114-0001-007']

This next command provides you with a list of all the files generated by your job. You can use it to figure out which files you want to retrieve with jobs-output-get.

In [238]:
jobOutputfileItems = ag.jobs.listOutputs(jobId=job.id, filePath=".")
df = pd.DataFrame.from_records(jobOutputfileItems, columns=jobOutputfileItems[0].keys())
df[['type','length','name']]

Unnamed: 0,type,length,name
0,file,80,.agave.archive
1,file,397,.agave.log
2,file,0,fork-command-1.err
3,file,2495,fork-command-1.ipcexe
4,file,6,fork-command-1.out
5,file,5,fork-command-1.pid
6,file,29,fork-test.txt
7,file,22,fork-wrapper.txt
8,file,0,tmpudzhvpjn


If you don't have Pandas available, simple columnized data works well too.

In [239]:
print("{:<10} {:<15} {:<35}".format("Type", "Length", "Name"))
print("-------------------------------------------------------------------------------")
for f in jobOutputfileItems:
    print("{:<10} {:<15} {:<35}".format(f.type, f.length, f.name))

Type       Length          Name                               
-------------------------------------------------------------------------------
file       80              .agave.archive                     
file       397             .agave.log                         
file       0               fork-command-1.err                 
file       2495            fork-command-1.ipcexe              
file       6               fork-command-1.out                 
file       5               fork-command-1.pid                 
file       29              fork-test.txt                      
file       22              fork-wrapper.txt                   
file       0               tmpudzhvpjn                        


Retrieve the standard output.

In [240]:
ag.files.download(systemId=job.executionSystem, filePath=job['outputPath']+"/fork-command-1.out").text

'hello\n'

Retrieve the standard error output.

In [241]:
ag.files.download(systemId=job.executionSystem, filePath=job['outputPath']+"/fork-command-1.err").text

''

<h3>Automating</h3>
Because we're working in Python, we can simply glue the above steps together and create a script to run jobs for us and fetch the standard output. Let's do that next.

In [283]:
%%writefile runagavecmd.py
from setvar import *
from agavepy.agave import Agave
from agavepy.async import AgaveAsyncResponse
import json
import requests
requests.packages.urllib3.disable_warnings()

def runagavecmd(cmd,infile=None):
    
    setvar("REQUESTBIN_URL=$(requestbin-create)")
    print("")
    print(" ** QUERY STRING FOR REQUESTBIN **")
    print('%s?inspect'%os.environ['REQUESTBIN_URL'])
    print("")
    requestbinUrl = os.environ['REQUESTBIN_URL']
    print("")
    
    # The input file is an optional parameter, both
    # to our function and to the Agave application.
    if infile == None:
        jobInputs = {};
    else:
        jobInputs = {"datafile": infile}
    
    # Create the Json for the job file.
    jobRequest = {
       "name": "fork-command-1",
       "appId": "{}-{}-fork-1.0".format(os.environ['AGAVE_USERNAME'], os.environ['MACHINE_NAME']),
       "executionSystem": os.environ['AGAVE_EXECUTION_SYSTEM_ID'],
       "archive": False,
       "notifications": [
        {
          "url": requestbinUrl+'?event=${EVENT}&jobid=${JOB_ID}',
          "event": "*",
          "persistent": True
        }
       ],
       "parameters": {
         "command": cmd
       },
       "inputs": jobInputs
     }
    
    agTokenCache = json.loads(open(os.environ['AGAVE_CACHE_DIR'] + '/current').read())

    ag = Agave(token=agTokenCache['access_token'], 
           refresh_token=agTokenCache['refresh_token'], 
           api_key=agTokenCache['apikey'], 
           api_secret=agTokenCache['apisecret'],
           api_server=agTokenCache['baseurl'], 
           client_name=agTokenCache['client_name'],
           verify=False)
    
    print("Submitting job request...")
    job = ag.jobs.submit(body=jobRequest)
    print("Job sucessfully submitted as " + job.id)

    ajob = AgaveAsyncResponse(ag, response=job)
    
    print("Waiting for job to complete... ")
    j = ajob.result()
    
    print("All done! Output follows.")
    
    print("=" * 70)
    
    # get the updated job description so the output path is present
    job = ag.jobs.get(jobId=job.id)
    
    # Fetch the job output from the remote machine
    print(ag.files.download(systemId=job.executionSystem, filePath=job['outputPath']+"/fork-command-1.out").text)
    
    return job
     

Overwriting runagavecmd.py


In [284]:
import runagavecmd as r
import imp
imp.reload(r)

<module 'runagavecmd' from '/home/jovyan/agave/runagavecmd.py'>

In [285]:
job = r.runagavecmd("lscpu")

REQUESTBIN_URL=https://requestbin.agaveapi.co/1lalxnj1

 ** QUERY STRING FOR REQUESTBIN **
https://requestbin.agaveapi.co/1lalxnj1?inspect


Submitting job request...
Job sucessfully submitted as 1531628834847714840-242ac114-0001-007
Waiting for job to complete... 
FINISHED
All done! Output follows.
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 158
Model name:            Intel(R) Core(TM) i7-7820HQ CPU @ 2.90GHz
Stepping:              9
CPU MHz:               2904.000
BogoMIPS:              5808.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
Flags:                 fpu vme de pse tsc msr pae mce cx8 ap

<h2>Permissions and Sharing</h3>

List the users and the permssions they have to look at the given job.

In [287]:
jobPems = ag.jobs.listPermissions(jobId=job.id)
[ "{} {}".format(p.username, p.permission) for p in jobPems ]


["dooley {'read': True, 'write': True}"]

In [291]:
# now pair off with your neighbor and both of you share your job with them.
# For now, just give read access
#!jobs-pems-update -u training002 -p READ ${JOB_ID}
pem = ag.jobs.updatePermissions(jobId=job.id, body={"username":"training002","permission":"READ"})
pem["permission"]

{'read': True, 'write': False}

In [295]:
# Now let's see if we can see our neighbor's job
# Now let's see if we can see our neighbor's job
#shared_job = !jobs-search --filter=id owner.neq=${AGAVE_USERNAME} 
sharedJob = ag.jobs.list(search={"owner.neq":os.environ['AGAVE_USERNAME']}, filter="id")[0]
os.environ['SHARED_JOB_ID'] = sharedJob.id
print(os.environ['SHARED_JOB_ID'])

1653314879115030040-242ac114-0001-007


Permissions are just that, permitting someone to do something. You said your neighbor could view your job. Let's see what that means.

In [296]:
# You already searched for the job and found it, so you should be able to lis
# an view the details
ag.jobs.get(jobId=sharedJob.id)

{'_links': {'app': {'href': 'https://sandbox.agaveplatform.org/apps/v2/crcollaboratory-shelob-stevenrbrandt2-2.0'},
  'archiveData': {'href': 'https://sandbox.agaveplatform.org/jobs/v2/1653314879115030040-242ac114-0001-007/outputs/listings'},
  'archiveSystem': {'href': 'https://sandbox.agaveplatform.org/systems/v2/shelob-exec-stevenrbrandt2'},
  'executionSystem': {'href': 'https://sandbox.agaveplatform.org/systems/v2/shelob-exec-stevenrbrandt2'},
  'history': {'href': 'https://sandbox.agaveplatform.org/jobs/v2/1653314879115030040-242ac114-0001-007/history'},
  'metadata': {'href': 'https://sandbox.agaveplatform.org/meta/v2/data/?q=%7B%22associationIds%22%3A%221653314879115030040-242ac114-0001-007%22%7D'},
  'notifications': {'href': 'https://sandbox.agaveplatform.org/notifications/v2/?associatedUuid=1653314879115030040-242ac114-0001-007'},
  'owner': {'href': 'https://sandbox.agaveplatform.org/profiles/v2/syuan'},
  'permissions': {'href': 'https://sandbox.agaveplatform.org/jobs/v2/1

In [307]:
# You should also be able to view the history. Here we'll just return the first few 
# events. Notice the history event showed up history event
ag.jobs.getHistory(jobId=sharedJob.id, limit=3)

[{'created': datetime.datetime(2018, 7, 15, 17, 10, 43, tzinfo=tzoffset(None, -18000)),
  'createdBy': 'syuan',
  'description': 'Job accepted and queued for submission.',
  'status': 'PENDING'},
 {'created': datetime.datetime(2018, 7, 15, 17, 10, 44, tzinfo=tzoffset(None, -18000)),
  'createdBy': 'syuan',
  'description': 'Attempt 1 to stage job inputs',
  'status': 'PROCESSING_INPUTS'},
 {'created': datetime.datetime(2018, 7, 15, 17, 10, 44, tzinfo=tzoffset(None, -18000)),
  'createdBy': 'syuan',
  'description': 'Identifying input files for staging',
  'status': 'PROCESSING_INPUTS'}]

In [None]:
# You can also view their job output
#[ f.name for f in ag.jobs.listOutputs(jobId=sharedJob.id, filePath='.') ]
ag.jobs.listOutputs(jobId=sharedJob.id, filePath='.')

In [317]:
# What if we no longer want to see the job. Let's delete it.
ag.jobs.delete(jobId=sharedJob.id)

Doah! We can't delete the shared job because we weren't granted write permission.

In [312]:
# Let's grant write access and see what we can do
#!jobs-pems-update -u training002 -p READ_WRITE ${JOB_ID}
pem = ag.jobs.updatePermissions(jobId=job.id, body={"username":"training002","permission": "READ_WRITE"})
pem["permission"]

{'read': True, 'write': True}

In [None]:
# Now let's see if we can delete the shared job
ag.jobs.delete(jobId=sharedJob.id)

In [318]:
# Wait, now we don't have anything to work with. 
# No worries. Agave doens't really delete anything. Your job is still there
# We just need to restore it.
resp = ag.jobs.manage(jobId=sharedJob.id, body={"action":"restore"})

In [324]:
# of ocurse, you can always go back and see what happened by querying the job history
jobHistory = ag.jobs.getHistory(jobId=sharedJob.id)
[ h.description for h in jobHistory[-2:] ]

['Job was deleted by user dooley', 'Job was restored by dooley']

In [325]:
# Now let's try to rerun the job
resubmittedJob = ag.jobs.manage(jobId=sharedJob.id, body={"action":"resubmit"})

In [330]:
# Well, what app did they use in the job?
sharedJobDetails = ag.jobs.get(jobId=sharedJob.id)

In [332]:
# Hmm, do we have access to the app?
#! apps-pems-list $SHARED_JOB_APP
sharedJobApp = ag.apps.get(appId=sharedJobDetails.appId)

In [342]:
# Oh, we don't have permission to even view the app. Guess our job permissions
# don't extend to the application. Let's be a good neighbor and share our apps
# with each other
sharedAppPem = ag.apps.updateApplicationPermissions(appId=sharedJobApp.id, 
                                                    body={"username":"training002","permission": "READ"})
sharedAppPem['permission']

{'execute': False, 'read': True, 'write': False}

In [343]:
# Now do we have access to the app?
# ! apps-pems-list $SHARED_JOB_APP
[ p.permission for p in ag.apps.listPermissions(appId=sharedJobApp.id) if p.username == os.environ['AGAVE_USERNAME'] ]

[{'execute': True, 'read': True, 'write': True}]

In [346]:
# Score, but wait, do I need execute to run? We should granb that too.
# Hmm, do we have access to the app?
sharedAppPem = ag.apps.updateApplicationPermissions(appId=sharedJobApp.id, 
                                                    body={"username":"training002","permission": "EXECUTE"})
sharedAppPem['permission']

{'execute': True, 'read': False, 'write': False}

In [368]:
# Now do we have access to the app?
for p in ag.apps.listPermissions(appId=sharedJobApp.id):
    if p.username == os.environ['AGAVE_USERNAME']:
        print(p.permission)

{'read': True, 'write': True, 'execute': True}


In [349]:
# I guess permissions aren't hierachical. Now i can execute it (I think), but I can't
# read it. How aabout we grant read_execute instead
sharedAppPem = ag.apps.updateApplicationPermissions(appId=sharedJobApp.id, 
                                                    body={"username":"training002","permission": "READ_EXECUTE"})
sharedAppPem['permission']

{'execute': True, 'read': True, 'write': False}

In [350]:
# Now do we have access to the app?
[ p.permission for p in ag.apps.listPermissions(appId=sharedJobApp.id) if p.username == os.environ['AGAVE_USERNAME'] ]

[{'execute': True, 'read': True, 'write': True}]

In [None]:
# So now we can rerun our neighbor's job, right
resubmittedJob = ag.jobs.manage(jobId=sharedJob.id, body={"action":"resubmit"})

In [366]:
# drat. why can't we run now? Do we have system access?
for r in ag.systems.listRoles(systemId=sharedJobDetails.executionSystem):
    print("{} {}".format(r.username, r.role))

stevenrbrandt OWNER
syuan USER
training002 USER


In [365]:
# ok, let's skip to the end. we'll just realize we should grant a user rather than guest role
# to the system
#!systems-roles-addupdate -u training002 -r USER $AGAVE_EXECUTION_SYSTEM_ID
systemRoles = ag.systems.updateRoleForUser(systemId=sharedJobDetails.executionSystem, 
                                           username="training002",
                                           body={"role": "USER"})
"{} {}".format(systemRoles[0]['username'], systemRoles[0]['role'])

'training002 USER'

In [367]:
# that should work, right?
for r in ag.systems.listRoles(systemId=sharedJobDetails.executionSystem):
    print("{} {}".format(r.username, r.role))

stevenrbrandt OWNER
syuan USER
training002 USER


In [None]:
# So can we run the job now?
resubmittedJob = ag.jobs.manage(jobId=sharedJob.id, body={"action":"resubmit"})
resubmittedJob.id

In [371]:
# yay. wait, who owns the data?
for p in ag.jobs.listPermissions(jobId=resubmittedJob.id):
    print(p.username + " " + json.dumps(p.permission))

dooley {"read": true, "write": true}


In [372]:
# mine, mine, mine, mine, mine, mine, mine, mine, mine, mine, mine, mine, mine
# kill it, we're moving on.
ag.jobs.manage(jobId=sharedJob.id, body={"action":"stop"})

Now we have a job we've run and data we've generated. Let's see how we can share the data
with our colleagues. We have already seen that sharing the job with another user will 
grant that user access to the job outputs. That is a bit of an all or nothing approach. 
We can also share individual files and folders a couple of different ways. Let's use 
the standard out file from our job. 

The hypermedia section of the job output listing has the fully qualified url for the file/folder.

In [398]:
jobStandardOutFileItem = ag.jobs.listOutputs(jobId=job.id, filePath="fork-command-1.out")
jobStandardOutFileUrl = jobStandardOutFileItem[0]['_links']['self']['href']
jobStandardOutFileUrl

'https://sandbox.agaveplatform.org/jobs/v2/8210673309388238360-242ac114-0001-007/outputs/media/fork-command-1.out'

We can create a disposable PostIt url to the job file with the PostIts service. PostIts are handy when you want to share a link to a file or API call, but only for a limited time and/or number of uses. Here we create a PostIt that will expire after 3 uses or one day, whichever comes first.

In [402]:
postit = ag.postits.create(body={
    "method": "GET",
    "maxUses": 3,
    "lifetime": 86400,
    "url": jobStandardOutFileUrl})

# click on the resulting link a few times to see it working.
print (postit['_links']['self']['href'])

https://sandbox.agaveplatform.org/postits/v2/8fda2b69b33c71472bd89044bae772bc


We can also share job data via the Files API. Let's share the job standard out file with each other.

In [396]:
job = ag.jobs.get(jobId=job.id)
sharedJobOutputPems = ag.files.updatePermissions(systemId=job['executionSystem'], 
                                                 filePath=job['outputPath']+"/fork-command-1.out",
                                                 body={"username": "training002", "permission": "READ"})[0]
print("{} {}".format(sharedJobOutputPems.username, sharedJobOutputPems.permission))

training002 {'read': True, 'write': False, 'execute': False}


Finally, we can publish any file or folder and make it publicly available on the web. 

In [404]:
worldJobOutputPems = ag.files.updatePermissions(systemId=job['executionSystem'], 
                                                 filePath=job['outputPath']+"/fork-command-1.out",
                                                 body={"username": "WORLD", "permission": "READ"})[0]
print("{} {}".format(worldJobOutputPems.username, worldJobOutputPems.permission))

WORLD {'read': True, 'write': False, 'execute': False}


Published data does not require authentication and, as such, is available through a slightly different url structure than unpublished data.

In [417]:
print("{}/files/v2/download/{}/system/{}/{}".format(os.environ['AGAVE_TENANT_BASEURL'],
                                                     os.environ['AGAVE_USERNAME'],
                                                     job['executionSystem'],
                                                     job['outputPath']+"/fork-command-1.out"))

https://sandbox.agaveplatform.org/files/v2/downloads/dooley/system/sandbox-exec-dooley//home/jovyan/dooley/job-8210673309388238360-242ac114-0001-007-fork-command-1/fork-command-1.out


That wraps up our primer on sharing job data. Next up, data management.

### Managing Data

You can also use Agave to manage your data


## Using the Agave ToGo web portal  

Follow the link below to run your job from a web portal.

In [None]:
!echo http://togo.agaveplatform.org/app/#/apps/${AGAVE_USERNAME}-${MACHINE_NAME}-fork-1.0/run