In [None]:
__author__ = 'M. Fitzpatrick <fitz@noao.edu>. M. Graham <graham@noao.edu>' # single string; emails in <>
__version__ = '20180604' # yyyymmdd; version datestamp of this notebook
__datasets__ = ['']      # datasets used

## How to use the Data Lab *Storage Manager* Service

*Revised:  June 04, 2018*

This notebook documents how to use the Data Lab _virtual storage system_ via the Storage Manager service. This can be done either from a Python script (e.g. within this notebook) or from the command line using the <i>datalab</i> command.  The Storage Manager provides a simpler interface to the VOSpace system used on the backend to implement the Data Lab virtual storage.

### The Storage Manager Web Service Interface

The Storage Manager service simplifies access to the Data Lab virtual storage system. This section describes the storage manager service interface in case we want to write our own code against that rather than using one of the provided tools. The Storage Manager service accepts an HTTP GET or POST call and then makes the appropriate call to the VOSpace backend for the requested operation:

| Endpoint | Description | Parameters | Client Interface |
|----------|-------------|------------|:----------------:|
| /get | Retrieve a file | name | res = get (fr \[, to\]) |
| /put | Upload a file | name | put (fr, to) |
| /cp | Copy a file/directory | from, to | cp (fr, to) |
| /ln | Link a file/directory | from, to | ln (src, target) |
| /ls | Get a file/directory listing | name | res = ls ([name]) |
| /access | Determine file accessability | path, mode | bool = access (path \[,mode\]) |
| /stat | File status info | path | stat_dict = stat (path) |
| /mkdir | Create a directory | name | mkdir (name) |
| /mv | Move/rename a file/directory | from, to | mv (fr, to) |
| /rm | Delete a file | name | rm (name) |
| /rmdir | Delete a directory | name | rmdir (name) |
| /tag | Annotate a file/directory | name, tag | tag (name) |
| /resolve | Resolve shorthand to full URI | name | res = resolve (name) |
| /setProperty | Set a node property | name, prop, val | setProperty (name) |
| /getProperty | Get a node property | name, prop | val = getProperty (name) |
| | | | | |
| /available | Service availability endpoint | N/A | N/A |
| /profiles | List available service profiles | profile, format | res = get_profiles (profile) |
| /debug | Toggle service debug flag | N/A | N/A |

For example, a call to <i>http://datalab.noao.edu/storage/get?name=vos://mag.csv</i> will retrieve the file '_mag.csv_' from the root directory of the user's virtual storage.  Likewise, a python call using the _storeClient_ interface such as "_storeClient.get('vos://mag.csv')_" would get the same file.

#### Virtual Storage Identifiers

Files in the virtual storage are usually identified via the prefix "_vos://_". This shorthand identifier is resolved to a user's home directory of the storage space in the service.  As a convenience, the prefix may optionally be omitted when the parameter refers to a node in the virtual storage. Navigation above a user's home directory is not supported however subdirectories within the space may be created and used as needed.  

If the '_vos_' prefix is instead the name of another user, and the remainder of the path grants public or group read/write access, then the other user's spaces will be accessed.  _Public spaces_ are specially created user accounts where all files are world-readable, otherwise it is typically only a user's "_/public_" directory that can be accessed (e.g. '_sarah://public/foo.fits_' will access the '_foo.fits_' file from user '_sarah_').  Users can make any file (or directory) public by moving it to (or creating a link in) their "/public" directory.

#### User Authentication and Public Spaces
The Storage Manager service requires a Data Lab token to be passed as the value of the HTTP header keyword "X-DL-AuthToken" in any call to the service. This token identifies the user and which space to access.  If the token is not supplied anonymous access is assumed but provides access only to public storage spaces.

## From Python code

The Storage Manager service can be called from Python code using the <i>datalab</i> client module. This provides methods to access the various Storage Manager functions in the <i>storeClient</i> subpackage. 

### Initialization
This is the setup that is required to use the Storage Manager. The first thing to do is import the relevant Python modules and also retrieve our Data Lab security token.
***

### Standard Notebook Setup

In [None]:
# Standard notebook imports
from __future__ import print_function   # Py2/Py3 compatability
import getpass                          # for Data Lab login
from dl import authClient               # for Data Lab login
from dl import storeClient              # Storage Manager client interface

#### Data Lab Login

Logging into the Data Lab is only required when a user identity is required, i.e. when you wish to access protected resources such as virtual storage or MyDB.  If no token is provided, the _anonymous_ user is assumed.  Once you login, the token is saved and will automatically be passed to interface methods that require it;  If you wish to supply a different token, each interface method accepts a _token_ parameter argument.

The first time this notebook is executed, login as the example _demo00_ user (passwd: _balatad_) in the cell below. Subsequent runs may skip this cell.  Note, this token will persist for other notebooks as well.

In [None]:
# Get the login token for the 'demo00' scratch user
token = authClient.login ('demo00',getpass.getpass('Account password: '))
if not authClient.isValidToken (token):
    print ('Error: invalid user login (%s)' % token)
else:
    print ("Login token:   %s" % token)

anon_tok = 'anonymous.0.0.anon_access'                            # EXAMPLES ONLY

***
## Client Interface Summary
### Listing a file/directory

We can see all the files that are in a specific directory or get a full listing for a specific file.  In this case, we'll list the default virtual storage directory to use as a basis for changes we'll make below.  Listing the virtual storage is done using the _**storeClient.ls()**_ method:

        listing = storeClient.ls (name, token=None, format='csv', verbose=False)
where:

        name - A vos identifier of the space to be listed
        token - Data Lab auth token to use when overriding login token
        format - Requested listing format
        verbose - Verbose output?

Multiple _format_ options are supported:

        raw - return the raw XML description of the storage nodes
        csv - return a CSV listing of directory contents
       list - return a single-file listing of directory contents
      short - return a multi-column short listing of contents (similar to 'ls')
       long - return a verbose long listing of contents (similar to 'ls -l')
  
Note that some formats are suitable only for displaying the listing.

In [None]:
listing = storeClient.ls ('vos://public', format='short', verbose=True)
print (listing)

The _**storeClient.ls()**_ method (along with others in this interfaceO can be called with a variable number of arguments, i.e. in previous versions of the interface the 'token' was required to always be the first argument, however this is now optional.  Examples below show the use of hidden and explicit token arguments, pattern matching and the optional prefix usage.  Uncomment as desired to see each example.

In [None]:
#listing = storeClient.ls ()                         # default CSV listing of root directory
listing = storeClient.ls (format='short')        # single-column listing of root directory, no prefix
#listing = storeClient.ls ('', format='list')        # single-column listing of root directory, no prefix
#listing = storeClient.ls ('', format='short')        # single-column listing of root directory, no prefix
#listing = storeClient.ls ('/', format='long')       # long listing of root directory, no prefix
#listing = storeClient.ls ('/public', format='long') # long listing of sub-directory, no prefix
#listing = storeClient.ls ('vos://t*')               # default CSV output with pattern matching
#listing = storeClient.ls (token, '')                # default CSV output w/ supplied token arg (deprecated)
#token = 'baweaver.1159.1159.$1$.IEbGBIK$PgJvZOt0M6AVrxP5pCmoc.'
#listing = storeClient.ls ('vos://', token=token)    # default CSV output using hidden 'token' param
print (listing)

#### File Existence and Info

Aside from simply listing files, it's possible to test whether a named file already exists or to determine more information about it. Existence testing is done using the _**storeClient.access()**_ method, a _**storeClient.stat()**_ method will get more information:

         bool_val = storeClient.access (path [,mode=None,token=None])
        stat_dict = storeClient.stat (path)

In [None]:
# A simple file existence test:
#if storeClient.access ('public'):                  # simple filename check
#if storeClient.access ('/public'):                 # leading path
if storeClient.access ('vos://public'):             # full URI
    print ('User "public" directory exists')
if storeClient.access ('vos://public', mode='w'):   # available modes: 'r','w','rw'
    print ('User "public" directory is grooup/world writable')
else:
    print ('User "public" directory is not group/world writable')
    
if storeClient.access ('vos://tmp'):             # full URI
    print ('User "tmp" directory exists')        
if storeClient.access ('vos://tmp', mode='w'):   # available modes: 'r','w','rw'
    print ('User "tmp" directory is group/world writable')
else:
    print ('User "tmp" directory is not group/world writable')

The _**stat()**_ methong returns a dictionary of node status values.  Returned fields include:

        name            Name of node
        groupread       List of group/owner names w/ read access
        groupwrite      List of group/owner names w/ write access
        publicread      Publicly readable (0=False, 1=True)
        owner           Owner name
        ctime           Creation time
        perms           Formatted unix-like permission string
        target          Node target if LinkNode
        size            Size of file node (bytes)
        type            Node type (container|data|link)

In [None]:
# More information about a file:  
#stat = storeClient.stat('/doesnt_exist')
stat = storeClient.stat('/public')
if len(stat) == 0:
    print ('Error: Node does not exist')
else:
    print ('  %s  %s   %s   %s' % (stat['perms'],stat['owner'],stat['ctime'],stat['name'])) 

##### Listing Public VOSpaces

Public storage can be listed or accessed in the same way, except that the '_vos_' URI prefix is replaced with the name of the user or storage space.  For example, to list the contents of the "/public' directory for user '_demo00_':

In [None]:
print (storeClient.ls ('demo00://public', format='long'))              # as current user
print (storeClient.ls ('demo00://public', format='long', token=None))  # as anonymous user

The *public* directory show here is visible to all Data Lab users and provides a means of sharing data without having to setup special access.  Conversely, the *tmp* directory is read-protected and provides a convenient temporary directory to be used in a workflow.

### Uploading and Downloading Files

Now we want to upload a new data file from our local disk to the virtual storage for our account, or perhaps retrieve a file from virtual storage to our local disk or directly into the notebook.  Within the Jupyter environment the '_local disk_' refers to storage on the notebook server. Other methods are available for copying a file from your laptop/desktop computer or for saving query results to virtual storage, or for accessing remote files automatically.  

Uploading/downloading files is done using the _**storeClient.put()**_ and _**storeClient.get()**_ methods respectively:

        resp = storeClient.put (fr, to, token=None, verbose=True)
        resp = storeClient.get (fr, to, token=None, verbose=True)
where for _**put()**_:

        fr - Source path to upload (e.g. './foo.fits')
        to - Destination URI of file (e.g. 'vos://foo.fits', defaults to 'vos://' if not given)
where for _**get()**_:

        fr - URI path to download (e.g. 'vos://foo.fits')
        to - Name of the local file to save, if not prov   
        
        token - Data Lab auth token to use when overriding login token
        verbose - Verbose output?
        
Both method return a _[requests](http://docs.python-requests.org/en/master/)_ _Response_ object that can be used to check for status codes and error messages.
        
#### Uploading Files

Assuming a local file called 'mags.csv' exists in the notebook directory, it can be moved to virtual storage as follows:

In [None]:
if not storeClient.access ('vos://test'):
    resp = storeClient.mkdir ('vos://test')
resp = storeClient.put ('./mags.csv', to='vos://mags.csv')
print (resp)

In [None]:
# List the directory before we upload
print ('Before upload: /public\n' + storeClient.ls ('/public',format='short'))

print ('Transfering file "mags.csv" to public directory under same name...')
resp = storeClient.put ('./mags.csv', to='public')
#resp = storeClient.put ('./mags.csv', to='public/')
#resp = storeClient.put ('./mags.csv', to='/public')
#resp = storeClient.put ('./mags.csv', to='/public/')
#resp = storeClient.put ('./mags.csv', to='vos://public')
#resp = storeClient.put ('./mags.csv', to='vos://public/')
print (resp)
print ('Put successful' if resp == 'OK' else ('Put error: ' + resp))
print ('After upload: /public\n' + storeClient.ls ('/public',format='short'))

print ('Transfering file "mags.csv" to public directory under different name...')
resp = storeClient.put ('./mags.csv', to='public/xyzzy.csv')
#resp = storeClient.put ('./mags.csv', to='/public/xyzzy.csv')
#resp = storeClient.put ('./mags.csv', to='vos://public/xyzzy.csv')
print ('Put successful' if resp == 'OK' else ('Put error: ' + resp))

print ('\nTransfering multiple files "*.csv"...')
resp = storeClient.put ('./*.csv', to='vos://public')# put the file to vospace
print ('Transfer status: ' + str(resp))

# List the directory again to verify it is there
print ('After upload: /public\n' + storeClient.ls ('/public',format='short'))

#### Downloading a file

Let's say we want to download a file from our virtual storage space, in this case a query result that we saved to it in the "_How to Use the Data Lab Query Manager Service_" notebook.  When called with a single argument, the contents of the remote file are returned as the result of the _**get()**_ method call:

In [None]:
try:
    data = storeClient.get ('public/mags.csv')                        # simple name
    #data = storeClient.get ('/public/mags.csv')                      # path prefix
    #data = storeClient.get ('vos://public/mags.csv')                 # full URI
    #data = storeClient.get (token,'vos://public/mags.csv')            # explicit token
    #data = storeClient.get (token, fr='vos://public/mags.csv',to='') # deprecated syntax
    #data = storeClient.get ('/does_not_exist')                       # test access to non-existent file
except Exception as e:
    print ('Donwload error: ' + str(e))
else:
    print (data)

To save the file to local disk, a second argument may be given to name the local file.

In [None]:
resp = storeClient.get ('vos://public/mags.csv', './new_mymags.csv')
#resp = storeClient.get ('mags.csv', 'new_mymags.csv')
#resp = storeClient.get ('/mags.csv', './new_mymags.csv')
#resp = storeClient.get (token, fr = 'vos://mags.csv', to = './new_mymags.csv') # deprecated syntax

# When specifying an output name, the 'resp' is an array of "OK"
# or error messages for each downloaded objects.
print ("Reponse: " + repr(resp))

# Use an escape to verify the local file was created then clean up
!ls -l new_mymags.csv ; rm -f new_mymags.csv

Multiple files may be downloaded by using a filename template:

In [None]:
!mkdir -p ./testdir                                            # create a local download directory
try:
    storeClient.get ('vos://public/*.csv', './testdir')       # get all public CSV files to this dir
    #resp = storeClient.get ('vos://public/*.csv', './does_not_exist') # test download to non-existent directory
except Exception as e:
    print ('Download error: ' + str(e))
!ls -l testdir ; /bin/rm -rf testdir                           # list it to verify and cleanup

#### Saving Python Objects as Files

Limited support for saving Python objects to files in virtual storage is provided by the _**storeClient.saveAs()**_ method.  This function save the _str()_ representation of the data to a named file in storage.

        resp = storeClient.saveAs (data, name, token=None)
where:

        data - Data object to be saved
        name - Path to file to create in storage (e.g. 'vos://foo.csv')


In [None]:
data = {'foo': 1, 'bar': 2}                     # test dictionary
resp = storeClient.saveAs (data, "vos://test1.dict")
print ('SaveAs response:  %s' % (resp))

print(storeClient.ls('vos://',format='short'))  # verify it was saved

str = storeClient.get('test1.dict')             # read it back
print ("String version: '%s'" % str)

#### Creating and Deleting Directories

We can create a directory on the remote storage to be used for saving data, later deleting this directory will also automatically remove the contents.  Directories are created and removed using the _**storeClient.mkdir()**_ and _**storeClient.rmdir()**_ methods.

        storeClient.mkdir (name, token=None)
        storeClient.rmdir (name, token=None)
where _name_ is the URI or pathname of the directory to create/remove.

In [None]:
dirname = 'testdir'              # simple name
#dirname = '/testdir'            # path prefix
#dirname = 'vos://testdir'       # URI

# Create a directory if it doesn't already exist
if not storeClient.access (dirname):
    print ('Creating test dir: ' + storeClient.mkdir (token, name=dirname))
else:
    print ('Error: test directory already exists')
    
# Now delete it
if storeClient.access (dirname):
    print ('Removing test dir: ' + storeClient.rmdir (token, name=dirname))
else:
    print ('Error: test directory does not exist')

#### Copying a File or Directory

Nodes (i.e. files or directories) in virtual storage may be copied to a new name or to other directories.
Files and directories are copied using the _**storeClient.cp()**_ method.

        resp = storeClient.cp (fr, to, token=None)
where both the _fr_ and _to_ arguments are required and are the URI or pathname of the directory to copy.

In [None]:
if storeClient.access ('/tmp/mags.csv'):
    storeClient.rm ('/tmp/mags.csv')
resp = storeClient.cp ('mags.csv', '/tmp')
#resp = storeClient.cp ('*.csv', '/tmp', verbose=True)
#resp = storeClient.cp (token, 'vos://mags.csv', 'vos://tmp/mags.csv')
#resp = storeClient.cp (fr='vos://mags.csv', to='vos://tmp/mags.csv')
#resp = storeClient.cp (token, fr='vos://mags.csv', to='vos://tmp/mags.csv')
print ('Copy reponse: %s' % (resp))

When copying multiple files to a directory, the response is an array of the individual response objects:

In [None]:
resp = storeClient.cp ('*.csv', '/tmp', verbose=True)
print (resp)

junk=storeClient.rm ('vos://tmp/mags.csv')          # clean up

#### Linking to a file/directory

Sometimes we want to create a link to a file or directory.  In this case, the link named by the *'fr'* parameter is created and points to the file/container named by the *'target'* parameter.  Links are created using the _**storeClient.ln()**_ method:

        stat = storeClient.ln (fr, target, token=None, verbose=True)
where

            fr - Name/URI of the link to create
        target - Name/URI of the link target
        
The _**ln()**_ method returns an "OK" string if the link was created, otherwise it throws an Exception to indicate the error. For example, to create a link named 'link.csv' that points to the 'mags.csv' file in our toplevel directory:

In [None]:
try:
    if storeClient.access ('vos://link.csv'):
        storeClient.rm ('vos://link.csv')
    stat = storeClient.ln (token, fr = 'vos://link.csv', target = 'vos://mags.csv')
except Exception as e:
    print ('Link Error: ' + str(e))
else:
    print ('Link status = %s, info: %s' % (stat,storeClient.ls('link.csv',format='json')))
print ("Tmp dir after link:  " + storeClient.ls (token, name='vos://'))

#### Moving/Renaming a File or Directory

Files and directories are moved (or renamed) using the _**storeClient.mv()**_ method.

        resp = storeClient.mv (fr, to, token=None)
where both the _fr_ and _to_ arguments are required and are the URI or pathname of the directory to move.  If the '_to_' parameter is an existing directory the '_fr_' file/directory is moved, otherwise it is renamed.

In [None]:
print ('Before move: access() to /results/mags.csv = %s' % storeClient.access('/results/mags.csv'))

resp = storeClient.mv(fr='mags.csv', to='/results/zz.csv')                    # simply names                  FIXME
#resp = storeClient.mv('vos://zzz.csv', 'vos://results')         # full URI
#resp = storeClient.mv(fr='vos://zzz.csv', to='vos://results')   # deprecated syntax
#resp = storeClient.mv(token, 'vos://zzz.csv', 'results')  # specify token, mixed names
#resp = storeClient.mv(token, fr='vos://zzz.csv', to='vos://results')
print ('Move response:  %s' % (resp))

print ('After move: access() to /results/zz.csv = %s' % storeClient.access('/results/zz.csv'))

# FIXME -- Seems to be a bug in moving to the toplevel dir w/out filename
#resp = storeClient.cp('vos://results/mags.csv', '/', verbose=True)  # put it back for next time
#resp = storeClient.cp('vos://results/mags.csv', 'vos://', verbose=True)  # put it back for next time
resp = storeClient.cp('vos://results/zz.csv', '/mags.csv', verbose=True)  # put it back for next time
print ('Copy response:  %s' % (resp))

Moving multiple files can be done using template strings as above:

In [None]:
print ('Root listing before move: \n' + storeClient.ls (format='short'))
print ('Public listing before move: \n' + storeClient.ls ('/public',format='short'))

storeClient.mv ('*.csv','/public')                     # move the file

print ('Root listing after move: \n' + storeClient.ls (format='short'))
print ('Public listing after move: \n' + storeClient.ls ('/public',format='short'))

junk = storeClient.mv ('/public/*.csv','vos://')             # put them back for next time

#### Deleting a File

We can delete a file:
Files deleted using the _**storeClient.rm()**_ method.

        resp = storeClient.rm (name, token=None, verbose=False)
where _name_ is required and specified as the URI or pathname of the file to remove.  Filename templates may be used to delete multiple files.

In [None]:
# Upload a file to use as a test
resp = storeClient.put ('./mags.csv', to='/rmtest.csv', verbose=False)
print ("Before: " + storeClient.ls ('vos://'))

# Delete the test file.  Note this returns status=204 on success
resp = storeClient.rm ('vos://rmtest.csv')
#resp = storeClient.rm (token, name = 'vos://mags.csv')
print ('Remove response:  %s' % (resp))

Now let's try deleting a non-existent file, i.e. the file we just removed:

In [None]:
if not storeClient.access ('vos://rmtest.csv'):           # verify it doesn't exist first
    resp = storeClient.rm ('vos://rmtest.csv')
    if resp != "OK":
        print ('Remove error: %s ' % (resp))  
    else:
        print ('Removed non-existent file')

Multiple files may be removed using filename templates.  Note that in this case the _rm()_ method returns an array of _Response_ objects.

In [None]:
storeClient.mkdir ('rmtest')                               # create some test data
storeClient.cp ('*.csv','rmtest',verbose=False)
print ('Test directory: \n' + storeClient.ls ('vos:///rmtest',format='short'))

resp = storeClient.rm ('/rmtest/*.csv')
print ('Remove status: ' + repr(resp) + '\n----------\n')

Let's see what happens if we try to remove the directory:

In [None]:
print ('Root directory before removing "rmtest" dir: \n' + storeClient.ls ('vos://',format='short'))
storeClient.cp ('*.csv','rmtest',verbose=False)
print ('rmtest directory before removing "rmtest" dir: \n' + storeClient.ls ('vos:///rmtest',format='short'))

resp = storeClient.rm ('/rmtest')
print ('Remove status: ' + repr(resp))
print ('Root directory after removing "rmtest" dir: \n' + storeClient.ls ('vos://',format='short'))

#### Deleting a directory

We can also delete a directory, doing so also deletes the contents of that directory:
Directories can be deleted using the _**storeClient.rmdir()**_ method.

        resp = storeClient.rmdir (name, token=None, verbose=False)
where _name_ is required and specified as the URI or pathname of the directory to remove.  Filename templates may be used to delete multiple directories.  Deleting a directory also deletes the contents of that directory:

In [None]:
storeClient.rmdir( token, name = 'vos://results2')

#### Tagging a file/directory

We can _tag_ any file or directory with arbitrary metadata.

In [None]:
# FIXME -- This doesn't appear to work
storeClient.tag(token, name = 'vos://results', tag = 'The results from my analysis')

NOTE: We need a method to retrieve tags or include them in the listing.

#### Cleanup the demo directory of remaining files

In [None]:
storeClient.rm (token, name = 'vos://newmags.csv')
storeClient.rm (token, name = 'vos://results')
storeClient.ls (token, name = 'vos://')

### Using the datalab command

The <i>datalab</i> command provides an alternate command line way to work with the Storage Manager through the storage subcommands.

#### Initialization
We need to be logged into the Data Lab to use the Storage Manager.

In [None]:
!datalab login user=demo00 password=...

#### Downloading a file

Let's say we want to download a file from our virtual storage space:

In [None]:
!datalab get fr="vos://mags.csv" to="./mags.csv"

#### Uploading a file

Now we want to upload a new data file from our local disk:

In [None]:
!datalab put fr="./newmags.csv" to="vos://newmags.csv"

#### Copying a file/directory

We want to put a copy of the file in a remote work directory:

In [None]:
!datalab cp fr="vos://newmags.csv" to="vos://tmp/newmags.csv"

#### Linking to a file/directory

Sometimes we want to create a link to a file or directory:

In [None]:
!datalab ln fr="vos://tmp/mags.csv" to="vos://mags.csv"

#### Listing a file/directory

We can see all the files that are in a specific directory or get a full listing for a specific file:

In [None]:
!datalab ls name="vos://tmp"

#### Creating a directory

We can create a directory:

In [None]:
!datalab mkdir name="vos://results"

#### Moving/renaming a file/directory

We can move a file or directory:

In [None]:
!datalab mv fr="vos://tmp/newmags.csv" to="vos://results"

#### Deleting a file

We can delete a file:

In [None]:
!datalab rm name="vos://tmp/mags.csv"

#### Deleting a directory

We can also delete a directory:

In [None]:
!datalab rmdir name="vos://tmp"

#### Tagging a file/directory

We can tag any file or directory with arbitrary metadata:

In [None]:
!datalab tag name="vos://results" tag="The results from my analysis"