# 1 - Introduction to Catalog and Basic Python Client Interactions

## Before starting, follow installation here

### INSTALL REQUIREMENTS
1) Install Globus Catalog-client (https://github.com/globusonline/catalog-client)
git clone https://github.com/globusonline/catalog-client
cd catalog-client
python setup.py install --user

2) Install Globus Transfer API (https://github.com/globusonline/transfer-api-client-python)
git clone https://github.com/globusonline/transfer-api-client-python
cd transfer-api-client-python
python setup.py install --user

### Globus Online - Catalog Command Line Client
[https://github.com/globusonline/catalog-client]

Catalog User Interface: https://catalog-alpha.globuscs.info/
Contact: Ben Blaiszik (blaiszik@uchicago.edu)

### OBTAIN GLOBUS CREDENTIALS
https://www.globus.org/SignUp - **this is the credential you will use with Catalog**


# Catalog Data Model

* <b> Catalogs </b>
    * Have specified "vocabularies" or tag definitions.
        * e.g. beam_energy - float, decription - text
    * Catalogs contain many datasets
        
* <b> Datsets </b>
    * Datasets can have tags added and ACLs specified
    * Datasets contain many members
* <b> Members </b>
    * Members can have tags added
    * Generally point to a data file or directory on a Globus endpoint
    * COuld be a more general URI

<img src="img/catalog-model.png" width=70%>

# Imports and Authentication
* For some versions of iPython Notebook this seems to fail. You can also paste this into your ipython shell using ipython -i

In [102]:
import os
from globusonline.catalog.client.catalog_wrapper import *
from globusonline.catalog.client.operators import Op
from globusonline.catalog.client.rest_client import RestClientError

# Store authentication data in a local file
token_file = os.getenv('HOME','')+"/.ssh/gotoken.txt"
wrap = CatalogWrapper(token_file=token_file)
client = wrap.catalogClient

# Create a Catalog and Save the ID

In [103]:
catalog_info = { 
                    "config": {
                        "name":"Ben Demo Catalog"
                    }
               }
_,response = client.create_catalog(catalog_info)
catalog_id = response['id']
response

{u'config': {u'content_read_users': [u'*'],
  u'content_write_users': [u'*'],
  u'name': u'Ben Demo Catalog',
  u'owner': u'u:blaiszik',
  u'read_users': [u'*'],
  u'write_users': []},
 u'id': 146}

# Create a Dataset within the New Catalog and Save the Dataset ID

In [104]:
dataset_info = {"name":"New Dataset"}
_,response = client.create_dataset(catalog_id, dataset_info)
dataset_id = response['id']
response

{u'annotations_present': [u'created',
  u'id',
  u'modified',
  u'modified by',
  u'name',
  u'owner',
  u'readok',
  u'writeok'],
 u'created': u'2015-11-05 19:19:57.878085+00:00',
 u'id': 49,
 u'modified': u'2015-11-05 19:19:57.878085+00:00',
 u'modified by': u'u:blaiszik',
 u'name': u'New Dataset',
 u'owner': u'u:blaiszik',
 u'writeok': True}

# Add a Member to the New Dataset and Save the Member ID

In [105]:
member_info = {"data_type":"file", "data_uri":"globus://go#ep1/~/test.tst"}
_,response = client.create_member(catalog_id, dataset_id, member_info)
member_id = response['id']

In [106]:
response

{u'code': u'Created',
 u'id': 50,
 u'message': u'Members created successfully',
 u'request_id': u'Jt0oisiaz'}

# Get all Members in a Dataset

In [107]:
_, response = client.get_members(catalog_id, dataset_id)
for member in response:
    print "[%s] %s  %s"%(member['id'],member['data_type'],member['data_uri'])

[50] file  globus://go#ep1/~/test.tst


In [108]:
response

[{u'data_type': u'file',
  u'data_uri': u'globus://go#ep1/~/test.tst',
  u'dataset_reference': [u'49'],
  u'id': 50}]

# Get all Datasets in a Catalog

In [109]:
_,response = client.get_datasets(catalog_id)
for dataset in response:
    print "[%s] %s"%(dataset['id'],dataset['name'])

[49] New Dataset


In [110]:
response

[{u'created': u'2015-11-05 19:19:57.878085+00',
  u'favorite': None,
  u'id': 49,
  u'label': None,
  u'modified': u'2015-11-05 19:19:57.878085+00',
  u'modified by': u'u:blaiszik',
  u'name': u'New Dataset',
  u'owner': u'u:blaiszik',
  u'readok': True,
  u'writeok': True}]

# List all Catalog in the Database

In [111]:
_,response = client.get_catalogs()
for catalog in response:
    print "[%s] %s"%(catalog['id'],catalog['config']['name'])

[1] Test
[16] RaviTest
[17] NexPy-Test
[18] Climate_Ocean
[34] Materials Catalog
[35] XPCS
[37] ISI-MIP
[38] ESG-ANL
[39] Tomography
[40] Microscopy
[42] Mwilde-Catalog-0
[49] Beamline: 2-BM-B
[50] Beamline: 32-ID-C
[51] APS Facility Catalog
[48] simanalyze
[52] CuSn Nanotomography
[63] Genomics
[62] Proteomics
[64] Wozniak Test
[65] Sector1APS
[68] SwiftProvenanceTest
[69] SwiftProvenanceTest
[70] Swift Provenance
[72] IME Nealey
[89] ematter
[76] NeXus demo 1
[77] NeXus demo 2
[80] APITEST
[94] Data Exchenge
[87] Tomography Test
[93] NeXus_Production
[95] NeXus_Production_2013_LSMO
[96] NeXus_Production_2014_BFAP
[97] bfap_test
[99] acl_test
[118] test1
[108] Sector 1 Test Catalog
[109] New Test
[110] MikeDemo
[111] myname123456
[112] myname123456
[113] asdfg3871263192
[114] demo catalog
[119] XPCS8IDI
[122] NeXus_Production_2015_V2O5
[123] Demo Catalog
[124] s8idi_test
[125] XPCS8IDI_2015_2
[127] NeXus_Production_2015_2
[140] Ben test
[138] CLASSE-test
[139] XPCS 2016
[141] junk1
[1

# Add an Annotation Definition and Apply it to a Dataset
* Available Annotation types {'enum': ['text', 'int8', 'float8', 'boolean', 'timestamptz', 'date']}


In [112]:
help(client.create_annotation_def)

Help on method create_annotation_def in module globusonline.catalog.client.dataset_client:

create_annotation_def(self, catalog_id, annotation_name, value_type, multivalued=False, unique=False) method of globusonline.catalog.client.dataset_client.DatasetClient instance



In [113]:
new_annotations = [ {"name":"beam_energy", "type":"float8"},
                    {"name":"reference", "type":"text"}, 
                    {"name":"sample_number", "type":"int8"}]
responses = []
for annotation in new_annotations:
    _,response = client.create_annotation_def(catalog_id, annotation['name'],annotation['type'])
    responses.append(response)

In [114]:
responses

[{u'multivalued': False,
  u'name': u':beam_energy',
  u'read users': u'*',
  u'readpolicy': u'anonymous',
  u'unique': False,
  u'value_type': u'float8',
  u'writepolicy': u'anonymous'},
 {u'multivalued': False,
  u'name': u':reference',
  u'read users': u'*',
  u'readpolicy': u'anonymous',
  u'unique': False,
  u'value_type': u'text',
  u'writepolicy': u'anonymous'},
 {u'multivalued': False,
  u'name': u':sample_number',
  u'read users': u'*',
  u'readpolicy': u'anonymous',
  u'unique': False,
  u'value_type': u'int8',
  u'writepolicy': u'anonymous'}]

In [115]:
help(client.add_dataset_annotations)

Help on method add_dataset_annotations in module globusonline.catalog.client.dataset_client:

add_dataset_annotations(self, catalog_id, dataset_id, annotations_dict) method of globusonline.catalog.client.dataset_client.DatasetClient instance



In [116]:
_,response = client.add_dataset_annotations(catalog_id, dataset_id, {"beam_energy":"1.1", "reference":"this is a reference", 
                                                                     "sample_number":1})
response

{u'code': u'Added',
 u'message': u'Annotations added successfully',
 u'request_id': u'xIkRgsF6K'}

# Retrieve Annotations on a Dataset

In [117]:
catalog_annotations = []
_,annotation_list = client.get_annotation_defs(catalog_id)
for annotation in annotation_list:
        catalog_annotations.append(annotation['name'])

_,response = client.get_dataset_annotations(catalog_id, dataset_id, catalog_annotations)
response

[{u'beam_energy': 1.1,
  u'created': u'2015-11-05 19:19:57.878085+00',
  u'data_id': None,
  u'data_type': None,
  u'data_uri': None,
  u'dataset_reference': None,
  u'favorite': None,
  u'id': 49,
  u'label': None,
  u'modified': u'2015-11-05 19:19:57.878085+00',
  u'modified by': u'u:blaiszik',
  u'name': u'New Dataset',
  u'owner': u'u:blaiszik',
  u'readok': True,
  u'reference': u'this is a reference',
  u'sample_number': 1,
  u'share-endpoint': None,
  u'share-users': None,
  u'writeok': True}]

# Query for Datasets in a Catalog

In [118]:
help(client.get_datasets)

Help on method get_datasets in module globusonline.catalog.client.dataset_client:

get_datasets(self, catalog_id, last_id=None, limit=100, selector_list=None) method of globusonline.catalog.client.dataset_client.DatasetClient instance
    Get a paged list of datasets the user has permission to view.
    Paging is done based on last id from the previous page, not numeric
    offset.
    
    @return: list of dataset dictionaries



### Valid Operators

In [119]:
Op

{'ABSENT': ':absent:',
 'EQUAL': '=',
 'FULLTEXT': ':word:',
 'GEQ': ':geq:',
 'GT': ':gt:',
 'LEQ': ':leq:',
 'LIKE': ':like:',
 'LT': ':lt:',
 'NOT_EQUAL': '!=',
 'NOT_FULLTEXT': ':!word:',
 'NOT_REGEXP': ':!regexp:',
 'NOT_REGEXP_CASE_INSENSITIVE': ':!ciregexp:',
 'REGEXP': ':regexp:',
 'REGEXP_CASE_INSENSITIVE': ':ciregexp:',
 'SIMTO': ':simto:',
 'TAGGED': ''}

In [120]:
_,response =client.get_datasets(catalog_id, selector_list=[("beam_energy",Op['GT'],1)])
response

[{u'created': u'2015-11-05 19:19:57.878085+00',
  u'favorite': None,
  u'id': 49,
  u'label': None,
  u'modified': u'2015-11-05 19:19:57.878085+00',
  u'modified by': u'u:blaiszik',
  u'name': u'New Dataset',
  u'owner': u'u:blaiszik',
  u'readok': True,
  u'writeok': True}]