## SquamataSB - Jupyter notebook for interacting with the ScienceBase API

The ScienceBase python libraries can be found at:
https://github.com/usgs/sciencebasepy

Documentation on the sb_json format can be found at https://my.usgs.gov/confluence/display/sciencebase/ScienceBase+Item+Core+Model

Properties Documentation of the json can be found at https://code.chs.usgs.gov/sciencebase/dev-docs/wikis/APIs/Catalog/Item-Hidden-Properties

This module performs the following operations:
- Log into ScienceBase
- List attributes for existing ScienceBase item in JSON format
- Create a new ScienceBase item
- Create and delete items
- Upload and delete files
- Log out of ScienceBase

## Future development plans for SquamataSB

- Currently working on existing items, add requests here. 

### Instructions
- This is a test of some ScienceBasePy (pysb) calls.  Follow guidlines as outlined below.

### To execute a function/command select a cell and Hold-Shift + Press-Enter

**The 'r' signifies a string literal. Use for paths.**

v1.0 
- Log into ScienceBase.
- List attributes for existing ScienceBase item in JSON format

v1.1 Exploring CRUD (Create-Read-Update-Delete)
- Create a parent item
- Create a child item
- Edit ScienceBase item attributes



In [1]:
# Phil Brown (pbrown@usgs.gov) September 2019 Beta 
# Working Python 3 Notebook used to show examples of ScienceBasePy function calls

In [4]:
# Test Cell
print ("Jupyter is working.") #To run this cell, hold down Shift and press Enter.

Jupyter is working.


## Need to install ScienceBasePy and associated packages to your python install.
- Uncomment the code below to install python modules requred for this Jupyter Notebook.

In [5]:
# Install a pip package in the current Jupyter kernel
# The below code will install sciencebasepy from a location on your hard drive
# https://github.com/usgs/sciencebasepy
import os, ssl
import sys

# Uncomment the below to install sciencebasepy from a directory or unline - !!! be sure to change the path appropriately !!!
#if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
    #getattr(ssl, '_create_unverified_context', None)): 
    #ssl._create_default_https_context = ssl._create_unverified_context
    #!{sys.executable} -m pip install C:\SBpy\sciencebasepy-master 


Processing c:\sbpy\sciencebasepy-master
Building wheels for collected packages: sciencebasepy
  Running setup.py bdist_wheel for sciencebasepy: started
  Running setup.py bdist_wheel for sciencebasepy: finished with status 'done'
  Stored in directory: C:\Users\pbrown\AppData\Local\pip\Cache\wheels\6a\e5\49\eb3d0b5a3ba71df71d57ea31c37ddc4232b29c60324d0a1d76
Successfully built sciencebasepy
Installing collected packages: sciencebasepy
Successfully installed sciencebasepy-1.6.4


In [7]:
# Install a pip package in the current Jupyter kernel
# The below code will install msgpack as required by sciencebasepy from a location on your hard drive
# https://github.com/msgpack/msgpack-python
import os, ssl
import sys

# Uncomment the below to install sciencebasepy from a directory or unline - !!!be sure to change the path appropriately!!!
#if (not os.environ.get('PYTHONHTTPSVERIFY', '') and
    #getattr(ssl, '_create_unverified_context', None)): 
    #ssl._create_default_https_context = ssl._create_unverified_context
    #!{sys.executable} -m pip install C:\msgpack\msgpack-python-master  

Processing c:\msgpack\msgpack-python-master
Building wheels for collected packages: msgpack
  Running setup.py bdist_wheel for msgpack: started
  Running setup.py bdist_wheel for msgpack: finished with status 'done'
  Stored in directory: C:\Users\pbrown\AppData\Local\pip\Cache\wheels\c8\84\ed\9c996789900fc68156b0183be9155692c023770863d466cce7
Successfully built msgpack
Installing collected packages: msgpack
Successfully installed msgpack-0.6.2


In [8]:
# Load required Libraries
import sys
import os
import zipfile
import csv
import sciencebasepy as pysb
import requests
import shutil
from shutil import copyfile
import zipfile
import datetime
import glob
from lxml import etree
import json
import pickle
import shutil
import fileinput
import json
import pandas as pd
import numpy as np
from IPython.core.display import display
from IPython.core.display import HTML
from lxml import etree
##from pymdwizard.core.xml_utils import XMLRecord
##from pymdwizard.core.xml_utils import XMLNode
import re
from ipywidgets import *
from IPython.display import display
from IPython.html.widgets import widgets
import datetime
import dateutil.parser
import time
from IPython.display import Javascript



# Set Directory Paths
## Please set directory paths below
### Directory paths include
- Parent Path: This is the path to information that will be loaded to the ScienceBase data release landing page.
- Child Path: The path to the children that needed to be uploaded to the parent page.
- ScienceBase ID: catalog number used in the URL by ScienceBase to identify the parent page of the data release.

In [6]:
#Set Data Paths - perhaps we'll get a user form to do this some day?
sciencebaseParentPath = r"C:\CurrentWork\DataReleases\Silverton\Data" #The 'r' signifies a string literal. Use for paths.
sciencebaseChildPath = r"C:\CurrentWork\DataReleases\Silverton"
sciencebaseID = "USA-Colorado-Silverton_Caldera-2018-Template-v1-11.xml"

In [37]:
#Check Paths for the fun of it
print ('The ScienceBase Data Path is: ' + '"' + sciencebaseParentPath + '"')
# sciencebaseDataPath + "\\" + mtMetaDataTemplateName

The ScienceBase Data Path is: "C:\CurrentWork\DataReleases\Silverton\Data"


## Now, let's log into ScienceBase. 


In [38]:
#Initialize ScienceBase session - this is the manual way - use the cell below instead

#Comment out below to login to ScienceBase manually

#username = 'Add User Name Here'
#password = 'Add Password Here'
#sb = pysb.SbSession(env=None).login(username, password)


In [10]:

sb = pysb.SbSession()

username = input("Username:  ")
sb.loginc(str(username))

#Check to see if login is successful
sb.is_logged_in()

#Get the ScienceBase Login session info
sb.get_session_info()




Username:  pbrown@usgs.gov
········


{'displayName': 'Philip J Brown II',
 'email': 'pbrown@usgs.gov',
 'fullDisplayName': 'Philip J Brown II [pbrown@usgs.gov]',
 'isLoggedIn': True,
 'jossoSessionId': '8A09823D9C795E32FEF394E8DE92E201',
 'username': 'pbrown@usgs.gov'}

In [11]:
#Now we can start testing ScienceBasePy functions

#Check to see if login is successful
sb.is_logged_in()

#Get the ScienceBase Login session info
sb.get_session_info()

{'displayName': 'Philip J Brown II',
 'email': 'pbrown@usgs.gov',
 'fullDisplayName': 'Philip J Brown II [pbrown@usgs.gov]',
 'isLoggedIn': True,
 'jossoSessionId': '8A09823D9C795E32FEF394E8DE92E201',
 'username': 'pbrown@usgs.gov'}

## Get a public or private items and list attributes

In [12]:
# Get a public item.  No need to log in.

# Villa Grove:5ce5c305e4b0bc180232eb80
# Tooele: 59de8dbee4b05fe04ccd3ada

item_json = sb.get_item('5ad0e39de4b0e2c2dd1eb0dd')
print ("Public Item: \t" + str(item_json))


Public Item: 	{'link': {'rel': 'self', 'url': 'https://www.sciencebase.gov/catalog/item/5ad0e39de4b0e2c2dd1eb0dd'}, 'relatedItems': {'link': {'url': 'https://www.sciencebase.gov/catalog/itemLinks?itemId=5ad0e39de4b0e2c2dd1eb0dd', 'rel': 'related'}}, 'id': '5ad0e39de4b0e2c2dd1eb0dd', 'title': 'pbrown@usgs.gov', 'provenance': {'dateCreated': '2018-04-13T17:06:37Z', 'lastUpdated': '2019-05-22T21:45:42Z', 'lastUpdatedBy': 'pbrown@usgs.gov', 'createdBy': 'pbrown@usgs.gov'}, 'hasChildren': True, 'parentId': '4f4e4772e4b07f02db47e231', 'contacts': [{'name': 'Philip J Brown II', 'oldPartyId': 1233, 'type': 'Author', 'contactType': 'person', 'email': 'pbrown@usgs.gov', 'active': True, 'primaryLocation': {'name': 'Philip J Brown II/GD/USGS/DOI - Primary Location', 'building': 'DFC Bldg 20', 'buildingCode': 'KAC', 'officePhone': '303-236-1310', 'faxPhone': '303-236-1425', 'streetAddress': {'line1': 'W 6th Ave Kipling St', 'city': 'Lakewood', 'state': 'CO', 'zip': '80225'}, 'mailAddress': {'line1'

In [18]:
# Get a private item.  Need to log in first.
# Current private AMT data release Silverton: 5c5c72b9e4b070828902cb07
item_json = sb.get_item('5ce5c305e4b0bc180232eb80')
print ("Public Item: \t" + str(item_json))

Public Item: 	{'link': {'rel': 'self', 'url': 'https://www.sciencebase.gov/catalog/item/5ce5c305e4b0bc180232eb80'}, 'relatedItems': {'link': {'url': 'https://www.sciencebase.gov/catalog/itemLinks?itemId=5ce5c305e4b0bc180232eb80', 'rel': 'related'}}, 'id': '5ce5c305e4b0bc180232eb80', 'identifiers': [{'type': 'DOI', 'scheme': 'https://www.sciencebase.gov/vocab/category/item/identifier', 'key': 'doi:10.5066/P9DXYPXH'}], 'title': 'High Resolution Aeromagnetic Survey, Villa Grove, Colorado, USA, 2011', 'summary': 'This data release includes data collected from the Villa Grove helicopter magnetic survey in northern San Luis Valley and Poncha Pass region in south-central Colorado, USA. The survey area extends over the northern part of Great Sand Dunes National Park, Poncha Pass and vicinity, and into the southern end of the Upper Arkansas Valley. It includes the communities of Crestone, Villa Grove, Saguache, and Salida. Several U.S. Geological Survey programs (including the National Cooperat

In [14]:

item_json = sb.get_item(sb.get_my_items_id())
print("\tMy Items: \n\n" + str(item_json))
#lets create a dictionary from the JSON file and list the items only


	My Items: 

{'link': {'rel': 'self', 'url': 'https://www.sciencebase.gov/catalog/item/5ad0e39de4b0e2c2dd1eb0dd'}, 'relatedItems': {'link': {'url': 'https://www.sciencebase.gov/catalog/itemLinks?itemId=5ad0e39de4b0e2c2dd1eb0dd', 'rel': 'related'}}, 'id': '5ad0e39de4b0e2c2dd1eb0dd', 'title': 'pbrown@usgs.gov', 'provenance': {'dateCreated': '2018-04-13T17:06:37Z', 'lastUpdated': '2019-05-22T21:45:42Z', 'lastUpdatedBy': 'pbrown@usgs.gov', 'createdBy': 'pbrown@usgs.gov'}, 'hasChildren': True, 'parentId': '4f4e4772e4b07f02db47e231', 'contacts': [{'name': 'Philip J Brown II', 'oldPartyId': 1233, 'type': 'Author', 'contactType': 'person', 'email': 'pbrown@usgs.gov', 'active': True, 'primaryLocation': {'name': 'Philip J Brown II/GD/USGS/DOI - Primary Location', 'building': 'DFC Bldg 20', 'buildingCode': 'KAC', 'officePhone': '303-236-1310', 'faxPhone': '303-236-1425', 'streetAddress': {'line1': 'W 6th Ave Kipling St', 'city': 'Lakewood', 'state': 'CO', 'zip': '80225'}, 'mailAddress': {'line1':

Need to work on a better way to parse and create or update these json files

In [34]:
#Try reading json dictionary
item_json['link']
#print("Title(s):\n " + str(item_json['link'])



SyntaxError: EOL while scanning string literal (<ipython-input-34-dea1f8f03f76>, line 6)

In [27]:
Really need to figure out the best way to parse out the SB json
# https://datatofish.com/load-json-pandas-dataframe/
import pandas as pd
#pd.read_json (r'Path where you saved the JSON file\File Name.json')
df = pd.read_json (r"C:\SB_Batch\jsontest.json")
#print (df)

ValueError: Expected object or value

## Lets try to create a SB object without having to load a JSON or XML file

In [30]:
# Create a new item.  The minimum required is a title for the new item, and the parent ID
new_item = {'title': 'This is a new pbrown test item',
    'parentId': sb.get_my_items_id(),
    'provenance': {'annotation': 'Python ScienceBase REST test script'}}
new_item = sb.create_item(new_item)
print ("NEW ITEM: " + str(new_item))

NEW ITEM: {'link': {'rel': 'self', 'url': 'https://www.sciencebase.gov/catalog/item/5d93d633e4b0c4f70d0d76d5'}, 'relatedItems': {'link': {'url': 'https://www.sciencebase.gov/catalog/itemLinks?itemId=5d93d633e4b0c4f70d0d76d5', 'rel': 'related'}}, 'id': '5d93d633e4b0c4f70d0d76d5', 'title': 'This is a new pbrown test item', 'provenance': {'annotation': 'Python ScienceBase REST test script', 'dateCreated': '2019-10-01T22:41:55Z', 'lastUpdated': '2019-10-01T22:41:55Z', 'lastUpdatedBy': 'pbrown@usgs.gov', 'createdBy': 'pbrown@usgs.gov'}, 'hasChildren': False, 'parentId': '5ad0e39de4b0e2c2dd1eb0dd', 'permissions': {'read': {'acl': ['USER:pbrown@usgs.gov'], 'inherited': True, 'inheritsFromId': '5ad0e39de4b0e2c2dd1eb0dd'}, 'write': {'acl': ['USER:pbrown@usgs.gov'], 'inherited': True, 'inheritsFromId': '5ad0e39de4b0e2c2dd1eb0dd'}}, 'distributionLinks': [], 'locked': False}


In [40]:
# Upload a file to the newly created item
new_item = sb.upload_file_to_item(new_item, r"C:\SB_Batch\jsontest.json")
print ("FILE UPDATE: " + str(new_item))

FILE UPDATE: {'link': {'rel': 'self', 'url': 'https://www.sciencebase.gov/catalog/item/5d93d82fe4b0c4f70d0d76df'}, 'relatedItems': {'link': {'url': 'https://www.sciencebase.gov/catalog/itemLinks?itemId=5d93d82fe4b0c4f70d0d76df', 'rel': 'related'}}, 'id': '5d93d82fe4b0c4f70d0d76df', 'title': 'This is a new pbrown test item', 'provenance': {'annotation': 'Python ScienceBase REST test script', 'dateCreated': '2019-10-01T22:50:23Z', 'lastUpdated': '2019-10-01T22:50:23Z', 'lastUpdatedBy': 'pbrown@usgs.gov', 'createdBy': 'pbrown@usgs.gov'}, 'hasChildren': False, 'parentId': '5ad0e39de4b0e2c2dd1eb0dd', 'files': [{'name': 'jsontest.json', 'title': None, 'contentType': 'application/json', 'contentEncoding': None, 'pathOnDisk': '__disk__7c/1b/f9/7c1bf97a177b057d3842735c5df81ed0f8a497c8', 'processed': None, 'processToken': None, 'imageWidth': None, 'imageHeight': None, 'size': 4293, 'dateUploaded': '2019-10-01T22:45:16Z', 'uploadedBy': 'pbrown@usgs.gov', 'originalMetadata': None, 'useForPreview':

In [41]:
# Delete the newly created item
ret = sb.delete_item(new_item)
print ("DELETE: " + str(ret))
print ("FILE UPDATE: " + str(new_item))

DELETE: True
FILE UPDATE: {'link': {'rel': 'self', 'url': 'https://www.sciencebase.gov/catalog/item/5d93d82fe4b0c4f70d0d76df'}, 'relatedItems': {'link': {'url': 'https://www.sciencebase.gov/catalog/itemLinks?itemId=5d93d82fe4b0c4f70d0d76df', 'rel': 'related'}}, 'id': '5d93d82fe4b0c4f70d0d76df', 'title': 'This is a new pbrown test item', 'provenance': {'annotation': 'Python ScienceBase REST test script', 'dateCreated': '2019-10-01T22:50:23Z', 'lastUpdated': '2019-10-01T22:50:23Z', 'lastUpdatedBy': 'pbrown@usgs.gov', 'createdBy': 'pbrown@usgs.gov'}, 'hasChildren': False, 'parentId': '5ad0e39de4b0e2c2dd1eb0dd', 'files': [{'name': 'jsontest.json', 'title': None, 'contentType': 'application/json', 'contentEncoding': None, 'pathOnDisk': '__disk__7c/1b/f9/7c1bf97a177b057d3842735c5df81ed0f8a497c8', 'processed': None, 'processToken': None, 'imageWidth': None, 'imageHeight': None, 'size': 4293, 'dateUploaded': '2019-10-01T22:45:16Z', 'uploadedBy': 'pbrown@usgs.gov', 'originalMetadata': None, 'us

In [None]:
# Upload multiple files to create a new item
# Seems looping through a file array using the above may be the best for a batch file upload.
    ret = sb.upload_files_and_create_item(sb.get_my_items_id(), ['sciencebasepy.py','readme.md'])
    print str(ret)

In [42]:
# List file info from the newly created item
ret = sb.get_item_file_info(new_item)
for fileinfo in ret:
    print ("File " + fileinfo["name"] + ", " + str(fileinfo["size"]) + "bytes, download URL " + fileinfo["url"])

File jsontest.json, 4293bytes, download URL https://www.sciencebase.gov/catalog/file/get/5d93d82fe4b0c4f70d0d76df?f=__disk__7c%2F1b%2Ff9%2F7c1bf97a177b057d3842735c5df81ed0f8a497c8


## Lets list and search my ScienceBase items

In [None]:
# Search
items = sb.find_items_by_any_text(username)
while items and 'items' in items:
    for item in items['items']:
        print (item['title'])
    items = sb.next(items)

# -----------------------------------------------------------------------------------------------------------

## Logout of ScienceBase

In [46]:

# Logout
sb.logout()
#Check to see if logout is successful
status = sb.is_logged_in()
print('Current login status is ' + str(status))

Current login status is False
