This notebook provides an example of running through and making a bunch of DOI updates based on records in ScienceBase. In this case, we had previously reserved DOIs, all pointing at ScienceBase Items as their de-referencing URLs, and we now need to finalize the DOIs and turn them on for real. It works with Brandon Serna's usgs_datatools package, where the latest version works with the new DOI REST API. Most of the parts and pieces of this should be fairly easily reused by others needing to do something similar.

The first few blocks here do all the same things Brandon has shown in other examples, setting up a session with the DOI tool to do work.

In [1]:
import os
import json
import requests
import getpass
from IPython.display import display

from usgs_datatools import doi

In [2]:
username = 'sbristol@usgs.gov'
password = getpass.getpass('USGS AD Password: ')

USGS AD Password: ········


In [3]:
doi_session = doi.DoiSession(env='production')

In [4]:
doi_session.doi_authenticate(username, password)

<usgs_datatools.doi.DoiSession at 0x1083e07f0>

In this case, I'm starting from the perspective of the existing ScienceBase Items where we previously recorded the reserved DOIs as one of the identifiers. I need to validate that all the critical information associated with that DOI matches what's in the ScienceBase Item, resetting a couple of attributes along the way and making the DOI public. I could just run it all in one process that goes through each ScienceBase Item, conencts to the DOI tool, and does the work, but since we are running this in a Notebook environment, we can simply build out a stripped down data object in memory and then run against it.

We're dealing with two different ScienceBase collections with 1,719 items in each one, so we need to paginate through ScienceBase search results to build out what we need. You could also do this with pysb and would need to use something like that if dealing with restricted items. Because we just finished the data release process on these and made the collections public, we can simply access the ScienceBase REST API with requests.

In [5]:
sbItems = []
for collectionID in ['527d0a83e4b0850ea0518326', '5951527de4b062508e3b1e79']:

    nextLink = 'https://www.sciencebase.gov/catalog/items?max=100&format=json&fields=title,identifiers&parentId='+collectionID

    while nextLink is not None:
        if nextLink is not None:
            sbResult = requests.get(nextLink).json()

            if 'nextlink' in sbResult.keys():
                nextLink = sbResult['nextlink']['url']
            else:
                nextLink = None

            if len(sbResult['items']) != 0:
                for item in sbResult['items']:
                    thisSBItem = {'id':item['id']}
                    thisSBItem['title'] = item['title']
                    thisSBItem['doi'] = next((i['key'] for i in item["identifiers"] if i['type'] == 'doi'), None)
                    sbItems.append(thisSBItem)


In [6]:
# See if we pulled the expected number of items - 3,438
print(len(sbItems))

3438


In [8]:
for index,sbItem in enumerate(sbItems):
    if index > 5:
        break
    
    thisDOIRecord = doi_session.get_doi(sbItem['doi'])

    # Potentially problematic response
    display (thisDOIRecord)

    thisDOIDict = json.loads(thisDOIRecord['message'])
    
    # Create a dictionary to conduct the update containing all the information it's supposed to have
    thisDoi = sbItem.copy()
    thisDoi.pop('doi')
    thisDoi.pop('id')
    # Set some hard parameters
    thisDoi['pubdate'] = '2018'
    thisDoi['ipdsNumbers'] = [{'ipdsNumber': '082267', 'ipdsType': 'DATA_RELEASE'}]
    thisDoi['dataSourceId'] = 59507
    thisDoi['dataSourceName'] = 'Core Science Analytics, Synthesis and Libraries'
    
    # Check to make sure the DOI URL matches the ScienceBase Item
    if thisDOIDict['url'].split('/')[-1] != sbItem['id']:
        thisDoi['url'] = 'https://www.sciencebase.gov/catalog/item/'+sbItem['id']
    
    # This will be the DOI update package plus the parameter to make it public
    display(thisDoi)
        

{'error': 200,
 'message': '{"doi":"doi:10.5066/F7K935WV","title":"Philadelphia Vireo (Vireo philadelphicus) Habitat Map","pubDate":"2017","url":"https://www.sciencebase.gov/catalog/item/58fa530ee4b0b7ea545252af","resourceType":"Model","date":null,"dateType":null,"description":"This dataset represents a species habitat distribution model for Philadelphia Vireo.  These habitat maps are created by applying a <a href=\\"https://www.sciencebase.gov/catalog/item/527d0a83e4b0850ea0518326\\">deductive habitat model</a> to remotely-sensed data layers within a species\' range.","subject":"USGS Data Release","username":"dwieferich@usgs.gov","status":"reserved","noDataReleaseAvailableReason":null,"noPublicationIdAvailable":false,"dataSourceId":59507,"dataSourceName":"Core Science Analytics, Synthesis and Libraries","linkCheckingStatus":null,"formatTypes":[],"authors":[{"authorName":"Prior-Magee, Julie","orcId":"0000-0003-4031-1885","nameType":"Personal","position":0},{"authorName":"McKerrow, Alex

{'dataSourceId': 59507,
 'dataSourceName': 'Core Science Analytics, Synthesis and Libraries',
 'ipdsNumbers': [{'ipdsNumber': '082267', 'ipdsType': 'DATA_RELEASE'}],
 'pubdate': '2018',
 'title': 'Philadelphia Vireo (Vireo philadelphicus) bPHVIx_CONUS_2001v1 Habitat Map'}

{'error': 200,
 'message': '{"doi":"doi:10.5066/F75719FH","title":"Shadow Chipmunk (Tamias senex) Habitat Map","pubDate":"2017","url":"https://www.sciencebase.gov/catalog/item/58fa7982e4b0b7ea54525b25","resourceType":"Model","date":null,"dateType":null,"description":"This dataset represents a species habitat distribution model for Shadow Chipmunk.  These habitat maps are created by applying a <a href=\\"https://www.sciencebase.gov/catalog/item/527d0a83e4b0850ea0518326\\">deductive habitat model</a> to remotely-sensed data layers within a species\' range.","subject":"USGS Data Release","username":"dwieferich@usgs.gov","status":"reserved","noDataReleaseAvailableReason":null,"noPublicationIdAvailable":false,"dataSourceId":59507,"dataSourceName":"Core Science Analytics, Synthesis and Libraries","linkCheckingStatus":null,"formatTypes":[],"authors":[{"authorName":"McKerrow, Alexa","orcId":"0000-0002-8312-2905","nameType":"Personal","position":1},{"authorName":"Prior-Magee, Julie","orcId":"00

{'dataSourceId': 59507,
 'dataSourceName': 'Core Science Analytics, Synthesis and Libraries',
 'ipdsNumbers': [{'ipdsNumber': '082267', 'ipdsType': 'DATA_RELEASE'}],
 'pubdate': '2018',
 'title': 'Shadow Chipmunk (Tamias senex) mSHCHx_CONUS_2001v1 Habitat Map'}

{'error': 200,
 'message': '{"doi":"doi:10.5066/F7H130D4","title":"Chihuahuan Grasshopper Mouse (Onychomys arenicola) Habitat Map","pubDate":"2017","url":"https://www.sciencebase.gov/catalog/item/58fa63ade4b0b7ea545257ad","resourceType":"Model","date":null,"dateType":null,"description":"This dataset represents a species habitat distribution model for Chihuahuan Grasshopper Mouse.  These habitat maps are created by applying a <a href=\\"https://www.sciencebase.gov/catalog/item/527d0a83e4b0850ea0518326\\">deductive habitat model</a> to remotely-sensed data layers within a species\' range.","subject":"USGS Data Release","username":"dwieferich@usgs.gov","status":"reserved","noDataReleaseAvailableReason":null,"noPublicationIdAvailable":false,"dataSourceId":59507,"dataSourceName":"Core Science Analytics, Synthesis and Libraries","linkCheckingStatus":null,"formatTypes":[],"authors":[{"authorName":"Prior-Magee, Julie","orcId":"0000-0003-4031-1885","nameType":"Personal","position":0},{"authorNa

{'dataSourceId': 59507,
 'dataSourceName': 'Core Science Analytics, Synthesis and Libraries',
 'ipdsNumbers': [{'ipdsNumber': '082267', 'ipdsType': 'DATA_RELEASE'}],
 'pubdate': '2018',
 'title': 'Chihuahuan Grasshopper Mouse (Onychomys arenicola) mCGMOx_CONUS_2001v1 Habitat Map'}

{'error': 200,
 'message': '{"doi":"doi:10.5066/F7FJ2F43","title":"Pine Grosbeak (Pinicola enucleator) Habitat Map","pubDate":"2017","url":"https://www.sciencebase.gov/catalog/item/58fa5318e4b0b7ea545252b2","resourceType":"Model","date":null,"dateType":null,"description":"This dataset represents a species habitat distribution model for Pine Grosbeak.  These habitat maps are created by applying a <a href=\\"https://www.sciencebase.gov/catalog/item/527d0a83e4b0850ea0518326\\">deductive habitat model</a> to remotely-sensed data layers within a species\' range.","subject":"USGS Data Release","username":"dwieferich@usgs.gov","status":"reserved","noDataReleaseAvailableReason":null,"noPublicationIdAvailable":false,"dataSourceId":59507,"dataSourceName":"Core Science Analytics, Synthesis and Libraries","linkCheckingStatus":null,"formatTypes":[],"authors":[{"authorName":"McKerrow, Alexa","orcId":"0000-0002-8312-2905","nameType":"Personal","position":1},{"authorName":"Prior-Magee, Julie","orcId":

{'dataSourceId': 59507,
 'dataSourceName': 'Core Science Analytics, Synthesis and Libraries',
 'ipdsNumbers': [{'ipdsNumber': '082267', 'ipdsType': 'DATA_RELEASE'}],
 'pubdate': '2018',
 'title': 'Pine Grosbeak (Pinicola enucleator) bPIGRx_CONUS_2001v1 Habitat Map'}

{'error': 200,
 'message': '{"doi":"doi:10.5066/F7H41PR5","title":"Henslow\'s Sparrow (Ammodramus henslowii) bHESPx_CONUS_2001v1 Habitat Map","pubDate":"2018","url":"https://www.sciencebase.gov/catalog/item/58fa4d4fe4b0b7ea545250a1","resourceType":"Model","date":null,"dateType":null,"description":"This dataset represents a species habitat distribution model for Henslow\'s Sparrow.  These habitat maps are created by applying a <a href=\\"https://www.sciencebase.gov/catalog/item/527d0a83e4b0850ea0518326\\">deductive habitat model</a> to remotely-sensed data layers within a species\' range.","subject":"USGS Data Release","username":"sbristol@usgs.gov","status":"reserved","noDataReleaseAvailableReason":null,"noPublicationIdAvailable":false,"dataSourceId":59507,"dataSourceName":"Core Science Analytics, Synthesis and Libraries","linkCheckingStatus":null,"formatTypes":[],"authors":[{"authorName":"Prior-Magee, Julie","orcId":"0000-0003-4031-1885","nameType":"Personal","position":0},{"authorNam

{'dataSourceId': 59507,
 'dataSourceName': 'Core Science Analytics, Synthesis and Libraries',
 'ipdsNumbers': [{'ipdsNumber': '082267', 'ipdsType': 'DATA_RELEASE'}],
 'pubdate': '2018',
 'title': "Henslow's Sparrow (Ammodramus henslowii) bHESPx_CONUS_2001v1 Habitat Map"}

{'error': 200,
 'message': '{"doi":"doi:10.5066/F7XG9PJP","title":"Southern Red-backed Vole (Myodes gapperi) Habitat Map","pubDate":"2017","url":"https://www.sciencebase.gov/catalog/item/58fa7bb7e4b0b7ea54525b72","resourceType":"Model","date":null,"dateType":null,"description":"This dataset represents a species habitat distribution model for Southern Red-backed Vole.  These habitat maps are created by applying a <a href=\\"https://www.sciencebase.gov/catalog/item/527d0a83e4b0850ea0518326\\">deductive habitat model</a> to remotely-sensed data layers within a species\' range.","subject":"USGS Data Release","username":"dwieferich@usgs.gov","status":"reserved","noDataReleaseAvailableReason":null,"noPublicationIdAvailable":false,"dataSourceId":59507,"dataSourceName":"Core Science Analytics, Synthesis and Libraries","linkCheckingStatus":null,"formatTypes":[],"authors":[{"authorName":"McKerrow, Alexa","orcId":"0000-0002-8312-2905","nameType":"Personal","position":1},{"authorName":"Prior-Magee

{'dataSourceId': 59507,
 'dataSourceName': 'Core Science Analytics, Synthesis and Libraries',
 'ipdsNumbers': [{'ipdsNumber': '082267', 'ipdsType': 'DATA_RELEASE'}],
 'pubdate': '2018',
 'title': 'Southern Red-backed Vole (Myodes gapperi) mSRBVx_CONUS_2001v1 Habitat Map'}