This notebook explores the dynamic of querying a system we are calling iSAID (Integrated Science Assessment Information Database) for information about people. Functionality is built into the [pylinkedcmd](https://github.com/skybristol/pylinkedcmd) package in the Isaid class. The functions shown here work against an experimental information cache exposed via a GraphQL end point that is accessible through functions in this codebase. The database contains cached information about all USGS staff from the following locations:

* ScienceBase Directory
* USGS Pubs Warehouse
* USGS Profile Pages

Information from these sources is synthesized to generate the start to an automated research record for individual staff. The ScienceBase Directory is used as the most complete and reasonable access point for USGS staff records. Processes are run against Pubs Warehouse metadata to optimize data for analyzing co-author connections and associated organizational affiliations. Information is scraped from USGS Profile Pages into a data structure for use because there is currently no API or some type of programmatic access to those sources.

In [11]:
import pylinkedcmd
from json2html import *
from IPython.display import HTML
import ipywidgets as widgets
from ipywidgets import interact, interact_manual
import random

cmd_isaid = pylinkedcmd.pylinkedcmd.Isaid()

I'll eventually build a better mechanism for interacting with these data. In the near term, if you want to look for personnel records from a specific organization, you can pick something from the output list of organizations and put it into the criteria. The get_people() function will actually respond to search criteria on anything in the database, so you can probably figure out how to run lots of interesting searches with just a little sleuthing.

In [10]:
organizations = cmd_isaid.get_organizations()
org_name_list = [i["organization_name"] for i in organizations if i["organization_name"] is not None]
org_name_list.sort()
org_name_list

['Alaska Science Center',
 'Arizona Water Science Center',
 'Astrogeology Science Center',
 'California Water Science Center',
 'Caribbean-Florida Water Science Center',
 'Central Energy Resources Science Center',
 'Central Midwest Water Science Center',
 'Climate Adaptation Science Centers',
 'Coastal and Marine Hazards and Resources Program',
 'Colorado Water Science Center',
 'Columbia Environmental Research Center',
 'Contaminant Biology Program',
 'Cooperative Research Units',
 'Core Research Center',
 'Dakota Water Science Center',
 'Deprecated[17464] Management Services Operations, Sacramento',
 'Earth Resources Observation and Science (EROS) Center',
 'Earthquake Hazards Program',
 'Earthquake Science Center',
 'Eastern Energy Resources Science Center',
 'Eastern Mineral and Environmental Resources Science Center',
 'Energy Resources Program',
 'Federal Geographic Data Committee',
 'Florence Bascom Geoscience Center',
 'Forest and Rangeland Ecosystem Science Center',
 'Fort Col

In [12]:
random_org = random.choice(org_name_list)
print(random_org)

people = cmd_isaid.get_people(criteria=random_org, parameter="organization_name")
email_list = [i["identifier_email"] for i in people]
email_list.sort()

@interact
def show_org_people(email=email_list):
    person_doc = cmd_isaid.assemble_person_record(email)
    display(HTML(json2html.convert(json=person_doc)))

Landslide Hazards Program


interactive(children=(Dropdown(description='email', options=('jgodt@usgs.gov', 'sslaughter@usgs.gov'), value='…