# Infocard Prototype

This prototype seeks to sketch out possible interfaces that can pull data from names authority databases and supplement the Primary Source Coop.

Links:

SNAC ID -> SNAC Bio, Resources

LCNAF ID -> Wikidata Q ID, VIAF
https://id.loc.gov/authorities/names/n80001490.html

Wikidata -> Image Field (p18), Gender (p21), Occupation (p106), Position held (p39)

VIAF -> Works

In [17]:
import requests, re
import pandas as pd
import numpy as np
import xml.etree.ElementTree as ET
from bs4 import BeautifulSoup

abs_dir = '/Users/quinn.wi/Documents/Data/JQA_pre-2021-04-14/'

In [2]:
%%time

names_data = pd.read_excel(abs_dir + 'DJQA_Names-List_singleSheet.xlsx', index_col = None) 

names_data.columns = names_data.columns.str.replace('\s', '_')

# names_data = names_data.query('(Last_Name == "Randolph") & (First_Name == "John")')

names_data = names_data.dropna(subset = ['LC_Name_Authority'])

CPU times: user 1min 10s, sys: 200 ms, total: 1min 10s
Wall time: 1min 10s


Expected Data Structure:

```json
{xml_id: 
    {
        loc_id: {info_keys: info_values},
        snac_id: {info_keys: info_values},
        viaf_id: {info_keys: info_values},
        wiki_id: {info_keys: info_values}
    }
}
```

In [40]:
%%time

def parseLOC(identifier):
#     Lookup URI.
    url = f"https://id.loc.gov/authorities/names/{identifier}.madsxml.xml"
    response = requests.get(url).text
    
#     Parse XML with namespace from string.
    root = ET.fromstring(response)
    namespace = re.match(r"{(.*)}", str(root.tag))
    ns = {"ns":namespace.group(1)}
    
#     Gather information.
    namePart = root.find('.//ns:name[@authority="naf"]/ns:namePart', ns).text
    birthDeathDate = root.find('.//ns:name[@authority="naf"]/ns:namePart[@type="date"]', ns).text
    genderedTerm = root.find('.//ns:genderTerm', ns).text
    
    return {identifier: {"name": namePart, "birthDeath": birthDeathDate, "genderedTerm": genderedTerm}}

# Read LOC html to get Wikidata & VIAF.
def locSoup(identifier):
    url = f"https://id.loc.gov/authorities/names/{identifier}.html"
    response = requests.get(url).text
    locSoup = BeautifulSoup(response)

    wiki = locSoup.find('span', {'href': re.compile(r'http://www.wikidata.org/entity/.*')})['href']
    viaf = locSoup.find('a', {'href': re.compile(r'http://viaf.org/viaf/.*')})['href']
    
    return {'wiki': {'url': wiki}, 'viaf': {'url': viaf}}
    

def parseSNAC(identifier):
#     Lookup URI & get JSON format.
    url = f"https://snaccooperative.org/download/{identifier}?type=constellation_json"
    response = requests.get(url).json()

    namePart = response['nameEntries'][0]['original']
    birthDeathDate = re.search('\d{4}-\d{4}', namePart).group(0)
    
    return {identifier: {'name': namePart, 'birthDeath': birthDeathDate}}

CPU times: user 7 µs, sys: 0 ns, total: 7 µs
Wall time: 10 µs


In [41]:
%%time

# locID for Lydia Child
locID = "n80001490"

# snacID
snacID = '84910652'


data = {}

data['loc'] = parseLOC(locID)
data['snac'] = parseSNAC(snacID)

data['wiki'] = locSoup(locID)['wiki']
data['viaf'] = locSoup(locID)['viaf']

data

CPU times: user 103 ms, sys: 11.4 ms, total: 115 ms
Wall time: 5.91 s


{'loc': {'n80001490': {'name': 'Child, Lydia Maria',
   'birthDeath': '1802-1880',
   'genderedTerm': 'female'}},
 'snac': {'84910652': {'name': 'Child, Lydia Maria, 1802-1880',
   'birthDeath': '1802-1880'}},
 'wiki': {'url': 'http://www.wikidata.org/entity/Q443132'},
 'viaf': {'url': 'http://viaf.org/viaf/sourceID/LC%7Cn++80001490#skos:Concept'}}

In [42]:
%%time

wiki_url = data['wiki']['url']

wiki_url

CPU times: user 12 µs, sys: 1 µs, total: 13 µs
Wall time: 15.3 µs


'http://www.wikidata.org/entity/Q443132'