# Extracting and visualizing spatial data from loc.gov JSON API with folium

Digital mapping has become an increasingly accessible and applicable tool for humanities scholarship. Embedded within digital collections available from the Library of Congress website are geographic data, including the locations of items and their environmental contexts. In this tutorial, we demonstrate how loc.gov JSON API users can find and store spatial information about their items of interest in order to plot them as points on a map interface, all within a Jupyter notebook.

### Rights and access

Rights and restrictions, including copyright, affect how you can use images, particularly if you want to publish, display, or otherwise distribute them. You can read more in About Copyright and the Collections and Copyright and Other Restrictions That Apply to Publication/Distribution of Images: Assessing the Risk of Using a P&P Image.

### Data quality

Consistency and accuracy of geographic information stored with digital content on the Library of Congress website varies within and across collections. This tutorial was designed a method of exploring existing spatially-referenced data on items. Finding and analyzing meaningful patterns may require additional data corrections, attributes, and geocoding frameworks to ensure optimal coverage.

# SHOW THE RESULTS FIRST:


### Example: Historic American Engineering Survey

One of the remarkable aspects of digital collections with the Library of Congress is the outsized contribution of just a few contributors. Julia Christianson

Working with data for analysis of any kind is mostly spent cleaning and grappling with the data. Mapping coordinates stored from the Historic American Engineering Record (HAER) is no exception - gaps, errors, duplicates. Spatial methods are important however for a number of purposes, and in this notebook, we will extract the "latlongs" from a portion of HAER and plot them. 
spatial-izing items: putting latlongs from JSON on the map

This expanding collection documents more than 7,600 structures, sites, and industrial processes through large-format photographs, measured drawings, and written historical reports. Included are a wide range of industries and transportation facilities throughout the United States and its territories such as factories, mills, bridges, canals, and railroads."""

HABS: http://www.loc.gov/pictures/collection/hh/

In [None]:
### The following guide will demonstrate how to map the Historic American Engineering Record (HAER). 
### With minor changes, the same process of spatial data extraction and visualization could be applied
### to other collections containing explicit geographic information. 

##### [insert map jpg]

Import a few Python packages for pulling, cleaning, and visualizing our data. 

In [1]:
# The recommended convention in Python's own documentation is to import everything at the top, and on separate lines.

import pandas as pd ## for data manipulation
import requests ## for API requests
import folium ## for visualizing data on a Leaflet.js map

# Data consistency and information about collections

## Identifying items

Getting up to speed with use of the loc.gov JSON API and Python to access the collection was a breeze, thanks to data exploration resources from former LC Labs resident Laura Wrubel, Software development librarian at GWU.

Grab more tips for loc.gov JSON API calls and URL parameterization from Laura's 'Accessing images for analysis' notebook:
https://github.com/lwrubel/data-exploration/blob/master/Accessing%20imagebs%20for%20analysis.ipynb

__About the Historic American Buildings Survey/Historic American Engineering Record/Historic American Landscapes Survey__

The Historic American Buildings Survey (HABS) and the Historic American Engineering Record (HAER) collections are among 
the largest and most heavily used in the Prints and Photographs Division of the Library of Congress. 
Since 2000, documentation from the Historic American Landscapes Survey (HALS) has been added to the holdings. 
The collections document achievements in architecture, engineering, and landscape design in the United States and its territories 
through a comprehensive range of building types, engineering technologies, and landscapes, including examples as diverse as 
the Pueblo of Acoma, houses, windmills, one-room schools, the Golden Gate Bridge, and buildings designed by Frank Lloyd Wright.

### Justine Christianson and the Historic American Buildings Survey

The Library of Congress has a digitized collection of Baseball Cards collection from the late 19th and eary 20th century. Here's an example:

## Gathering geography



In [5]:
# Many of the prints & photographs in HAER are tagged with geographic coordinates ('latlong')
# Using the requests package we imported, we can easily 'get' data for an item as JSON and parse it for our latlong:

get_any_item = requests.get("https://www.loc.gov/item/al0006/?fo=json")
print('latlong: {}'.format(get_any_item.json()['item']['latlong']))

latlong: 32.45977,-86.47767


In [10]:
# To retrieve this sort of data point for a set of search results, we can use Laura's get_image_urls function. 
# This will allow us to store the latlong from each item in a list, working through each page of the search.

def get_image_urls(url, items=[]):
    '''
    Retrieves the lat_longs for items that have public URLs available. 
    Skips over items that are for the colletion as a whole or web pages about the collection.
    Handles pagination. 
    '''
    # request pages of 100 results at a time
    params = {"fo": "json", "c": 100, "at": "results,pagination"}
    call = requests.get(url, params=params)
    data = call.json()
    results = data['results']
    for result in results:
        # don't try to get images from the collection-level result
        if "collection" not in result.get("original_format") and "web page" not in result.get("original_format"):
            # take the last URL listed in the image_url array
            item = result.get("id")
            items.append(item)
    if data["pagination"]["next"] is not None: # make sure we haven't hit the end of the pages
        next_url = data["pagination"]["next"]
        #print("getting next page: {0}".format(next_url))
        get_lat_longs(next_url, items)
        
    return items

To demonstrate with a manageable example, I'll use a search that targets items from HAER with the subject 'Urban Growth'.

In [15]:
url = "https://www.loc.gov/search/?fa=subject:urban+growth&fo=json"

This is the base URL we will use for the API requests we'll be making as we run the function.

In [14]:
# Once you've found a search that targets the items that interest you, copy the URL. 
# That will be the base URL for your API request.

print('Item links for Urban Growth: {}'.format(get_image_urls(url, items=[])))

Item links for Urban Growth: ['http://www.loc.gov/item/2009617010/', 'http://www.loc.gov/item/mt0074/', 'http://www.loc.gov/item/ut0208/', 'http://www.loc.gov/item/co0214/', 'http://www.loc.gov/item/ct0415/', 'http://www.loc.gov/item/nj0984/', 'http://www.loc.gov/item/co0164/', 'http://www.loc.gov/item/co0073/', 'http://www.loc.gov/item/nj1002/', 'http://www.loc.gov/item/mt0165/', 'http://www.loc.gov/item/mt0039/', 'http://www.loc.gov/item/ma0547/', 'http://www.loc.gov/item/co0039/', 'http://www.loc.gov/item/ma0544/', 'http://www.loc.gov/item/co0217/', 'http://www.loc.gov/item/ct0371/', 'http://www.loc.gov/item/ma0543/', 'http://www.loc.gov/item/nj1004/', 'http://www.loc.gov/item/ma0972/', 'http://www.loc.gov/item/co0069/', 'http://www.loc.gov/item/ut0130/', 'http://www.loc.gov/item/or0150/', 'http://www.loc.gov/item/ma1293/', 'http://www.loc.gov/item/nj0874/', 'http://www.loc.gov/item/md1573/', 'http://www.loc.gov/item/ma1290/', 'http://www.loc.gov/item/mo0508/', 'http://www.loc.gov/i

In [23]:
lastl = pd.read_csv('latlongtest2.csv')
t0 = dt.datetime.now()

In [67]:
lastl = lastl[['latitude', 'longitude']]

In [16]:
#lastl

In [69]:
NYC_COORD = [40.7128, -74.0059]

map_nyc = folium.Map(zoom_start=12, 
tiles='cartodbpositron', width=640, height=480)

In [70]:
len(lastl['latitude'])

27

In [71]:
type(lastl)
lastlist = lastl.values.tolist()
lastlist

[[42.64135, -71.31376999999999],
 [39.767309999999995, -75.57489],
 [41.4860186, -73.0509432],
 [39.435590000000005, -75.68289],
 [42.372915, -71.234792],
 [42.6453, -71.31752],
 [42.65071, -71.31443],
 [42.64492, -71.31347],
 [42.6503, -71.32569000000001],
 [48.19983, -114.31416000000002],
 [42.34571, -71.05859],
 [43.13769, -72.44866999999999],
 [41.95842570000001, -73.4934565],
 [42.35413, -71.09141],
 [42.64583, -71.30649],
 [42.6486, -71.33192],
 [41.82573, -71.40908],
 [33.52213, -92.78894],
 [42.647819, -71.307349],
 [42.64124, -71.41597],
 [39.102579999999996, -94.5929],
 [47.52036, -111.28955],
 [39.29063, -76.61259],
 [32.45977, -86.47766999999999],
 [39.74944, -75.54334],
 [39.7391666667, -104.984722222],
 [45.57691, -122.74565]]

In [72]:
range(len(lastlist))

range(0, 27)

In [73]:
for i in range(len(lastlist)): 
    folium.CircleMarker(lastlist[i], radius=1, color='#0080bb', fill_color='#0080bb').add_to(map_nyc) 

In [74]:
map_nyc

In [None]:
# folium builds on the data wrangling strengths of the Python ecosystem and the mapping strengths of the Leaflet.js library. 
# Manipulate your data in Python, then visualize it in a Leaflet map 

### Spelling out in a narrative way how I made decisions about subsetting the collection and gathering data

In [11]:
len(item_ids)

76

In [12]:
p1 = {"fo" : "json"}

ite = set()

# loop through the item urls (item_ids) and extract latlong

for item in item_ids:
    r3 = requests.get(item, params=p1)
    try:
        data = r3.json()
        results = data['item']['latlong']
        ite.add(results)
    except:
        pass
    
ite

KeyboardInterrupt: 

### Narrative here: excluded duplicates; how many have duplicates? how many didn't have latlong?

#### Beat the dead horse about what's possible, how items don't all some info, etc.

In [77]:
# convert latlong set to list
latlong_list = list(ite)

# convert list to pandas dataframe
df = pd.DataFrame(latlong_list)

# split coordinates into two columns
df = df[0].str.split(',', expand=True)

# rename columns with latitude and longitude
df = df.rename(columns={0:'latitude', 1:'longitude'})

# print df to csv
# df.to_csv('latlongtest2.csv')

In [12]:
# crawl entire HABS collection for latlongs
r2 = requests.get("https://www.loc.gov/collections/historic-american-buildings-landscapes-and-engineering-records/?fa=contributor:historic+american+engineering+record&fo=json")
#r2.json()

# let's grab latlongs for whole of HABS

In [76]:
# running our function to get latlongs again - this time for all search results for HABS by Justine Christianson in California
url2 = "https://www.loc.gov/collections/historic-american-buildings-landscapes-and-engineering-records/?fa=location:california%7Ccontributor:christianson,+justine"
item_ids2 = get_lat_longs(url2, items=[])
#item_ids

In [77]:
len(item_ids2)
# might need to shrink this puppy down

154

In [78]:
p1 = {"fo" : "json"}

ite2 = set()

# loop through the item urls (item_ids) and extract latlong

for item in item_ids2:
    r3 = requests.get(item, params=p1)
    try:
        data = r3.json()
        results = data['item']['latlong']
        ite2.add(results)
    except:
        pass

In [None]:
# convert latlong set to list
latlong_list2 = list(ite2)

# convert list to pandas dataframe
df2 = pd.DataFrame(latlong_list2)

# split coordinates into two columns
df2 = df2[0].str.split(',', expand=True)

# rename columns with latitude and longitude
df2 = df2.rename(columns={0:'latitude', 1:'longitude'})

# print df to csv
df2.to_csv('latlongtest_cali500.csv')

# Conclusion

In [4]:
df = pd.read_csv('latlongtest2.csv')
cc = cartoframes.CartoContext(base_url="htts://cmoffett.carto.com",
                              api_key=APIKEY)
cc.write(df, 'acadia_biodiversity')

NameError: name 'APIKEY' is not defined