# Checking ISBNs in Bobcat using Selenium

**Background**
The library received a donation of a large book collection and we need to check this list of book against current holding both here and in the NYU Library system generally. This can be most easily done by checking ISBNs against our library management system (LMS). 

**Problem addressed**
Checking the booklist against the LMS manually—i.e. entering each item into the online library catalog—works, but is time-consuming. Let's estimate this at 3 minutes per book for lookup and recording (i.e. yes/no as to whether it is already held in the library system). For a large collection, the time to check books adds up quickly.

**Proposed solution**
Two points from the start: 1. we already record ISBNs from donated books as part of the inventory process and store them in a list (currently in a spreadsheet); 2. the library's online catalog can search for books by ISBN. Accordingly, my proposed solution to this problem is use a Python script to iteratre over the list of ISBNS and retrieve catalog information in an automated manner.

**Code outline**
- Spreadsheet data—i.e. the ISBNs—is moved to a plaintext list, which is read into memory using Python.
- Browser session is initiated using `selenium`/`geckodriver`.
- (ISBNs are validated using `isbnlib`, etc.)
- URL request (composed of a base URL and a query parameter for the ISBN) is sent to library catalog
- Using `BeautifulSoup`, the html source for the URL is checked for the presence of a div with class 'alert'. If a search result page has an 'alert' block, it means that the given book is not in the catalog and it is assigned a value of False.
- Matches—i.e. True/False for each ISBN—are stored in a list and output to a .CSV file.

**Future direction**
Direct API access to the LMS to retrieve book data by ISBN would be a preferable way to accomplish this goal. This does not seem possible as present (and if it is, please let me know!). PJB, 11.17.17

In [None]:
# Imports

import os
import time

import csv

from selenium import webdriver
from requests import get
from bs4 import BeautifulSoup

from isbnlib import is_isbn10, is_isbn13, clean

In [None]:
# Statics
# Note: this URL works, but could probably be cleaned up
# Note: this search may be restricted to NYU ip addresses

base_url = "http://bobcat.library.nyu.edu/primo-explore/search?search_scope=all&sortby=rank&vid=NYU&lang=en_US&query=isbn,exact,"

infile = "data/isbns.txt"

In [None]:
# Read a txt file of isbns

with open(infile, "r") as f:
    isbns = f.read().splitlines()

In [None]:
# Function to validate isbns

def validate_isbn(isbn):
    return True if is_isbn13(isbn) or is_isbn10(isbn) else False

def pad_isbn(isbn):
    if len(clean(isbn)) < 10:
        return '0' * (10-len(isbn)) + isbn
    return isbn

In [None]:
# Create browser instance
# Note: this requires Firefox & Geckodriver to be installed

browser = webdriver.Firefox(executable_path='/usr/local/bin/geckodriver')

In [None]:
# Function for finding isbn matches in Bobcat (via Selenium)

def check_bobcat_isbn(isbn):
    #check_bobcat = False # Set default return
    
    valid_isbn = validate_isbn(isbn)
    
    if valid_isbn:
        url = base_url + isbn # Build URL string
    elif validate_isbn(pad_isbn(isbn)):
        url = base_url + pad_isbn(isbn) # Build URL string
    else:
        return False
    browser.get(url) # Open url in browser instance; should trap response errors

    time.sleep(2)
    
    html = browser.find_element_by_tag_name('html').get_attribute('innerHTML')
    alert = "No records found" in html
    return False if alert else True


In [None]:
isbns = ['9785990589834', '9785446904327', '9783963270109', '9781781792834']

In [None]:
# Iterate over isbns and find matches

matches = []

for isbn in isbns:
    isbn = pad_isbn(isbn.upper())
    match = check_bobcat_isbn(isbn)
    print(f'Checking ISBN {isbn}. Result: {"Found" if match==True else "Not Found"}')
    matches.append((isbn, match))

In [None]:
# Export to csv

with open('matches.csv','w') as out:
    csv_out=csv.writer(out, quotechar = "'")
    csv_out.writerow(['isbn','match'])
    for row in matches:
        csv_out.writerow(row)

In [None]:
# Close browser instance

browser.quit()