## → Chapter Five  
*The Magic of Open APIs and ISBN Lookups*

🔺After many failed attempts with Chinese websites, I decided to see how many titles I could find using just open databases based in the U.S., and planned to manually fill in the rest.  
Even if only a fraction of the books were found, it would still save time.

🔺To my surprise, I realized the managers had misunderstood the processand accidentally mislead me. When they said they couldn’t retrieve Chinese book info by ISBN, what they really meant was that the data wouldn’t auto-populate in Shopify through a barcode scan—so they gave up and resorted to manual entry. It became clear that these resources had always been available, the people before me just didn’t know where or how to look.  


🔺But in reality, many open platforms do provide structured ISBN data and return a wide range of results. Out of roughly 700 titles:  
- **Zotero** (via WorldCat) returned ~200  
- **ISBNdb** returned ~520  
- **isbnsearch.org** worked well for quick scraping using BeautifulSoup  

🔺I first used the **ISBNdb API**, which returned reliable metadata including publisher, date, genre, and description.  
**Zotero** provided bulk query options through its reference management interface.  
**isbnsearch.org** allowed lightweight scraping for additional results.

Below, I provide the tutorial to:   
- Use the **ISBNdb API** to retrieve detailed information on books by ISBNs
- Scrape **isbnsearch.org** using BeautifulSoup and retrieve basic book information by ISBN without the book descriptions
- Look up book info by ISBN in bulk through **Zotero**

---
## → 第五章  
*踏破铁鞋无觅处，开放数据在身边*

🔺在多次尝试中文网站失败之后，我决定试试看，单纯依赖美国的开放数据库，能找到多少图书的信息，并计划把剩下找不到的部分手动补上。  
即使只能找回一小部分书目，也能节省不少时间。

🔺令我意外的是，我意识到店里的管理人员其实误解了整个流程，并因此误导了我。他们口中的“无法通过ISBN查到中文书籍的信息”，其实是指这些信息无法通过扫码自动出现在Shopify里。于是他们就放弃了自动化的方法，转而全靠人工输入。后来我才意识到，这些资源其实一直都在，只是之前没有人知道该去哪里找，或者怎么用。

🔺事实上，很多开放平台确实提供结构化的ISBN书目信息，并能返回大量结果。在我大约700本书的样本中：  
- **Zotero**（通过 WorldCat）返回了约 200 本  
- **ISBNdb** 返回了约 520 本  
- **isbnsearch.org** 则适合用 BeautifulSoup 快速爬取基础信息  

🔺我首先使用了 **ISBNdb API**，它能稳定返回书籍的出版方、出版日期、分类标签、简介等字段。  
**Zotero** 允许通过它的文献管理功能批量查询ISBN。  
**isbnsearch.org** 则适合快速、轻量地爬取基础字段（不含书籍简介）。

下方是关于以下方法的详细教程：  
- 如何使用 **ISBNdb API** 通过 ISBN 获取详细图书信息  
- 如何使用 **BeautifulSoup** 爬取 **isbnsearch.org** 并提取基础字段（不含简介）  
- 如何通过 **Zotero** 批量查询 ISBN 并获取图书信息

---



### **→ I. Querying ISBNdb API for Book Information**  

**→ 1.❗Requirements:**  
🔺You must register on [ISBNdb.com](https://isbndb.com/) and purchase a plan to access the API.
🔺Once registered, you will receive a personal API key, which you'll use for authorization.

**→ 2. ISBNdb and API explained**  
🔺[ISBNdb](https://isbndb.com/) is a subscription-based API that provides structured metadata for books.
It supports queries by ISBN and is useful for bulk cataloging projects.

🔺An **API** (Application Programming Interface) is a tool that allows different software systems to talk to each other.  
In this case, we are sending a request from our Python script to **ISBNdb.com**, which is a website that stores metadata for books.  
In response, the API sends us structured data about the book we ask for, based on its **ISBN** (International Standard Book Number).

🔺Using the code below, I was able to:   
- Send each isbn in my list to the ISBNdb API using the format `https://api2.isbndb.com/book/{isbn}`  
- Saving the returned data into a structured csv

In [None]:
import pandas as pd
import requests
import time

# define range and api key
start_idx = 234 # you can re-define the start and end ids to choose how many entries from the list you want to query at one time.
end_idx = 235
API_KEY = 'your_api_key_here'  # replace with your actual API key

if API_KEY == '':
    print('Error: you need to subscribe to ISBNdb.com and provide your API key.')

# set headers
HEADERS = {
    'accept': 'application/json',
    'Authorization': API_KEY,
    'Content-Type': 'application/json',
}

# function to query a single isbn
def get_book_info(isbn):
    url = f'https://api2.isbndb.com/book/{isbn}'
    try:
        response = requests.get(url, headers=HEADERS)
        if response.status_code == 200:
            return response.json().get("book", {})
        else:
            print(f"ISBN {isbn} not found or error ({response.status_code})")
            return {}
    except Exception as e:
        print(f"Error fetching ISBN {isbn}: {e}")
        return {}

# process isbn list from csv
def process_isbn_csv(csv_path, start_idx=0, end_idx=None):
    df = pd.read_csv(csv_path)
    if 'CODECONTENT' not in df.columns:
        raise ValueError("CSV must contain a 'CODECONTENT' column with ISBNs.")
    
    isbn_list = df['CODECONTENT'].astype(str).tolist()
    if end_idx is None:
        end_idx = len(isbn_list)

    results = []
    for idx, isbn in enumerate(isbn_list[start_idx:end_idx], start=start_idx):
        isbn = isbn.strip()
        print(f"[{idx}] Fetching: {isbn}")
        book_data = get_book_info(isbn)
        book_data['isbn_searched'] = isbn  
        results.append(book_data)
        time.sleep(1)  # to respect API rate limits

    result_df = pd.json_normalize(results)
    return result_df

# run script (make sure the csv file path, start id, end id are all correct)
if __name__ == "__main__":
    result_df = process_isbn_csv('../data/scannedResults.csv', start_idx, end_idx)
    print(result_df.head())
    result_df.to_csv("isbndbResults.csv", index=False)

### **→ II. Scraping isbnsearch.org for book data by ISBNs**

**→ 1. Website Structure**  

🔺I used BeautifulSoup to scrape the isbnsearch.org website because it does not have any restrictions. I only introcuded a sleep time to be polite. 

🔺Much like of the process I used before with BeautifulSoup, I needed to find the right URL structure for the pages containing information for each book and extract the information from HTML. Scraping this site was even simpler because it has a simple URL structure that will lead me directly to the book info page given an ISBN. 

🔺With the code below, I was able to: 

- Generate a URL for each ISBN using the pattern `https://isbnsearch.org/isbn/{isbn}`
- Use `requests` and `BeautifulSoup` to fetch and parse HTML
- Extract key fields from the `<div class="bookinfo">` section
- Save the results to a structured csv


In [None]:
import pandas as pd
import requests
from bs4 import BeautifulSoup
import time

# path to csv with scanned ISBNs
csv_path = '../data/scannedResults.csv'

# load ISBNs
df = pd.read_csv(csv_path)
isbn_list = df['CODECONTENT'].astype(str).tolist()

# base URL for isbnsearch
base_url = 'https://isbnsearch.org/isbn/'

# list to store results
results = []

for idx, isbn in enumerate(isbn_list):
    url = base_url + isbn
    print(f"[{idx}] Scraping: {url}")

    try:
        response = requests.get(url, timeout=10)
        if response.status_code != 200:
            print(f"Failed to fetch ISBN {isbn} — status {response.status_code}")
            continue

        soup = BeautifulSoup(response.content, 'html.parser')
        info_div = soup.find('div', class_='bookinfo')

        if not info_div:
            print(f"No data found for ISBN {isbn}")
            continue

        data = {
            'isbn': isbn,
            'title': info_div.find('h1').get_text(strip=True) if info_div.find('h1') else None,
            'isbn_13': None,
            'isbn_10': None,
            'author': None,
            'edition': None,
            'binding': None,
            'publisher': None,
            'published': None,
        }

        # parse each <p> field
        for p in info_div.find_all('p'):
            text = p.get_text(strip=True)
            if text.startswith('ISBN-13:'):
                data['isbn_13'] = p.text.replace('ISBN-13:', '').strip()
            elif text.startswith('ISBN-10:'):
                data['isbn_10'] = p.text.replace('ISBN-10:', '').strip()
            elif text.startswith('Author:'):
                data['author'] = p.text.replace('Author:', '').strip()
            elif text.startswith('Edition:'):
                data['edition'] = p.text.replace('Edition:', '').strip()
            elif text.startswith('Binding:'):
                data['binding'] = p.text.replace('Binding:', '').strip()
            elif text.startswith('Publisher:'):
                data['publisher'] = p.text.replace('Publisher:', '').strip()
            elif text.startswith('Published:'):
                data['published'] = p.text.replace('Published:', '').strip()

        results.append(data)
        time.sleep(1)  # use a sleep time to be polite (avoid too many requests at once)

    except Exception as e:
        print(f"Error scraping ISBN {isbn}: {e}")
        continue

# convert to DataFrame and save
result_df = pd.DataFrame(results)
result_df.to_csv('../data/isbnsearch_results.csv', index=False, encoding='utf-8-sig')
print("Data saved to isbnsearch_results.csv")

### **→III. Bulk ISBN Lookup in Zotero**

**→ 1. Downloading Zotero**

[Zotero](https://www.zotero.org/) is a free and open-source reference management tool used by researchers to collect, organize, and cite sources. It also allows you to add books by ISBN, DOI, or other identifiers and automatically pulls metadata from a number of databases. For example, many of the books I retrieved via bulk ISBN import were matched through **WorldCat**.
You can [download Zotero on the official website](https://www.zotero.org/download/). 

**→2. Combine ISBNs into a single string to query in Zotero**
Zotero allows bulk look up of ISBNs, but I needed to put all the values in a single comma separated string in Python:

```python
import pandas as pd

df = pd.read_csv('../data/scannedResults.csv')
isbn_string = ",".join(df['CODECONTENT'].dropna().astype(str).tolist())
print(isbn_string)
```
I then copied the string into the input box for "Add Items by Identifier" in Zotero, and wait for Zotero to process and load the results.


**4. Export Library**

Once all books are successfully looked up, I exported my library into a csv named **zoteroExport.csv** and saved it in the data folder.
