# University of Southampton Digital Library OAI-PMH API

This notebook demonstrates how to obtain basic catalogue records from the University of Southampton Digital Library using our OAI-PMH endpoint. OAI-PMH stands for [Open Archive Initiative Protocol for Metadata Harvesting](https://www.openarchives.org/OAI/openarchivesprotocol.html) and specifies a standardised way to obtain records from a repository. 

In [4]:
import xml.etree.ElementTree as ET
import subprocess
import sys

# Ensure 'requests' is installed and imported
try:
    import requests
except ImportError:
    subprocess.check_call([sys.executable, "-m", "pip", "install", "requests"])
    import requests

In [6]:
# let's create a function that takes the base url for the endpoint, a metadata prefix specifying the metadata standard
# we want the records to be in (Dublin Core in this instance) and a max number of pages of results (in case we want to limit
# how many pages of results we get back).

def harvest_oai_records(base_url, metadata_prefix="oai_dc", max_pages=None):
    records = []
    params = {
        "verb": "ListRecords",
        "metadataPrefix": metadata_prefix
    }
    page_count = 0
    
    while True:
        response = requests.get(base_url, params=params)
        if response.status_code != 200:
            print(f'Error: HTTP {response.status_code}')
            break
            
        root = ET.fromstring(response.content)
        records.extend(root.findall(".//{http://www.openarchives.org/OAI/2.0/}record"))
        
        token_element = root.find(".//{http://www.openarchives.org/OAI/2.0/}resumptionToken")
        if token_element is not None and token_element.text:
            params = {
                "verb": "ListRecords",
                "resumptionToken": token_element.text
            }
            page_count += 1
            if max_pages is not None and page_count >= max_pages:
                break
        else:
            break
    return records

Let's call the function to get 2 pages of results and then print out how many records we have.

In [7]:
base_url = "https://digital.epexio.com/uosViewer/repositories/records/oai"
records = harvest_oai_records(base_url, max_pages=2)

print(f"Retrieved {len(records)} records.")


Retrieved 400 records.
