# This notebook has definitions and functions used in common by the other notebooks

Data in [zbMath Open](https://www.zbmath.org/) can be accessed through the [zbMath Open OAI-PMH](https://oai.zbmath.org/) service, that implements the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [Schubotz and Teschke, 2021]. The service is open and subject to certain [terms and conditions](https://oai.zbmath.org/static/terms-and-conditions.html).

To load these functions into another notebook, do `%run definitions_and_functions.ipynb` inside a code cell, e.g. once at the top of the page.

## Common URLS, endpoints and namespaces

In [2]:
# API URLs
API_URL='https://oai.zbmath.org/v1/' # base URL of the API
LIST_IDENTIFIERS="{}?verb=ListIdentifiers&metadataPrefix=oai_zb_preview".format(API_URL) # ListIdentifiers endpoint
LIST_RECORDS="{}?verb=ListRecords&metadataPrefix=oai_zb_preview".format(API_URL) # ListRecords endpoint
GET_RECORD="{}?verb=GetRecord&metadataPrefix=oai_zb_preview".format(API_URL) # GetRecord endpoint
FILTER = '{}helper/filter?metadataPrefix=oai_zb_preview'.format(API_URL) # helper/filter endpoint

# API namespaces
OAI_NS = 'http://www.openarchives.org/OAI/2.0/' # the OAI namespace
OAI_ZB_PREVIEW_NS = 'https://zbmath.org/OAI/2.0/oai_zb_preview/'
ZBMATH_NS = 'https://zbmath.org/zbmath/elements/1.0/'

# text shown in zbMath Open when there's a license conflict
CONFLICT_TXT = 'zbMATH Open Web Interface contents unavailable due to conflicting licenses.'
# which tags to keep
TAGS = ['author', 'author_ids', 'document_title', 'source', 'classifications', 'keywords', 'doi', 'publication_year']

## Defines a function to handle namespaces and tag names

In [4]:
def ns(tag_name, namespace=OAI_NS):
    """
    Returns a fully qualified tag name.
    @param namespace URL of a namespace|None (OAI_NS is default)
    """
    return '{{{}}}{}'.format(namespace, tag_name)

## Defines a function to parse records XML into a python dict

In [5]:
import xml.etree.ElementTree as ET

def parse_record(xml_record, verbose=False):
    """
    Parse bibliographic record details from XML Element.
    @returns dict
    """
    new_entry = {}
    # zbMath identifier
    zb_id = xml_record.find(ns('header')).find(ns('identifier')).text 
    new_entry['id'] = zb_id
    # read tags
    zb_preview = xml_record.find(ns('metadata')).find(ns('zbmath', OAI_ZB_PREVIEW_NS))
    for tag in TAGS:
        value = zb_preview.find(ns(tag, ZBMATH_NS))
        if value is not None:
            if len(value):
                # element has children
                texts = []
                for child in value:
                    texts.append(child.text)
                text = ';'.join(texts) # multiple values are rendered as a semicolon-separated string
            else:
                # element content is a simple text
                text = zb_preview.find(ns(tag, ZBMATH_NS)).text
                
            if text == CONFLICT_TXT:
                # License conflict
                if verbose:
                    print('Licensing conflict for id "{}" tag "{}"'.format(zb_id, tag))
                return None
            
            new_entry[tag] = text
    return new_entry