# Import data from zbMath
Data in [zbMath Open](https://www.zbmath.org/) can be accessed through the [zbMath Open OAI-PMH](https://oai.zbmath.org/) service, that implements the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) [Schubotz and Teschke, 2021]. The service is open and subject to certain [terms and conditions](https://oai.zbmath.org/static/terms-and-conditions.html).

**Contrary to the documentation, the API always returns XML, not JSON.**

## Get a list of publication Id's
The API returns XML as text, no matter what.

In [1]:
import requests

API_URL='https://oai.zbmath.org/v1/' # the API endpoint
REQUEST_URL="{}?{}".format(API_URL, 'verb=ListIdentifiers&from=2022-01-01T00%3A00%3A00Z&metadataPrefix=oai_zb_preview')

headers = {'accept': 'text/xml'} # this has NO effect

response = requests.get(REQUEST_URL, headers)
with open('data/response.xml', 'w') as f:
    f.write(response.text)

NameError: name 'resp_text' is not defined

Parse the response into an XML tree, and put the result into a pandas dataframe.

In [None]:
import xml.etree.ElementTree as ET
import pandas as pd

def ns(tag_name):
    """Returns a fully qualified tag name"""
    oai_ns = 'http://www.openarchives.org/OAI/2.0/' # the OAI namespace
    return '{{{}}}{}'.format(oai_ns, tag_name)

#parse the tree, get a list of identifiers
tree = ET.parse('data/response.xml')
list_ids = tree.getroot().find(ns('ListIdentifiers'))
entries = list_ids.findall(ns('header'))

# put identifiers in a pandas dataframe
entries_df = pd.DataFrame(columns=['id'])
for entry in entries:
    entry_id = entry.find(ns('identifier')).text
    entries_df = entries_df.append({'id': entry_id}, ignore_index=True)
entries_df.head()

## References
M. Schubotz and O. Teschke, zbMATH Open: Towards standardized machine interfaces to expose bibliographic metadata. EMS Magazine 119, 50–53 (2021). https://euromathsoc.org/magazine/articles/mag-12

