# Collect BibTex for Conference Proceedings

This script uses the [CrossRef REST API](https://github.com/CrossRef/rest-api-doc) to randomly retrieve DOIs for conference articles. For each DOI, it calls the API again to retrieve the individual item record, transformed into BibTex. 

In [97]:
import requests
import urllib.request
import ijson

First, download 2000 random conference proceedings from the CrossRef API and retrieve the DOI for each entry. Retrieve these 200 at a time using the `rows` and `offset` parameters.

In [102]:
headers = {"User-Agent": "Virginia Tech DLRL (waingram@vt.edu)"}

rows = 200
offset = 0
dois = []
while rows + offset <= 2000:
    url = f'https://api.crossref.org/works?filter=type:proceedings-article&rows={rows}&offset={offset}'
    r = requests.get(url, stream=True, headers=headers)
    f = urllib.request.urlopen(url)
    objects = ijson.items(f, 'message.items.item')
    dois += [i["DOI"] for i in list(objects)]
    offset += rows
print(len(dois))

2000


For each DOI, call the CrossRef API again to get the BibTex. Save to a file. 

In [104]:
file_name = 'data/crossref_random_proceedings.bibtex'
with open(file_name, 'w') as fd:
    for doi in dois:
        url = f'http://api.crossref.org/works/{doi}/transform/application/x-bibtex'
        r = requests.get(url, headers=headers)
        if r.ok:
            fd.write(r.content.decode('utf-8') + '\n')