# Poking the DataCite API with Python

This notebook can be used for minting and deleting DataCite DOIs with the DataCite Member API. To execute a code cell, select it and click the play (black triangle) button in the toolbar. You can also use `Ctrl + Enter`.

Depending on the platform that you're running this notebook on, execution of code cells - especially those that make a request of the DataCite API - may take some time. Be patient, and in bulk operations you might like to get up and do some stretches or get a coffee.

## Initial setup

Create a copy of `datacite-api-config.example.json` named `datacite-api-config.json` and modify it to contain your DataCite Member API username and password. For example:
```json
{
    "username" : "myusername",
    "password" : "mypassword"
}
```

By default this notebook runs on the test API. Change the line `use_test_api = True` to `use_test_api = False` to use the live API.

Then execute the following code cell to perform initial setup. You swill need to execute this code cell once at the beginning of every session.

In [None]:
import csv
import datetime
import glob
import json
import os
import requests
import urllib

if not os.path.isfile('doilog.csv'):
    with open('doilog.csv', 'w') as csvfile:
        logwriter = csv.writer(csvfile)
        logwriter.writerow([ 'timestamp', 'username', 'filename', 'doi', 'action' ])

with open('datacite-api-config.json') as f:
    data = json.load(f)
    username = data["username"]
    password = data["password"]

use_test_api = True

if use_test_api:
    api_endpoint = 'https://api.test.datacite.org/dois'
else:
    api_endpoint = 'https://api.datacite.org/dois'

# Bulk

## Mint bulk Draft DOIs

The following code cell will go through all files in the `bulk-mint` directory and attempt to mint a DOI for each one. The DataCite API will reject anything that is not a valid JSON file containing DataCite metadata.

Put multiple metadata.json files in the `bulk-mint` directory, and then execute the following code cell.

In [None]:
path = 'template-output-prod'
url = api_endpoint
headers = {
    'Content-Type': 'application/vnd.api+json',
}
print('Bulk minting DOIs for files in ' + path)
dois = []

with os.scandir(path) as it:
    for entry in it:
        if not entry.name.startswith('.') and entry.is_file():
            data = open(path + '/' + entry.name)
            print('Attempting to mint DOI for ' + path + '/' + entry.name)
            response = requests.request('POST', url, auth=(username, password), data = data, headers = headers)
            if (response.headers['Status'] == '201 Created'):
                doi = json.loads(response.text)['data']['id']
                timestamp = datetime.datetime.now().replace(microsecond=0).astimezone().isoformat()
                print(doi + ' minted')
                dois.append(doi)
                with open('doilog.csv', 'a') as csvfile:
                    logwriter = csv.writer(csvfile)
                    logwriter.writerow([ timestamp, username, path + '/' + entry.name, doi , 'created' ])
            else:
                print('DOI not minted')
                print(response.headers)
print('Done')

## Delete bulk Draft DOIs

The following code cell will attempt to delete all of the DOIs specified in the dois array, which is automatically filled by the bulk minting process.

If you want to specify a list of DOIs to delete, put them in the list in the second cell e.g. `dois = ['10.80335/sr34-9h64', '10.80335/5375-5t54', '10.80335/2r04-6k46', '10.80335/drgg-dp97', '10.80335/7yky-cd07']`. Then, execute the second code cell before executing the first code cell.

In [None]:
print('Bulk deleting DOIs in list')
for doi in dois:
    url = api_endpoint + '/' + urllib.parse.quote_plus(doi)
    print('Attempting to delete ' + doi)
    response = requests.request('DELETE', url, auth=(username, password), headers=headers)
    if (response.headers['Status'] == '204 No Content'):
        timestamp = datetime.datetime.now().replace(microsecond=0).astimezone().isoformat()
        # timestamp = response.headers['Date']
        print(doi + ' deleted')
        with open('doilog.csv', 'a') as csvfile:
            logwriter = csv.writer(csvfile)
            logwriter.writerow([ timestamp, username, '', doi , 'deleted' ])
    else:
        print(doi + ' not deleted')
        print(response.headers)
print('Done')

In [None]:
dois = ['10.25813/XXAA1DXP7Z', '10.25813/XXAA1NGBPY', '10.25813/XXAA1PDTJM', '10.25813/XXAA55TEJV', '10.25813/XXAA7H5EMR', '10.25813/XXAA999888', '10.25813/XXAA999889', '10.25813/XXAA999999', '10.25813/XXAA9AH7X4', '10.25813/XXAAAYYUJD', '10.25813/XXAAC8I3CU', '10.25813/XXAACAXKKN', '10.25813/XXAACGSIA0', '10.25813/XXAAFCNGO6', '10.25813/XXAAFUNWPC', '10.25813/XXAAGPVEKV', '10.25813/XXAAHMY4H7', '10.25813/XXAAJ1LHLA', '10.25813/XXAAJ7HM6G', '10.25813/XXAAJIWVHW', '10.25813/XXAALLMCAB', '10.25813/XXAALNYMIO', '10.25813/XXAAMQXQWO', '10.25813/XXAAN3CSAU', '10.25813/XXAANFC4UR', '10.25813/XXAANLNAUC', '10.25813/XXAANV39JY', '10.25813/XXAAOA1PVQ', '10.25813/XXAAP5893X', '10.25813/XXAAPKPOEI', '10.25813/XXAAPQ6PDC', '10.25813/XXAATVMR4Z', '10.25813/XXAAU6PX8G', '10.25813/XXAAU8BIAC', '10.25813/XXAAV7XLNX', '10.25813/XXAAVDUFBT', '10.25813/XXAAVNRADO', '10.25813/XXAAVYDJWF', '10.25813/XXAAXIW4A3', '10.25813/XXAAXJ3JR9', '10.25813/XXAAXMHJB2', '10.25813/XXAAZQSM8Z']

## Register, publish, or hide bulk Draft DOIs

The following code cell will attempt to register, publish, or hide all of the DOIs specified in the dois array, which is automatically filled by the bulk minting process.

Once in Registered or Findable state, a DOI can't be set back to Draft state. This also means that once in Registered or Findable state, a DOI *cannot be deleted*. This is serious, mum.

The only option for removing a Findable DOI from the public record is to hide it.

In [None]:
action = 'register' # Set this to register, publish, or hide

payload = '{\"data\":{\"attributes\":{\"event\":\"' + action + '\"}}}'
headers = {
    'Content-Type': 'application/vnd.api+json',
}
print('Bulk ' + action + ' DOIs in list')
for doi in dois:
    url = api_endpoint + '/' + urllib.parse.quote_plus(doi)
    print('Attempting to ' + action + ' ' + doi)
    response = requests.request('PUT', url, auth=(username, password), data = payload, headers=headers)
    if (response.headers['Status'] == '200 OK'):
        timestamp = datetime.datetime.now().replace(microsecond=0).astimezone().isoformat()
        # timestamp = response.headers['Date']
        print(doi + ' ' + action + ' successful')
        with open('doilog.csv', 'a') as csvfile:
            logwriter = csv.writer(csvfile)
            logwriter.writerow([ timestamp, username, '', doi , action ])
    else:
        print(doi + ' ' + action + ' unsuccessful')
        print(response.headers)
print('Done')

## Bulk download DOI metadata

The following code cell will attempt to download the metadata for the DOIs specified in the array and save the metadata to JSON files in the `downloaded` directory. Subdirectories for each prefix will be created, and files will be named with the suffix. For example `downloaded/10.80335/1337.json`.

One use for this downloaded metadata is for a baseline to use when updating DOIs with new metadata. If you are going to use the JSON for this purpose, it is recommended that you delete metadata fields from the JSON if you are *not* going to update them. The next section contains a code block that will delete metadata fields that you should avoid changing.

In [None]:
dois = ['10.80335/xxaa1dxp7z']

path = 'downloaded'
if not os.path.exists(path):
    os.mkdir(path)

url = api_endpoint
headers = {
    'Content-Type': 'application/vnd.api+json',
}
print('Bulk downloading metadata for DOIs in array')
for doi in dois:
    url = api_endpoint + '/' + doi
    print('Getting metadata for ' + doi)
    response = requests.request("GET", url, auth=(username, password), headers = headers)
    data = json.loads(response.text)
    prefix = doi.split('/')[0]
    suffix = doi.split('/')[1]
    print('Writing to ' + path + '/'+ prefix + '/' + suffix + '.json')
    if not os.path.exists(path + '/'+ prefix):
        os.mkdir(path + '/'+ prefix)
    with open(path + '/'+ prefix + '/' + suffix + '.json', 'w+') as jsonfile:
        json.dump(data, jsonfile, indent=3)
print('Done')

## Sanitise downloaded JSON

The following code will go through all the downloaded JSON files and delete the metadata fields you should avoid trying to update.

You can add or remove items between the points specified.

In [None]:
path = 'downloaded'

print('Sanitising downloaded JSON')

for filename in glob.iglob(path + '/**', recursive=True):
    if os.path.isfile(filename) and filename.endswith('.json'): # filter dirs
        print('Processing '+ filename)
        with open(filename) as file:
            data = json.load(file)

# Add or remove items after this point. Use a # character before a line to stop that line of code from running.

        data['data'].pop('relationships', None)
        data['data']['attributes'].pop('xml', None)
        data['data']['attributes'].pop('contentUrl', None)
        data['data']['attributes'].pop('metadataVersion', None)
        data['data']['attributes'].pop('schemaVersion', None)
        data['data']['attributes'].pop('source', None)
        data['data']['attributes'].pop('isActive', None)
        data['data']['attributes'].pop('state', None)
        data['data']['attributes'].pop('reason', None)
        data['data']['attributes'].pop('landingPage', None)
        data['data']['attributes'].pop('viewCount', None)
        data['data']['attributes'].pop('viewsOverTime', None)
        data['data']['attributes'].pop('downloadCount', None)
        data['data']['attributes'].pop('downloadsOverTime', None)
        data['data']['attributes'].pop('referenceCount', None)
        data['data']['attributes'].pop('citationCount', None)
        data['data']['attributes'].pop('citationsOverTime', None)
        data['data']['attributes'].pop('partCount', None)
        data['data']['attributes'].pop('partOfCount', None)
        data['data']['attributes'].pop('versionCount', None)
        data['data']['attributes'].pop('versionOfCount', None)
        data['data']['attributes'].pop('created', None)
        data['data']['attributes'].pop('registered', None)
        data['data']['attributes'].pop('published', None)
        data['data']['attributes'].pop('updated', None)
        
# Add or remove items before this point
        
        with open(filename, 'w') as file:
            json.dump(data, file, indent=3)
print('Done')

## Bulk update DOIs

The following code cell will attempt to update the metadata of all of the DOIs in the `bulk-mint` directory.

In [None]:
path = 'bulk-mint'
action = 'update'
url = api_endpoint
headers = {
    'Content-Type': 'application/vnd.api+json',
}
print('Bulk updating metadata for files in ' + path)
dois = []
with os.scandir(path) as it:
    for entry in it:
        if not entry.name.startswith('.') and entry.is_file():
            print('Attempting to upload metadata for ' + entry.name)
            data = open(path + '/' + entry.name)
            metadata = json.load(data)
            doi = metadata['data']['id']
            print('DOI on record is ' + doi)
            url = api_endpoint + '/' + doi
            response = requests.request('PUT', url, auth=(username, password), data = json.dumps(metadata), headers = headers)
            if (response.headers['Status'] == '200 OK'):
                timestamp = datetime.datetime.now().replace(microsecond=0).astimezone().isoformat()
                # timestamp = response.headers['Date']
                print(doi + ' ' + action + ' successful')
                with open('doilog.csv', 'a') as csvfile:
                    logwriter = csv.writer(csvfile)
                    logwriter.writerow([ timestamp, username, '', doi , action ])
            else:
                print(doi + ' ' + action + ' unsuccessful')
                print(response.headers)
print('Done')

## Troubleshooting

If you are trying to work out why your DOI was not minted or deleted, execute the following code cell.

In [None]:
print(response.headers)
print('\n')
print(response.text)
json.loads(response.text)