# DOI Negotiation

<div class="alert alert-warning" role="alert" style="margin: 10px">
Librerías que usaremos:<br>
requests<br>
json
</div>

In [1]:
import requests
import json

## 1 Introduction

DOIs provide a persistent link to content. They identify many types of work, from journal articles to research data sets. Typically, someone interacting with DOIs will be a researcher, who will resolve DOIs found in scholarly references to content using a DOI resolver. Such researchers may not even realise they are using DOIs and a DOI resolver since they may follow links with embedded DOIs.

Yet DOIs can provide more than a permanent, indirect link to content. DOI registration agencies such as CrossRef, DataCite and mEDRA collect bibliographic metadata about the works they link to. This metadata can be retrieved from a DOI resolver too, using content negotiation to request a particular representation of the metadata.

For some DOIs content negotiation can be used to retrieve different representations of a work. For example, some DataCite DOIs identify data sets that may be available in a number of data formats and container formats.



## 2 Redirection

The DOI resolver at doi.org will normally redirect a user to the resource location of a DOI. For example, the DOI "10.1126/science.169.3946.635" redirects to a landing page describing the article, "The Structure of Ordinary Water". Content negotiated requests to doi.org that ask for a content type which isn't "text/html" will be redirected to a metadata service hosted by the DOI's registration agency. CrossRef, DataCite and mEDRA support content negotiated DOIs via https://data.crossref.org, https://data.datacite.org and http://data.medra.org respectively.

<div class="alert alert-warning" role="alert" style="margin: 10px">

       GET "Accept: text/html"
https://doi.org/10.1126/science.169.3946.635<br>

                   |<br>
                   |<br>
                   |<br>
                   V<br>
<br>
       Publisher landing page
https://www.sciencemag.org/content/169/3946/635
</div>

Normal browser requests or explicit requests for text/html redirect to the content's landing page.

<div class="alert alert-warning" role="alert" style="margin: 10px">

             GET "Accept: application/rdf+xml"
https://doi.org/10.1126/science.169.3946.635<br>

                   |<br>
                   |<br>
                   |<br>
                   V<br>
<br>
CrossRef metadata service
http://data.crossref.org/10.1126/science.169.3946.635
</div>

Requests for a data type redirect to a registration agency's metadata service.

## 3 What is Content Negotiation?

Content negotiation allows a user to request a particular representation of a web resource. DOI resolvers use content negotiation to provide different representations of metadata associated with DOIs.

A content negotiated request to a DOI resolver is much like a standard HTTP request, except server-driven negotiation will take place based on the list of acceptable content types a client provides.

### 3.1 The Accept Header

Making a content negotiated request requires the use of a HTTP header, "Accept". Content types that are acceptable to the client (those that it knows how to parse), each with an optional "quality" value indicating its relative suitability. For example, a client that wishes to receive citeproc JSON if it is available, but which can also handle RDF XML if citeproc JSON is unavailable, would make a request with an Accept header listing both "application/citeproc+json" and "application/rdf+xml":

In [2]:
import requests

url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
r = requests.get(url) #GET without headers
print(r.status_code)
print(r.text)


200
<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE html>
<html class="pl"
  lang="en"
  dir="ltr" version="HTML+RDFa+MathML 1.1"
  xmlns="http://www.w3.org/1999/xhtml"
  xmlns:content="http://purl.org/rss/1.0/modules/content/"
  xmlns:dc="http://purl.org/dc/terms/"
  xmlns:foaf="http://xmlns.com/foaf/0.1/"
  xmlns:og="http://ogp.me/ns#"
  xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
  xmlns:sioc="http://rdfs.org/sioc/ns#"
  xmlns:sioct="http://rdfs.org/sioc/types#"
  xmlns:skos="http://www.w3.org/2004/02/skos/core#"
  xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
  xmlns:mml="http://www.w3.org/1998/Math/MathML">
  <head prefix="og: http://ogp.me/ns# article: http://ogp.me/ns/article# book: http://ogp.me/ns/book#">
    <link type="text/css" rel="stylesheet" href="https://science.sciencemag.org/sites/default/files/advagg_css/css__F4-OCWbNhE6xeJ7aSfPhPCmzR85_Y1I0voq8bwkMa20__g7o-5hEqFktDmV4N-91Sqyv0AROzwVdTtOVfHgJrQ3E__LV0Cr1Sz-DtUjju_GHS6PqcGO-LafuNW-zcXCeDAoB0.css" media="all" />

In [3]:
url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'application/rdf+xml;q=0.5'} #Type of response accpeted
r = requests.get(url, headers=headers) #GET with headers
print(r.status_code)
print(r.text)

200
<rdf:RDF
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:j.0="http://purl.org/dc/terms/"
    xmlns:j.1="http://prismstandard.org/namespaces/basic/2.1/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:j.2="http://purl.org/ontology/bibo/"
    xmlns:j.3="http://xmlns.com/foaf/0.1/">
  <rdf:Description rdf:about="http://dx.doi.org/10.1126/science.169.3946.635">
    <j.1:startingPage>635</j.1:startingPage>
    <owl:sameAs rdf:resource="doi:10.1126/science.169.3946.635"/>
    <owl:sameAs rdf:resource="info:doi/10.1126/science.169.3946.635"/>
    <j.0:identifier>10.1126/science.169.3946.635</j.0:identifier>
    <j.0:publisher>American Association for the Advancement of Science (AAAS)</j.0:publisher>
    <j.0:creator>
      <j.3:Person rdf:about="http://id.crossref.org/contributor/h-s-frank-3new7r2ulpnaj">
        <j.3:name>H. S. Frank</j.3:name>
        <j.3:familyName>Frank</j.3:familyName>
        <j.3:givenName>H. S.</j.3:givenName>
      </j.3:Person>
  

## JSON - JavaScript Object Notation
When exchanging data between a browser and a server, the data can only be text.

JSON is text, and we can convert any JavaScript object into JSON, and send JSON to the server.

Similar to XML

We can also convert any JSON received from the server into JavaScript objects.

This way we can work with the data as JavaScript objects, with no complicated parsing and translations.

Different python libraries are oriented to manage JSON objects or files, and the information can be parsed easily. From the previous request, we can get the answer in JSON format and store it in a variable:

In [11]:
url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'application/json'} #Type of response accpeted
r = requests.get(url, headers=headers) #GET with headers
print("Status code: %s" % r.status_code) #200 means that the resource exists
print("Response: %s" % r.text)

Status code: 200
Response: {"indexed":{"date-parts":[[2019,5,31]],"date-time":"2019-05-31T13:23:10Z","timestamp":1559308990636},"reference-count":0,"publisher":"American Association for the Advancement of Science (AAAS)","issue":"3946","content-domain":{"domain":[],"crossmark-restriction":false},"published-print":{"date-parts":[[1970,8,14]]},"DOI":"10.1126\/science.169.3946.635","type":"article-journal","created":{"date-parts":[[2006,10,5]],"date-time":"2006-10-05T12:56:56Z","timestamp":1160053016000},"page":"635-641","source":"Crossref","is-referenced-by-count":73,"title":"The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance","prefix":"10.1126","volume":"169","author":[{"given":"H. S.","family":"Frank","sequence":"first","affiliation":[]}],"member":"221","container-title":"Science","original-title":[],"language":"en","link":[{"URL":"https:\/\/syndication.highwire.org\/content\/doi\/10.1126\/science.169.3946.635","conte

### 3.2 Response Codes

Code	Meaning<br>
200	The request was OK.<br>
204	The request was OK but there was no metadata available.<br>
404	The DOI requested doesn't exist.<br>
406	Can't serve any requested content type.<br>

Individual metadata services may utilise additional response codes but they will always use the response codes above in event of the case described.

If multiple content types specified by the client are supported by a DOI then the content type with the highest "q" value (or, if no "q" values are specified, the one that appears first in the "accept" header) will be returned.



After ask for a json response, if we get a 200, we can transform that received text into JSON

In [12]:
data = json.loads(r.text) #Data is now a json object
print(data)

{'indexed': {'date-parts': [[2019, 5, 31]], 'date-time': '2019-05-31T13:23:10Z', 'timestamp': 1559308990636}, 'reference-count': 0, 'publisher': 'American Association for the Advancement of Science (AAAS)', 'issue': '3946', 'content-domain': {'domain': [], 'crossmark-restriction': False}, 'published-print': {'date-parts': [[1970, 8, 14]]}, 'DOI': '10.1126/science.169.3946.635', 'type': 'article-journal', 'created': {'date-parts': [[2006, 10, 5]], 'date-time': '2006-10-05T12:56:56Z', 'timestamp': 1160053016000}, 'page': '635-641', 'source': 'Crossref', 'is-referenced-by-count': 73, 'title': 'The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance', 'prefix': '10.1126', 'volume': '169', 'author': [{'given': 'H. S.', 'family': 'Frank', 'sequence': 'first', 'affiliation': []}], 'member': '221', 'container-title': 'Science', 'original-title': [], 'language': 'en', 'link': [{'URL': 'https://syndication.highwire.org/content/doi/1

In order to know the different elements in the JSON, we can run a loop:
<div class="alert alert-warning" role="alert" style="margin: 10px">
Remember "tags" in XML?
</div>

In [13]:
for elem in data:
    print(elem)

indexed
reference-count
publisher
issue
content-domain
published-print
DOI
type
created
page
source
is-referenced-by-count
title
prefix
volume
author
member
container-title
original-title
language
link
deposited
score
subtitle
short-title
issued
references-count
journal-issue
URL
relation
ISSN
container-title-short


For getting the value, it works like "dictionary" in python (Key-Value)

In [15]:
data['URL']

'http://dx.doi.org/10.1126/science.169.3946.635'

You can also print both keys and values to know the JSON structure

In [16]:
for elem in data:
    print(elem,":", data[elem])

indexed : {'date-parts': [[2019, 5, 31]], 'date-time': '2019-05-31T13:23:10Z', 'timestamp': 1559308990636}
reference-count : 0
publisher : American Association for the Advancement of Science (AAAS)
issue : 3946
content-domain : {'domain': [], 'crossmark-restriction': False}
published-print : {'date-parts': [[1970, 8, 14]]}
DOI : 10.1126/science.169.3946.635
type : article-journal
created : {'date-parts': [[2006, 10, 5]], 'date-time': '2006-10-05T12:56:56Z', 'timestamp': 1160053016000}
page : 635-641
source : Crossref
is-referenced-by-count : 73
title : The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance
prefix : 10.1126
volume : 169
author : [{'given': 'H. S.', 'family': 'Frank', 'sequence': 'first', 'affiliation': []}]
member : 221
container-title : Science
original-title : []
language : en
link : [{'URL': 'https://syndication.highwire.org/content/doi/10.1126/science.169.3946.635', 'content-type': 'unspecified', 'cont

## 4 Formatted Citations

CrossRef, DataCite and similar services support formatted citations via the text/bibliography content type. These are the output of the Citation Style Language processor, citeproc-js. The content type can take two additional parameters to customise its response format. A "style" can be chosen from the list of style names found in the CSL style repository. Many styles are supported, including common styles such as apa and harvard3:

In [4]:
url = "https://doi.org/10.1126/science.169.3946.635" #DOI solver URL
headers = {'Accept': 'text/x-bibliography; style=bibtex'} #Type of response accpeted
r = requests.get(url, headers=headers) #POST with headers
print(r.status_code)
print(r.text)

200
 @article{Frank_1970, title={The Structure of Ordinary Water: New data and interpretations are yielding new insights into this fascinating substance}, volume={169}, ISSN={1095-9203}, url={http://dx.doi.org/10.1126/science.169.3946.635}, DOI={10.1126/science.169.3946.635}, number={3946}, journal={Science}, publisher={American Association for the Advancement of Science (AAAS)}, author={Frank, H. S.}, year={1970}, month={Aug}, pages={635â641}}



### Let's try with a DOI at Zenodo

In [5]:
url = "https://doi.org/10.5281/zenodo.842715" #DOI solver URL
headers = {'Accept': 'application/vnd.citationstyles.csl+json;q=1.0'} #Type of response accpeted
r = requests.post(url, headers=headers) #POST with headers
print(r.status_code)
print(r.text)

200
{
  "type": "dataset",
  "id": "https://doi.org/10.5281/zenodo.842715",
  "categories": [
    "Cuerda del Pozo",
    "Reservoir",
    "Freshwater",
    "Water Quality",
    "AMT",
    "beginDate:'2010-01-01'",
    "endDate:'2010-12-31'",
    "location:'CdP'",
    "attributeLabel:'Temp'",
    "attributeLabel:'Press'",
    "attributeLabel:'Cond'",
    "attributeLabel:'Salinity'",
    "attributeLabel:'DO'",
    "attributeLabel:'rawO2'",
    "attributeLabel:'OxySat'",
    "attributeLabel:'ph'",
    "attributeLabel:'redox'",
    "gnd:2010-01-01"
  ],
  "author": [
    {
      "family": "Aguilar",
      "given": "Fernando"
    },
    {
      "family": "Marco",
      "given": "Jesús"
    },
    {
      "family": "Monteoliva",
      "given": "Agustín"
    }
  ],
  "issued": {
    "date-parts": [
      [
        2017,
        8,
        14
      ]
    ]
  },
  "abstract": "AMT data from Cuerda del Pozo Reservoir in 2010. It includes: Temperature, Pressure, Conductivity, Dissolved Oxygen, ra

## Exercise 1
Show title and description

In [24]:
data = json.loads(r.text)
print("Title: " + data['title'])
print("Description: " + data['abstract'])

Title: Amt Cuerda Del Pozo 2010
Description: AMT data from Cuerda del Pozo Reservoir in 2010. It includes: Temperature, Pressure, Conductivity, Dissolved Oxygen, raw O2, Oxygen saturation, ph and redex values.


# Solving PIDs
With Handle

By default, handle redirects you to the URL field in PID

In [36]:
import requests

url = "http://hdl.handle.net/1895.22/1013" #PID solver URL
r = requests.get(url) #GET
print(r.status_code)
print(r.text)

200
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
<META NAME="Generator" CONTENT="Microsoft Word 97">
<TITLE>License for PYTHON 1.6.1</TITLE>
</HEAD>
<BODY LINK="#0000ff" VLINK="#800080">

<FONT FACE="Century Schoolbook">
<P>&nbsp;</P>
<P ALIGN="CENTER">PYTHON 1.6.1</P>
<P ALIGN="CENTER">PYTHON 1.6.1 LICENSE AGREEMENT<BR>
</P>
<P>1. This LICENSE AGREEMENT is between the Corporation for National Research Initiatives, having an office at 1895 Preston White Drive, Reston, VA 20191 ("CNRI"), and the Individual or Organization ("Licensee") accessing and otherwise using Python 1.6.1 software in source or binary form and its associated documentation.</P>

<P>2. Subject to the terms and conditions of this License Agreement, CNRI hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 1.6.1 alone or in any derivat

The handle System has different options that we can manage:

http://www.handle.net/proxy_servlet.html

For example, we can tell the server not to redirect to URL field:

In [25]:
import requests
import json

url = "http://hdl.handle.net/1895.22/1013?noredirect" #PID URL with ?noredirect
r = requests.get(url) #GET with headers
print(r.status_code)
print(r.text)

200
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

<html><head><title>Handle Proxy</title></head>

<body bgcolor="#ffffff">

<a href="http://www.handle.net">
<img src="/static/images/res_tool.gif" width="270" height="40" alt="Handle.net Logo" border=0></a>

<table width="100%">
<tbody>
<tr><th colspan="4" align="left" bgcolor="#dddddd">Handle Values for: 1895.22/1013</th></tr>
<tr><td align="left" valign="top">Index</td><td align="left" valign="top">Type</td><td align="left" valign="top">Timestamp</td><td align="left" valign="top">Data</td></tr>
<tr bgcolor="#dddddd"><td align="left" valign="top"><b>100</b></td><td align="left" valign="top"><b><a href="http://hdl.handle.net/0.TYPE/HS_ADMIN">HS_ADMIN</a></b></td><td valign="top"><span style='white-space:nowrap'>2015-04-03&nbsp;21:20:10Z</span></td>
<td>handle=200/1; index=300; [create hdl,delete hdl,create derived prefix,delete derived prefix,read val,modify val,del val,add val,modify admin,del admin,add admin,list]</t

### Query Parameters

This proxy server system REST API is CORS-compliant, however, JSONP callbacks are also supported using a "callback" query parameter.

The presence of the "pretty" query parameter instructs the server to pretty-print the JSON output.

The "auth" query parameter instructs the proxy server to bypass its cache and query a primary handle server directly for the newest handle data.

The "cert" query parameter instructs the proxy server to request an authenticated response from the source handle server. Not generally needed by end users.

The "type" and "index" query parameters allow the resolution response to be restricted to specific types and indexes of interest. "Type" is the key defined by the user to store a metadata term. "Index" is a number associated to that term. Multiple "type" and "index" parameters are allowed and values are returned which match any of the specified types or indexes. For example,

For example, http://hdl.handle.net/api/handles/4263537/4000?type=URL&type=EMAIL&callback=processResponse yields the response

```JSON
processResponse({
   "responseCode":1,
   "handle":"4263537/4000",
   "values":[
      {
         "index":1,
         "type":"URL",
         "data":{ "format":"string", "value":"http://www.handle.net/index.html" },
         "ttl":86400,
         "timestamp":"2001-11-21T16:21:35Z"
      },
      {
         "index":2,
         "type":"EMAIL",
         "data":{ "format":"string", "value":"hdladmin@cnri.reston.va.us" },
         "ttl":86400,
         "timestamp":"2000-04-10T22:41:46Z"
      }
   ]
});
```

<div class="alert alert-warning" role="alert" style="margin: 10px">
Recuerda!<br>
Si no indicas el Content-type, el servidor actuará como si recibiera una petición por navegador, devolviendo un html
</div>

In [123]:
import requests
import json

url = "http://hdl.handle.net/api/handles/4263537/4000?type=URL&type=EMAIL&callback=processResponse" #PID URL with ?noredirect
headers = {'Content-Type': 'application/json'} #Type of response accpeted
r = requests.get(url, headers=headers) #POST with headers
print(r.text) 


processResponse({"responseCode":1,"handle":"4263537/4000","values":[{"index":1,"type":"URL","data":{"format":"string","value":"http://www.handle.net/index.html"},"ttl":86400,"timestamp":"2015-04-03T21:20:22Z"},{"index":2,"type":"EMAIL","data":{"format":"string","value":"hdladmin@cnri.reston.va.us"},"ttl":86400,"timestamp":"2015-04-03T21:20:22Z"}]});


# Exercise 2

* 1: Try to find *TWO* different repositories that manage PIDs or DOIs (e.g journals, figshare.com, DataONE, zenodo, digital.CSIC, etc.)
* 2: Check an example file or the documentation to get the identifier from a resource
* 3: Resolve the identifier using python request
* 4: Show relevant information, like author, title, description...
* 5: Which kind of "types" are defined in the PIDs?
* 6: Is there any difference managing the DOIs/PIDs between the different repositories? (textual answer)

In [152]:
# First repository: Datacite API (https://support.datacite.org/docs/api-get-doi) -> Gives us a JSON object
# Second repository: Arxiv API (https://arxiv.org/) -> Gives us a XML response

import requests
import json
import xml.etree.ElementTree as ET

url1 = "https://api.datacite.org/dois/10.5438/0014"
url2 = "http://export.arxiv.org/api/query?id_list=1904.00010"

headers = {'Content-Type': 'application/json'}

r1 = requests.get(url1, headers=headers)
r2 = requests.get(url2, headers=headers)
data1 = json.loads(r1.text)
data2 = ET.ElementTree(ET.fromstring(r2.text))

#print("Response code 1: " + str(r1.text))
#print("Response code 2: " + str(data2))

print("\n====== FIRST RESOURCE ======")
print("Id: " + data1['data']['id'])
print("\nTitle: " + str(data1['data']['attributes']['titles'][0]['title']))
print("\nPublisher: " + data1['data']['attributes']['publisher'])
print("\nPublication year: " + str(data1['data']['attributes']['publicationYear']))

print("\nList of contributors: ")
contributors = data1['data']['attributes']['contributors']
for contributor in contributors:
    print(contributor['name'])
    
print("\nAll tags: ")
for tag in data1['data']['attributes']:
    print(tag)

print("\n====== SECOND RESOURCE ======")
namespaces = {'w3': 'http://www.w3.org/2005/Atom', 'arxiv': 'http://arxiv.org/schemas/atom'}

print("Title(s) :")
for element in data2.findall('.//w3:title',namespaces):
    print(str(element.text))
    
print("\nSummary :")
for element in data2.findall('.//w3:summary',namespaces):
    print(str(element.text))
    
print("Author(s) :")
for element in data2.findall('.//w3:name',namespaces):
    print(str(element.text))
    
print("\nDOI :")
for element in data2.findall('.//arxiv:doi',namespaces):
    print(str(element.text))
    
print("\nAll tags :")
for element in data2.iter():
    print(str(element.tag))


Id: 10.5438/0014

Title: DataCite Metadata Schema Documentation for the Publication and Citation of Research Data v4.1

Publisher: DataCite

Publication year: 2017

List of contributors: 
Smaele, Madeleine de
Starr, Joan
Ashton, Jan
Barton, Amy
Birt, Noris
Dietiker, Stefanie
Elliot, Jannean
Fenner, Martin
Hugo, Wim
Jakobsson, Stefan
Bernal Martínez, Isabel
Rücknagel, Jessica
Yahia, Mohamed
Ziedorn, Frauke
Zolly, Lisa

All tags: 
doi
prefix
suffix
identifiers
creators
titles
publisher
container
publicationYear
subjects
contributors
dates
language
types
relatedIdentifiers
sizes
formats
version
rightsList
descriptions
geoLocations
fundingReferences
xml
url
contentUrl
metadataVersion
schemaVersion
source
isActive
state
reason
created
registered
published
updated

Title(s) :
ArXiv Query: search_query=&id_list=1904.00010&start=0&max_results=10
Deconfined quantum critical point in one dimension

Summary :
  We perform a numerical study of a spin-1/2 model with $\mathbb{Z}_2 \times
\mathbb{

We do observe a large difference with the two repositories in the sense that the http request do not give us back the same kind of object, even when using the same header. The Datacite API is giving us a JSON object (or at least, a string that can be parsed into a JSON object) while the arxiv API gives us back a XML object that we can read using techniques seen during the previous classes.