Ecole des Chartes implementation demo
===

The Ecole des Chartes implementation runs on Capitains.org software suite using [Nautilus](http://github.com/capitains/nautilus).

A nice API browsing interface has been provided by Julien Pilla : http://dev.chartes.psl.eu/dts-demo/ (Source code : https://github.com/chartes/dts-demo )

## Configuration

The following cell are used to avoid rewriting to much cells if the address of the DTS API were to change.


In [1]:
import requests
from urllib.parse import urljoin

URI = "http://dev.chartes.psl.eu/api/nautilus/dts"

## Getting the available endpoints

DTS entry point is a listing of the available endpoints and their URL. This means that for each implementation of DTS, this single URL will give you all the information you need to perform arbitrary queries, if you were to do so. The Ecole des Chartes Capitains implementation provide the three endpoints :

In [2]:
entry_request = requests.get(URI)
ENDPOINTS = entry_request.json()
ENDPOINTS

{'navigation': '/api/nautilus/dts/navigation',
 'documents': '/api/nautilus/dts/document',
 '@context': 'dts/EntryPoint.jsonld',
 '@id': '/api/nautilus/dts',
 '@type': 'EntryPoint',
 'collections': '/api/nautilus/dts/collections'}

As you can see, all three endpoints have been given URIs. Because we do not know the text we want to see, we'll browse from here :

## Browsing the root of the catalog

The root of the data catalog is the result of the basic GET request on the `collections` endpoint :

In [3]:
ROOT_COLLECTION  = requests.get(
    urljoin(URI, ENDPOINTS["collections"])).json()
ROOT_COLLECTION

{'totalItems': 1,
 '@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
  'dts': 'https://w3id.org/dts/api#'},
 'member': [{'totalItems': 17,
   '@id': 'default',
   '@type': 'Collection',
   'title': 'Default collection'}],
 '@id': 'defaultTic',
 '@type': 'Collection',
 'title': 'None'}

The root collection has 1 item : "@default". Let's see what hides behind this mysterious name !

## Requesting a specific collection

Requesting a specific collection is simple : you go to the Collections endpoint, add the parameter `id` with the `@id` property of your item.

### default ?

We first want to get the LatinLit collection : it will be at the URI http://dev.chartes.psl.eu/api/nautilus/dts/collections?id=default

In [4]:
Default_Collection  = requests.get(urljoin(URI, ENDPOINTS["collections"]+"?id=default")).json()
Default_Collection

{'totalItems': 17,
 '@context': {'ns2': 'http://www.w3.org/2004/02/skos/core#',
  '@vocab': 'https://www.w3.org/ns/hydra/core#',
  'dts': 'https://w3id.org/dts/api#'},
 'member': [{'totalItems': 28,
   '@id': 'urn:cts:frenchLit:pos2010',
   '@type': 'Collection',
   'title': "Positions de thèses de l'École nationale des chartes promotion 2010"},
  {'totalItems': 27,
   '@id': 'urn:cts:frenchLit:pos2015',
   '@type': 'Collection',
   'title': "Positions de thèses de l'École nationale des chartes promotion 2015"},
  {'totalItems': 24,
   '@id': 'urn:cts:frenchLit:pos2011',
   '@type': 'Collection',
   'title': "Positions de thèses de l'École nationale des chartes promotion 2011"},
  {'totalItems': 28,
   '@id': 'urn:cts:frenchLit:pos2014',
   '@type': 'Collection',
   'title': "Positions de thèses de l'École nationale des chartes promotion 2014"},
  {'totalItems': 27,
   '@id': 'urn:cts:frenchLit:pos2005',
   '@type': 'Collection',
   'title': "Positions de thèses de l'École nationale de

### Thesis from 2001 !

17 Collections ! Much better ! So, today, I am in the mood for reading thesis from the Ecole des Chartes... Let's see what they did in 2001 !

In [5]:
Theses_2001_Collection  = requests.get(
    urljoin(URI, ENDPOINTS["collections"]+"?id=urn:cts:frenchLit:pos2001")).json()
Theses_2001_Collection

{'totalItems': 24,
 '@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
  'dts': 'https://w3id.org/dts/api#',
  'ns1': 'http://purl.org/dc/elements/1.1/',
  'ns3': 'http://www.w3.org/2004/02/skos/core#',
  'cts': 'http://chs.harvard.edu/xmlns/cts/'},
 'member': [{'totalItems': 1,
   '@id': 'urn:cts:frenchLit:pos2001.pos200121',
   '@type': 'Collection',
   'title': 'L’Église cathare du Carcassès (1167-début du '},
  {'totalItems': 1,
   '@id': 'urn:cts:frenchLit:pos2001.pos200111',
   '@type': 'Collection',
   'title': 'Édition critique de lettres de Guy Patin (Bibl. nat. de France, Baluze 148)'},
  {'totalItems': 1,
   '@id': 'urn:cts:frenchLit:pos2001.pos200123',
   '@type': 'Collection',
   'title': 'L’Exposition universelle de 1867 : apothéose du Second Empire et de la génération de 1830'},
  {'totalItems': 1,
   '@id': 'urn:cts:frenchLit:pos2001.pos200110',
   '@type': 'Collection',
   'title': 'Les Français acteurs et spectateurs de l’histoire de Hawaii (1837-1898)'},
  {

### La bibliothèque médiévale du collège des Cholets

So two things :
1. We have more collections. We'll go see `La bibliothèque médiévale du collège des Cholets` (`@id=urn:cts:frenchLit:pos2001.pos200118`).
2. We have Qualified Dublin Core metadata in `dts:dublincore` !
    1. The publisher is described !
    2. We even have a Sudoc Identifier : http://www.sudoc.fr/013565311 !
    3. We have the Machine Readable Date ! (Or at least close ?)
Wait, it has again another collection. Let's go ! Let's see where this ends !

In [6]:
Cholets_Collection  = requests.get(
    urljoin(URI, ENDPOINTS["collections"]+"?id=urn:cts:frenchLit:pos2001.pos200118")
).json()
Cholets_Collection

{'totalItems': 1,
 'dts:dublincore': {'dct:relation': [{'@id': 'http://www.sudoc.fr/149664052'}]},
 '@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
  'dts': 'https://w3id.org/dts/api#',
  'ns3': 'http://www.w3.org/2004/02/skos/core#',
  'dct': 'http://purl.org/dc/terms/',
  'ns1': 'http://purl.org/dc/elements/1.1/',
  'cts': 'http://chs.harvard.edu/xmlns/cts/'},
 'member': [{'totalItems': 0,
   'dts:passage': '/api/nautilus/dts/document?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1',
   'dts:references': '/api/nautilus/dts/navigation?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1',
   'dts:citeDepth': 2,
   'dts:extensions': {'cts:label': [{'@language': 'fr',
      '@value': 'La bibliothèque médiévale du collège des Cholets'}],
    'ns3:prefLabel': [{'@language': 'fr',
      '@value': 'La bibliothèque médiévale du collège des Cholets'}],
    'ns1:language': 'fr'},
   '@id': 'urn:cts:frenchLit:pos2001.pos200118.positionThese-fr1',
   '@type':

### Edition of La bibliothèque médiévale du collège des Cholets
Wait, the next one seems more complicated, let's request this single collection and read what's in there :

In [7]:
Cholets_Edition_Collection  = requests.get(
    urljoin(URI, ENDPOINTS["collections"]+"?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1")
).json()
Cholets_Edition_Collection

{'totalItems': 0,
 '@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
  'dts': 'https://w3id.org/dts/api#',
  'ns1': 'http://purl.org/dc/elements/1.1/',
  'ns3': 'http://www.w3.org/2004/02/skos/core#',
  'cts': 'http://chs.harvard.edu/xmlns/cts/'},
 'dts:passage': '/api/nautilus/dts/document?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1',
 'dts:references': '/api/nautilus/dts/navigation?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1',
 'dts:citeDepth': 2,
 'dts:extensions': {'cts:label': [{'@language': 'fr',
    '@value': 'La bibliothèque médiévale du collège des Cholets'}],
  'ns3:prefLabel': [{'@language': 'fr',
    '@value': 'La bibliothèque médiévale du collège des Cholets'}],
  'ns1:language': 'fr'},
 '@id': 'urn:cts:frenchLit:pos2001.pos200118.positionThese-fr1',
 '@type': 'Resource',
 'dts:citeStructure': {'dts:citeStructure': [{'dts:citeType': 'Partie'}],
  'dts:citeType': 'Section'},
 'title': 'La bibliothèque médiévale du collège des 

So, there is few things we can see:

- There is prefLabels again, and some value for `cts` ontology properties.
- More importantly, the `@type` is not `Collection` anymore ! This means the current Collection can actually be read, it's not only metadata. Good to know hmm ?
- You see the `dts:citeDepth` ? It means the text has two levels of citation. In the context of this collection, the data curator actually explicited them in `dts:citeStructure` :
    1. The first level has the name `Section`. This level has a second level:
        1. The second of the level inside poem has the name `Partie`

Now, we have two really interesting links, let's go see what's in there !

## What are the passages that I can single out in the Edition of La bibliothèque médiévale du collège des Cholets ?

To reply to this long but quite clear title, there is only one thing to do : go to the `dts:references` URI we see here.

But wait, see the URI ? It's actually a simple construction :

- We use `navigation` from `ENDPOINTS`.
- We add the `@id` of the Resource we are interested in !

### All the *Parties* !

In [8]:
Parties_Cholets = requests.get(
    urljoin(URI, Cholets_Edition_Collection["dts:references"])).json()
Parties_Cholets

{'@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
  'dts': 'https://w3id.org/dts/api#'},
 'dts:passage': '/api/nautilus/dts/document?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1{&ref}{&start}{&end}',
 'dts:citeType': 'Section',
 'dts:citeDepth': 2,
 'member': [{'ref': '1'},
  {'ref': '2'},
  {'ref': '3'},
  {'ref': '4'},
  {'ref': '5'},
  {'ref': '6'}],
 '@id': '/api/nautilus/dts/navigation?groupBy=1&id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1&level=1',
 'dts:level': 1}

There is 6 Parties ! Good. Let's see how many sections there is !

### All the *Sections* !

But wait, didn't we say there was a second level ? Let's see what are all the *Sections* in this text !

In [9]:
Section_Cholets = requests.get(
    urljoin(URI, Cholets_Edition_Collection["dts:references"]+"&level=2")).json()
Section_Cholets

{'@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
  'dts': 'https://w3id.org/dts/api#'},
 'dts:passage': '/api/nautilus/dts/document?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1{&ref}{&start}{&end}',
 'dts:citeType': 'Partie',
 'dts:citeDepth': 2,
 'member': [{'ref': '3.1'},
  {'ref': '3.2'},
  {'ref': '3.3'},
  {'ref': '3.4'},
  {'ref': '4.1'},
  {'ref': '4.2'},
  {'ref': '4.3'}],
 '@id': '/api/nautilus/dts/navigation?groupBy=1&id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1&level=2',
 'dts:level': 2}

See ? Apparently, the Parties `1`, `2`, `5` and `6` have no children. 

### Error message when something does not exist !
Let's check though :

In [10]:
print(urljoin(URI, Cholets_Edition_Collection["dts:references"]+"&ref=1"))
Section_Cholets_Partie_1 = requests.get(
    urljoin(URI, Cholets_Edition_Collection["dts:references"]+"&ref=1")).json()
Section_Cholets_Partie_1

# This is a bug that needs to be addressed. We should have an error message with 404 or something !

http://dev.chartes.psl.eu/api/nautilus/dts/navigation?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1&ref=1


JSONDecodeError: Expecting value: line 1 column 1 (char 0)

## Getting the text

Now that we can see what are the available passage, why not getting to the text passages ?

Let see... We build this the same way than the Navigation query ! But instead, we use `document` from the entry point !

### Getting an excerpt


In [11]:
Cholet_Partie_1 = requests.get(
    urljoin(URI, Cholets_Edition_Collection["dts:passage"]+"&ref=1")
)
print(
    urljoin(URI, Cholets_Edition_Collection["dts:passage"]+"&ref=1")
)
print(Cholet_Partie_1.text)

http://dev.chartes.psl.eu/api/nautilus/dts/document?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1&ref=1
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:py="http://codespeak.net/lxml/objectify/pytype" xml:lang="fr" xml:id="position-200118" py:pytype="TREE"><dts:fragment xmlns:dts="https://w3id.org/dts/api#"><text><body xml:lang="fr" n="urn:cts:frenchLit:pos2001.pos200118.positionThese-fr1"><div type="introduction" n="1"><head>Introduction</head><p>Les historiens ne se sont intéressés que récemment aux bibliothèques de collèges, dont la documentation est moins riche et plus dispersée
          que les fonds monastiques ou de particuliers. Les travaux pionniers d’Elisabeth Pellegrin sur les bibliothèques des collèges parisiens de
          Dormans-Beauvais, de Hubant et de Fortet, ont donné lieu à des articles très riches, mais les monographies sur le sujet restent rares.</p><p>Le collège des Cholets, étudié en 1971 par Elisabeth Rabut dans sa thèse d’Ecole des chartes, a 

### Getting the whole text

That's nice ! But the text seems fairly small, so why not requesting the whole text ? 

In [12]:
Cholet_Full_Text = requests.get(
    urljoin(URI, Cholets_Edition_Collection["dts:passage"])
)
print(
    urljoin(URI, Cholets_Edition_Collection["dts:passage"])
)
print(Cholet_Full_Text.text)

http://dev.chartes.psl.eu/api/nautilus/dts/document?id=urn%3Acts%3AfrenchLit%3Apos2001.pos200118.positionThese-fr1
<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:lang="fr" xml:id="position-200118"><teiHeader><fileDesc><titleStmt><title xml:lang="fr" type="main">La bibliothèque médiévale du collège des Cholets</title><author key="200118_rebmeister" ref="060918705">Karine Rebmeister</author></titleStmt><editionStmt><edition>École des chartes</edition></editionStmt><publicationStmt><publisher>École des chartes</publisher><date when="2001"/><availability status="restricted"><licence target="http://creativecommons.org/licenses/by-nc-nd/3.0/fr/"/></availability></publicationStmt><seriesStmt><title>Positions des thèses</title><idno type="ISSN">0755-2976</idno><idno type="URI">http://www.sudoc.fr/013565311</idno></seriesStmt><sourceDesc><bibl><title>Positions des thèses soutenues par les élèves de la promotion de 2001 pour obtenir le diplôme d’archiviste paléographe</title>, <pubPlace>Paris</pub