Beta maṣāḥǝft
===

The Beta maṣāḥǝft implementation is a specific implementation built by Pietro Liuzzo

## Configuration

The following cell are used to avoid rewriting to much cells if the address of the DTS API were to change.


In [1]:
import requests
import requests_cache
from pprint import pprint
from urllib.parse import urljoin

URI = "https://betamasaheft.eu/api/dts"

# define hearders for HTTP requests
h = {'User-Agent': 'DTS Client'}

## Getting the available endpoints

DTS entry point is a listing of the available endpoints and their URL. This means that for each implementation of DTS, this single URL will give you all the information you need to perform arbitrary queries, if you were to do so. The Beta maṣāḥǝft implementation provide the three endpoints :

In [2]:
entry_request = requests.get(URI, headers=h)
ENDPOINTS = entry_request.json()
ENDPOINTS

{'indexes': '/api/dts/indexes',
 'annotations': '/api/dts/annotations',
 '@context': 'api/dts/contexts/EntryPoint.jsonld',
 'navigation': '/api/dts/navigation',
 'collections': '/api/dts/collections',
 '@type': 'EntryPoint',
 'document': '/api/dts/document',
 '@id': '/api/dts'}

As you can see, all three endpoints have been given URIs. Because we do not know the text we want to see, we'll browse from here :

## Browsing the root of the catalog

The root of the data catalog is the result of the basic GET request on the `collections` endpoint :

In [3]:
ROOT_COLLECTION  = requests.get(
    urljoin(URI, ENDPOINTS["collections"]),
    headers=h
).json()
ROOT_COLLECTION

{'@context': {'lawd': 'http://lawd.info/ontology/',
  'dts': 'https://w3id.org/dts/api#',
  'saws': 'http://purl.org/saws/ontology#',
  '@vocab': 'https://www.w3.org/ns/hydra/core#',
  'fabio': 'http://purl.org/spar/fabio',
  'foaf': 'http://xmlns.com/foaf/0.1/',
  'edm': 'http://www.europeana.eu/schemas/edm/',
  'sc': 'http://iiif.io/api/presentation/2#',
  'svcs': 'http://rdfs.org/sioc/services#',
  'tei': 'http://www.tei-c.org/ns/1.0',
  'ecrm': 'http://erlangen-crm.org/current/',
  'doap': 'http://usefulinc.com/ns/doap#',
  'crm': 'http://www.cidoc-crm.org/cidoc-crm/',
  'dc': 'http://purl.org/dc/terms/'},
 'dts:dublincore': {'dc:publisher': ['Akademie der Wissenschaften in Hamburg',
   'Hiob-Ludolf-Zentrum für Äthiopistik'],
  'dc:description': [{'@lang': 'en',
    '@value': "The project Beta maṣāḥǝft: Manuscripts of Ethiopia and Eritrea (Schriftkultur des christlichen Äthiopiens: eine multimediale Forschungsumgebung) is a long-term project funded within the framework of the Acade

The root collection has 2 items : let's go see the one made of manuscripts (`@id=https://betamasaheft.eu/transcriptions`)!

## Requesting a specific collection

Requesting a specific collection is simple : you go to the Collections endpoint, add the parameter `id` with the `@id` property of your item.

### Manuscripts Collections

We first want to get the Manuscripts collection : it will be at the URI http://betamasaheft.eu/api/dts/collections?id=https://betamasaheft.eu/transcriptions

In [4]:
collection_id = "https://betamasaheft.eu/transcriptions"

In [14]:
Manuscript_Collection  = requests.get(
    urljoin(URI, ENDPOINTS["collections"]+f"?id={collection_id}"),
    headers=h
).json()
print(urljoin(URI, ENDPOINTS["collections"]+f"?id={collection_id}"))
pprint(Manuscript_Collection)
#Let's check page X where we have BLorient718

https://betamasaheft.eu/api/dts/collections?id=https://betamasaheft.eu/transcriptions
{'@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
              'crm': 'http://www.cidoc-crm.org/cidoc-crm/',
              'dc': 'http://purl.org/dc/terms/',
              'doap': 'http://usefulinc.com/ns/doap#',
              'dts': 'https://w3id.org/dts/api#',
              'ecrm': 'http://erlangen-crm.org/current/',
              'edm': 'http://www.europeana.eu/schemas/edm/',
              'fabio': 'http://purl.org/spar/fabio',
              'foaf': 'http://xmlns.com/foaf/0.1/',
              'lawd': 'http://lawd.info/ontology/',
              'saws': 'http://purl.org/saws/ontology#',
              'sc': 'http://iiif.io/api/presentation/2#',
              'svcs': 'http://rdfs.org/sioc/services#',
              'tei': 'http://www.tei-c.org/ns/1.0'},
 '@id': 'https://betamasaheft.eu/transcriptions',
 '@type': 'Collection',
 'dts:dublincore': {'dc:description': [{'@lang': 'en',
           

### London, British Library, BL Oriental 718

There is a lot of collections, let check this one in details : urn:dts:betmasMS:BLorient718 (It's available in one of the pages and is an interesting, fully working collection)

In [7]:
BLorient718_coll_id = "https://betamasaheft.eu/BLorient718"

In [16]:
request_uri = urljoin(URI, ENDPOINTS["collections"]+f"?id={BLorient718_coll_id}")

In [17]:
BLorient718  = requests.get(request_uri,headers=h).json()

In [18]:
pprint(BLorient718)

{'@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
              'crm': 'http://www.cidoc-crm.org/cidoc-crm/',
              'dc': 'http://purl.org/dc/terms/',
              'doap': 'http://usefulinc.com/ns/doap#',
              'dts': 'https://w3id.org/dts/api#',
              'ecrm': 'http://erlangen-crm.org/current/',
              'edm': 'http://www.europeana.eu/schemas/edm/',
              'fabio': 'http://purl.org/spar/fabio',
              'foaf': 'http://xmlns.com/foaf/0.1/',
              'lawd': 'http://lawd.info/ontology/',
              'saws': 'http://purl.org/saws/ontology#',
              'sc': 'http://iiif.io/api/presentation/2#',
              'svcs': 'http://rdfs.org/sioc/services#',
              'tei': 'http://www.tei-c.org/ns/1.0'},
 '@id': 'https://betamasaheft.eu/BLorient718',
 '@type': 'Resource',
 'description': 'The transcription of manuscript London, British Library, BL '
                'Oriental 718 in Beta maṣāḥǝft ',
 'dts:citeDepth': 1,
 'dts:c

So, there is few things we can see:

- The `dts:dublincore` is quite well filled : 
    - There is multiple language in the text
    - The creator of the edition are Alessandro Bausia and Nafisa Valieva
    - Contributor are named.
- The data provider has given a direct link for download, in case this seems more interesting for the user, through `dts:download`
- You see the `dts:citeDepth` ? The texts has up to 3 levels !
    1. The first level has the name `folio`. This level has a second level:
        1. The second of the level inside folio has the name `page`
            1. The third of the level inside page has the name `column`

Now, we have two really interesting links, let's go see what's in there !

## What folio do you have ?

To reply to this long but quite clear title, there is only one thing to do : go to the `dts:references` URI we see here.

But wait, see the URI ? It's actually a simple construction :

- We use `navigation` from `ENDPOINTS`.
- We add the `@id` of the Resource we are interested in !

### All the *folio*s !

In [43]:
Folios = requests.get(
    urljoin(URI, BLorient718["dts:references"]),
    headers=h
).json()

In [45]:
pprint(Folios)

{'@base': '/api/dts/navigation',
 '@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
              'dc': 'http://purl.org/dc/terms/',
              'dts': 'https://w3id.org/dts/api#'},
 '@id': '/api/dts/navigation?id=https://betamasaheft.eu/BLorient718',
 'dc:hasVersion': 'version set to , no version links retrieved from GitHub.',
 'dts:citeDepth': 4,
 'dts:citeType': 'textpart',
 'dts:level': 1,
 'dts:passage': '/api/dts/document?id=https://betamasaheft.eu/BLorient718',
 'member': [{'dts:citeType': 'folio',
             'dts:dublincore': {'dc:source': [{'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p1',
                                               '@type': 'sc:Canvas'},
                                              {'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p2',
                                               '@type': 'sc:Canvas'}]},
             'dts:ref': '1'},
            {'dts:citeType': 'folio',
             'dts:dublincore': {'dc:source'

                                              {'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p97',
                                               '@type': 'sc:Canvas'}]},
             'dts:ref': '96'},
            {'dts:citeType': 'folio',
             'dts:dublincore': {'dc:source': [{'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p97',
                                               '@type': 'sc:Canvas'},
                                              {'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p98',
                                               '@type': 'sc:Canvas'}]},
             'dts:ref': '97'},
            {'dts:citeType': 'folio',
             'dts:dublincore': {'dc:source': [{'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p98',
                                               '@type': 'sc:Canvas'},
                                              {'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p99',
             

In [44]:
# let's print only the first 20 results
for m in Columns['member'][:20]:
    print(m['dts:citeType'], m['dts:ref'])

folio 1
unit 1.tei:ab[1]
folio 2
unit 2.tei:ab[1]
folio 3
unit 3.tei:ab[1]
folio 4
unit 4.tei:ab[1]
folio 5
unit 5.tei:ab[1]
folio 6
unit 6.tei:ab[1]
folio 7
unit 7.tei:ab[1]
folio 8
unit 8.tei:ab[1]
folio 9
unit 9.tei:ab[1]
folio 10
unit 10.tei:ab[1]


### Give me all the columns !

In [42]:
Columns = requests.get(
    urljoin(URI, BLorient718["dts:references"] + "&level=1"),
    headers=h
).json()

In [46]:
pprint(Columns)

{'@base': '/api/dts/navigation',
 '@context': {'@vocab': 'https://www.w3.org/ns/hydra/core#',
              'dc': 'http://purl.org/dc/terms/',
              'dts': 'https://w3id.org/dts/api#'},
 '@id': '/api/dts/navigation?id=https://betamasaheft.eu/BLorient718',
 'dc:hasVersion': 'version set to , no version links retrieved from GitHub.',
 'dts:citeDepth': 4,
 'dts:citeType': 'textpart',
 'dts:level': 1,
 'dts:passage': '/api/dts/document?id=https://betamasaheft.eu/BLorient718',
 'member': [{'dts:citeType': 'folio',
             'dts:dublincore': {'dc:source': [{'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p1',
                                               '@type': 'sc:Canvas'},
                                              {'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p2',
                                               '@type': 'sc:Canvas'}]},
             'dts:ref': '1'},
            {'dts:citeType': 'unit', 'dts:ref': '1.tei:ab[1]'},
            {'dts

                                              {'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p84',
                                               '@type': 'sc:Canvas'}]},
             'dts:ref': '83'},
            {'dts:citeType': 'unit', 'dts:ref': '83.tei:ab[1]'},
            {'dts:citeType': 'folio',
             'dts:dublincore': {'dc:source': [{'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p84',
                                               '@type': 'sc:Canvas'},
                                              {'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p85',
                                               '@type': 'sc:Canvas'}]},
             'dts:ref': '84'},
            {'dts:citeType': 'unit', 'dts:ref': '84.tei:ab[1]'},
            {'dts:citeType': 'folio',
             'dts:dublincore': {'dc:source': [{'@id': 'https://betamasaheft.eu/api/iiif/BLorient718/canvas/p85',
                                               '@type': 'sc:Canvas

In [41]:
# let's print only the first 20 results
for m in Columns['member'][:20]:
    print(m['dts:citeType'], m['dts:ref'])

folio 1
unit 1.tei:ab[1]
folio 2
unit 2.tei:ab[1]
folio 3
unit 3.tei:ab[1]
folio 4
unit 4.tei:ab[1]
folio 5
unit 5.tei:ab[1]
folio 6
unit 6.tei:ab[1]
folio 7
unit 7.tei:ab[1]
folio 8
unit 8.tei:ab[1]
folio 9
unit 9.tei:ab[1]
folio 10
unit 10.tei:ab[1]


## Getting the text

Now that we can see what are the available passage, why not getting to the text passages ?

Let see... We build this the same way than the Navigation query ! But instead, we use `document` from the entry point !

### Getting an excerpt

In [54]:
request_uri = urljoin(URI, BLorient718["dts:passage"]+"&ref=1")

In [55]:
BLorient718_F_1 = requests.get(request_uri,headers=h)

In [59]:
# why this time  we don't receive back a JSON answer?
# because we are querying the `document` endpoint, whose
# serialization format is TEI/XML!
BLorient718_F_1.headers['Content-Type']

'application/tei+xml; charset=utf-8'

In [57]:
print(request_uri)

https://betamasaheft.eu/api/dts/document?id=https://betamasaheft.eu/BLorient718&ref=1


In [58]:
print(BLorient718_F_1.text)

<TEI xmlns="http://www.tei-c.org/ns/1.0">
    <dts:fragment xmlns:dts="https://w3id.org/dts/api#">
        <div type="textpart" subtype="folio" n="1" xml:id="f1">
               <ab>
                  <pb n="1r"/>
                  <cb n="a"/>
                  <hi rend="rubric">በስመ፡ አብ፡ ወወልድ፡ ወመንፈስ፡ ቅዱስ፡ ፩ </hi> አምላክ፡ ሥሉስ፡ ዘኢይሰደቅ፡ ዋሕድ፨ ዕሩ<hi rend="rubric">ይ፡ ታሉት፡ ዘኢይነፍድ፤ ዘሀሎ፡ እምቅድመ፡ ክ</hi> ዋኒሁ፡ እንዘ፡ ኢይትበዓድ፨ ወእምህላዊሁ፡ እንዘ፡
            ኢየሐፅፅ፡ ወኢይፈደፍድ፨ በአናስረ፡ ዓ <hi rend="rubric">ለም፡ ዘኢይተረጐም፡ ወኢይትዔለድ፨ ወዘ</hi> ኢይትፈለጥ፡ ወልድ፡
            እምአቡሁ፡ ወመንፈስ፡ እምወልድ፨ ዘአምጽአ፡ ዓለመ፡ በቃለ፡ ጽ <hi rend="rubric">ውዓ፡ እምኀበ፡ ኢሀሎ፡ በአሐቲ፡ ምክር፡
              ወበአ</hi> ሐቲ፡ ፈቃድ፨ ዘሣረራ፡ ለምድር፡ በልቡና፡ <cb n="b"/> ዘኢ <hi rend="rubric">ይኄለድ፨ ወሰማይኒ፡
              ዘአንበራ፡ በዓየር፡ እሳት፡ </hi> ዘይነድድ፨ ወለእሳትኒ፡ ዘሰፍሖ፡ በዓየር፡ ሰማይ፡ <hi rend="rubric">እንበለ፡ ገሢሥ፡
              በእድ፤ ወለነፋስ፡ ዘረበቦ፡ ዲበ፡ </hi> ሰረገላ፡ ጽልመት፡ እንበለ፡ መሠረት፡ ወድድ፤ አርጊዖ፡ ማየ፡ ዘረሰየ፡ ሰማየ፡ ወሰቀሎ፡
            ከመ፡ ቀመር፡ <hi rend="rubric">ወዓምድ፨ ዘረሰዮሙ፡ ለመላእክቲሁ፡ </hi> መንፈሰ፡ ወለእለ፡ ይትለአክዎ፡ 

### Getting the whole text

That's nice ! But the text seems fairly small, so why not requesting the whole text ? 

In [60]:
request_uri = urljoin(URI, BLorient718["dts:passage"])

In [61]:
DSEthiop1_full_text = requests.get(request_uri, headers=h)

In [62]:
print(request_uri)

https://betamasaheft.eu/api/dts/document?id=https://betamasaheft.eu/BLorient718


In [63]:
print(DSEthiop1_full_text.text)

<TEI xmlns="http://www.tei-c.org/ns/1.0" xml:id="BLorient718" xml:lang="en" type="mss">
  <teiHeader xml:base="https://betamasaheft.eu/">
      <fileDesc>
         <titleStmt>
            <title xml:lang="en" xml:id="t1"> 'The 'Gadla Lālibalā'  collection of
          textual units'</title>

            <editor role="generalEditor" key="AB"/>
            <editor key="NV"/>
            <funder>Akademie der Wissenschaften in Hamburg</funder>
         </titleStmt>
         <editionStmt>
            <p> </p>
         </editionStmt>
         <publicationStmt>
            <authority>Hiob-Ludolf-Zentrum für Äthiopistik</authority>
            <publisher>Die Schriftkultur des christlichen Äthiopiens und Eritreas: Eine multimediale
          Forschungsumgebung / Beta maṣāḥǝft</publisher>
            <pubPlace>Hamburg</pubPlace>
            <availability>
               <licence target="http://creativecommons.org/licenses/by-sa/4.0/"> This file is licensed
            under the Creative Commons 