# Using the eScriptorium Connector

This library provides a simple API to access the eScriptorium online platform.

Install with `pip install escriptorium-connector` and import it as follows.

In [4]:
from src.escriptorium_connector import EscriptoriumConnector
import os
from dotenv import load_dotenv
import io

## Instantiating the connector

It is probably a best practice to store your validation credentials separately. Here we use `python-dotenv` to load from a `.env` file the user's login credentials and the eScriptorium instance address.

The url of the eScriptorium instance is passed to the connector along with the url address for its api and the user's login credentials. The connector will take care of getting the API key for the user and any necessary cookies for asynchronous requests.

In [5]:
load_dotenv()
url = str(os.getenv('ESCRIPTORIUM_URL'))
api = f'{url}api/'
username = str(os.getenv('ESCRIPTORIUM_USERNAME'))
password = str(os.getenv('ESCRIPTORIUM_PASSWORD'))
escr = EscriptoriumConnector(url, api, username, password)

## Usage

Once instantiated, the connector provides well-documented and convenient methods to interact with your data on eScriptorium.

You might get a list of all your documents.

In [6]:
my_documents = escr.get_documents()
document_pk = my_documents[0]["pk"]


In [7]:
print(len(my_documents))
print(my_documents[0])

12
{'pk': 299, 'name': 'Geniza_complex', 'project': 'htr4pgp_project', 'transcriptions': [{'pk': 363, 'name': 'kraken:PGPsico1_19'}, {'pk': 361, 'name': 'kraken:PGPsimple74_4'}, {'pk': 360, 'name': 'manual'}], 'main_script': 'Hebrew', 'read_direction': 'rtl', 'line_offset': 0, 'valid_block_types': [{'pk': 263, 'name': 'Arabic'}, {'pk': 2, 'name': 'Main'}, {'pk': 268, 'name': 'Oblique_135'}, {'pk': 269, 'name': 'Oblique_225'}, {'pk': 613, 'name': 'Oblique_315'}, {'pk': 267, 'name': 'Oblique_45'}, {'pk': 264, 'name': 'Upside_Down'}, {'pk': 612, 'name': 'Vertical_Bottom_Up_90'}, {'pk': 611, 'name': 'Vertical_Top_Down_270'}], 'valid_line_types': [{'pk': 3, 'name': 'Correction'}], 'parts_count': 1693, 'tags': [], 'created_at': '2021-04-13T10:03:05.047075Z', 'updated_at': '2021-11-16T22:34:03.778208Z'}


## Convenience

The connector provides several conveniences.  Perhaps the most broadly useful is automatically following the paging information to acquire the full dataset requested.  Here the request for document parts collects the full list of parts without the need to check if there are further pages of results available.

In [9]:
document_parts = escr.get_document_parts(18)

In [10]:
print(len(document_parts))
selected_part = document_parts[1]
print(selected_part)

1670
{'pk': 5651, 'name': '', 'filename': 'PGPID_3020_MS-TS-00010-J-00004-00008-000-00002.jpg', 'title': 'Element 2', 'typology': None, 'image': {'uri': '/media/documents/18/PGPID_3020_MS-TS-00010-J-00004-00008-000-00002.jpg', 'size': [2000, 1846], 'thumbnails': {'card': '/media/documents/18/PGPID_3020_MS-TS-00010-J-00004-00008-000-00002.jpg.180x180_q85_crop-smart.jpg', 'large': '/media/documents/18/PGPID_3020_MS-TS-00010-J-00004-00008-000-00002.jpg.1000x1000_q85.jpg'}}, 'image_file_size': 477991, 'bw_image': None, 'workflow': {'convert': 'done', 'segment': 'done', 'transcribe': 'done'}, 'order': 1, 'recoverable': False, 'transcription_progress': 100, 'source': ''}


## Downloading Files

Images can be easily downloaded/uploaded with the connector. Also the exported XML files are available.

In [None]:
image = escr.get_image(selected_part["image"]["uri"])

In [None]:
from IPython.display import Image
display(Image(image))

In [None]:
import zipfile

transcriptions = escr.get_document_transcriptions(document_pk)
page_xmls_zipped = escr.download_part_pagexml_transcription(document_pk, [x["pk"] for x in document_parts[0:4]], transcriptions[0]["pk"])

# the bytes from the connector are from a zip file, so we need to unzip it
zip = zipfile.ZipFile(io.BytesIO(page_xmls_zipped))
page_xmls = [zip.read(x) for x in zip.infolist()]
print(len(page_xmls))

In [None]:
import xml.dom.minidom as md

# Let's see what it looks like:
for page_xml in page_xmls:
    dom = md.parse(io.BytesIO(page_xml))
    pretty_xml = dom.toprettyxml()
    # remove the weird newline issue:
    pretty_xml = os.linesep.join([s for s in pretty_xml.splitlines() if s.strip()])
    print(pretty_xml)