#Login Session


To access protected resources, we initiate a login process by sending a POST request with provided credentials to the server's login endpoint. Following this, we check the response status code; a 200 indicates successful login. Upon successful login, we extract the session ID from the response cookies. This ID is vital for maintaining session state and authentication for future requests, ensuring continued access to the server's resources.

In [72]:

import requests

# login credentials
login_data={'user': 'tyiesha.r.b@gmail.com','pw': ''}

# Make the POST request to log in
response = requests.post('https://transkribus.eu/TrpServer/rest/auth/login', data=login_data)

# Check the response status
if response.status_code == 200:

    session_id = response.cookies.get('JSESSIONID')
    print('Login successful!')
else:
    print('Failed to log in. Status code:', response.status_code)
    print('Response:', response.text)




Login successful!


#Collection List

The process begins with a POST request to the login endpoint, using the provided credentials to authenticate with the server. After verifying the login status, including extracting the session ID upon successful login, the script uses the API URL for accessing collection data. By using a GET request to retrieve documents from the collection, the response status is checked for success. If successful, JSON data from the response is parsed for document information.

In [71]:
import requests

#login credentials
login_data = {'user': 'tyiesha.r.b@gmail.com', 'pw': ''}

# collection ID
collection_id = '285162'

# POST request to log in
login_response = requests.post('https://transkribus.eu/TrpServer/rest/auth/login', data=login_data)

# Check if login was successful
if login_response.status_code == 200:
    print('Login successful!')

    # Extract the session ID from the login response
    session_id = login_response.cookies.get('JSESSIONID')

    # URL to get the list of documents in the collection
    collection_url = f'https://transkribus.eu/TrpServer/rest/collections/{collection_id}/list'

    # GET request to get the list of documents in the collection
    collection_response = requests.get(collection_url, headers={'Cookie': f'JSESSIONID={session_id}'})

    # Check if getting the collection was successful
    if collection_response.status_code == 200:
        documents = collection_response.json()
        print('Documents in Collection:')
        print(documents)
    else:
        print('Failed to get the list of documents in the collection. Status code:', collection_response.status_code)
        print('Response:', collection_response.text)
else:
    print('Failed to log in. Status code:', login_response.status_code)
    print('Response:', login_response.text)



Login successful!
Documents in Collection:
[{'type': 'trpDocMetadata', 'docId': 1858821, 'title': 'Germknödel (Sample Document)', 'uploadTimestamp': 1709495586880, 'uploader': 'tyiesha.r.b@gmail.com', 'uploaderId': 223639, 'nrOfPages': 1, 'pageId': 65999920, 'url': 'https://files.transkribus.eu/Get?fileType=view&id=DAEQZNPMDNXLWKBWOHOHPWDD', 'thumbUrl': 'https://files.transkribus.eu/Get?fileType=thumb&id=DAEQZNPMDNXLWKBWOHOHPWDD', 'status': 0, 'fimgStoreColl': 'TrpDoc_DEA_889295', 'origDocId': 889295, 'collectionList': {'colList': [{'colId': 285162, 'colName': 'tyiesha.r.b@gmail.com Collection', 'description': 'tyiesha.r.b@gmail.com', 'crowdsourcing': False, 'elearning': False, 'nrOfDocuments': 0}]}, 'attributes': [], 'mainColId': 285162, 'isInMain': True}, {'type': 'trpDocMetadata', 'docId': 1865455, 'title': 'manuscript', 'uploadTimestamp': 1709752965775, 'uploader': 'tyiesha.r.b@gmail.com', 'uploaderId': 223639, 'nrOfPages': 1, 'pageId': 66206911, 'url': 'https://files.transkribus.e

#Return Full Document Data

To return the full document data, you begin by defining login credentials and identifying the collection and document IDs. Following this, a POST request is made to the server's login endpoint to authenticate. Upon successful login, the session ID is extracted from the response cookies. Using this session ID, a GET request is then sent to retrieve the full document specified by the collection and document IDs.  If the retrieval is successful, the JSON data from the response is parsed to extract information about the full document.

In [None]:
import requests

# login credentials
login_data = {'user': 'tyiesha.r.b@gmail.com', 'pw': ''}

# Define the collection ID and document ID
collection_id = '285162'
document_id = '1865455'

    # POST request to log in
login_response = requests.post('https://transkribus.eu/TrpServer/rest/auth/login', data=login_data)
if login_response.status_code == 200:
        session_id = login_response.cookies.get('JSESSIONID')

        # Make the GET request to retrieve the full document
        full_document_url = f'https://transkribus.eu/TrpServer/rest/collections/{collection_id}/{document_id}/fulldoc'
        response = requests.get(full_document_url, headers={'Cookie': f'JSESSIONID={session_id}'})

        # Check if getting the document data was successful
        if response.status_code == 200:
            full_document_data = response.json()
            print('Full Document Data:')
            print(full_document_data)
        else:
            print('Failed to retrieve full document. Status code:', response.status_code)
            print('Response:', response.text)
else:
        print('Failed to log in. Status code:', login_response.status_code)
        print('Response:', login_response.text)



Full Document Data:
{'md': {'nrOfRegions': 3, 'nrOfTranscribedRegions': 0, 'nrOfWordsInRegions': 0, 'nrOfLines': 38, 'nrOfTranscribedLines': 37, 'nrOfWordsInLines': 297, 'nrOfWords': 296, 'nrOfTranscribedWords': 296, 'nrOfCharsInLines': 1177, 'nrOfNew': 0, 'nrOfInProgress': 1, 'nrOfDone': 0, 'nrOfFinal': 0, 'nrOfGT': 0, 'docId': 1865455, 'title': 'manuscript', 'uploadTimestamp': 1709752965775, 'uploader': 'tyiesha.r.b@gmail.com', 'uploaderId': 223639, 'nrOfPages': 1, 'pageId': 66206911, 'url': 'https://files.transkribus.eu/Get?fileType=view&id=OFUWSLEXLXBDIEIJCDJWVCRX', 'thumbUrl': 'https://files.transkribus.eu/Get?fileType=thumb&id=OFUWSLEXLXBDIEIJCDJWVCRX', 'status': 0, 'fimgStoreColl': 'TrpDoc_DEA_1865455', 'origDocId': 0, 'collectionList': {'colList': [{'colId': 285162, 'colName': 'tyiesha.r.b@gmail.com Collection', 'description': 'tyiesha.r.b@gmail.com', 'crowdsourcing': False, 'elearning': False, 'nrOfDocuments': 0}]}, 'attributes': [], 'mainColId': 285162}, 'pageList': {'pages':

#Job List

To retrieve the job list, send a POST request to the login endpoint with the provided login credentials. Upon receiving a successful login, the session ID is extracted from the response cookies. Then, using the GET request, it retrieves the list of jobs, including the session ID in the request headers for authentication. If the response status code is 200, indicating successful retrieval, the JSON data containing the list of jobs is parsed and displayed.

In [None]:
import requests

# login credentials
login_data = {'user': 'tyiesha.r.b@gmail.com', 'pw': ''}


# POST request to log in
login_response = requests.post('https://transkribus.eu/TrpServer/rest/auth/login', data=login_data)
if login_response.status_code == 200:
        session_id = login_response.cookies.get('JSESSIONID')

        # GET request to retrieve the list of jobs
        jobs_list_url = 'https://transkribus.eu/TrpServer/rest/jobs/list'
        response = requests.get(jobs_list_url, headers={'Cookie': f'JSESSIONID={session_id}'})

        # Check if Job list returns successfully and print the output
        if response.status_code == 200:
            jobs_list = response.json()
            print('Jobs List:')
            print(jobs_list)
        else:
            print('Failed to retrieve jobs list. Status code:', response.status_code)
            print('Response:', response.text)
else:
        print('Failed to log in. Status code:', login_response.status_code)
        print('Response:', login_response.text)



Jobs List:
[{'jobId': '8282766', 'docId': 1867036, 'pageNr': -1, 'type': 'Create Document', 'state': 'FINISHED', 'success': True, 'description': 'Done, duration: 3s 504ms', 'userName': 'tyiesha.r.b@gmail.com', 'userId': 223639, 'createTime': 1709835403058, 'startTime': 1709835403401, 'endTime': 1709835406905, 'jobData': '#Thu Mar 07 19:16:43 CET 2024\ncolId=285162\n', 'resumable': False, 'jobImpl': 'UploadImportJob', 'moduleUrl': 'http://dea-bl04:8080/UtilityModule-trpProd-2.12.0', 'moduleName': 'UtilityModule', 'moduleVersion': '2.12.0', 'started': '2024-03-07T19:16:43.401+01:00', 'ended': '2024-03-07T19:16:46.905+01:00', 'created': '2024-03-07T19:16:43.058+01:00', 'batchId': 0, 'pageid': 0, 'tsid': 0, 'parent_jobid': 0, 'parent_batchid': 0, 'colId': 285162, 'progress': 1, 'totalWork': 1, 'nrOfErrors': 0, 'docTitle': 'default', 'priority': 0}, {'jobId': '8282681', 'docId': 1866954, 'pageNr': -1, 'type': 'Create Document', 'state': 'FINISHED', 'success': True, 'description': 'Done, dur

#Job Details


A POST request is sent to the login endpoint using the provided credentials, with the response status code indicating successful authentication. Upon receiving a successful login, the session ID is extracted from the response cookies. A GET request that contains specific job details is created by adding the job ID to the URL. Inclusion of the session ID in the request headers ensures proper authentication.

In [None]:
import requests

# login credentials
login_data = {'user': 'tyiesha.r.b@gmail.com', 'pw': ''}


    # POST request to log in
login_response = requests.post('https://transkribus.eu/TrpServer/rest/auth/login', data=login_data)
if login_response.status_code == 200:
        session_id = login_response.cookies.get('JSESSIONID')

        # GET details of a specific job
        job_id = '8273703'
        job_details_url = f'https://transkribus.eu/TrpServer/rest/jobs/{job_id}'
        job_details_response = requests.get(job_details_url, headers={'Cookie': f'JSESSIONID={session_id}'})

        if job_details_response.status_code == 200:
            job_details = job_details_response.json()
            print('Job Details:')
            print(job_details)
        else:
            print('Failed to retrieve job details. Status code:', job_details_response.status_code)
            print('Response:', job_details_response.text)

else:
        print('Failed to log in. Status code:', login_response.status_code)
        print('Response:', login_response.text)



Job Details:
{'jobId': '8273703', 'docId': 1865455, 'pageNr': -1, 'pages': '1', 'type': 'PyLaia Decoding', 'state': 'FINISHED', 'success': True, 'description': 'Done, duration: 40s 885ms', 'userName': 'tyiesha.r.b@gmail.com', 'userId': 223639, 'createTime': 1709754289538, 'startTime': 1709754293167, 'endTime': 1709754334052, 'jobData': '#Wed Mar 06 20:44:53 CET 2024\ndoNotDeleteWorkDir=false\nwriteKwsIndex=false\ndoLinePolygonSimplification=true\nkeepOriginalLinePolygons=false\nmodelId=53042\nwriteLineConfScore=false\nwriteWordConfScores=false\nb2pBackend=Legacy\nuserRoles=fields-recognition,super-models,User,named-entity-recognition,smart-search,tables-recognition,collaboration-tools,export\nclearLines=false\nisNextGen=false\nuserEmail=tyiesha.r.b@gmail.com\ndoWordSeg=true\nworkDir=/tmp/HTR/PyLaia/trpProd/Decode/pylaiaDecode_8273703\nbatchSize=10\nuseExistingLinePolygons=false\nnBest=1\n', 'resumable': False, 'jobImpl': 'PyLaiaDecodingJob', 'moduleUrl': 'http://srv6113:8080/PyLaiaModu

#Job Errors

 Upon successful login, a session ID is extracted for subsequent requests. A GET request is made to retrieve details of errors associated with a specific job, dynamically constructing the request URL with the job ID and ensuring authentication with the session ID in the request headers.

In [None]:
import requests

# login credentials
login_data = {'user': 'tyiesha.r.b@gmail.com', 'pw': ''}


    # POST request to log in
login_response = requests.post('https://transkribus.eu/TrpServer/rest/auth/login', data=login_data)
if login_response.status_code == 200:
        session_id = login_response.cookies.get('JSESSIONID')

        # Retrieve details of errors for a specific job
        job_id = '8273703'
        job_errors_url = f'https://transkribus.eu/TrpServer/rest/jobs/{job_id}/errors'
        job_errors_response = requests.get(job_errors_url, headers={'Cookie': f'JSESSIONID={session_id}'})

        if job_errors_response.status_code == 200:
            job_errors = job_errors_response.json()
            print('Job Errors:')
            print(job_errors)
        else:
            print('Failed to retrieve job errors. Status code:', job_errors_response.status_code)
            print('Response:', job_errors_response.text)

else:
        print('Failed to log in. Status code:', login_response.status_code)
        print('Response:', login_response.text)




Job Errors:
{'type': 'jobErrorList', 'total': 0, 'index': 0, 'nValues': -1}


## Upload Image

 A POST request is made to authenticate the user with login credentials. If successful, the session ID is extracted for subsequent requests. Then, a GET request retrieves a list of collections, and if successful, the collections are printed. Following this, metadata for the document is defined, and a POST request uploads the document to the specified collection ID. The script checks the response status codes for each request and prints appropriate messages indicating success or failure.

In [None]:
import requests
from google.colab import files

# login
login_data = {'user': 'tyiesha.r.b@gmail.com', 'pw': ''}
login_response = requests.post('https://transkribus.eu/TrpServer/rest/auth/login', data=login_data)

# Check if authentication was successful
if login_response.status_code == 200:
    session_id = login_response.cookies.get('JSESSIONID')
    session_params = {'JSESSIONID': session_id}
    print("Login successful!")
else:
    print("Failed to authenticate. Status code:", login_response.status_code)
    print("Response:", login_response.text)
    # Exit the script or handle the error as needed

# Get list of collections
collections_response = requests.get('https://transkribus.eu/TrpServer/rest/collections/list', params=session_params)

# Check if retrieving collections was successful
if collections_response.status_code == 200:
    print("Collections retrieved successfully!")
    print(collections_response.text)
else:
    print("Failed to retrieve collections. Status code:", collections_response.status_code)
    print("Response:", collections_response.text)
    # Exit the script or handle the error as needed

# Define metadata for the document
metadata = {
    'md': {
        'title': 'default',
        'author': 'default',
        'genre': 'default',
        'writer': 'default'
    },
    'pageList': {
        'pages': [
            {
                'fileName': 'default.jpg',
                'pageNr': 1
            }
        ]
    }
}
# Upload the document
upload_response = requests.post('https://transkribus.eu/TrpServer/rest/uploads?collId=285162', params=session_params, json=metadata)

# Check if upload was successful
if upload_response.status_code == 200:
    print("Upload process started successfully!")
    print(upload_response.text)
else:
    print("Failed to start upload process. Status code:", upload_response.status_code)
    print("Response:", upload_response.text)






Authentication successful!
Collections retrieved successfully!
[{"type":"trpCollection","colId":285162,"colName":"tyiesha.r.b@gmail.com Collection","description":"tyiesha.r.b@gmail.com","created":"2024-03-03T20:53:06.139+01:00","crowdsourcing":false,"elearning":false,"pageId":66001375,"url":"https://files.transkribus.eu/Get?fileType=view&id=HSRQNWRELKZQUMHJTHHMCUYE","thumbUrl":"https://files.transkribus.eu/Get?fileType=thumb&id=HSRQNWRELKZQUMHJTHHMCUYE","nrOfDocuments":5,"role":"Owner","accountingStatus":1}]
Upload process started successfully!
<?xml version="1.0" encoding="UTF-8" standalone="yes"?><trpUpload><md><docId>-1</docId><title>default</title><author>default</author><uploadTimestamp>0</uploadTimestamp><genre>default</genre><writer>default</writer><uploaderId>0</uploaderId><nrOfPages>0</nrOfPages><collectionList/></md><pageList><pages><fileName>default.jpg</fileName><pageUploaded>false</pageUploaded><pageNr>1</pageNr></pages></pageList><uploadId>1877199</uploadId><created>2024-

The part extends its functionality by stating that the image file is to be uploaded as part of the document. A PUT request is then initiated to upload the image file to the designated upload ID, which is retrieved from the response of the previous upload request. Authentication is ensured by including the session ID in the request parameters (`session_params`). The `files` parameter is used to upload the image file. The script checks the response status code to confirm the success of the image upload. If successful, it prints a response confirming that the image upload is complete.

In [None]:
# Upload the image
image = {'img': open('default.jpg', 'rb')}
image_upload_response = requests.put('https://transkribus.eu/TrpServer/rest/uploads/1867036', files=image, params=session_params)

# Check if image upload was successful
if image_upload_response.status_code == 200:
    print("Image uploaded successfully!")
    print(image_upload_response.text)
else:
    print("Failed to upload image. Status code:", image_upload_response.status_code)
    print("Response:", image_upload_response.text)

# Transcribe (xml)

A POST request is sent to authenticate the user with the provided login credentials. Upon successful authentication, the session ID is extracted from the response cookies. Subsequently, the script defines the collection ID, document ID, and page number and constructs the URL for accessing the transcript endpoint. A GET request is then sent to this endpoint, with the session ID included in the request cookies for authentication. If the request is successful (status code 200), the transcript is extracted from the response and printed to the screen in XML.

In [53]:
import requests

# Login
login_data = {'user': 'tyiesha.r.b@gmail.com', 'pw': ''}
login_response = requests.post('https://transkribus.eu/TrpServer/rest/auth/login', data=login_data)

if login_response.status_code == 200:
    # Extract the session ID from the response
    session_id = login_response.cookies.get('JSESSIONID')

    # Define the collection ID, document ID, page number
    coll_id = '285162'
    doc_id = '1865455'
    page_number = '1'

    # Define the URL for the transcript endpoint without the transcript ID
    transcript_url = f'https://transkribus.eu/TrpServer/rest/collections/{coll_id}/{doc_id}/{page_number}/text'

    # Send a GET request to the transcript endpoint
    response = requests.get(transcript_url, cookies={'JSESSIONID': session_id})

    # Check if the request was successful
    if response.status_code == 200:
        # Extract the transcript from the response
        transcript = response.text
        print("Transcript:")
        print(transcript)
    else:
        print(f"Failed to retrieve transcript. Status code: {response.status_code}")

else:
    print(f"Authentication failed. Status code: {login_response.status_code}")



Transcript:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PcGts xmlns="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15 http://schema.primaresearch.org/PAGE/gts/pagecontent/2013-07-15/pagecontent.xsd">
    <Metadata>
        <Creator>prov=READ-COOP:name=PyLaia@TranskribusPlatform:version=2.12.0:model_id=53042:lm=none:date=06_03_2024_20:45</Creator>
        <Created>2024-03-06T20:22:45.750+01:00</Created>
        <LastChange>2024-03-06T20:53:17.510+01:00</LastChange>
        <TranskribusMetadata docId="1865455" pageId="66206911" pageNr="1" tsid="150640457" status="NEW" userId="223639" imgUrl="https://files.transkribus.eu/Get?id=OFUWSLEXLXBDIEIJCDJWVCRX&amp;fileType=view" xmlUrl="https://files.transkribus.eu/Get?id=OPIVRFMGJYXVWYXEVAOCTVYI" imageId="52926126"/>
    </Metadata>
    <Page imageFilename="66206911.jpg" imageWidth