-------------------------------------------------------------------------------
### Document Translation (Asynchronous) using SAP Translation Hub
-------------------------------------------------------------------------------



---------------------------------------------------------------------------
What this script does
1) Loads OAuth credentials and API base URL from a .env file.
2) Fetches an access token via client-credentials.
3) Creates a Async job  for document translation
4) Poll for the job status
5) List all the jobs that we may have
6) Fetch the translated results using Results API.
4) Delete a given job at any point of time.
--------------------------------------------------------------------------------------



--------------------------------------------------------------------------------------------------
Environment variables expected in .env
- AUTH_URL          : OAuth token URL from your service key (…/oauth/token)
- CLIENT_ID         : clientid from service key
- CLIENT_SECRET     : clientsecret from service key
- DOCTRANS_BASE     : e.g. https://document-translation.api.<region>.translationhub.cloud.sap
-------------------------------------------------------------------------------------------------


A) Synchronous (for small docs)


Good for short files (character/size limits apply for sync).

Prerequisites

- Python 3.10+

- Libraries given below
    - pip install python-docx
    - pip install python-dotenv requests python-docx

- .env file in the working directory with required values
- input file or text you want to translate

In [1]:
import os, time, requests
import os, requests, mimetypes, base64
from dotenv import load_dotenv
from docx import Document  
from io import BytesIO
from pathlib import Path
from urllib.parse import urlencode
import mimetypes
from PyPDF2 import PdfReader
# loads variables from .env in current directory
load_dotenv()
in_path = Path(r"sample_doc\SAP_BTP_Translation_Hub_DE_long_.pdf")
mime, _ = mimetypes.guess_type(str(in_path))

In [2]:
# Checks if all the environment prequesties are met
def require_env(name: str) -> str:
    v = os.getenv(name)
    if not v:
        raise RuntimeError(f"Missing env: {name}")
    return v

# access from .env in current directory
auth_url     = require_env("AUTH_URL")
client_id     = require_env("CLIENT_ID")
client_secret = require_env("CLIENT_SECRET")
translation_base_url = require_env("DOCTRANS_BASE")

In [44]:
# code  to fetch the access token
def get_token():
    # Prepare the payload and headers
    payload = {
        "grant_type": "client_credentials"
    }
    headers = {
        "Content-Type": "application/x-www-form-urlencoded"
    }

    # Make the POST request to obtain the token
    response = requests.post(auth_url, data=payload, headers=headers, auth=(client_id, client_secret))

    # Check if the request was successful
    if response.status_code == 200:
        access_token = response.json().get("access_token")
        print("Access token obtained successfully.")
    else:
        print(f"Failed to obtain access token: {response.status_code} - {response.text}")
    
    return access_token

In [45]:
# getting the acess token by calling the get token function
access_token=get_token()

Access token obtained successfully.


### 1. Create an Async Job (multipart upload)
Starts a long-running translation by uploading your document and parameters (like sourceLanguage, targetLanguage, model) in one request. The service returns a job ID you’ll use for status checks and result download later. This is the most robust way to initiate async translations because it cleanly carries binary files plus fields.



This creates an asynchronous translation job. The API can be called supplying either using “Content-Type” multipart/form-data (“formData”), the actual content type to translate “RAW”, or enclosing the request in a JSON object.  For this notebook we are going ahead with form-data approach

In [46]:
def create_job(file_path: Path, source="de-DE", target="en-US", model="default") -> str:
    mime, _ = mimetypes.guess_type(str(file_path))
    with file_path.open("rb") as f:
        files = {"file": (file_path.name, f, mime or "application/octet-stream")}
        r = requests.post(
            f"{translation_base_url}/api/v1/translation/jobs",
            headers={
                "Authorization": f"Bearer {access_token}",
                "Accept": "application/json",   
            },
            data={
                "sourceLanguage": source,
                "targetLanguage": target,   
                "model": model,
            },
            files=files,   # <-- multipart; requests sets Content-Type
            timeout=120,
        )
    r.raise_for_status()
    job_id = r.json().get("id") or r.json().get("jobId")
    if not job_id:
        raise RuntimeError(f"Job ID not found in response: {r.text}")
    return job_id


In [47]:
# 1) create job (multipart)
job_id = create_job(in_path, source="de-DE", target="en-US", model="default")
print("Created job:", job_id)

Created job: a1cdab55-0a2d-4a18-bfe2-3a1aab725420


### 2.Get Job Status — polling until complete

Retrieves the latest state for your job (e.g., pending/in-progress/completed/failed). You’ll call this repeatedly until it’s ready, then move on to download the result.


Status of jobs
- PENDING
- ERROR
- DONE (Controls whether requests for the status will redirect to the result when the file translation has completed. When “enableRedirect” is omitted or set to true a status request for a job which has the status “DONE” will lead to a HTTP 303 response with the Location header containing the URL of the result (/api/v1/translation/jobs/{jobid}/result). When “enableRedirect” is present and set to false a status request for a job which has the status “DONE” will lead to a HTTP 200 response without a Location header.)in our case we have set it to false


In [48]:
def wait_until_done(job_id: str) -> dict:

    params = {
    "enableRedirect": False,
}
    qs = "?" + urlencode(params)
    url = translation_base_url.rstrip("/") + "/api/v1/translation/jobs/"+job_id+qs

    r = requests.get(
        url,
        headers={"Authorization": f"Bearer {access_token}"},
        timeout=30,
    )
    r.raise_for_status()
    info = r.json()
    return info


In [49]:
info = wait_until_done(job_id)
print("Final status:", info)


Final status: {'id': 'a1cdab55-0a2d-4a18-bfe2-3a1aab725420', 'progress': 0.0, 'status': 'PENDING', 'sourceFilename': 'SAP_BTP_Translation_Hub_DE_long_.pdf', 'sourceLanguage': 'de-DE', 'targetLanguage': 'en-US', 'sourceSize': 3891374, 'uploadedOn': '2025-09-15T12:13:31.131391300Z', 'model': 'default'}


In [50]:
info = wait_until_done(job_id)
print("Final status:", info)


Final status: {'id': 'a1cdab55-0a2d-4a18-bfe2-3a1aab725420', 'progress': 1.0, 'status': 'DONE', 'sourceFilename': 'SAP_BTP_Translation_Hub_DE_long_.pdf', 'sourceLanguage': 'de-DE', 'targetLanguage': 'en-US', 'sourceSize': 3891374, 'uploadedOn': '2025-09-15T12:13:31.131391300Z', 'model': 'default'}


### 3. List Your Jobs
Returns a lists all the translation jobs created by the current user known to the system.


To use the user-specific APIs of SAP BTP Document Translation (like listing translations or checking job history), you must authenticate with a user-specific JWT.
These JWTs can only be obtained via OAuth 2.0 Authorization Code or Password Credentials flow — other token types won’t work.

Check for more details https://help.sap.com/docs/translation-hub/sap-translation-hub/user-specific-json-web-tokens

In [None]:
def list_jobs(top=20, skip=0, include_count=True) -> dict:
    params = {"$top": top, "$skip": skip}
    if include_count:
        params["$count"] = "true"
    r = requests.get(
        f"{translation_base_url}/api/v1/translation/jobs",
        headers={"Authorization": f"Bearer {access_token}", "Accept": "application/json"},
        params=params,
        timeout=60,
    )
    r.raise_for_status()
    return r.json() 


### 4. Download the Result
Retrieves the translated file for a completed job.

In [56]:
def download_result_as_text(job_id):
    
    url = f"{translation_base_url}/api/v1/translation/jobs/{job_id}/result"
    headers = {
        "Authorization": f"Bearer {access_token}",
        "Accept": "application/json"   
    }
    
    response = requests.get(url, headers=headers)

    if response.status_code == 200:
        result = response.json()
        return result
    else:
        raise Exception(f"Error: {response.status_code}, {response.text}")

In [57]:
out_path=download_result_as_text(job_id)
print(out_path)

{'contentType': 'application/vnd.openxmlformats-officedocument.wordprocessingml.document', 'encoding': 'base64', 'sourceLanguage': 'de-DE', 'targetLanguage': 'en-US', 'model': 'default', 'data': 'UEsDBBQACAgIADpiL1sAAAAAAAAAAAAAAAATAAAAW0NvbnRlbnRfVHlwZXNdLnhtbLWUy27CMBBF93xF5C1KDF1UVUVg0ceyZUE/wNgTsOqXPA6Fv++EQCpVFGihm0jJzL3nzjjJaLK2JltBRO1dyYbFgGXgpFfaLUr2NnvO71iGSTgljHdQsg0gm4x7o9kmAGYkdliyZUrhnnOUS7ACCx/AUaXy0YpEt3HBg5DvYgH8ZjC45dK7BC7lqfFg49EjVKI2KXta0+M2SASDLHtoGxtWyUQIRkuRqM5XTn2j5DtCQcptDy51wD41MH6Q0FR+Bux0r7SZqBVkUxHTi7DUxT98VFx5WVtSFsdtDuT0VaUldPrGLUQvAZFWbk3RVazQrn8qB6aNAbx+itb3GJ600+gDciJdzIfmXBSonEIEiEkfZ7ejQ0oU9j+G3zmfjJDofYf2Orw4xtbmJLIiwEzMDVx/7M76rGOXPv4hw/4rbdRnHvYXscbk7cVztza/edNcbecQaUnX33lnvQ/Btz/Wce8TUEsHCBWDooZhAQAAmQUAAFBLAwQUAAgICAA6Yi9bAAAAAAAAAAAAAAAACwAAAF9yZWxzLy5yZWxzrZLPTsMwDIfve4rI99XdQAihpbsgpN0QKg9gJe4f0SZR4sH29gQEgkqj7LBjnJ+/fLay2R7GQb1yTL13GlZFCYqd8bZ3rYbn+mF5CyoJOUuDd6zhyAm21WLzxANJ7kldH5LKEJc0dCLhDjGZjkdKhQ/s8k3j40iSj7HFQOaFWsZ1Wd5g/M2AasJUO6sh7uwK

In [58]:
blob = base64.b64decode(out_path.get("data"))
doc = Document(BytesIO(blob))
print("\n".join(p.text for p in doc.paragraphs))

SAP BTP Translation Hub – A Comprehensive Overview Introduction
SAP Business Technology Platform (BTP) Translation Hub is a cloud service  from SAP that helps companies translate content in many languages quickly, consistently, and cost-effectively.
In a world where companies are increasingly working globally, it is becoming increasingly important that content is not only available in one language. Employees, partners, and customers expect information to be available in their native language. This is exactly where the Translation Hub starts.


Central Functions
Classic Machine Translation
The service uses modern machine translation technologies. This means that texts are automatically transferred from one language to another. Numerous language pairs are available – from English to German to Chinese – Spanish.
Domain-specific glossaries
Companies often have their own technical terms, product names, or abbreviations. To ensure that these are not distorted during translation, Translation 

### 5. Delete Job

Removes a job once you have downloaded and archived it in your system.

Note : When using User-Specific JSON Web Tokens only jobs created by the same user can be deleted. When not using user-specific JSON web tokens only jobs created without the using user-specific JSON web tokens can be deleted.

In [59]:
def delete_job(job_id: str) -> None:
    r = requests.delete(
        f"{translation_base_url}/api/v1/translation/jobs/{job_id}",
        headers={"Authorization": f"Bearer {access_token}"},
        timeout=30,
    )
    r.raise_for_status()

In [60]:
delete_job(job_id)
print("Deleted job:", job_id)

Deleted job: a1cdab55-0a2d-4a18-bfe2-3a1aab725420
