---------------------------------------------------------------------------
---------------------------------------------------------------------------
### Document Translation (Synchronous) using SAP Translation Hub
---------------------------------------------------------------------------
---------------------------------------------------------------------------


---------------------------------------------------------------------------
What this script does
1) Loads OAuth credentials and API base URL from a .env file.
2) Fetches an access token via client-credentials.
3) Sends a document to translate using one of three request styles:
   - JSON Body Content → JSON response (base64 file inside JSON)

   - FORM-DATA → RAW bytes (translated file as binary stream)

   - RAW upload (no multipart) → JSON response (base64 file inside JSON)

4) Reads the translated DOCX in memory and prints its text
---------------------------------------------------------------------------



---------------------------------------------------------------------------
Environment variables expected in .env
- AUTH_URL          : OAuth token URL from your service key (…/oauth/token)
- CLIENT_ID         : clientid from service key
- CLIENT_SECRET     : clientsecret from service key
- DOCTRANS_BASE     : e.g. https://document-translation.api.<region>.translationhub.cloud.sap
--------------------------------------------------------------------------


A) Synchronous (for small docs)


Good for short files (character/size limits apply for sync).

Prerequisites

- Python 3.10+

- Libraries given below
    - pip install python-docx
    - pip install python-dotenv requests python-docx

- .env file in the working directory with required values
- input file or text you want to translate

In [8]:
import os, time, requests
import os, requests, mimetypes, base64
from dotenv import load_dotenv
from docx import Document  
from io import BytesIO
from pathlib import Path
from urllib.parse import urlencode
import mimetypes
from PyPDF2 import PdfReader
# loads variables from .env in current directory
load_dotenv()
# Use Path.joinpath to ensure compatibility across Unix and Windows systems
in_path = Path("sample_doc").joinpath("SAP_BTP_Translation_Hub_DE.pdf")
mime, _ = mimetypes.guess_type(str(in_path))

In [9]:
# Checks if all the environment prequesties are met
def require_env(name: str) -> str:
    v = os.getenv(name)
    if not v:
        raise RuntimeError(f"Missing env: {name}")
    return v


# access from .env in current directory
auth_url     = require_env("AUTH_URL")
client_id     = require_env("CLIENT_ID")
client_secret = require_env("CLIENT_SECRET")
translation_base_url = require_env("DOCTRANS_BASE")

In [10]:
# code  to fetch the access token
def get_token():
    # Prepare the payload and headers
    payload = {
        "grant_type": "client_credentials"
    }
    headers = {
        "Content-Type": "application/x-www-form-urlencoded"
    }

    # Make the POST request to obtain the token
    response = requests.post(auth_url, data=payload, headers=headers, auth=(client_id, client_secret))

    # Check if the request was successful
    if response.status_code == 200:
        access_token = response.json().get("access_token")
        print("Access token obtained successfully.")
    else:
        print(f"Failed to obtain access token: {response.status_code} - {response.text}")
    
    return access_token

In [11]:
# getting the acess token by calling the get token function
access_token=get_token()

Access token obtained successfully.


### Option 1) JSON Body Content → JSON response

When to use:

- You prefer structured responses (status + metadata + data field with the file).

- You’re building UI/backend code that already handles base64 decoding.

- Easiest to debug/log because you can inspect JSON.

Important header rule: Do not set Content-Type: multipart/form-data yourself.
requests sets it (with proper boundary) when you pass files=....

In [12]:
reader = PdfReader(in_path)
text = "\n".join(page.extract_text() or "" for page in reader.pages)
print(text)

SAP BTP Translation Hub  ist ein Cloud -Service, der Unternehmen dabei unterstützt, 
Inhalte schnell und konsistent zu übersetzen.  
Er verbindet klassische maschinelle Übersetzung mit domänenspezifischen Glossaren 
und Qualitätsprüfungen.  
Über APIs können Apps und Workflows nahtlos angebunden werden.  
Layouts und Dateiformate bleiben bei Dokumentübersetzungen weitgehend erhalten.  
Für kurze Texte  steht zudem eine LLM-gestützte Übersetzung  bereit, wenn 
natürlichere Formulierungen gefragt sind.  
Entwickler greifen per OAuth-gesicherte REST -Schnittstellen  auf den Service zu.  
Sprachen, Domänen und Provider lassen sich je Use -Case steuern.  
So skaliert Übersetzung vom Prototyp  bis zur Unternehmensweite  
 


In [13]:
headers = {
    "Authorization": f"Bearer {access_token}",
     "Accept": "application/json"
    }

body = {
    "sourceLanguage": "de-DE",
    "targetLanguage": "en-US",
    "model": "default",
    "data":f"{text}"
}
resp = requests.post(
    f"{translation_base_url}/api/v1/translation", json=body,
    headers=headers,
    timeout=120,
)
print(resp.json())


{'contentType': 'text/html', 'encoding': 'plain', 'sourceLanguage': 'de-DE', 'targetLanguage': 'en-US', 'model': 'default', 'data': 'SAP BTP Translation Hub is a cloud service that helps companies \nto translate content quickly and consistently.  \nIt combines classic machine translation with domain-specific glossaries \nand quality inspections.  \nApps and workflows can be connected seamlessly using APIs.  \nLayouts and file formats are largely retained for document translations.  \nFor short texts, an LLM-supported translation is also available if: \nmore natural formulations.  \nDevelopers access the service using OAuth secured REST interfaces.  \nLanguages, domains, and providers can be controlled for each use case.  \nHow translation scales from prototype to enterprise-wide  \n '}


In [14]:
print(resp.json().get("data"))

SAP BTP Translation Hub is a cloud service that helps companies 
to translate content quickly and consistently.  
It combines classic machine translation with domain-specific glossaries 
and quality inspections.  
Apps and workflows can be connected seamlessly using APIs.  
Layouts and file formats are largely retained for document translations.  
For short texts, an LLM-supported translation is also available if: 
more natural formulations.  
Developers access the service using OAuth secured REST interfaces.  
Languages, domains, and providers can be controlled for each use case.  
How translation scales from prototype to enterprise-wide  
 


### Option 2) FORM-DATA upload → RAW bytes

When to use:

- You want the translated file directly as the HTTP body (no JSON wrapping).
- Ideal for streaming to disk or sending the bytes onwards without decoding.

In [15]:
headers = {
    "Authorization": f"Bearer {access_token}",
    #"Content-Type": "multipart/form-data",
     "Accept": "application/octet-stream"#/json"
    }

params = {
    "sourceLanguage": "de-DE",
    "targetLanguage": "en-US",
    "model": "default",
}


f = open(in_path, "rb") 

try:
    file = {"file": (in_path.name, f, mime)}
    resp = requests.post(
        f"{translation_base_url}/api/v1/translation", data=params,
        headers=headers,
        files=file,
        timeout=120,
    )
    resp.raise_for_status()
finally:
    f.close()

doc = Document(BytesIO(resp._content))
print("\n".join(p.text for p in doc.paragraphs))


SAP BTP Translation Hub  is a cloud service that helps companies translate content quickly and consistently.
It combines classic machine translation with domain-specific glossaries and quality checks.
Apps and workflows can be connected seamlessly using APIs.
Layouts and file formats are largely retained for document translations.
For short texts, an LLM-supported translation  is also available if more natural formulations are required.
Developers access the service using OAuth secured REST interfaces. Languages, domains, and providers can be controlled for each use case.
How translation scales from prototype  to enterprise-wide


### Option 3) RAW upload (no multipart) → JSON response

When to use:
- Your client can’t easily build multipart bodies.
- You want to POST the file bytes directly with an accurate Content-Type.
- You still prefer JSON back (base64 data).

In [16]:
headers = {
    "Authorization": f"Bearer {access_token}",
    "Content-Type": "application/pdf",
     "Accept": "application/json"
    }

params = {
    "sourceLanguage": "de-DE",
    "targetLanguage": "en-US",
    "model": "default",
}

qs = "?" + urlencode(params)
url = translation_base_url.rstrip("/") + "/api/v1/translation" + qs

f = open(in_path, "rb") 

try:
    
    resp = requests.post(
        url,
        headers=headers,
        data = f.read(),
        timeout=120,
    )
    resp.raise_for_status()
finally:
    f.close()

blob = base64.b64decode(resp.json().get("data"))
doc = Document(BytesIO(blob))
print("\n".join(p.text for p in doc.paragraphs))


SAP BTP Translation Hub  is a cloud service that helps companies translate content quickly and consistently.
It combines classic machine translation with domain-specific glossaries and quality checks.
Apps and workflows can be connected seamlessly using APIs.
Layouts and file formats are largely retained for document translations.
For short texts, an LLM-supported translation  is also available if more natural formulations are required.
Developers access the service using OAuth secured REST interfaces. Languages, domains, and providers can be controlled for each use case.
How translation scales from prototype  to enterprise-wide


###  Summary

We have achieved the following with above notebook
1) Authroization to translationhub service.
2) Document translatiion from german to english language with three different ways
   - JSON Body Content 
   - FORM-DATA 
   - RAW upload