<a href="https://colab.research.google.com/github/diyasini13/translationllm/blob/main/%5BSharing%5D_ContentTranslation_Translate_API.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### **Content Translation**
> This notebook will focus on doing content translation using Google's Translation API. In order to do content translation using large langugae model, please refer to https://colab.sandbox.google.com/drive/10AWZyocjY5b7L6qaho46lNpRCzFgmnbm

Cloud Translation - Advanced provides a Document Translation API (Advanced) for directly translating formatted documents such as PDF and DOCX. Compared to plain text translations, Document Translation preserves the original formatting and layout in your translated documents, helping you retain much of the original context like paragraph breaks, images, etc..

For more details please refer to https://cloud.google.com/translate/docs/advanced/translate-documents#translate_v3beta1_translate_document-python

> Prerequisites
  - Enable Cloud Translation API for your project
  - Assign following permissions to the user
   - Cloud Translation API Admin (To access Cloud Translation API)
   - Storage Object Admin (To manage buckets)

In [None]:
# @title Set up environment
# https://cloud.google.com/translate/docs/setup
!pip install google-cloud-translate==2.0.1
!pip install --upgrade google-cloud-translate
!pip install --upgrade google-cloud-storage

**Colab only**: Uncomment the following cell to restart the kernel or use the button to restart the kernel. For Vertex AI Workbench you can restart the terminal using the button on top.

In [None]:
# @title Restart runtime
# Automatically restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

In [None]:
# @title Set up authentication
from google.colab import auth as google_auth
PROJECT_ID = "gdc-ai-playground" # @param {type:"string")
# google_auth.authenticate_user(project_id=PROJECT_ID)

!gcloud auth application-default login
!gcloud auth application-default set-quota-project $PROJECT_ID
# !gcloud auth application-default revoke

In [None]:
# @title Storage bucket utility function

from google.cloud import storage

def create_bucket_class_location(bucket_name):
    """
    Create a new bucket in the US region with the coldline storage
    class
    """
    global PROJECT_ID
    storage_client = storage.Client(project=PROJECT_ID)
    bucket = storage_client.bucket(bucket_name)

    if bucket.exists():
        print(f"Bucket {bucket_name} already exists. Please use different name or if this has been created by you please ignore")
        return

    new_bucket = storage_client.create_bucket(bucket, location="us")

    print(
        "Created bucket {} in {} with storage class {}".format(
            new_bucket.name, new_bucket.location, new_bucket.storage_class
        )
    )
    return new_bucket

def upload_blob(bucket_name, source_file_name, destination_blob_name):
    """Uploads a file to the bucket."""
    global PROJECT_ID
    storage_client = storage.Client(project=PROJECT_ID)
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    blob.upload_from_filename(source_file_name)


In [None]:
# @title Create Storage bucket to store documents
bucket_name = "translate-api-docs2" # @param {type:"string")
full_bucket_name = f"gs://{bucket_name}"

create_bucket_class_location(bucket_name)
# !gcloud storage buckets create gs://translate-api-docs

Bucket translate-api-docs2 already exists. Please use different name or if this has been created by you please ignore


In [None]:
# @title Download source files to bucket
file_url = "https://www.colorado.edu/amath/sites/default/files/attached-files/ch12_0.pdf" # @param {type:"string"}
file_parts =  file_url.split("/")
if len(file_parts) > 0:
  file_name = file_parts[-1]
else:
  raise Exception("File URL not valid")

!wget $file_url -O $file_name

# !wget https://www.colorado.edu/amath/sites/default/files/attached-files/ch12_0.pdf -O ch12_0.pdf
# !gsutil cp ch12_0.pdf gs://$bucket_name/ch12_0.pdf
upload_blob(bucket_name, file_name, file_name)

# !wget https://abc.xyz/assets/20/ef/844a05b84b6f9dbf2c3592e7d9c7/2023q2-alphabet-earnings-release.pdf -O 2023q2-alphabet-earnings-release.pdf
# !gsutil cp 2023q2-alphabet-earnings-release.pdf gs://$bucket_name/2023q2-alphabet-earnings-release.pdf


In [None]:
# @title Translate API Utility function
# Translate text: https://cloud.google.com/translate/docs/advanced/translate-text-advance
# Translate Document: https://cloud.google.com/translate/docs/advanced/translate-documents#translate_a_document_from
# Supported languages: https://cloud.google.com/translate/docs/languages
# IAM roles: https://cloud.google.com/translate/docs/access-control


# Imports the Google Cloud Translation library
from google.cloud import translate

# Initialize Translation client
def translate_text(
    text: str
    , project_id: str
    , source_language_code = "en-US"
    , target_language_code = "fr"
) -> translate.TranslationServiceClient:
    """Translating Text."""

    client = translate.TranslationServiceClient()

    location = "global"

    parent = f"projects/{project_id}/locations/{location}"

    # Translate text from English to French
    # Detail on supported types can be found here:
    # https://cloud.google.com/translate/docs/supported-formats
    response = client.translate_text(
        request={
            "parent": parent,
            "contents": [text],
            "mime_type": "text/plain",  # mime types: text/plain, text/html
            "source_language_code": source_language_code,
            "target_language_code": target_language_code,
        }
    )

    # Display the translation for each input text provided
    for translation in response.translations:
        print(f"Translated text: {translation.translated_text}")

    return response

# Initialize Translation client
def translate_document(
    gcs_src_file_path: str
    , gcs_dest_file_prefix: str
    , project_id: str
    , source_language_code = "en-US"
    , target_language_code = "fr"
) -> translate.TranslationServiceClient:
    """Translating Text."""

    client = translate.TranslationServiceClient()

    location = "global"

    parent = f"projects/{project_id}/locations/{location}"

    response = client.translate_document(
        request={
            "parent": parent,
            "source_language_code": source_language_code,
            "target_language_code": target_language_code,
            "document_input_config": {
              "gcs_source": {
                "input_uri": gcs_src_file_path
              }
            },
            "document_output_config": {
              "gcs_destination": {
                "output_uri_prefix": gcs_dest_file_prefix
              }
            },
            "is_translate_native_pdf_only": True
        }
    )

    # Display the translation for each input text provided
    # for translation in response.translations:
    #     print(f"Translated text: {translation.translated_text}")

    # return response

In [None]:
# @title Translate text English --> Hindi
dummy_text = "Create a shorter version of a document that incorporates pertinent information from the original text. For example, you might want to summarize a chapter from a textbook"
translate_text(text = dummy_text
              ,project_id=PROJECT_ID
               ,source_language_code="en-us"
               ,target_language_code="hi")

In [None]:
# @title Translate text English --> French
dummy_text = "Create a shorter version of a document that incorporates pertinent information from the original text. For example, you might want to summarize a chapter from a textbook"
translate_text(text = dummy_text
              ,project_id=PROJECT_ID
               ,source_language_code="en-us"
               ,target_language_code="fr")

In [None]:
# @title Translate text English --> Chinese
dummy_text = "Create a shorter version of a document that incorporates pertinent information from the original text. For example, you might want to summarize a chapter from a textbook"
translate_text(text = dummy_text
              ,project_id=PROJECT_ID
               ,source_language_code="en-us"
               ,target_language_code="zh")

In [None]:
# @title Translate document English --> French
source_language_code="en-us"
target_language_code = "fr"
translate_document(gcs_src_file_path = f"gs://{bucket_name}/ch12_0.pdf"
               , gcs_dest_file_prefix = f"gs://{bucket_name}/dest_{target_language_code}/"
               , project_id=PROJECT_ID
               , source_language_code=source_language_code
               , target_language_code=target_language_code)

In [None]:
# @title Translate document English --> Hindi
source_language_code="en-us"
target_language_code = "hi"
translate_document(gcs_src_file_path = "gs://{bucket_name}/2023q2-alphabet-earnings-release.pdf"
               , gcs_dest_file_prefix = f"gs://{bucket_name}/dest_{target_language_code}/"
               , project_id=PROJECT_ID
               , source_language_code=source_language_code
               , target_language_code=target_language_code)