# Thai to English Language Translation with Vertex AI Gemini 1.5 Pro

In this notebook, we'll utilize the Vertex AI Gemini 1.5 Pro model for translating Thai text to English. The Gemini 1.5 Pro is a powerful language model capable of understanding and generating content across various modalities, including text, audio, video, and images. Our focus here will be on text translation.

## Getting Started

**Warning**, be aware of the rate limits which may restrict the number of requests per minute (RPM), making it challenging to label more than five rows at once. Refer to [Vertex AI quotas](https://cloud.google.com/vertex-ai/generative-ai/docs/quotas) for more details. Be sure to extend the quota rate limits. Your limits increase request should be approved in a moment. Ensure the environment is set up properly as per the [Vertex AI setup guide](https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal#set-up-your-environment).

### Install Vertex AI SDK for Python
To use Vertex AI, we need to install the Vertex AI SDK for Python.

In [None]:
!pip3 install --upgrade --user --quiet google-cloud-aiplatform

## Set Google Cloud Project Information and Initialize Vertex AI SDK

In [1]:
import vertexai
from vertexai import generative_models

In [2]:
project_id = "translation-428305"
vertexai.init(project=project_id, location='us-central1')

## Load the Gemini 1.5 pro Model

In [3]:
model = generative_models.GenerativeModel(model_name="gemini-1.5-pro")

## Define Safety Settings
For language translation tasks, the source text might contain words considered harmful or inappropriate by Gemini, which can cause errors if not handled. Therefore, we will disable safety blocks.

In [4]:
safety_config = [
        generative_models.SafetySetting(
            category=generative_models.HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT,
            threshold=generative_models.HarmBlockThreshold.BLOCK_NONE,
        ),
        generative_models.SafetySetting(
            category=generative_models.HarmCategory.HARM_CATEGORY_HARASSMENT,
            threshold=generative_models.HarmBlockThreshold.BLOCK_NONE,
        ),
        generative_models.SafetySetting(
            category=generative_models.HarmCategory.HARM_CATEGORY_HATE_SPEECH,
            threshold=generative_models.HarmBlockThreshold.BLOCK_NONE,
        ),
        generative_models.SafetySetting(
            category=generative_models.HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT,
            threshold=generative_models.HarmBlockThreshold.BLOCK_NONE,
        ),
    ]

## Translation Function
Define a function to translate text from Thai to English. Consider using a few-shot prompting style, referenced from [Google's Gemini API prompting strategie](https://ai.google.dev/gemini-api/docs/prompting-strategies), to ensure the model follows instructions straightforwardly, avoiding the need for tedious post-processing.

In [None]:
from tqdm import tqdm
import pandas as pd

def translate_column(df, column_name, model):
    translated_texts = []
    successful_translation = 0

    example_prompt = """
    Please translate the following Thai YouTube comments from Thai to English and only return the translated text:
    [Thai Text]
    ก็สมควรติดแหละ เหมือนไม่ได้กลัวเลยแมสไม่ใส่ไม่เว้นระยะ
    [English Text]
    It's no wonder they got infected. It's like they weren't scared at all, not wearing a mask and not social distancing.

    [Thai Text]
    {text}
    [English Text]
    """
    
    for text in tqdm(df[column_name], desc="Translating"):
        if translated_texts and successful_translation % 100 == 0:
            print(f"Successful translation: {successful_translation}")
        try:
            prompt = example_prompt.format(text=text)
            translated_texts.append(model.generate_content(
                prompt, safety_settings=safety_config).text.strip())
            successful_translation += 1
        except Exception as e:
            raise e
    print(f"Successful translation: {successful_translation}")
    return pd.Series(translated_texts, index=df.index)

## Load Dataset
Load the datasets containing Thai comments.

In [6]:
import pandas as pd

In [7]:
thai_train_df = pd.read_csv('mask_train.csv')
thai_test_df = pd.read_csv('mask_test.csv')
thai_train_df.shape, thai_test_df.shape

((2129, 7), (231, 8))

## Translate Dataframe
Translate the 'comment_text' column in each datasets.

In [9]:
thai_test_df['english_comment_text'] = translate_column(thai_test_df.copy(deep=True), "comment_text", model)

Translating:  43%|████▎     | 100/231 [03:09<03:53,  1.78s/it]

Successful translation: 100


Translating:  87%|████████▋ | 200/231 [06:12<00:53,  1.71s/it]

Successful translation: 200


Translating: 100%|██████████| 231/231 [07:10<00:00,  1.86s/it]

Successful translation: 231





In [10]:
thai_test_df[['comment_text', 'english_comment_text']]

Unnamed: 0,comment_text,english_comment_text
0,หลังสงกราณ์ถอดหน้ากากอนามัยได้หรอ แทบไม่น่าเชื...,Can we take off our masks after Songkran? I ca...
1,ตำรวจต้องกวดขันและจับด้วย. เมื่อวานนี้ผมเดินไป...,The police need to be stricter and arrest them...
2,บอกต่อกันหน่อยจร้า ถ้ามีคนเอาหน้ากากอนามัยมาแ...,"Tell each other, if someone offers you free ma..."
3,ก็สมควรติดแหละ เหมือนไม่ได้กลัวเลยแมสไม่ใส่ไม่...,They deserve to get infected. They don't seem ...
4,แบบนี้ต้องถอดแมสออกให้ลุงเกเร แล้ว​เอาผ้าอนามั...,This person should take off their mask and giv...
...,...,...
226,รัฐบาล​โง่ๆ​ ก็แค่ปิดประเทศ​ ไม่ให้คนนอกเข้า​ ...,"The stupid government just closes the country,..."
227,วันนึงคนเป็นเกือบแสน ถ้าไม่ใส่แมสจะเท่าไหร่,"Almost a hundred thousand cases a day, imagine..."
228,ใส่แมสตลอดตอนออกจากบ้าน,I always wear a mask when I leave the house.
229,หน้ากากอนามัยผมใส่2ชั่นว่าจะใส่3ชั่นกว่าห้ายใจ...,"I wear two masks. I was going to wear three, b..."


In [11]:
thai_test_df.to_csv('mask_test_translated.csv', index=False)

In [12]:
thai_train_df['english_comment_text'] = translate_column(thai_train_df.copy(deep=True), "comment_text", model)

Translating:   5%|▍         | 100/2129 [03:00<1:09:21,  2.05s/it]

Successful translation: 100


Translating:   9%|▉         | 200/2129 [06:04<56:45,  1.77s/it]  

Successful translation: 200


Translating:  14%|█▍        | 300/2129 [08:55<41:26,  1.36s/it]  

Successful translation: 300


Translating:  19%|█▉        | 400/2129 [11:45<49:15,  1.71s/it]  

Successful translation: 400


Translating:  23%|██▎       | 500/2129 [14:41<41:26,  1.53s/it]  

Successful translation: 500


Translating:  28%|██▊       | 600/2129 [18:07<58:32,  2.30s/it]  

Successful translation: 600


Translating:  33%|███▎      | 700/2129 [21:28<46:57,  1.97s/it]  

Successful translation: 700


Translating:  38%|███▊      | 800/2129 [24:35<45:24,  2.05s/it]

Successful translation: 800


Translating:  42%|████▏     | 900/2129 [27:48<31:32,  1.54s/it]

Successful translation: 900


Translating:  47%|████▋     | 1000/2129 [31:00<37:25,  1.99s/it]

Successful translation: 1000


Translating:  52%|█████▏    | 1100/2129 [34:17<33:08,  1.93s/it]

Successful translation: 1100


Translating:  56%|█████▋    | 1200/2129 [37:14<28:12,  1.82s/it]

Successful translation: 1200


Translating:  61%|██████    | 1300/2129 [40:33<26:21,  1.91s/it]

Successful translation: 1300


Translating:  66%|██████▌   | 1400/2129 [43:34<22:39,  1.87s/it]

Successful translation: 1400


Translating:  70%|███████   | 1500/2129 [46:44<18:05,  1.73s/it]

Successful translation: 1500


Translating:  75%|███████▌  | 1600/2129 [50:04<20:32,  2.33s/it]

Successful translation: 1600


Translating:  80%|███████▉  | 1700/2129 [53:09<12:43,  1.78s/it]

Successful translation: 1700


Translating:  85%|████████▍ | 1800/2129 [56:21<10:35,  1.93s/it]

Successful translation: 1800


Translating:  89%|████████▉ | 1900/2129 [59:25<05:37,  1.47s/it]

Successful translation: 1900


Translating:  94%|█████████▍| 2000/2129 [1:02:25<04:01,  1.87s/it]

Successful translation: 2000


Translating:  99%|█████████▊| 2100/2129 [1:05:17<00:42,  1.47s/it]

Successful translation: 2100


Translating: 100%|██████████| 2129/2129 [1:06:11<00:00,  1.87s/it]

Successful translation: 2129





In [13]:
thai_train_df[['comment_text', 'english_comment_text']]

Unnamed: 0,comment_text,english_comment_text
0,มันเป็นคำแก้ตัว หน้ากากอนามัยก็ไม่ใส่ยังจะมาทำ...,It's an excuse. You don't even wear a mask and...
1,เห็นใจทั้ง 2 ฝ่าย แต่ฝั่งผู้หญิงควรควบคุมอารมณ...,"I sympathize with both sides, but the woman sh..."
2,แค่ไม่เอาหน้ากาก ไม่ใส่หน้า ถึงกับจะไล่ออกจากป...,"Just because they didn't put on a mask, didn't..."
3,ขนาดใส่แมสยังติดถ้าถอดออกมาแล้ว......ช่างคิดดี...,"Even with a mask on, you can still get infecte..."
4,ตอนนี้ไม่ถอดหน้ากากอนามัยเด็ดขาดเพราะไม่มั่นใจ,I will definitely not take off my mask now bec...
...,...,...
2124,ถอดแมสเดี๋ยว​ก็ติดนุบนับหลอก​\n\nทำไรกานจ๊ะ,"Take your mask off and you'll get infected, do..."
2125,ฉีด 5 เข็มก็ยังติด ใส่แมสและสเปรย์แอลกอฮอลช่วย...,"Even with 5 doses, you can still get infected...."
2126,เสรีภาพไม่ฉีดวัคซีนก็ได้ แต่คนไม่ฉีดวัคซีนมั่น...,"You have the freedom not to get vaccinated, bu..."
2127,ไม่ฉีดก็ไม่ต้องใส่แมสจะได้เสรี,"If you don't get vaccinated, you shouldn't hav..."


In [14]:
thai_train_df.to_csv('mask_train_translated.csv', index=False)