<h1> Title: Translation</h1>

<strong>Overview: In this notebook, I aim to perform translation</strong><br>
In this notebook, it covers:<br>
1.0 Context Translation<br>
2.0 Word-by-word Translation<br>
3.0 API Translation

In [None]:
import re
import pandas as pd
import numpy as np
from langid import classify
from langdetect import detect
from deep_translator import GoogleTranslator
from openai import OpenAI

In [38]:
df = pd.read_csv('Datasets/comments.csv',nrows=20)
df

Unnamed: 0,kind,commentId,channelId,videoId,authorId,textOriginal,parentCommentId,likeCount,publishedAt,updatedAt
0,youtube#comment,2895557,15366,0,2425288,"The, uh, *shape* of the containers is somethin...",,4,2022-09-23 19:12:24+00:00,2022-09-23 19:12:24+00:00
1,youtube#comment,101047,29145,0,3378074,And with perfect people like you √¢¬ù¬§√Ø¬∏¬è√∞≈∏≈í¬∏,2214515.0,1,2021-11-11 03:33:45+00:00,2021-11-11 03:33:45+00:00
2,youtube#comment,2555,30692,0,3456989,Please don't call me sirüòÖüòÖ best part is you re...,2452518.0,1,2020-02-12 15:27:17+00:00,2020-02-12 15:27:17+00:00
3,youtube#comment,1822478,15366,0,3390312,Lol All you need is to put your hair in two po...,,33,2022-09-23 19:26:44+00:00,2022-09-23 19:26:44+00:00
4,youtube#comment,2539,30692,0,259614,"It's ""for the record"" by Ooyy",1275651.0,0,2020-02-13 16:16:00+00:00,2020-02-13 16:16:00+00:00
5,youtube#comment,2997236,4942,0,1872074,All that to look just above average teanage boy,,0,2024-03-22 14:25:29+00:00,2024-03-22 14:25:29+00:00
6,youtube#comment,2515,30692,0,780506,Alright,1558784.0,0,2020-02-12 13:10:35+00:00,2020-02-12 13:10:35+00:00
7,youtube#comment,4662277,15366,0,2916157,Please show us what these are like in the sun.,,0,2022-09-25 23:36:49+00:00,2022-09-25 23:36:49+00:00
8,youtube#comment,96146,51730,0,2607618,"This is a great point, thank you!",2309738.0,0,2021-10-17 04:26:39+00:00,2021-10-17 04:26:39+00:00
9,youtube#comment,4275012,15366,0,2230360,Pretty brave to try it anyway after dreaming t...,,1,2022-09-23 20:27:08+00:00,2022-09-23 20:27:08+00:00


## 1.0 Context Translation

Strength.
- Preserves overall meaning and flow

Weakness:
- Fails with code-mixed input, especially non-dominant parts

In [44]:
def context_translate(text):
    text = text.strip()
    try:
        lang, _ = classify(text)
    except Exception:
        lang = "unknown"

    if lang != "en" and lang != "unknown":
        try:
            translated = GoogleTranslator(source=lang, target="en").translate(text)
            return translated, lang
        except Exception as e:
            print(f"‚ö†Ô∏è Translation failed: {e}", lang)
            return text, lang
    return text, lang

In [45]:
print("=== Context Translation ===")
context_results = []
for text in df['textOriginal']:  
    translated, lang = context_translate(str(text))
    context_results.append((text, lang, translated))

# Print results
for original, lang, translated in context_results:
    print(f"[{lang}] {original}  -->  {translated}")

=== Context Translation ===
‚ö†Ô∏è Translation failed: zh --> No support for the provided language.
Please select on of the supported languages:
{'afrikaans': 'af', 'albanian': 'sq', 'amharic': 'am', 'arabic': 'ar', 'armenian': 'hy', 'assamese': 'as', 'aymara': 'ay', 'azerbaijani': 'az', 'bambara': 'bm', 'basque': 'eu', 'belarusian': 'be', 'bengali': 'bn', 'bhojpuri': 'bho', 'bosnian': 'bs', 'bulgarian': 'bg', 'catalan': 'ca', 'cebuano': 'ceb', 'chichewa': 'ny', 'chinese (simplified)': 'zh-CN', 'chinese (traditional)': 'zh-TW', 'corsican': 'co', 'croatian': 'hr', 'czech': 'cs', 'danish': 'da', 'dhivehi': 'dv', 'dogri': 'doi', 'dutch': 'nl', 'english': 'en', 'esperanto': 'eo', 'estonian': 'et', 'ewe': 'ee', 'filipino': 'tl', 'finnish': 'fi', 'french': 'fr', 'frisian': 'fy', 'galician': 'gl', 'georgian': 'ka', 'german': 'de', 'greek': 'el', 'guarani': 'gn', 'gujarati': 'gu', 'haitian creole': 'ht', 'hausa': 'ha', 'hawaiian': 'haw', 'hebrew': 'iw', 'hindi': 'hi', 'hmong': 'hmn', 'hungaria

## 2.0 Word-to-word Translation

Strength:
- Translates each word individually and simple

Weakness:
- Context Lost, awkward or incorrect phrasing

In [46]:
def word_translate(text):
    translated_words = []
    words = text.split()
    try:
        lang, _ = classify(text)
    except Exception:
        lang = "unknown"

    for word in words:
        if lang != "en" and lang != "unknown":
            try:
                translated = GoogleTranslator(source=lang, target="en").translate(word)
                if translated:  # avoid None
                    translated_words.append(translated)
                else:
                    translated_words.append(word)  # fallback
            except:
                translated_words.append(word)  # fallback
        else:
            translated_words.append(word)

    return lang, ' '.join(str(w) for w in translated_words)  # force everything to string

In [47]:
print("=== Word-by-Word Translation ===")
word_results = []
for text in df['textOriginal']:
    lang, translated_text = word_translate(str(text)) 
    word_results.append((text, lang, translated_text))

# Print results
for original, lang, translated_text in word_results:
    print(f"[{lang}] {original}  -->  {translated_text}")

=== Word-by-Word Translation ===
[en] The, uh, *shape* of the containers is something else üò≥  -->  The, uh, *shape* of the containers is something else üò≥
[la] And with perfect people like you √¢¬ù¬§√Ø¬∏¬è√∞≈∏≈í¬∏  -->  And with perfect People like you √Ç ¬§√Ø¬∏o¬∏
[en] Please don't call me sirüòÖüòÖ best part is you reply to every component  -->  Please don't call me sirüòÖüòÖ best part is you reply to every component
[en] Lol All you need is to put your hair in two ponytails and you'll look like Harley Quinn! You could definitely pull that look off!  -->  Lol All you need is to put your hair in two ponytails and you'll look like Harley Quinn! You could definitely pull that look off!
[en] It's "for the record" by Ooyy  -->  It's "for the record" by Ooyy
[en] All that to look just above average teanage boy  -->  All that to look just above average teanage boy
[en] Alright  -->  Alright
[en] Please show us what these are like in the sun.  -->  Please show us what these are like

## 3.0 API Translation

Strength:
- High accuracy for mixed/complex text

Weakness:
- Token limits

In [None]:
API_KEY = "your_openai_api_key_here"
client = OpenAI(api_key=API_KEY)

def api_translation(text, target_language="English"):
    # Detect language
    try:
        lang, _ = classify(text)
    except Exception as e:
        lang = "unknown"
        print(f"‚ö†Ô∏è Language detection failed: {e}")

    # If already English, skip translation
    if lang == "en":
        return text

    try:
        response = client.chat.completions.create(
            model= "gpt-3.5-turbo", #"gpt-4-1106-preview"
            messages=[
                {
                    "role": "system",
                    "content": (
                        "You are a professional multilingual translator. "
                        "Translate the following mixed-language text into English, keeping the meaning accurate. "
                        "Preserve slang, proper nouns, and domain-specific terms. Return only the translated result."
                    )
                },
                {
                    "role": "user",
                    "content": f"Translate to {target_language}:\n\n{text}"
                }
            ],
            temperature=0.3,
            max_tokens=1000,
        )

        translated_text = response.choices[0].message.content.strip()
        return translated_text

    except Exception as e:
        raise Exception(f"Translation failed: {str(e)}")

In [51]:
print("=== API Translation===")
mixed_results = []
for text in df['textOriginal']:
    translated = api_translation(str(text))
    mixed_results.append((text, translated))

# Print results
for original, translated in mixed_results:
    print(f"{original}  -->  {translated}")

=== API Translation===
The, uh, *shape* of the containers is something else üò≥  -->  The, uh, *shape* of the containers is something else üò≥
And with perfect people like you √¢¬ù¬§√Ø¬∏¬è√∞≈∏≈í¬∏  -->  And with perfect people like you ‚ù§Ô∏èüåü
Please don't call me sirüòÖüòÖ best part is you reply to every component  -->  Please don't call me sirüòÖüòÖ best part is you reply to every component
Lol All you need is to put your hair in two ponytails and you'll look like Harley Quinn! You could definitely pull that look off!  -->  Lol All you need is to put your hair in two ponytails and you'll look like Harley Quinn! You could definitely pull that look off!
It's "for the record" by Ooyy  -->  It's "for the record" by Ooyy
All that to look just above average teanage boy  -->  All that to look just above average teanage boy
Alright  -->  Alright
Please show us what these are like in the sun.  -->  Please show us what these are like in the sun.
This is a great point, thank you!  --> 