#Translation using transformer


##Install The Requirement:


---



Install beberapa library yang diperlukan untuk menjalankan kode selanjutnya
- sentencepiece: digunakan untuk tokenisasi dan pemrosesan teks
- sacremoses: menyediakan interface untuk tokenisasi dan normalisasi teks input
- tqdm: digunakan untuk menampilkan progress bar saat iterasi training sedang berlangsung
- accelereate -u: digunakan untuk mempercepat pelatihan model pada GPU

In [None]:
!pip install sentencepiece
!pip install sacremoses
!pip install tqdm
!pip install accelerate -U

Collecting sentencepiece
  Downloading sentencepiece-0.1.99-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.3 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m7.9 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: sentencepiece
Successfully installed sentencepiece-0.1.99
Collecting sacremoses
  Downloading sacremoses-0.1.1-py3-none-any.whl (897 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m897.5/897.5 kB[0m [31m6.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: sacremoses
Successfully installed sacremoses-0.1.1
Collecting accelerate
  Downloading accelerate-0.25.0-py3-none-any.whl (265 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m5.7 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate
Successfully installed accelerate-0.25.0


In [None]:
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, AdamW
from transformers import Seq2SeqTrainingArguments, Seq2SeqTrainer
import torch
from torch.utils.data import DataLoader
from tqdm import tqdm
from sklearn.model_selection import train_test_split
from torch.utils.data import Dataset
import pandas as pd
import re
import unicodedata

**AutoTokenizer:** Digunakan untuk menginstansiasi tokenizer otomatis berdasarkan nama model. Tokenizer digunakan untuk memproses dan memecah teks menjadi token-token yang dapat dimengerti oleh model bahasa alami.

**AutoModelForSeq2SeqLM:** Digunakan untuk menginstansiasi model seq2seq (sequence-to-sequence) otomatis berdasarkan nama model. Model seq2seq biasanya digunakan untuk tugas-tugas terjemahan mesin atau generasi teks lainnya.

**AdamW:** Ini adalah pustaka optimisasi yang merupakan variasi dari algoritma optimasi Adam. Pustaka ini digunakan untuk mengelola dan mengoptimalkan parameter-parameter model selama pelatihan.


#Connect google drive to be able to use the model

Kami sebelumnya memasukkan dataset translasi yang berjumlah 27024 baris ke dalam google drive dan kemudian memanggilnya kembali di bagian ini dan melakukan beberapa nromalisasi seperti mengubah karakter unicode supaya menjadi lebih mirip dengan format ASCII dan melakukan normalisasi string (mengubahnya menjadi lowercase, serta memisahkan antara kata dengan tanda baca)

In [None]:
from google.colab import drive
drive.mount('/content/drive')
file_path = '/content/drive/MyDrive/DeepLearning_UAS/data/eng-indo.txt'

with open(file_path, 'r') as fp:
    text = fp.read()

def unicodeToAscii(s):
    return ''.join(
        c for c in unicodedata.normalize('NFD', s)
        if unicodedata.category(c) != 'Mn'
    )

def normalizeString(s):
    s = unicodeToAscii(s.lower().strip())
    s = re.sub(r"([.!?])", r" \1", s)
    s = re.sub(r"[^a-zA-Z.!?]+", r" ", s)
    return s

text_dict = {"English": [], "Indonesian": []} # Read English and Indonesian words that's separated by 4 spaces
for l in text.splitlines():
    split_text = re.split(r" {4}", l)
    text_dict["English"].append(normalizeString(split_text[0]))
    text_dict["Indonesian"].append(normalizeString(split_text[1]))

df = pd.DataFrame.from_dict(text_dict)
df

Mounted at /content/drive


Unnamed: 0,English,Indonesian
0,run !,lari !
1,who ?,siapa ?
2,wow !,wow !
3,help !,tolong !
4,jump !,lompat !
...,...,...
27020,former dutch international koeman signed a two...,mantan pemain internasional belanda koeman men...
27021,valencia were then fourth in the table four po...,valencia kemudian beradadi urutan keempat pada...
27022,spanish media also reported on monday the club...,media spanyol pada senin juga melaporkan bahwa...
27023,the reports said club delegate salvador gonzal...,laporan tersebut menyebutkan delegasi klub sal...


#Model Training and Save Model
Simpan model data di google drive sehingga hasil trained tersimpan

#Load the trained translation model and use


In [None]:

# Load the fine-tuned model
model_path = "/content/drive/MyDrive/DeepLearning_UAS/results/fine-tuned-model"
model = AutoModelForSeq2SeqLM.from_pretrained(model_path)

# Load the fine-tuned tokenizer
tokenizer_path = "/content/drive/MyDrive/DeepLearning_UAS/results/fine-tuned-tokenizer"
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)

# Example English sentence
english_sentence = "They gathered old photos, ticket stubs, and drawings, about the adventures they would shared ."

# Split sentences based on period (.)
sentences = english_sentence.split('. ')

# Translate and concatenate the results
translated_sentences = []

for sentence in sentences:
    # Tokenize the input sentence
    inputs = tokenizer(sentence, return_tensors="pt")

    # Translate the input sentence
    translation = model.generate(**inputs)

    # Decode the translated sentence
    translated_sentence = tokenizer.batch_decode(translation, skip_special_tokens=True)[0]

    # Append the translated sentence to the list
    translated_sentences.append(translated_sentence)

# Join the translated sentences into a complete translation
translated_text = '. '.join(translated_sentences)

print("English:", english_sentence)
print("Translated:", translated_text)


English: They gathered old photos, ticket stubs, and drawings, about the adventures they would shared .
Translated: Mereka mengumpulkan foto-foto lama, potongan tiket, dan gambar-gambar, tentang petualangan mereka akan berbagi.


#Use our summarizer model


---

kita mengupload model kami sebelumnya di hugging face

In [None]:
import requests

API_URL = "https://api-inference.huggingface.co/models/Mr-FineTuner/FInal_Project_Deep_learning_Summarizer_for_indonesian_Language"
headers = {"Authorization": "Bearer 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"}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

output = query({
	"inputs": translated_text,
})
print(translated_text)
print(output)

Mereka mengumpulkan foto-foto lama, potongan tiket, dan gambar-gambar, tentang petualangan mereka akan berbagi.
[{'generated_text': 'Sebuah foto-foto lama, potongan tiket dan gambar-gambar, tentang petualangan mereka akan berbagi.'}]


##Jika Menggunakan Google Translate dan model summarize kami serta penambahan fitur text-to-speech


##Install The Requirement:


---



Install beberapa library yang diperlukan untuk menjalankan API Google Translate + Text to Speech
- googletrans: library yang harus diinstall supaya dapat menggunakan API Google Translate
- pydub: supaya dapat melakukan text to speech

In [None]:
! pip install googletrans==4.0.0-rc1
! pip install pydub

Collecting googletrans==4.0.0-rc1
  Downloading googletrans-4.0.0rc1.tar.gz (20 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting httpx==0.13.3 (from googletrans==4.0.0-rc1)
  Downloading httpx-0.13.3-py3-none-any.whl (55 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m55.1/55.1 kB[0m [31m2.5 MB/s[0m eta [36m0:00:00[0m
Collecting hstspreload (from httpx==0.13.3->googletrans==4.0.0-rc1)
  Downloading hstspreload-2023.1.1-py3-none-any.whl (1.5 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.5/1.5 MB[0m [31m10.8 MB/s[0m eta [36m0:00:00[0m
Collecting chardet==3.* (from httpx==0.13.3->googletrans==4.0.0-rc1)
  Downloading chardet-3.0.4-py2.py3-none-any.whl (133 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m133.4/133.4 kB[0m [31m10.3 MB/s[0m eta [36m0:00:00[0m
[?25hCollecting idna==2.* (from httpx==0.13.3->googletrans==4.0.0-rc1)
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
[2K     [90

Collecting pydub
  Downloading pydub-0.25.1-py2.py3-none-any.whl (32 kB)
Installing collected packages: pydub
Successfully installed pydub-0.25.1


# Google Translate
---
Set the destination to "id" = indonesian



In [None]:
from googletrans import Translator

translator = Translator()
dest ='id'
text_to_translate = 'They gathered old photos, ticket stubs, and drawings, reminiscing about the adventures they d shared . '
# text_to_translate = 'Once upon a time in a small town, there was a curious boy named Jake '
translated_text = translator.translate(text_to_translate, dest)
print(translated_text.text)  # Output: I speak English
print(f'current language: {dest}')   # Output: id


Mereka mengumpulkan foto -foto lama, potongan tiket, dan gambar, mengenang petualangan yang mereka bagikan.
current language: id




---
#Using our summarizer model to summarize the translated text


In [None]:
import requests

# ini yang model summarize kami
# Untuk SUMMARIZER

API_URL = "https://api-inference.huggingface.co/models/Mr-FineTuner/Summary-model-better"
headers = {"Authorization": "Bearer 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"}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

output = query({
	"inputs": translated_text.text,
})

print(translated_text.text)
print(output)

Mereka mengumpulkan foto -foto lama, potongan tiket, dan gambar, mengenang petualangan yang mereka bagikan.
[{'generated_text': 'Sebuah foto-foto lama, potongan tiket, dan gambar, mengenang petualangan yang mereka bagian dari seluruh dunia.'}]


#Using elevenlabs API for text to speech model

In [None]:
import requests

CHUNK_SIZE = 1024
url = "https://api.elevenlabs.io/v1/text-to-speech/cEj9Ae7aq7XpVnBlUhGO"

headers = {
  "Accept": "audio/mpeg",
  "Content-Type": "application/json",
  "xi-api-key": "2a343c602871be856c7a8f8def5c952e"
}

data = {
  "text": translated_text.text,
  "model_id": "eleven_monolingual_v1",
  "voice_settings": {
    "stability": 0.5,
    "similarity_boost": 0.5
  }
}

response = requests.post(url, json=data, headers=headers)
with open('output.mp3', 'wb') as f:
    for chunk in response.iter_content(chunk_size=CHUNK_SIZE):
        if chunk:
            f.write(chunk)


#Play the audio

In [None]:


from pydub import AudioSegment
from IPython.display import display, Audio
import numpy as np

mp3_path = '/content/output.mp3'

# Read the MP3 file
audio = AudioSegment.from_file(mp3_path, format="mp3")

# Convert audio to NumPy array
audio_array = np.array(audio.get_array_of_samples())

# Display the audio player
display(Audio(data=audio_array, rate=audio.frame_rate))


#Using another translation model from Helsinki as a comparison to our previous result.
https://huggingface.co/Helsinki-NLP/opus-mt-en-id

* ini merupakan model translasi yang kami gunakan sebelum difine-tune

In [None]:
import requests


API_URL = "https://api-inference.huggingface.co/models/Helsinki-NLP/opus-mt-en-id"
headers = {"Authorization": "Bearer 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"}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

output1 = query({
	# "inputs": "The crisp morning air filled the countryside as the sun peeked over the horizon, casting a golden hue across the fields. Dew glistened on blades of grass, a testament to the quiet night that had passed. Birds chirped cheerfully, heralding the arrival of a new day. In the distance, the silhouette of a farmhouse stood stoically against the waking sky, its windows reflecting the warm glow of the sunrise. Nature seemed to be awakening, embracing the world with a serene and timeless beauty.",
	# "inputs": "",
	"inputs": 'in the quaint village nestled between rolling hills and meandering streams, life unfolded with a timeless rhythm. Each morning, the sun painted the sky in hues of pink and gold, casting a warm glow upon cobblestone streets. The aroma of freshly baked bread wafted from the local bakery, inviting townsfolk to start their day with a sense of comfort. Children laughed and played in the vibrant town square, their joy echoing through the air. As the day progressed, the villagers tended to their gardens, traded stories at the market, and gathered in the cozy inn for evening tales. In this idyllic haven, the passage of time seemed to slow, creating a haven where community and nature harmoniously coexisted.',
})
print(output1)



[{'translation_text': 'Setiap pagi, matahari melukis langit dengan warna merah muda dan emas, melemparkan cahaya hangat di jalan-jalan batu. aroma roti panggang yang baru dipanggang dari toko roti lokal, mengundang warga kota untuk memulai hari mereka dengan rasa nyaman. anak-anak tertawa dan bermain di alun-alun kota yang hidup, kegembiraan mereka bergema melalui udara. seperti hari kemajuan, penduduk desa cenderung ke kebun-kebun mereka, bertukar cerita di pasar, dan berkumpul di tempat yang nyaman untuk dongeng malam.'}]


In [None]:
hasil = (output1[0]['translation_text'])
print(hasil)

Setiap pagi, matahari melukis langit dengan warna merah muda dan emas, melemparkan cahaya hangat di jalan-jalan batu. aroma roti panggang yang baru dipanggang dari toko roti lokal, mengundang warga kota untuk memulai hari mereka dengan rasa nyaman. anak-anak tertawa dan bermain di alun-alun kota yang hidup, kegembiraan mereka bergema melalui udara. seperti hari kemajuan, penduduk desa cenderung ke kebun-kebun mereka, bertukar cerita di pasar, dan berkumpul di tempat yang nyaman untuk dongeng malam.


#Using our summarizer model to summarize the translated text


In [None]:
import requests

API_URL = "https://api-inference.huggingface.co/models/Mr-FineTuner/Summary-model-better"
headers = {"Authorization": "Bearer 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"}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

output = query({
	"inputs": hasil,
})
print(hasil)
print(output)

Setiap pagi, matahari melukis langit dengan warna merah muda dan emas, melemparkan cahaya hangat di jalan-jalan batu. aroma roti panggang yang baru dipanggang dari toko roti lokal, mengundang warga kota untuk memulai hari mereka dengan rasa nyaman. anak-anak tertawa dan bermain di alun-alun kota yang hidup, kegembiraan mereka bergema melalui udara. seperti hari kemajuan, penduduk desa cenderung ke kebun-kebun mereka, bertukar cerita di pasar, dan berkumpul di tempat yang nyaman untuk dongeng malam.
[{'generated_text': 'Setiap pagi, matahari melukis langit dengan warna merah muda dan emas, melemparkan cahaya hangat di jalan-jalan batu.'}]


#Terima Kasih