<a href="https://colab.research.google.com/github/Sofia-Amouei/Sofia-Amouei/blob/main/Machine_Translations.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


##Prepared by **"Sofia Amouei 4013074508, Machine Translations"**
## Text mining & Web mining Course Project, Decision Science & Computer Engineering student, Khorazmi University; January 11, 2024

# *Extracting Python codes for Rule-Based machine translation (RBMT), Statistical Machine Translation (SMT), and Neural Machine Translation (NMT) systems requires reviewing various sources, as these are complex systems developed over many years by numerous researchers and engineers. In this research, each translation system was examined separately for English and Spanish languages and their codes were analyzed in this Colab:*

# **1-Rule-Based Machine Translation (RBMT):**

These systems are an early form of machine translation technology that rely heavily on linguistic rules and dictionaries. Unlike modern statistical or neural machine translation systems, RBMT focuses on translating texts by applying grammatical and syntactic rules of the source and target languages. These rules are typically handcrafted by linguists.

To create a simple example of an RBMT system, let's consider a basic Python script for translating a very limited set of sentences from **English to Spanish**. This system will be highly simplified and will not represent the complexity found in full-scale RBMT systems.

In [30]:
# A Simple Rule-Based Machine Translation Example: English to Spanish

# Dictionary for word-to-word translation
english_to_spanish_dict = {
    "cat": "gato",
    "dog": "perro",
    "eats": "come",
    "sleeps": "duerme",
    "the": "el"  # Assuming only masculine nouns for simplicity
}

# Simple rules for sentence structure
# English: Subject-Verb-Object
# Spanish: Subject-Object-Verb

def translate_sentence(sentence):
    words = sentence.lower().split()
    translated_words = []

    if len(words) == 3:  # Simple sentences with Subject, Verb, Object
        subject, verb, obj = words
        # Translate each word
        subject_spanish = english_to_spanish_dict.get(subject, "UNKNOWN")
        verb_spanish = english_to_spanish_dict.get(verb, "UNKNOWN")
        object_spanish = english_to_spanish_dict.get(obj, "UNKNOWN")

        # Reorder words to fit Spanish structure: Subject-Object-Verb
        translated_words = [subject_spanish, object_spanish, verb_spanish]

    return ' '.join(translated_words)

# Example usage
english_sentence = "The cat eats"
spanish_translation = translate_sentence(english_sentence)
print(f"English: {english_sentence} | Spanish: {spanish_translation}")


English: The cat eats | Spanish: el come gato


This code is a basic representation and works only for very specific sentence structures. Real-world RBMT systems are far more complex and involve extensive rules for grammar, syntax, and context handling. They may also include morphological analysis, part-of-speech tagging, and other linguistic processes. The development of such systems requires deep linguistic expertise and extensive language-specific development.

# **2-Statistical Machine Translation (SMT):**

Statistical Machine Translation (SMT) is a type of machine translation that uses statistical models to translate text from one language to another. Unlike Rule-Based Machine Translation (RBMT), which relies on linguistic rules, SMT learns to translate by analyzing large volumes of bilingual text data. The core idea is to find patterns and probabilities of words and phrases in one language corresponding to those in another.

SMT typically involves the following components:

**1-Language Model:** Determines the probability of a sequence of words in the target language.

**2-Translation Model:** Determines the probability of a source language phrase translating to a target language phrase.

**3-Decoder:** Finds the best translation by combining probabilities from the language and translation models.

Creating a full-fledged SMT system from scratch is complex and requires extensive data and computational resources. However,
we can provide a simplified Python example to illustrate the basic concept.

In this example, we'll use a small predefined dictionary for phrase translations and a simple model **(English to Spanish)** for choosing translations based on probabilities.

In [31]:
import random

# Example Translation Model: A simple dictionary with probabilities
# Format: { "source phrase": [(translation, probability), ...]}
translation_model = {
    "hello": [("hola", 0.6), ("buenos días", 0.4)],
    "world": [("mundo", 1.0)],
}

# Example Language Model: Function to randomly choose a translation based on probabilities
def choose_translation(phrase):
    if phrase in translation_model:
        translations = translation_model[phrase]
        total = sum(prob for _, prob in translations)
        rand = random.uniform(0, total)
        current = 0
        for translation, prob in translations:
            current += prob
            if rand <= current:
                return translation
    return "UNKNOWN"

# Example Decoder: Function to translate a sentence
def translate_sentence(sentence):
    translated_sentence = []
    for word in sentence.lower().split():
        translated_sentence.append(choose_translation(word))
    return ' '.join(translated_sentence)

# Example usage
english_sentence = "hello world"
spanish_translation = translate_sentence(english_sentence)
print(f"English: {english_sentence} | Spanish: {spanish_translation}")


English: hello world | Spanish: hola mundo


This code is a very basic illustration and does not represent the complexity and sophistication of actual SMT systems. Real SMT systems, like those used by Google Translate in its early years, involve training on large bilingual corpora, sophisticated probabilistic models, and complex algorithms for decoding and translation. SMT has largely been surpassed by Neural Machine Translation (NMT) models in recent years, which offer improvements in translation quality and efficiency.

# **3-Neural Machine Translation (NMT):**
Neural Machine Translation (NMT) is a sophisticated approach to machine translation that utilizes deep neural networks, particularly **sequence-to-sequence (seq2seq) models**. These models have significantly improved the quality of machine translation by effectively handling long-range dependencies and nuances in language. NMT systems are trained on large bilingual datasets and learn to translate by finding complex patterns in the data.
A full-fledged NMT system is complex and requires substantial computational resources for training. However, We can provide a simplified Python example using the **Hugging Face Transformers** library, which includes pre-trained NMT models. This example will demonstrate how to use a pre-trained NMT model for translating text from **English to Spanish**.

In [32]:
!pip install transformers
!pip install sentencepiece
import sentencepiece as spm
spm.SentencePieceProcessor()
from transformers import pipeline




In [33]:
!pip install googletrans==4.0.0-rc1




In [34]:
from googletrans import Translator, LANGUAGES

# Initialize the Google Translate API translator
translator = Translator()

# Translate text from English to Spanish
english_text = "This is an example of neural machine translation."
translated = translator.translate(english_text, src='en', dest='es')

print(f"English: {english_text} | Spanish: {translated.text}")


English: This is an example of neural machine translation. | Spanish: Este es un ejemplo de traducción al automóvil neural.


In this code:
We create an instance of Translator from the googletrans library.
We then use the translate method to translate the given **English text to Spanish**, specifying **src='en'** for source language and **dest='es'** for destination language.
Finally, we print the original English text and its Spanish translation.
This approach should work more smoothly, as it doesn't require complex model setups or external dependencies beyond the googletrans library itself.

## Machine translation is a vibrant area of research and development, and GitHub hosts a variety of projects ranging from academic prototypes to industry-grade frameworks. Here are some notable GitHub repositories in the field of machine translation:

**1-T2T (Tensor2Tensor) by Google Research:**
Repository: github.com/tensorflow/tensor2tensor

Description: Tensor2Tensor is a library for deep learning models and datasets designed by Google Research. It includes implementations of many state-of-the-art models, including those for machine translation.

**2-OpenNMT:**
Repository: github.com/OpenNMT

Description: OpenNMT is an open-source ecosystem for neural machine translation and neural sequence learning. Started at Harvard, it has been developed and maintained by a wide range of contributors. OpenNMT provides implementations in both PyTorch (OpenNMT-py) and TensorFlow (OpenNMT-tf).

**3-Marian NMT:**
Repository: github.com/marian-nmt/marian

Description: Marian is an efficient, free Neural Machine Translation framework mainly being developed by the Microsoft Translator team. It is particularly geared towards translation, but can be used for any sequence-to-sequence tasks.

**4-Fairseq by Facebook AI Research:**
Repository: github.com/pytorch/fairseq

Description: Fairseq is a sequence-to-sequence learning toolkit from Facebook AI Research that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks.

**5-Transformers by Hugging Face:**
Repository: github.com/huggingface/transformers

Description: Although not exclusively for machine translation, this library provides general-purpose architectures for natural language understanding and generation. It includes many pre-trained models that can be used for machine translation.

**6-Joey NMT:**
Repository: github.com/joeynmt/joeynmt

Description: Joey NMT is a minimalist NMT toolkit for educational purposes. It's designed to be simple, transparent, and easy to modify and extend.

**7-T5 (Text-To-Text Transfer Transformer) by Google Research:**
Repository: github.com/google-research/text-to-text-transfer-transformer

Description: T5 reframes all NLP tasks as a text-to-text problem, and it provides a unified framework that can be used for translation, summarization, question answering, and other tasks.
These projects represent a mix of different approaches and are suitable for various levels of expertise, from beginners to advanced practitioners in machine learning and NLP.

"Thanks for your attention"