<a href="https://colab.research.google.com/github/Eaby/NLP_Codes/blob/main/NU_IUI_Machine_Translation.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# ***Machine Translation Application***

********************************************************************************

# **Task 1 : Machine Translation**

This program is a command-line utility for translating text from one language to another using the Google Translate API. It also includes the ability to extract text from images using OCR (Optical Character Recognition) and translate that extracted text.

**Libraries:**

googletrans (for translation)

tabulate (for formatting language lists)

pytesseract (for OCR)

PIL (Python Imaging Library for image processing)

**Translator Setup:**

The **googletrans** library is used to create a **Translator** object for language translation.

**Functions:**

**translate_text(text, target_language='en')** translates the given text to the specified target language.

**extract_text_from_image(image_path)** extracts text from an image using OCR.

**list_languages()** lists available target languages.

**print_translation(source_language, target_language, original_text, translated_text)** displays translation details.

>It starts with a welcome message and an interactive menu loop.

>>Users can choose between three options:

>>>Translate text manually entered by the user.

>>>Upload an image, extract text, and translate it.

>>>Quit the program.

>>Users can also list available target languages, select a target language, and view translation details.

The program handles user input validation and provides informative messages for various scenarios.

Its a user-friendly command-line interface for translating text from various languages, either entered manually or extracted from images. It demonstrates the use of multiple Python libraries and the Google Translate API to achieve language translation and OCR functionality.

In [None]:
!pip install googletrans==4.0.0-rc1

In [None]:
!pip install tabulate

In [None]:
!pip install pytesseract Pillow

In [None]:
!apt-get install tesseract-ocr

In [None]:
#************************************************************************************************
# Developer: Eaby Kollonoor Babu
# Version: 2.1
# Last Updated: 2023-09-09
# Contact:eaby.asha@gmail.com

# Description
"""
Machine Translation Application using Python Program (for educational purpose only)

A command-line utility for translating text from one language to another
using the Google Translate API. It also includes the ability to extract text from images
using OCR (Optical Character Recognition) and translate that extracted text.
"""

# License and Copyright Notice
"""
Copyright (c) 2023 Eaby Kollonoor Babu

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), for the sole
purpose of educational and non-commercial use, without restriction, including
without limitation the rights to use, copy, modify, merge, publish, distribute,
or sublicense copies of the Software, and to permit persons to whom the Software
is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS" FOR EDUCATIONAL USE ONLY, WITHOUT WARRANTY OF ANY
KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO
EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR
OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
IN THE SOFTWARE.
"""

# Changelog/Release Notes
"""
Changelog:

- Version 1.0 (2023-08-22): Only basic text translating testing.
- Version 1.1 (2023-08-25): Text translating with Google Translate API integration.
- Version 1.2 (2023-08-27): Output text formatted and tabulated
- Version 2.1 (2023-09-09): Integrated option to choose between text and image inputs

"""

# Feedback
"""
For questions or feedback, feel free to email me at eaby.asha@gmail.com
"""
#************************************************************************************************

import os
from googletrans import Translator, LANGUAGES
from tabulate import tabulate
import pytesseract
from PIL import Image
from google.colab import files

translator = Translator()

def translate_text(text, target_language='en'):
    try:
        # Detect the source language
        src_language = translator.detect(text).lang
        translated_text = translator.translate(text, src=src_language, dest=target_language)
        return translated_text.text, LANGUAGES[src_language], LANGUAGES[target_language]
    except Exception as e:
        return str(e), None, None

def extract_text_from_image(image_path):
    try:
        # Use Tesseract OCR to extract text from the image
        extracted_text = pytesseract.image_to_string(Image.open(image_path))
        return extracted_text
    except Exception as e:
        return str(e)

def list_languages():
    lang_items = list(LANGUAGES.items())
    num_langs = len(lang_items)

    # Divide the list of languages into eight columns
    cols = 8
    rows = (num_langs + cols - 1) // cols
    table = []

    for i in range(rows):
        row_data = []
        for j in range(cols):
            index = i + j * rows
            if index < num_langs:
                code, lang = lang_items[index]
                row_data.append(f"{code}: {lang}")
        table.append(row_data)

    table_str = tabulate(table, tablefmt="fancy_grid")
    print("\033[1m\033[94m  Available target languages: \033[0m")
    print(table_str)

def print_translation(source_language, target_language, original_text, translated_text):
    print("\033[1m\033[94m Translation Details: \033[0m")
    print(f"\033[1m\033[94m Source Language: \033[0m {source_language}")
    print(f"\033[1m\033[94m Target Language: \033[0m {target_language}")
    print(f"\033[1m\033[94m Original Text: \033[0m {original_text}")
    print(f"\033[1m\033[94m Translated Text:\033[0m {translated_text}")

def main():
    print("\033[1m\033[94m        *****  Welcome to the Multilingual Translator!  *****\033[0m")
    while True:
        print("\n\n\033[1m\033[94mSelect translation source:\033[0m")
        print("\n\033[1m\033[94m1. Text Input\033[0m")
        print("\033[1m\033[94m2. Image File\033[0m")
        print("\033[1m\033[94m3. Quit\033[0m")
        choice = input("\nPlease Enter your choice (1/2/3): ")

        if choice == '1':
            user_input = input("\nPlease Enter text to translate (q to quit): ")
            if user_input.lower() == 'q':
                break

            list_languages()
            target_language = input("\nEnter the target language code (e.g., 'en' for English): ")

            if target_language not in LANGUAGES:
                print("\n\033[1m\033[94mInvalid language code. Please select a valid language.\033[0m")
                continue

            translated_text, source_language, target_language_name = translate_text(user_input, target_language)
            if source_language:
                print_translation(source_language, target_language_name, user_input, translated_text)
            else:
                print("\n\033[1m\033[94m Language detection failed.\033[0m")

        elif choice == '2':
            uploaded = files.upload()
            # Get the first uploaded file name (assuming only one file is uploaded)
            uploaded_files = list(uploaded.keys())

            if len(uploaded_files) == 0:
                print("\n\033[1m\033[94m No files uploaded. Please upload an image.\033[0m")
                continue

            image_path = uploaded_files[0]
            extracted_text = extract_text_from_image(image_path)

            if extracted_text:
                print(f"\n\033[1m\033[94m Extracted text from the image:\033[0m {extracted_text}")
                list_languages()
                target_language = input("\033[1m\033[94m Enter the target language code (e.g., 'es' for Spanish):\033[0m ")

                if target_language not in LANGUAGES:
                    print("\n\033[1m\033[94m Invalid language code. Please select a valid language.\033[0m")
                    continue

                translated_text, source_language, target_language_name = translate_text(extracted_text, target_language)
                if source_language:
                    print_translation(source_language, target_language_name, extracted_text, translated_text)
                else:
                    print("\n\033[1m\033[94m Language detection failed.\033[0m")
            else:
                print("\n\033[1m\033[94m Text extraction from the image failed.\033[0m")

        elif choice == '3':
            break
        else:
            print("\n\033[1m\033[94m Invalid choice. Please select 1, 2, or 3.\033[0m")

if __name__ == "__main__":
    main()