<a href="https://colab.research.google.com/github/SandeepKonduruFeb12/aiml/blob/master/silver/A3TranslatorApp.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

create a translator app which takes your voice/speech as an input
(in the language you have selected, give english and 4 Indian language choices) convert it to speech in the language you selected as output (in the output also give english and 4 Indian language choices)

# Speech Translator Application Summary

This notebook implements a comprehensive Speech Translator Application that allows users to translate spoken input from one language to spoken output in another. The application addresses the initial problem statement of creating a translator app that takes voice/speech as input in a selected language and converts it to speech in a selected output language, supporting English and several Indian languages.

## Functionality:
1.  **Speech Input**: Captures user's voice through the microphone in a selected source language.
2.  **Speech-to-Text Conversion**: Transcribes the captured speech into text.
3.  **Text Translation**: Translates the transcribed text from the source language to a selected target language.
4.  **Text-to-Speech Conversion**: Converts the translated text into spoken audio in the target language.
5.  **Audio Output**: Plays the translated speech back to the user.

## Technologies and Libraries Used:
*   **`SpeechRecognition`**: Used for converting spoken audio into text. It leverages Google's Web Speech API for robust speech recognition.
*   **`gTTS` (Google Text-to-Speech)**: Utilized for converting translated text into natural-sounding speech in various languages.
*   **`googletrans`**: Employed for performing text translation between different languages, specifically using Google Translate's capabilities.
*   **`PyAudio`**: A dependency for `SpeechRecognition` to interface with audio input/output devices.
*   **`pydub`**: Used for audio manipulation, specifically for converting the recorded WebM audio format (from Colab's JavaScript) into a WAV format compatible with `SpeechRecognition`.
*   **`IPython.display`**: Enables displaying rich output within the Colab environment, particularly for playing audio files.
*   **`google.colab`**: Provides utilities specific to Google Colab, including `output` for capturing JavaScript execution results and `widgets` (though not directly used in the final version of audio recording, it's relevant for interactive Colab elements).
*   **JavaScript (Embedded)**: Custom JavaScript code is used to record audio directly in the browser within the Colab environment, overcoming common microphone access issues.

## Problem Statement Fulfillment:
*   **Input/Output Language Choices**: The app provides a selection of languages for both input and output, including English (`en`) and four Indian languages: Hindi (`hi`), Tamil (`ta`), Telugu (`te`), and Kannada (`ka`).
*   **End-to-End Voice Translation**: It seamlessly integrates speech recognition, text translation, and text-to-speech synthesis to achieve a complete voice-to-voice translation experience.

## Key Features and Considerations:
*   **Colab Audio Recording**: A custom `record_audio_colab` function, using embedded JavaScript, handles microphone input within the Google Colab environment, which can sometimes be challenging.
*   **Audio Format Conversion**: Recorded WebM audio is converted to WAV format using `pydub` to ensure compatibility with `SpeechRecognition`.
*   **Temporary File Management**: The application handles the creation and removal of temporary audio files (`translated_audio.mp3`, `temp_colab_audio.wav`) to keep the environment clean.
*   **User-Friendly Interface**: The `run_translator_app` function provides clear prompts for language selection and real-time feedback during the translation process.
*   **Error Handling**: Basic error handling is implemented for speech recognition, translation, and text-to-speech conversion to gracefully manage potential issues.


## How the Solution Was Implemented

The implementation of this Speech Translator Application involved several key steps and addressing specific challenges, particularly related to audio input within the Google Colab environment:

1.  **Environment Setup**: Initially, necessary Python libraries (`SpeechRecognition`, `gTTS`, `googletrans`, `pydub`) were installed using `!pip install`. The `googletrans` library required careful handling due to changes in its internal structure, leading to an `ImportError` for `service_urls`, which was resolved by removing unnecessary imports.

2.  **Language Options**: A Python dictionary (`language_options`) was created to map human-readable language names (e.g., "English", "Hindi") to their corresponding ISO 639-1 language codes (e.g., "en", "hi"). This ensured a flexible and user-friendly language selection mechanism.

3.  **Speech Input and Recognition**: This was the most complex part due to Colab's limitations:
    *   Direct microphone access via `speech_recognition.Microphone` (which relies on `PyAudio`) failed with an `OSError: No Default Input Device Available` as Colab notebooks do not have direct access to local microphones.
    *   Attempts to use `google.colab.widgets.audio.record_audio()` also resulted in `AttributeError`s, indicating this module was not directly available or used incorrectly.
    *   **Solution**: A custom JavaScript function (`recordAudio`) was embedded and executed via `google.colab.output.eval_js`. This JavaScript uses the browser's `navigator.mediaDevices.getUserMedia` and `MediaRecorder` APIs to capture audio in WebM format. The recorded audio bytes are then returned to Python.
    *   **Audio Format Conversion**: The WebM audio bytes were incompatible with `SpeechRecognition`'s `AudioFile` class. To resolve this, `ffmpeg` (installed via `!apt-get`) and `pydub` (installed via `!pip install`) were used to convert the WebM data to WAV format before feeding it to `SpeechRecognition`.
    *   The recording duration was increased from 3 to 5 seconds to improve the chances of successful speech recognition.

4.  **Text Translation**: The `translate_text` function was implemented using the `googletrans.Translator` class. It takes the recognized text and the target language code, performs the translation, and returns the translated text. Error handling was included for translation failures.

5.  **Text-to-Speech Conversion**: The `text_to_speech` function utilized `gTTS` to convert the translated text into an MP3 audio file. This file was temporarily saved and then played back using `IPython.display.Audio`, providing audible output to the user. Temporary files were managed by deleting them after playback.

6.  **Application Orchestration**: The `run_translator_app` function serves as the main interface. It prompts the user to select input and output languages from the defined `language_options`, then sequentially calls the `recognize_speech_from_mic`, `translate_text`, and `text_to_speech` functions to complete the end-to-end speech-to-speech translation process. Input validation for language selection and comprehensive error messages were included to guide the user through the process.

In [6]:
!pip install SpeechRecognition gTTS googletrans==4.0.0-rc1
print("Libraries installed successfully.")

Libraries installed successfully.


In [15]:
!apt-get update
!apt-get install -y portaudio19-dev
!pip install PyAudio

print("PyAudio and its dependencies installed successfully.")

0% [Working]            Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
0% [Connecting to archive.ubuntu.com] [1 InRelease 14.2 kB/129 kB 11%] [Connect                                                                               Hit:2 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [1 InRelease 129 kB/129 kB 100%] [Connected to cloud.r-project.org (108.157.                                                                               Get:3 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
                                                                               Get:4 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,632 B]
                                                                               Get:5 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
                                                                               Hit:6 https://cli.gi

In [8]:
from googletrans import Translator

def translate_text(text, dest_language_code):
    if not text:
        print("No text to translate.")
        return None

    translator = Translator()
    try:
        # The src parameter can be left as default (auto-detect) or explicitly set if needed
        translated = translator.translate(text, dest=dest_language_code)
        print(f"Translated text: {translated.text}")
        return translated.text
    except Exception as e:
        print(f"Error during translation: {e}")
        return None

print("Text translation function defined.")

Text translation function defined.


In [9]:
from gtts import gTTS
from IPython.display import Audio, display
import os

def text_to_speech(text, lang_code):
    if not text:
        print("No text to convert to speech.")
        return

    try:
        # Create a gTTS object
        tts = gTTS(text=text, lang=lang_code, slow=False)

        # Save the audio to a temporary file
        audio_file = "translated_audio.mp3"
        tts.save(audio_file)

        # Play the audio file
        print(f"Playing audio in {lang_code}...")
        display(Audio(audio_file, autoplay=True))

        # Clean up the temporary file after playback
        # Note: In some environments, the file might be deleted before full playback.
        # For Colab, Audio(filename) makes a copy, so safe to delete immediately.
        os.remove(audio_file)

    except Exception as e:
        print(f"Error during text-to-speech conversion: {e}")

print("Text-to-speech function defined.")


Text-to-speech function defined.


In [12]:
def run_translator_app(language_options):
    print("Welcome to the Speech Translator App!")

    # Display language options to the user
    print("\nAvailable Languages:")
    lang_names = list(language_options.keys())
    for i, lang in enumerate(lang_names):
        print(f"{i+1}. {lang}")

    # Get source language input from user
    while True:
        try:
            source_choice = int(input("\nEnter the number for your INPUT language: "))
            if 1 <= source_choice <= len(lang_names):
                source_lang_name = lang_names[source_choice - 1]
                source_lang_code = language_options[source_lang_name]
                print(f"Selected input language: {source_lang_name}")
                break
            else:
                print("Invalid choice. Please enter a number from the list.")
        except ValueError:
            print("Invalid input. Please enter a number.")

    # Get target language input from user
    while True:
        try:
            target_choice = int(input("Enter the number for your OUTPUT language: "))
            if 1 <= target_choice <= len(lang_names):
                target_lang_name = lang_names[target_choice - 1]
                target_lang_code = language_options[target_lang_name]
                print(f"Selected output language: {target_lang_name}")
                break
            else:
                print("Invalid choice. Please enter a number from the list.")
        except ValueError:
            print("Invalid input. Please enter a number.")

    print("\n--- Starting Translation Process ---")

    # Step 1: Recognize speech from microphone
    print(f"Listening for speech in {source_lang_name}...")
    recognized_text = recognize_speech_from_mic(source_lang_code)

    if recognized_text:
        # Step 2: Translate the recognized text
        print(f"Translating from {source_lang_name} to {target_lang_name}...")
        translated_text = translate_text(recognized_text, target_lang_code)

        if translated_text:
            # Step 3: Convert translated text to speech and play
            print(f"Converting translated text to speech in {target_lang_name}...")
            text_to_speech(translated_text, target_lang_code)
        else:
            print("Translation failed. Cannot convert to speech.")
    else:
        print("No speech recognized. Cannot proceed with translation.")

    print("\n--- Translation Process Complete ---")

print("Translator app core function defined.")

# To run the app, uncomment the line below:
# run_translator_app(language_options)


Translator app core function defined.


In [27]:
from google.colab import output as colab_output

print("Colab output module imported.")

Colab output module imported.


In [33]:
from google.colab import widgets as colab_widgets

print("Colab widgets module imported.")

Colab widgets module imported.


In [45]:
import speech_recognition as sr
import os
import io
from IPython.display import Javascript, display
from base64 import b64decode
from google.colab import output # Explicitly import output
from pydub import AudioSegment

# Helper function to record audio using Colab's JavaScript capabilities
def record_audio_colab():
    js_code = '''
        async function recordAudio() {
            const sleep  = time => new Promise(resolve => setTimeout(resolve, time))
            const b2text = blob => new Promise(resolve => {
                const reader = new FileReader()
                reader.onloadend = () => resolve(reader.result)
                reader.readAsDataURL(blob)
            })

            const options = { mimeType: 'audio/webm; codecs=opus' }
            const stream = await navigator.mediaDevices.getUserMedia({ audio: true })
            const recorder = new MediaRecorder(stream, options)
            let blob
            let data = []

            recorder.ondataavailable = e => data.push(e.data)
            recorder.start()
            await sleep(5000) // Record for 5 seconds (increased from 3 seconds)
            recorder.stop()
            await sleep(200) // Give recorder time to finalize blob
            blob = new Blob(data, options)
            return await b2text(blob)
        }
        recordAudio()
    '''
    display(Javascript(js_code))
    # 'output' refers to google.colab.output which is available globally in Colab notebooks
    s = output.eval_js('recordAudio()')
    if s:
        b = b64decode(s.split(',')[1])
        return b
    return None

def recognize_speech_from_mic(source_language_code):
    r = sr.Recognizer()
    print(f"Click 'Run' for the audio recorder widget and speak clearly in {source_language_code} for about 5 seconds!")

    audio_bytes_webm = record_audio_colab()

    if audio_bytes_webm is None:
        print("Audio recording failed or no audio captured.")
        return None

    # Convert WebM audio bytes to WAV format using pydub
    try:
        audio_segment = AudioSegment.from_file(io.BytesIO(audio_bytes_webm), format="webm")
        temp_audio_file_wav = "temp_colab_audio.wav"
        audio_segment.export(temp_audio_file_wav, format="wav")
    except Exception as e:
        print(f"Error converting audio to WAV: {e}")
        return None

    try:
        # Load the WAV audio file and recognize it
        with sr.AudioFile(temp_audio_file_wav) as source:
            audio = r.record(source)  # read the entire audio file

        text = r.recognize_google(audio, language=source_language_code)
        print("You said: " + text)
        return text
    except sr.UnknownValueError:
        print("Google Speech Recognition could not understand audio (try speaking more clearly or for longer).")
        return None
    except sr.RequestError as e:
        print(f"Could not request results from Google Speech Recognition service; {e}")
        return None
    finally:
        # Clean up the temporary file
        if os.path.exists(temp_audio_file_wav):
            os.remove(temp_audio_file_wav)

print("Speech recognition function updated with increased recording duration and clear instructions.")

Speech recognition function updated with increased recording duration and clear instructions.


In [46]:
print("Initiating the Speech Translator App...")
run_translator_app(language_options)
print("Speech Translator App session ended.")

Initiating the Speech Translator App...
Welcome to the Speech Translator App!

Available Languages:
1. English
2. Hindi
3. Tamil
4. Telugu
5. Kannada

Enter the number for your INPUT language: 4
Selected input language: Telugu
Enter the number for your OUTPUT language: 2
Selected output language: Hindi

--- Starting Translation Process ---
Listening for speech in Telugu...
Click 'Run' for the audio recorder widget and speak clearly in te for about 5 seconds!


<IPython.core.display.Javascript object>

You said: మీరు ఏం చెప్పాలనుకుంటున్నా
Translating from Telugu to Hindi...
Translated text: आप कहना क्या चाहते हैं?
Converting translated text to speech in Hindi...
Playing audio in hi...



--- Translation Process Complete ---
Speech Translator App session ended.
