<a href="https://colab.research.google.com/github/Tharindupriyaharshana/H-rverstehenPro/blob/main/H%C3%B6rverstehenPro.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


**HörverstehenPro**

This code represents a sophisticated tool designed for helping students, particularly those learning German, to understand spoken German better. It's a Gradio-based web application named "HörverstehenPro," incorporating various AI models for tasks like speech recognition, grammar correction, translation, and grammatical analysis.


This tool seems particularly useful for students learning German, enabling them to practice listening, understand spoken German better, correct their grammar, and translate to their native language, all while identifying key parts of speech.

Here's a breakdown of each part:

In [None]:
!pip install --upgrade pip
!pip install --upgrade git+https://github.com/huggingface/transformers.git accelerate datasets[audio]
!pip install gradio
!pip install git+https://github.com/openai/whisper.git
!pip install transformers
!pip install torch
!pip install soundfile
!pip install spacy
!python -m spacy download de_core_news_sm




In [None]:
!pip install sentencepiece


Imports: The code begins by importing necessary libraries and models:



*   gradio for creating the web interface.
*   soundfile, numpy, difflib, and tempfile for audio file handling and processing.
*   transformers, torch, and whisper for utilizing pre-trained AI models.
*   spacy for natural language processing (NLP) tasks in German.









**The models used**

- Whisper Model: A large-v3 model from OpenAI's Whisper series is loaded for speech recognition.

- Spacy Model: The de_core_news_sm model from Spacy, tailored for German language processing, is loaded for grammatical analysis.

- T5 Grammar Correction Model: A model specialized in correcting German grammar.

- Translation Model: This is used to translate German to English, specifically the Helsinki-NLP/opus-mt-de-en model.


**Function Definitions:**

- correct_grammar: Corrects grammar in a given German text using the T5 model.
generate_diff: Generates a textual difference between original and corrected text.
- transcribe_and_correct: Transcribes audio to text, corrects its grammar, and shows differences.
- transcribe_and_translate: Handles audio input, transcribes it to German text, corrects grammar, translates to English, and extracts nouns and verbs.


**Audio Processing:**

The code handles audio input, either as a file or a NumPy array, and processes it for transcription and language detection.

**Language Detection and Transcription:**

Whisper's language detection is hinted to focus on German.
The audio is transcribed, and the German text is obtained.

**Translation and Grammatical Analysis:**

The transcribed German text is translated into English.
Spacy's NLP model is used to extract nouns and verbs from the original German text.



In [None]:
import gradio as gr
import tempfile
import soundfile as sf
import numpy as np
from transformers import pipeline, WhisperForConditionalGeneration, WhisperProcessor
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
import torch
import difflib
import whisper
import spacy

# Load Whisper large-v3 model
model_id = "openai/whisper-large-v3"
model = WhisperForConditionalGeneration.from_pretrained(model_id)
processor = WhisperProcessor.from_pretrained(model_id)

# Load the German model
nlp = spacy.load("de_core_news_sm")


# Initialize Whisper pipeline
whisper_pipeline = pipeline(
    "automatic-speech-recognition",
    model=model,
    tokenizer=processor.tokenizer,
    feature_extractor=processor.feature_extractor,
    device=0 if torch.cuda.is_available() else -1
)


# Load T5 German Grammar Correction model
tokenizer = AutoTokenizer.from_pretrained("aiassociates/t5-small-grammar-correction-german")
t5_model = AutoModelForSeq2SeqLM.from_pretrained("aiassociates/t5-small-grammar-correction-german")

# Load the translation model (you can choose an appropriate translation model)
translation_model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-de-en")
translation_tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-de-en")


def correct_grammar(text):
    inputs = tokenizer.encode("grammar: " + text, return_tensors="pt", padding=True)
    outputs = t5_model.generate(inputs, max_length=512)
    corrected_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return corrected_text


def generate_diff(original, corrected):
    diff = difflib.ndiff(original.split(), corrected.split())
    diff_text = '\n'.join(diff)
    return diff_text

def transcribe_and_correct(audio):
    transcribed_text = transcribe(audio)
    corrected_text = correct_grammar(transcribed_text)
    diff_text = generate_diff(transcribed_text, corrected_text)
    return transcribed_text, corrected_text, diff_text


def transcribe_and_translate(audio):
    # Create a temporary file to save the audio if it's a NumPy array
    if isinstance(audio, np.ndarray) or (isinstance(audio, tuple) and isinstance(audio[1], np.ndarray)):
        # If audio is a tuple, it contains (sample_rate, audio_data)
        if isinstance(audio, tuple):
            sample_rate, audio_data = audio
        else:
            sample_rate = 16000  # the model request in this rate
            audio_data = audio

        # Write audio data to a temporary file
        with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as tmp_file:
            sf.write(tmp_file.name, audio_data, sample_rate)
            tmp_file_path = tmp_file.name

        # Load the audio file with Whisper
        audio = whisper.load_audio(tmp_file_path)
    else:
        # If it's a file path, use it directly
        audio = whisper.load_audio(audio)

    # Load audio and pad/trim it to fit 30 seconds
    audio_data = whisper.pad_or_trim(audio)

    # Make log-Mel spectrogram and move to the same device as the model
    mel = whisper.log_mel_spectrogram(audio_data).to(model.device)


     # Set the decoding options with German language
    options = whisper.DecodingOptions(language="de")

    # Transcribe in the original language (German)
    original_transcription_result = whisper_pipeline(audio, generate_kwargs={"language": "german"})
    original_transcribed_text = original_transcription_result["text"]

    # Correct grammar in the original transcription
    corrected_text = correct_grammar(original_transcribed_text)


    # Use the translation model to translate from German to English
    input_ids = translation_tokenizer.encode("translate German to English: " + corrected_text, return_tensors="pt", padding=True)
    translated_ids = translation_model.generate(input_ids, max_length=512)
    translated_text = translation_tokenizer.decode(translated_ids[0], skip_special_tokens=True)


    # Extract nouns and verbs using POS tagging

    doc = nlp(original_transcribed_text)

    # Extract nouns and verbs
    nouns = [token.text for token in doc if token.pos_ == "NOUN"]
    verbs = [token.text for token in doc if token.pos_ == "VERB"]


    return original_transcribed_text, corrected_text, translated_text, nouns, verbs

iface = gr.Interface(
    fn=transcribe_and_translate,
    inputs=gr.Audio(),
    outputs=[
        gr.Textbox(label="Original Transcribed Text (German)"),
        gr.Textbox(label="Grammar Corrected Text"),
        gr.Textbox(label="Translated to English"),
        gr.Textbox(label="Nouns"),
        gr.Textbox(label="Verbs")
    ],
    title='HörverstehenPro',
    description='Your German Listening Assistant by GIVE A NAME',
    live=True
)

# Launch with sharing enabled
iface.launch(share=True, debug=True)




The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
  return self.fget.__get__(instance, owner)()


Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://69c48b5e068b9c5b55.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/gradio/queueing.py", line 495, in call_prediction
    output = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/route_utils.py", line 232, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1561, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/dist-packages/gradio/blocks.py", line 1179, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/dist-packages/anyio/to_thread.py", line 33, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 877, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/dist-packages/anyio/_backends/_asyncio.py", line 807, in run
    re