## Student Helper App (Bilingual - Arabic & English)

This notebook is designed to assist in developing a bilingual (Arabic and English) Student Helper App, leveraging various machine learning models and pipelines for tasks such as transcription, summarization, translation, question-answering, and text-to-audio conversion.

### Key Features and Models Used:

- **Audio Transcription**:
  - We use the **Whisper** model to transcribe audio inputs into text.
  
- **Summarization**:
  - We use **facebook/bart-large-cnn** from Hugging Face to generate concise summaries from the transcribed content.
  
- **Translation**:
  - For translation tasks (Arabic ↔ English), we leverage the **facebook/nllb-200-distilled-600M** model, providing accurate bilingual translations.
  
- **Question-Answering**:
  - We implement Hugging Face's question-answering pipeline to extract relevant answers from the content.

- **Question Generation**:
  - For generating questions based on the content, we utilize a model from [patil-suraj/question_generation](https://github.com/patil-suraj/question_generation).

- **Text-to-Speech (TTS)**:
  - We use **gTTS (Google Text-to-Speech)** to convert the final text outputs into audio.

This notebook provides a streamlined workflow to assist students in summarizing, translating, and interacting with academic content in both English and Arabic, including an audio-based interface for improved accessibility.

**Authors : [Shahad Albalawi, Nojood Alnahdi]**

####Dependencies


In [1]:
!pip install git+https://github.com/openai/whisper.git
!pip install wget
!pip install transformers
!pip install torch
!pip install langdetect
!pip install gradio
!python -m nltk.downloader punkt
!pip install -U transformers==3.0.0
!git clone https://github.com/patil-suraj/question_generation.git
!pip install pytube moviepy
!pip install yt-dlp
!pip install gtts
!pip install wget

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-k28x55gx
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-k28x55gx
  Resolved https://github.com/openai/whisper.git to commit 25639fc17ddc013d56c594bfbf7644f2185fad84
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
Collecting tiktoken (from openai-whisper==20240930)
  Downloading tiktoken-0.7.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.6 kB)
Collecting triton>=2.0.0 (from openai-whisper==20240930)
  Downloading triton-3.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (1.3 kB)
Downloading triton-3.0.0-1-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (209.4 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m 

In [2]:
%cd question_generation

/content/question_generation


In [3]:
import whisper
import torch
from transformers import pipeline
from transformers.utils import logging
logging.set_verbosity_error()
from langdetect import detect
import gradio as gr
import os
from gtts import gTTS
from moviepy.editor import VideoFileClip
import yt_dlp

####Load the utilized models

In [4]:
# Load the pre-trained Whisper model (e.g., 'tiny','base', 'small', 'medium', 'large')
whispermodel = whisper.load_model("medium")

100%|█████████████████████████████████████| 1.42G/1.42G [00:23<00:00, 66.3MiB/s]
  checkpoint = torch.load(fp, map_location=device)



In [5]:
# Load the summarizer pipeline 'facebook/bart-large-cnn' model
summarizer = pipeline(task="summarization",
                      model="facebook/bart-large-cnn",
                      torch_dtype=torch.bfloat16)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.



config.json:   0%|          | 0.00/1.58k [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.63G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/363 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.36M [00:00<?, ?B/s]




In [6]:
# Load the translator pipeline 'facebook/nllb-200-distilled-600M' model
translator = pipeline(task="translation", model="facebook/nllb-200-distilled-600M")
languages = {
    "English": "eng_Latn",
    "Arabic": "arb_Arab",
}

config.json:   0%|          | 0.00/846 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/2.46G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/564 [00:00<?, ?B/s]

sentencepiece.bpe.model:   0%|          | 0.00/4.85M [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.3M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/3.55k [00:00<?, ?B/s]

In [7]:
# Load the question-answering pipeline 'deepset/roberta-base-squad2' model
qa_pipeline = pipeline(task = "question-answering", model = "deepset/roberta-base-squad2")

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/496M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/79.0 [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/899k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/772 [00:00<?, ?B/s]

In [8]:
# Move the model to GPU if available
if torch.cuda.is_available():
    whispermodel = whispermodel.to("cuda")
    print("Model moved to GPU.")
else:
    print("GPU not available. Model remains on CPU.")

Model moved to GPU.


In [9]:
#from pipelines.py get the pipeline we utilized patil-suraj/question_generation
from pipelines import pipeline
question_generator = pipeline("question-generation", model="valhalla/t5-small-qg-prepend", qg_format="prepend")

tokenizer_config.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/242M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/90.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/656 [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/31.0 [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/65.0 [00:00<?, ?B/s]

pytorch_model.bin:   0%|          | 0.00/242M [00:00<?, ?B/s]

####Gradio

In [10]:
!wget --no-check-certificate -O video-example.mp4 "https://drive.google.com/uc?export=download&id=1o6hO2tYTxgudQSwhSD1E0wVwZ_N6qR1l"
!wget --no-check-certificate -O audio-example.mp3 "https://drive.google.com/uc?export=download&id=1BcE0aITKjABWcN6JFs5lS1GFUjCQU_7Y"

--2024-10-01 15:51:21--  https://drive.google.com/uc?export=download&id=1o6hO2tYTxgudQSwhSD1E0wVwZ_N6qR1l
Resolving drive.google.com (drive.google.com)... 142.251.2.138, 142.251.2.113, 142.251.2.100, ...
Connecting to drive.google.com (drive.google.com)|142.251.2.138|:443... connected.
HTTP request sent, awaiting response... 303 See Other
Location: https://drive.usercontent.google.com/download?id=1o6hO2tYTxgudQSwhSD1E0wVwZ_N6qR1l&export=download [following]
--2024-10-01 15:51:21--  https://drive.usercontent.google.com/download?id=1o6hO2tYTxgudQSwhSD1E0wVwZ_N6qR1l&export=download
Resolving drive.usercontent.google.com (drive.usercontent.google.com)... 142.250.141.132, 2607:f8b0:4023:c0b::84
Connecting to drive.usercontent.google.com (drive.usercontent.google.com)|142.250.141.132|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5395237 (5.1M) [video/mp4]
Saving to: ‘video-example.mp4’


2024-10-01 15:51:26 (29.2 MB/s) - ‘video-example.mp4’ saved [5395237/5395237]

In [11]:
#Examples

example_audio = "audio-example.mp3"
example_video = "video-example.mp4"
example_youtube = "https://www.youtube.com/watch?v=J4RqCSD--Dg&ab_channel=LearnFree"

In [12]:
def interactive_qa(question):
    global transcription, languageG

    # Check if transcription is available
    if not transcription:
        return "Transcript is empty. Please input the transcript first."

    question = question.strip()
    if not question:
        return "Please enter a question."

    try:
        # Translate question if necessary
        if languageG != 'English':
            question = translator(question, src_lang=languages[languageG], tgt_lang=languages['English'])[0]['translation_text']

        # Get answer from QA pipeline
        result = qa_pipeline(question=question, context=transcription)
        answer = result.get('answer', 'No answer found.') # get the answer from result, if answer is not in result will get 'No answer found.'
        score = result.get('score', 0.0)

        # Translate answer back if necessary
        if languageG != 'English':
            answer = translator(answer, src_lang=languages['English'], tgt_lang=languages[languageG])[0]['translation_text']

        return f"**Answer**: {answer}\n\n**Confidence**: {score:.2f}"

    except Exception as e:
        return f"An error occurred: {str(e)}"

In [13]:
def translator_text(summary, data, language):
    # Return as-is if the language is English
    if language == 'English':
        return summary, data

    translated_summary = None
    translated_data = []

    # Translate summary if it's provided
    if summary is not None:
        translated_summary = translator(summary, src_lang=languages["English"], tgt_lang=languages['Arabic'])[0]['translation_text']
    else:
        translated_summary = "No summary requested."

    # Translate data if provided
    if data is not None:
        for item in data:
            question = item.get('question', '')
            answer = item.get('answer', '')

            # Translate both question and answer if they are present
            translated_question = translator(question, src_lang=languages["English"], tgt_lang=languages['Arabic'])[0]['translation_text'] if question else ''
            translated_answer = translator(answer, src_lang=languages["English"], tgt_lang=languages['Arabic'])[0]['translation_text'] if answer else ''

            translated_data.append({
                'question': translated_question,
                'answer': translated_answer
            })
    else:
        translated_data = "No Q&A requested."

    return translated_summary, translated_data

In [14]:
def extract_audio_from_video(video_file, output_audio="/content/extracted_audio.mp3"):
    try:
        # Use 'with' to ensure proper cleanup
        with VideoFileClip(video_file) as video_clip:
            video_clip.audio.write_audiofile(output_audio)
        return output_audio
    except Exception as e:
        return f"Error extracting audio: {e}"

In [15]:
def download_audio_from_youtube(youtube_url, output_path="/content/downloaded_audio.mp3"):
    ydl_opts = {
        'format': 'bestaudio/best',
        'outtmpl': 'temp_audio.%(ext)s',
        'postprocessors': [{
            'key': 'FFmpegExtractAudio',
            'preferredcodec': 'mp3',
            'preferredquality': '192',
        }],
        'quiet': True,
        'no_warnings': True,
    }

    try:
        with yt_dlp.YoutubeDL(ydl_opts) as ydl:
            ydl.download([youtube_url])
        os.rename('temp_audio.mp3', output_path)
        print(f"Audio successfully downloaded to {output_path}")
        return output_path
    except Exception as e:
        print(f"Error downloading audio: {e}")
        return None

In [16]:
def content_input_update(content_type):
    # Define visibility mappings for different content types
    visibility_map = {
        "Audio Upload": (True, False, False),
        "Video Upload": (False, False, True),
        "YouTube Link": (False, True, False),
    }

    # Get the visibility settings based on content type, default to all False if not found
    visible_audio, visible_youtube, visible_video = visibility_map.get(content_type, (False, False, False))

    # Return the updates
    return (
        gr.update(visible=visible_audio),
        gr.update(visible=visible_youtube),
        gr.update(visible=visible_video)
    )

In [17]:
# Global Variables
transcription = None
languageG = None

#Helper functions needed for gradio
def transcribe_content(content_type, audio_path, youtube_link, video):
    """Transcribe audio from different content types."""
    if content_type == "Audio Upload" and audio_path:
        return whispermodel.transcribe(audio_path)["text"]
    elif content_type == "YouTube Link" and youtube_link:
        audio_file = download_audio_from_youtube(youtube_link)
        return whispermodel.transcribe(audio_file)["text"]
    elif content_type == "Video Upload" and video:
        audio_file = extract_audio_from_video(video.name)
        return whispermodel.transcribe(audio_file)["text"]
    return None

def generate_summary_and_qna(summarize, qna, number):
    """Generate summary and Q&A if requested."""
    summary_text = None
    extracted_data = None

    # Generate summary if requested
    if summarize:
        summary = summarizer(transcription, min_length=10, max_length=150)
        summary_text = summary[0]['summary_text']

    # Generate Q&A if requested
    if qna:
        questions = question_generator(transcription)
        extracted_data = [{'question': item['question'], 'answer': item['answer'].replace('<pad> ', '')} for item in questions]
        extracted_data = extracted_data[:number] if len(extracted_data) > number else extracted_data
    return summary_text, extracted_data

def create_audio_summary(summary, language):
    """Create audio summary using gTTS."""
    if summary and summary != 'No summary requested.':
        tts = gTTS(text=summary, lang='ar' if language == 'Arabic' else 'en')
        audio_path = "output_audio.mp3"
        tts.save(audio_path)
        return audio_path
    return None

In [19]:
def main(content_type, audio_path, youtube_link, video, language, summarize, qna, number):
    global transcription
    global languageG
    languageG = language

    #1: Transcribe content based on the selected content type
    transcription = transcribe_content(content_type, audio_path, youtube_link, video)
    if not transcription:
        return "No transcription available.", "No Q&A requested.", None

    #2: Translate the transcription to English if it is written in Arabic, so it can be used in the pipelines.
    input_language = detect(transcription)
    input_language = 'Arabic' if input_language == 'ar' else 'English'
    if input_language != 'English':
        transcription = translator(transcription, src_lang=languages[input_language], tgt_lang=languages['English'])[0]['translation_text']

    #3: Summary the transcription & Generate Q&A from the question_generator pipeline
    summary_text, generated_qna = generate_summary_and_qna(summarize, qna, number)

    #4: Translate the summary and Q&A into the preferred language of the user.
    summary, qna = translator_text(summary_text, generated_qna, language)

    #5: Generate audio from the summary to be in the user's preferred language.
    audio_path = create_audio_summary(summary, language)

    #6: Prepare Q&A output
    qna_output = (
        "\n\n".join(
            f"**Question:** {item['question']}\n**Answer:** {item['answer']}"
            if language == "English"
            else f"**السؤال:** {item['question']}\n**الجواب:** {item['answer']}"
            for item in qna
        ) if qna else "No Q&A requested."
    )

    return summary, qna_output, audio_path


# Gradio interface
with gr.Blocks() as demo:
    # Title and description
    gr.Markdown(
        """
        # Student Helper App
        This app assists students by allowing them to upload audio, video, or YouTube links for automatic transcription.
        It can translate content, summarize it, and generate Q&A questions to help with studying.
        The app is ideal for students who want to review lectures, study materials, or any educational content more efficiently.
        """
    )

    # Step 1: Choose content type
    content_type = gr.Radio(
        choices=["Audio Upload", "Video Upload", "YouTube Link"],
        label="Select Content Type",
        value="Audio Upload"
    )

    # Input components for file, video, and YouTube link
    file_input = gr.Audio(label="Upload an Audio File", visible=True, type="filepath")
    youtube_input = gr.Textbox(label="Enter YouTube Link", visible=False, placeholder="https://www.youtube.com/watch?v=example")
    video_input = gr.File(label="Upload a Video", visible=False, type="filepath")


    # Step 2: Ask for preferred language
    language = gr.Radio(choices=["Arabic", "English"], label="Preferred Language", value="English")

    # Step 3: Ask if user wants a summary
    summarize = gr.Checkbox(label="Summarize the content?")

    # Step 4: Ask if user wants Q&A
    qna = gr.Checkbox(label="Generate Q&A about the content?")
    number = gr.Number(label="How many questions do you want at maximum?", value=5)

    # Add examples for testing
    examples = [
        ["Audio Upload", example_audio, None, None, "English", True, True, 5],
        ["Video Upload", None, None, example_video, "Arabic", True, False, 3],
        ["YouTube Link", None, example_youtube, None, "English", False, True, 2]
    ]
    gr.Examples(
        examples=examples,
        inputs=[content_type, file_input, youtube_input, video_input, language, summarize, qna, number],
        label="Try These Examples"
    )

    # Output result display
    with gr.Tab("Summary"):
        summary_output = gr.Textbox(label="Summary", interactive=False)
        audio_output = gr.Audio(label="Audio Summary")

    with gr.Tab("Q&A"):
        qna_output = gr.Markdown(label="Q&A Request")

    with gr.Tab("Interactive Q&A"):
        user_question = gr.Textbox(label="Ask a Question", placeholder="Enter your question here...")
        qa_button = gr.Button("Get Answer")
        qa_response = gr.Markdown(label="Answer")

        # Define the interaction for Q&A
        qa_button.click(interactive_qa, inputs=[user_question], outputs=qa_response)

    # Dynamic update of input box based on content type selection
    content_type.change(content_input_update, inputs=[content_type], outputs=[file_input, youtube_input, video_input])

    # Submit button
    submit_btn = gr.Button("Submit")

    # Action when submit button is clicked
    submit_btn.click(main, inputs=[content_type, file_input, youtube_input, video_input, language, summarize, qna, number], outputs=[summary_output, qna_output, audio_output])

demo.launch(debug=True)

Setting queue=True in a Colab notebook requires sharing enabled. Setting `share=True` (you can turn this off by setting `share=False` in `launch()` explicitly).

Colab notebook detected. This cell will run indefinitely so that you can see errors and logs. To turn off, set debug=False in launch().
Running on public URL: https://fe23a4cdc06fedab94.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)


Keyboard interruption in main thread... closing server.
Killing tunnel 127.0.0.1:7860 <> https://fe23a4cdc06fedab94.gradio.live


