***AI-Powered Automated Lecture Summarization, Q&A Generator, and Interactive

---

Learning Assistant***

**Comprehensive NLP Tasks**:
The code includes a range of NLP tasks:

Text extraction from Dataset, PDF

Text preprocessing

Summarization using T5 Transformer with multiple languages

Text-to-Speech (TTS) with multiple languages

Keyword extraction using TF-IDF

Named Entity Recognition (NER) using spaCy

Question-Answering using DistilBERT

Flashcard generation with MCQs

Sentiment analysis using a pretrained transformer

Saving results to CSV/JSON

Pretrained Models:
Leveraging T5 for summarization and DistilBERT for Q&A is efficient and reliable.

Modularity:
Each function is well-organized and reusable, making it easy to test and modify.

Interactive:
The flashcard quiz makes it interactive and engaging, improving user involvement.

Language Detection:
Using langdetect to auto-detect the language for TTS is a thoughtful touch

In [None]:
# Install Required Libraries
!pip install transformers nltk PyPDF2 gtts langdetect spacy scikit-learn
!pip install pypdf pdfplumber


# Import Libraries

import nltk
nltk.download('punkt_tab')

import PyPDF2
import torch
import IPython.display as ipd
import random
import pdfplumber
from transformers import pipeline, T5Tokenizer, T5ForConditionalGeneration
from gtts import gTTS
from google.colab import files
import pandas as pd
from langdetect import detect  # For language detection
from sklearn.feature_extraction.text import TfidfVectorizer
import spacy
# Install Required Libraries
!pip install transformers nltk PyPDF2 gtts langdetect spacy scikit-learn googletrans==4.0.0-rc1

from gtts import gTTS
from googletrans import Translator

import json

Collecting PyPDF2
  Downloading pypdf2-3.0.1-py3-none-any.whl.metadata (6.8 kB)
Collecting gtts
  Downloading gTTS-2.5.4-py3-none-any.whl.metadata (4.1 kB)
Collecting langdetect
  Downloading langdetect-1.0.9.tar.gz (981 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m981.5/981.5 kB[0m [31m12.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Downloading pypdf2-3.0.1-py3-none-any.whl (232 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m232.6/232.6 kB[0m [31m6.8 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading gTTS-2.5.4-py3-none-any.whl (29 kB)
Building wheels for collected packages: langdetect
  Building wheel for langdetect (setup.py) ... [?25l[?25hdone
  Created wheel for langdetect: filename=langdetect-1.0.9-py3-none-any.whl size=993223 sha256=94c7d9b3d5c603c0063865b3aa3d8404b2c5aec7a78aee82974683e4df6e5426
  Stored in directory: /root/.cache/pip/wheels/0a/f2/b2/e5ca405801e05eb7c8ed5b3b4bcf1fcabcd6272c

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Unzipping tokenizers/punkt_tab.zip.


Collecting googletrans==4.0.0-rc1
  Downloading googletrans-4.0.0rc1.tar.gz (20 kB)
  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting httpx==0.13.3 (from googletrans==4.0.0-rc1)
  Downloading httpx-0.13.3-py3-none-any.whl.metadata (25 kB)
Collecting hstspreload (from httpx==0.13.3->googletrans==4.0.0-rc1)
  Downloading hstspreload-2025.1.1-py3-none-any.whl.metadata (2.1 kB)
Collecting chardet==3.* (from httpx==0.13.3->googletrans==4.0.0-rc1)
  Downloading chardet-3.0.4-py2.py3-none-any.whl.metadata (3.2 kB)
Collecting idna==2.* (from httpx==0.13.3->googletrans==4.0.0-rc1)
  Downloading idna-2.10-py2.py3-none-any.whl.metadata (9.1 kB)
Collecting rfc3986<2,>=1.3 (from httpx==0.13.3->googletrans==4.0.0-rc1)
  Downloading rfc3986-1.5.0-py2.py3-none-any.whl.metadata (6.5 kB)
Collecting httpcore==0.9.* (from httpx==0.13.3->googletrans==4.0.0-rc1)
  Downloading httpcore-0.9.1-py3-none-any.whl.metadata (4.6 kB)
Collecting h11<0.10,>=0.8 (from httpcore==0.9.*->httpx==0.13.3->googl

In [None]:
# Load spaCy's English model for NER
nlp = spacy.load("en_core_web_sm")

nltk.download('punkt_tab')

# Available Language Options
language_options = {
    "1": "en",  # English
    "2": "hi",  # Hindi
    "3": "fr",  # French
    "4": "es",  # Spanish
    "5": "de",  # German
    "6": "zh",  # Chinese
    "7": "ta"   # Tamil
}

[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!


In [None]:
# Step 2: Load TED Talks Dataset
ted_df = pd.read_csv("ted_main.csv")
transcripts_df = pd.read_csv("transcripts.csv")

# Merge datasets on 'url'
df = ted_df.merge(transcripts_df, on='url')

In [None]:
# Step 1: Upload PDF File
def upload_pdf():
    uploaded = files.upload()
    return list(uploaded.keys())[0]  # Get the uploaded file name

In [None]:
# Step 2: Extract Text from PDF
def extract_text_from_pdf(pdf_path):
    with open(pdf_path, "rb") as pdf_file:
        reader = PyPDF2.PdfReader(pdf_file)
        text = "".join([page.extract_text() for page in reader.pages if page.extract_text()])
    return text.strip()

In [None]:
# Step 3: Preprocess Text
def preprocess_text(text):
    text = text.lower()
    text = nltk.sent_tokenize(text)
    return " ".join(text)

In [None]:
# Step 4: Summarization Model (Using T5 Transformer)
model_name = "t5-small"
tokenizer = T5Tokenizer.from_pretrained(model_name)
model = T5ForConditionalGeneration.from_pretrained(model_name)

def summarize_text(text, summary_length='medium'):
    input_text = "summarize: " + text
    inputs = tokenizer.encode(input_text, return_tensors="pt", max_length=512, truncation=True)

    if summary_length == 'short':
        summary_ids = model.generate(inputs, max_length=50, min_length=30, length_penalty=2.0, num_beams=4, early_stopping=True)
    elif summary_length == 'medium':
        summary_ids = model.generate(inputs, max_length=150, min_length=100, length_penalty=2.0, num_beams=4, early_stopping=True)
    else:  # detailed summary
        summary_ids = model.generate(inputs, max_length=300, min_length=200, length_penalty=2.0, num_beams=4, early_stopping=True)

    summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)
    return summary

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/2.32k [00:00<?, ?B/s]

spiece.model:   0%|          | 0.00/792k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.39M [00:00<?, ?B/s]

You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565


config.json:   0%|          | 0.00/1.21k [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/242M [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/147 [00:00<?, ?B/s]

In [None]:
# Step 5: Translate and Convert to Speech
def translate_text(text, source_lang, target_lang="en"):
    translator = Translator()
    translated_text = translator.translate(text, src=source_lang, dest=target_lang).text
    return translated_text

def text_to_speech(text):
    # Ask user for speech language choice
    print("\nSelect the language for speech output:")
    for key, lang in language_options.items():
        print(f"{key}. {lang.upper()}")

    choice = input("Enter the number corresponding to your preferred language: ")
    speech_lang = language_options.get(choice, "en")  # Default to English

    # Translate if needed
    if speech_lang != "en":
        translator = Translator()
        translated_text = translator.translate(text, dest=speech_lang).text
        print(f"\nTranslated Summary in {speech_lang.upper()}: {translated_text}")
    else:
        translated_text = text

    # Convert to speech
    tts = gTTS(translated_text, lang=speech_lang)
    audio_filename = f"summary_audio_{speech_lang}.mp3"
    tts.save(audio_filename)

    print(f"\nSummary audio saved as '{audio_filename}'. Playing now...")
    return ipd.Audio(audio_filename)

In [None]:
# Step 6: Keyword Extraction using TF-IDF
def extract_keywords_tfidf(text, top_n=10):
    vectorizer = TfidfVectorizer(stop_words='english', max_features=top_n)
    tfidf_matrix = vectorizer.fit_transform([text])
    feature_names = vectorizer.get_feature_names_out()
    scores = tfidf_matrix.sum(axis=0).A1
    keywords = [(feature_names[i], scores[i]) for i in range(len(feature_names))]
    keywords = sorted(keywords, key=lambda x: x[1], reverse=True)
    return keywords

In [None]:
# Step 7: Named Entity Recognition (NER) for Extracting Key Entities
def extract_named_entities(text):
    doc = nlp(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

In [None]:
# Step 8: Question-Answer Generation (Using BERT)
qa_pipeline = pipeline("question-answering", model="distilbert-base-cased-distilled-squad")


config.json:   0%|          | 0.00/473 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/261M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/49.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/213k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/436k [00:00<?, ?B/s]

Device set to use cpu


In [None]:
# Step 9: Enhanced Flashcard Generation with MCQs
def generate_questions_with_choices(text):
    questions = [
        {"question": "What is the main idea of the text?",
         "choices": ["Idea 1", "Idea 2", "Idea 3", "Idea 4"],
         "answer": "Idea 1"},

        {"question": "What are the key points discussed?",
         "choices": ["Key Point 1", "Key Point 2", "Key Point 3", "Key Point 4"],
         "answer": "Key Point 1"},

        {"question": "What is the conclusion?",
         "choices": ["Conclusion 1", "Conclusion 2", "Conclusion 3", "Conclusion 4"],
         "answer": "Conclusion 1"},

        {"question": "What are the challenges mentioned?",
         "choices": ["Challenge 1", "Challenge 2", "Challenge 3", "Challenge 4"],
         "answer": "Challenge 1"},

        {"question": "What solutions are proposed?",
         "choices": ["Solution 1", "Solution 2", "Solution 3", "Solution 4"],
         "answer": "Solution 1"}
    ]
    # Generate answers using QA pipeline and choose the best option from choices
    for q in questions:
        try:
            ans = qa_pipeline(question=q['question'], context=text)['answer']
            # Replace the first matching choice with the answer
            for i in range(len(q["choices"])):
                if ans.lower() in q["choices"][i].lower():
                    q["answer"] = q["choices"][i]
                    break
        except:
            q['answer'] = "No answer found."
    return questions



In [None]:
# Step 10: Convert Q&A into Flashcard Format with MCQs
def generate_flashcards_with_mcq(qa_list):
    flashcards = [{"Question": q["question"],
                   "Choices": q["choices"],
                   "Answer": q["answer"]} for q in qa_list]
    return flashcards

In [None]:
# Step 11: Gamified Flashcard Quiz with MCQs
def flashcard_quiz_with_mcq(flashcards):
    print("\n🎮 Welcome to the Flashcard Quiz! 🎮")
    print("Try to answer the questions by choosing one of the options.")

    score = 0
    random.shuffle(flashcards)  # Shuffle flashcards for randomness

    for card in flashcards:
        print(f"\nQ: {card['Question']}")

        # Display multiple-choice options
        for i, choice in enumerate(card['Choices']):
            print(f"{i+1}. {choice}")

        user_answer = input("Select the correct option (1/2/3/4): ").strip()

        # Check if the selected answer is correct
        try:
            if card['Choices'][int(user_answer) - 1] == card['Answer']:
                print("✅ Correct!")
                score += 1
            else:
                print(f"❌ Incorrect. The correct answer is: {card['Answer']}")
        except:
            print("Invalid choice! Skipping question.")
            print(f"Correct answer: {card['Answer']}")

    print(f"\n🏆 Quiz Completed! Your Score: {score}/{len(flashcards)} 🏆")


In [None]:

# Step 12: Sentiment Analysis
sentiment_analyzer = pipeline("sentiment-analysis")

def analyze_sentiment(text):
    sentiment = sentiment_analyzer(text)[0]
    return sentiment['label'], sentiment['score']

No model was supplied, defaulted to distilbert/distilbert-base-uncased-finetuned-sst-2-english and revision 714eb0f (https://huggingface.co/distilbert/distilbert-base-uncased-finetuned-sst-2-english).
Using a pipeline without specifying a model name and revision in production is not recommended.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

Device set to use cpu


In [None]:
!pip install pymupdf

Collecting pymupdf
  Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.4 kB)
Downloading pymupdf-1.25.5-cp39-abi3-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (20.0 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m20.0/20.0 MB[0m [31m37.5 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: pymupdf
Successfully installed pymupdf-1.25.5


In [None]:
import fitz  # PyMuPDF
from sklearn.metrics import accuracy_score
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

# Step 1: Upload PDF
uploaded_pdf = "/content/tamil.pdf"  # Update with your actual PDF path

# Step 2: Extract text from PDF
def extract_pdf_text(path):
    doc = fitz.open(path)
    text = ""
    for page in doc:
        text += page.get_text()
    doc.close()
    return text

pdf_text = extract_pdf_text(uploaded_pdf)

# Step 3: Split into sentences
sentences = [s.strip() for s in pdf_text.split('.') if len(s.strip()) > 20]

# Step 4: Provide true labels (manually)
true_labels = [
    "POSITIVE", "NEGATIVE", "NEUTRAL", "NEGATIVE", "POSITIVE", "NEUTRAL"
]

sample_sentences = sentences[:len(true_labels)]

# Step 5: Load multilingual sentiment model (supports NEUTRAL too)
model_name = "cardiffnlp/twitter-roberta-base-sentiment"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Step 6: Create pipeline
sentiment_analyzer = pipeline("sentiment-analysis", model=model, tokenizer=tokenizer)

# Step 7: Map LABEL_0/1/2 to human-readable classes
label_map = {
    "LABEL_0": "NEGATIVE",
    "LABEL_1": "NEUTRAL",
    "LABEL_2": "POSITIVE"
}

# Step 8: Predict sentiments
predicted_labels = [label_map[sentiment_analyzer(text)[0]['label']] for text in sample_sentences]

# Optional: Print both predictions and true labels for comparison
print("True labels:     ", true_labels)
print("Predicted labels:", predicted_labels)

# Step 9: Calculate accuracy
accuracy = accuracy_score(true_labels, predicted_labels)
print(f"✅ Sentiment Accuracy with NEUTRAL class: {accuracy * 100:.2f}%")


Device set to use cpu


True labels:      ['POSITIVE', 'NEGATIVE', 'NEUTRAL', 'NEGATIVE', 'POSITIVE', 'NEUTRAL']
Predicted labels: ['NEUTRAL', 'NEUTRAL', 'NEUTRAL', 'NEUTRAL', 'NEUTRAL', 'NEUTRAL']
✅ Sentiment Accuracy with NEUTRAL class: 33.33%


In [None]:
# Step 13: Save Results to CSV or JSON
def save_results_to_csv(summaries, sentiment, flashcards, filename="results.csv"):
    # Save summaries to a DataFrame
    summary_data = {
        "Summary Type": ["Short", "Medium", "Detailed"],
        "Summary": [summaries['short'], summaries['medium'], summaries['detailed']],
    }
    summary_df = pd.DataFrame(summary_data)

    # Save sentiment analysis results
    sentiment_data = {
        "Sentiment Label": [sentiment['label']],
        "Sentiment Score": [sentiment['score']],
    }
    sentiment_df = pd.DataFrame(sentiment_data)

    # Save flashcards to a DataFrame
    flashcard_data = []
    for flashcard in flashcards:
        flashcard_data.append({
            "Question": flashcard['Question'],
            "Choices": ", ".join(flashcard['Choices']),
            "Answer": flashcard['Answer']
        })
    flashcard_df = pd.DataFrame(flashcard_data)

    # Combine all data
    final_df = pd.concat([summary_df, sentiment_df, flashcard_df], ignore_index=True)

    # Save to CSV
    final_df.to_csv(filename, index=False)
    print(f"Results saved to {filename}")

def save_results_to_json(summaries, sentiment, flashcards, filename="results.json"):
    results = {
        "summaries": summaries,
        "sentiment": sentiment,
        "flashcards": flashcards
    }

    # Save to JSON
    with open(filename, 'w') as json_file:
        json.dump(results, json_file, indent=4)
    print(f"Results saved to {filename}")


In [None]:
# Step 14: Main Execution - Save results to CSV and JSON
if __name__ == "__main__":
    # Step 1: Upload PDF
    pdf_path = upload_pdf()

    # Step 2: Ask user for PDF language
    print("\nSelect the language of the uploaded PDF:")
    for key, lang in language_options.items():
        print(f"{key}. {lang.upper()}")

    pdf_lang_choice = input("Enter the number corresponding to the PDF's language: ")
    pdf_lang = language_options.get(pdf_lang_choice, "en")  # Default to English

    # Step 3: Extract text from PDF
    print("\nExtracting text from PDF...")
    pdf_text = extract_text_from_pdf(pdf_path)

    if not pdf_text:
        print("Error: No text extracted from the PDF. Please check the file.")
    else:
        # Step 4: Preprocess the text
        print("\nPreprocessing text...")
        preprocessed_text = preprocess_text(pdf_text)

        # Step 5: Ask for language choice for summarization
        print("\nSelect the language for summarization:")
        for key, lang in language_options.items():
            print(f"{key}. {lang.upper()}")

        summary_lang_choice = input("Enter the number corresponding to the summarization language: ")
        summary_lang = language_options.get(summary_lang_choice, "en")  # Default to English

        # Step 6: Translate PDF text to English (if needed) for summarization
        if pdf_lang != "en":
            print("\nTranslating PDF text to English for summarization...")
            pdf_text = translate_text(pdf_text, source_lang=pdf_lang, target_lang="en")

        # Step 7: Preprocess the text
        print("\nPreprocessing text...")
        preprocessed_text = preprocess_text(pdf_text)

        # Step 8: Generate Summary in Selected Language
        print("\nGenerating summary...")
        medium_summary = summarize_text(preprocessed_text, summary_length='medium')

        if summary_lang != "en":
            print(f"\nTranslating Summary to {summary_lang.upper()}...")
            medium_summary = translate_text(medium_summary, source_lang="en", target_lang=summary_lang)

        print("\n--- Summary ---")
        print(medium_summary)

        # Step 9: Generate Multiple Summaries (Short, Medium, Detailed)
        print("\nGenerating summaries...")

        # Short summary
        short_summary = summarize_text(preprocessed_text, summary_length='short')
        print("\n--- Short Summary ---")
        print(short_summary)

        # Medium summary
        medium_summary = summarize_text(preprocessed_text, summary_length='medium')
        print("\n--- Medium Summary ---")
        print(medium_summary)

        # Detailed summary
        detailed_summary = summarize_text(preprocessed_text, summary_length='detailed')
        print("\n--- Detailed Summary ---")
        print(detailed_summary)

        # Step 10: Convert Summary to Speech in Selected Language
        ipd.display(text_to_speech(medium_summary))

        # Step 5: Text-to-Speech (TTS) for Summary with Language Support
        def text_to_speech(text, lang='en'):
            tts = gTTS(text, lang=lang)
            tts.save("summary_audio.mp3")  # Save the summary speech
            return ipd.Audio("summary_audio.mp3")  # Return audio for playback in Colab

        # Step 11: Convert Summary to Speech and Save
        lang = detect(preprocessed_text)  # Detect language of the text
        print(f"\nDetected language: {lang}")

        print("\nPlaying summary as speech...")
        audio = text_to_speech(detailed_summary, lang)  # You can select any of the summaries here
        ipd.display(audio)  # Play the audio in Colab
        print("\nSummary audio saved as 'summary_audio.mp3'.")

        # Step 12: Sentiment Analysis of the text
        print("\nAnalyzing sentiment of the text...")
        sentiment_label, sentiment_score = analyze_sentiment(preprocessed_text)
        sentiment = {'label': sentiment_label, 'score': sentiment_score}
        print(f"Sentiment Label: {sentiment_label} | Sentiment Score: {sentiment_score}")

        # Step 13: Generate Q&A with MCQs
        print("\nGenerating questions and answers with multiple-choice options...")
        questions_with_choices = generate_questions_with_choices(preprocessed_text)

        # Step 14: Create flashcards with MCQs
        flashcards_with_mcq = generate_flashcards_with_mcq(questions_with_choices)
        print(f"\nFlashcards generated for quiz:")
        for flashcard in flashcards_with_mcq:
            print(f"{flashcard['Question']}")
            print(f"Choices: {flashcard['Choices']}")
            print(f"Answer: {flashcard['Answer']}")

        # Step 15: Play a gamified flashcard quiz
        flashcard_quiz_with_mcq(flashcards_with_mcq)

        # Step 16: Save results
        summaries = {
            "short": short_summary,
            "medium": medium_summary,
            "detailed": detailed_summary
        }
        save_results_to_csv(summaries, sentiment, flashcards_with_mcq, filename="results.csv")
        save_results_to_json(summaries, sentiment, flashcards_with_mcq, filename="results.json")


Saving spanish.pdf to spanish.pdf

Select the language of the uploaded PDF:
1. EN
2. HI
3. FR
4. ES
5. DE
6. ZH
7. TA
Enter the number corresponding to the PDF's language: 4

Extracting text from PDF...

Preprocessing text...

Select the language for summarization:
1. EN
2. HI
3. FR
4. ES
5. DE
6. ZH
7. TA
Enter the number corresponding to the summarization language: 1

Translating PDF text to English for summarization...

Preprocessing text...

Generating summary...

--- Summary ---
ai has transformed the way we interact with the technology and the world around us. it is possible to develop more that analyzes large volumes of data, identify patterns and make decisions with high precision. ai has promoted automation in industries such as manufacturing and trade, optimizing processes and reducing costs. despite its great benefits, artificial intelligence presents challenges important, such as data privacy, algorithmic discrimination and impact on the employment. despite its great benefi


Detected language: en

Playing summary as speech...



Summary audio saved as 'summary_audio.mp3'.

Analyzing sentiment of the text...
Sentiment Label: POSITIVE | Sentiment Score: 0.9960676431655884

Generating questions and answers with multiple-choice options...

Flashcards generated for quiz:
What is the main idea of the text?
Choices: ['Idea 1', 'Idea 2', 'Idea 3', 'Idea 4']
Answer: Idea 1
What are the key points discussed?
Choices: ['Key Point 1', 'Key Point 2', 'Key Point 3', 'Key Point 4']
Answer: Key Point 1
What is the conclusion?
Choices: ['Conclusion 1', 'Conclusion 2', 'Conclusion 3', 'Conclusion 4']
Answer: Conclusion 1
What are the challenges mentioned?
Choices: ['Challenge 1', 'Challenge 2', 'Challenge 3', 'Challenge 4']
Answer: Challenge 1
What solutions are proposed?
Choices: ['Solution 1', 'Solution 2', 'Solution 3', 'Solution 4']
Answer: Solution 1

🎮 Welcome to the Flashcard Quiz! 🎮
Try to answer the questions by choosing one of the options.

Q: What solutions are proposed?
1. Solution 1
2. Solution 2
3. Solution 3
4. 

In [None]:
import pickle

# Assuming 'model' is your trained AI model
model_filename = "lecture_summarization_model.pkl"

# Save the model as a pickle file
with open(model_filename, "wb") as file:
    pickle.dump(model, file)

print(f"Model saved as {model_filename}")


Model saved as lecture_summarization_model.pkl


In [None]:
# Load the model from the pickle file
with open(model_filename, "rb") as file:
    loaded_model = pickle.load(file)

print("Model loaded successfully")


Model loaded successfully


In [None]:
from sklearn.metrics import accuracy_score

# Sample manually labeled test data
test_data = [
    ("The product was amazing and I loved it!", "POSITIVE"),
    ("I am very disappointed with the service.", "NEGATIVE"),
    ("Such a wonderful experience!", "POSITIVE"),
    ("It was a total waste of time and money.", "NEGATIVE"),
    ("Highly recommend this to everyone!", "POSITIVE"),
    ("The quality is terrible.", "NEGATIVE"),
    ("I’m extremely satisfied.", "POSITIVE"),
    ("The customer support was rude and unhelpful.", "NEGATIVE"),
    ("Very happy with my purchase.", "POSITIVE"),
    ("This is the worst thing I’ve bought.", "NEGATIVE")
]

# Run predictions
predicted_labels = [analyze_sentiment(text)[0].upper() for text, _ in test_data]
true_labels = [label for _, label in test_data]

# Compute accuracy
accuracy = accuracy_score(true_labels, predicted_labels)
print(f"✅ Accuracy of DistilBERT on test set: {accuracy * 100:.2f}%")


✅ Accuracy of DistilBERT on test set: 100.00%
