# 📚 StudyGenie: AI-Powered Study Assistant

## Project Overview

**StudyGenie** is an innovative AI-powered study assistant designed to transform how students engage with academic materials. By leveraging advanced generative AI capabilities, StudyGenie processes uploaded PDF documents, answers questions with context-aware precision, and generates tailored study resources like summaries, multiple-choice questions (MCQs), and flashcards. This project addresses a real-world problem: the challenge students face in efficiently studying complex materials, retaining key concepts, and preparing for exams.

### Why StudyGenie?
- **Real-World Impact**: Helps students save time, understand complex topics, and excel in their studies.
- **Innovative Use Case**: Combines document understanding, retrieval-augmented generation (RAG), and structured output to create a seamless learning experience.
- **Gen AI Capabilities**:
  - **Document Understanding**: Extracts and processes text from PDF documents.
  - **Retrieval Augmented Generation (RAG)**: Provides accurate answers by retrieving relevant document chunks.
  - **Structured Output/JSON Mode**: Delivers answers in a consistent JSON format for clarity and usability.
  - **Embeddings**: Uses AI embeddings to enable semantic search within documents.

### How It Works
1. **Upload a PDF**: Students upload study materials (e.g., lecture notes, textbooks).
2. **Process Documents**: The AI extracts text, splits it into chunks, and creates a vector database for efficient retrieval.
3. **Ask Questions**: Students ask questions, and StudyGenie retrieves relevant context to provide concise, accurate answers.
4. **Generate Study Materials**: The AI creates summaries, MCQs, and flashcards to reinforce learning.

### Target Audience
- Students preparing for exams.
- Educators creating study resources.
- Lifelong learners seeking to understand complex texts.

This notebook demonstrates a fully functional prototype, showcasing how generative AI can revolutionize education. Let’s dive into the code and see StudyGenie in action! 🚀

## Installing Dependencies

This cell installs the necessary Python packages to power StudyGenie’s AI capabilities. It’s a critical first step to ensure the notebook can process PDFs, generate embeddings, perform RAG, and create structured outputs.

**Why It’s Important**:
- Installs libraries like `langchain` for RAG and chaining AI tasks.
- Includes `pypdf` for document understanding (PDF parsing).
- Sets up `sentence-transformers` and `faiss-cpu` for embeddings and vector search.
- Adds `ipywidgets` for an interactive user interface.

In [2]:
# Install required packages
!pip install langchain langchain-community langchain-google-genai pypdf sentence-transformers faiss-cpu google-generativeai python-docx ipywidgets



## Importing Libraries and Setting Constants

This cell imports the required libraries and defines constants used throughout the notebook. It sets the stage for document processing, AI model integration, and vector database management.

**Why It’s Important**:
- Imports `langchain` modules for RAG and document processing.
- Configures the path for the FAISS vector database, which stores document embeddings for retrieval.
- Ensures all dependencies are loaded to support generative AI capabilities.

**Role in Workflow**:
- Prepares the environment for document understanding (PDF loading), embeddings (semantic search), and RAG (context-aware answers).

In [11]:
# Standard library imports
import os                      # Used for handling file paths and environment variables
import json                    # For reading and writing JSON data
from pathlib import Path       # Provides an object-oriented interface to handle file paths
from io import BytesIO         # Allows for in-memory binary stream handling
from docx import Document      # Enables parsing and reading of Word (.docx) documents

# LangChain ecosystem imports
from langchain_community.document_loaders import PyPDFLoader         # Loads and extracts text from PDF documents
from langchain.text_splitter import RecursiveCharacterTextSplitter   # Splits large documents into overlapping text chunks for processing
from langchain_community.vectorstores import FAISS                   # Vector store for similarity search based on document embeddings
from langchain_google_genai import GoogleGenerativeAIEmbeddings      # Embedding model using Google Gemini for vector generation
from langchain_google_genai import ChatGoogleGenerativeAI            # Chat model interface to interact with Gemini LLM
from langchain.chains import RetrievalQA, LLMChain                   # Pre-built chains for retrieval-augmented generation and language model pipelines
from langchain.prompts import PromptTemplate                         # Allows for structured and reusable prompt creation

# Google Generative AI API setup
import google.generativeai as genai                                  # Required for authentication and configuration of Gemini API access

# Path to store or load the FAISS index
FAISS_INDEX_PATH = "./faiss_index"


The history saving thread hit an unexpected error (OperationalError('attempt to write a readonly database')).History will not be written to the database.


## Configuring the Google API Key

This cell creates an interactive interface for users to input their Google API key securely. The key is essential for accessing Google’s generative AI models.

**Why It’s Important**:
- Securely configures the AI client, enabling access to embeddings and text generation.
- Uses `ipywidgets` for a user-friendly interface, enhancing accessibility.
- Validates the API key to ensure the notebook runs without errors.

**Role in Workflow**:
- Initializes the generative AI backend, which powers document understanding, embeddings, and RAG.

**Note**: If `ipywidgets` fails to load, ensure it’s installed (`pip install ipywidgets`) and Jupyter is configured to support widgets (`jupyter nbextension enable --py widgetsnbextension`).

In [12]:
# Importing interactive widget libraries for use in a Jupyter Notebook
import ipywidgets as widgets
from IPython.display import display, clear_output

# Create a secure input field for the user to enter their Google API key
api_key_input = widgets.Password(
    value='',
    placeholder='Enter your Google API Key here',
    description='Google API Key:',
    disabled=False,
    style={'description_width': 'initial'},
    layout=widgets.Layout(width='50%')
)

# Create a button that will trigger the API configuration process
configure_button = widgets.Button(
    description='Configure API Key',
    button_style='info',
    tooltip='Click to configure the Google AI client with your key'
)

# Create an output area to show feedback messages
api_status_output = widgets.Output()

# Define the function that runs when the user clicks the button
def on_configure_button_clicked(b):
    with api_status_output:
        clear_output()  # Clear previous messages
        api_key = api_key_input.value
        if not api_key:
            print('⚠️ Please enter your Google API Key.')
            return

        try:
            print('Configuring Google AI Client...')
            genai.configure(api_key=api_key)  # Apply the API key to configure access
            print('✅ Google AI Client Configured Successfully!')
            api_key_input.disabled = True      # Lock input after success
            configure_button.disabled = True   # Disable button to prevent changes
        except Exception as e:
            print(f'❌ Failed to configure Google AI Client: {e}')

# Connect the button to the handler function
configure_button.on_click(on_configure_button_clicked)

# Render the input and button together in the notebook
display(widgets.VBox([api_key_input, configure_button, api_status_output]))


VBox(children=(Password(description='Google API Key:', layout=Layout(width='50%'), placeholder='Enter your Goo…

## Loading and Processing PDF Documents

This cell defines functions to load PDF files, split them into manageable chunks, and create a FAISS vector database for retrieval. It leverages **Document Understanding** and **Embeddings** to prepare documents for RAG.

**Why It’s Important**:
- `load_pdf`: Extracts text from PDFs, enabling document understanding.
- `process_documents`: Splits text into chunks for efficient processing and retrieval.
- `create_vectordb`: Generates embeddings and stores them in a FAISS database for semantic search.

**Role in Workflow**:
- Converts raw PDFs into a structured format for AI processing.
- Enables RAG by creating a searchable vector database.

In [13]:
def load_pdf(pdf_path):
    """Load a PDF file and return a list of documents"""
    try:
        # Use PyPDFLoader to extract text content from the provided PDF
        loader = PyPDFLoader(pdf_path)
        return loader.load()
    except Exception as e:
        # Log any issues during loading
        print(f"Error loading PDF: {e}")
        return None


def process_documents(documents):
    """Process and split the documents into chunks"""
    if not documents:
        return []

    try:
        # Use a recursive character splitter to divide the text into overlapping chunks
        # This improves retrieval accuracy in RAG by preserving context
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,       # Number of characters per chunk
            chunk_overlap=200,     # Overlap between chunks to maintain context
            separators=["\n\n", "\n", " ", ""]  # Prioritize splitting at logical points
        )
        chunks = text_splitter.split_documents(documents)
        return chunks
    except Exception as e:
        # Handle errors in case the document can't be split properly
        print(f"Error processing documents: {e}")
        return []


def create_vectordb(chunks, api_key):
    """Create and save a FAISS vector database from the document chunks"""
    if not chunks:
        return None

    try:
        # Create embeddings using Google's Gemini embedding model
        embeddings = GoogleGenerativeAIEmbeddings(
            model="models/embedding-001",
            google_api_key=api_key
        )

        # Ensure the directory for storing FAISS index exists
        os.makedirs(FAISS_INDEX_PATH, exist_ok=True)

        # Build the FAISS vector store from the document chunks
        vectordb = FAISS.from_documents(
            documents=chunks,
            embedding=embeddings
        )

        # Save the vector store locally to disk for reuse
        vectordb.save_local(folder_path=FAISS_INDEX_PATH)
        print(f"Vector database created and saved to {FAISS_INDEX_PATH}")
        return vectordb
    except Exception as e:
        # Catch and log any error during vector DB creation
        print(f"Error creating FAISS vector database: {e}")
        return None


## Setting Up the RAG-Based Q&A System

This cell configures a **Retrieval Augmented Generation (RAG)** system to answer questions based on the uploaded PDF. It uses a structured prompt to ensure answers are concise and formatted as JSON (**Structured Output**).

**Why It’s Important**:
- Combines retrieval (from the FAISS vector database) with generation (using Google’s Gemini model) to provide accurate, context-aware answers.
- Uses a structured prompt to enforce JSON output, ensuring consistency and usability.
- Enhances learning by providing follow-up questions and resource suggestions.

**Role in Workflow**:
- Enables students to ask questions and receive precise answers grounded in the document.
- Demonstrates RAG and structured output capabilities, key to the Kaggle competition.

In [14]:
def setup_qa_system(vectordb, api_key):
    """Setup a RAG-based Q&A system"""
    if not vectordb:
        return None

    try:
        # Initialize the Gemini language model (via Vertex AI) for response generation
        llm = ChatGoogleGenerativeAI(
            model="gemini-1.5-flash",                 # Fast, cost-efficient Gemini model
            google_api_key=api_key,                   # Google API key for authentication
            temperature=0.2,                          # Low randomness for factual answers
            top_p=0.95,                               # Top-p nucleus sampling
            max_output_tokens=2048,                   # Max response length
            convert_system_message_to_human=True      # Converts system prompts for better clarity
        )

        # Define a detailed prompt template that instructs the model to:
        # - Provide short, precise explanations
        # - Suggest follow-up questions
        # - Recommend external learning resources
        # - Format everything in clean JSON for structured output
        template = """
        You are an expert tutor named StudyGenie specializing in explaining complex topics clearly and concisely based on the provided files. Your goal is to help a user understand a specific concept using ONLY the text provided from these documents.

        **Instructions:**

        1. Read the user's {question}.
        2. Carefully analyze the {context} provided below (this context is extracted from the uploaded documents).
        3. Generate a concise, clear, and accurate explanation of the {question} based *strictly* on the information within the {context}. Do not add external knowledge or information not present in the documents. Keep the explanation focused and to the point (2-4 sentences).
        4. Generate 2-3 distinct, relevant follow-up questions that a learner might ask to deepen their understanding of your explanation or the concept itself, based on the document content.
        5. Suggest 1-2 relevant search terms or types of external resources (e.g., 'search for tutorials on [topic]', 'look for articles explaining [concept]') that could help the user learn more about the topic based on the context.
        6. Format your entire response *strictly* as a single, valid JSON object. Do NOT include any text before or after the JSON object itself. The JSON object must have these exact keys: "explanation" (string), "follow_up_questions" (list of strings), and "potential_resources" (list of strings, where each string is a suggested search term or resource type).

        **Context:**
        Context:
        {context}

        **Concept:**
        Concept:
        {question}

        **JSON Output (MUST be only valid JSON):**
        """

        # Wrap the prompt in a LangChain-compatible PromptTemplate
        PROMPT = PromptTemplate(
            template=template,
            input_variables=["context", "question"]
        )

        # Setup a RetrievalQA chain using the vector database and prompt
        # This enables RAG: retrieves top 5 relevant chunks and passes them into the prompt
        qa_chain = RetrievalQA.from_chain_type(
            llm=llm,
            chain_type="stuff",  # Basic input stuffing; suitable for short contexts
            retriever=vectordb.as_retriever(search_kwargs={"k": 5}),
            chain_type_kwargs={"prompt": PROMPT},
            return_source_documents=True  # Return source chunks for context transparency
        )

        return qa_chain

    except Exception as e:
        # Handle and report initialization errors
        print(f"Error setting up Q&A system: {e}")
        return None

## Handling User Questions

This cell defines a function to process user questions using the RAG system. It retrieves relevant document chunks, generates an answer, and parses the JSON response.

**Why It’s Important**:
- Implements the core Q&A functionality, allowing students to ask questions about their study materials.
- Ensures answers are structured (JSON) and grounded in the document context (RAG).
- Provides source documents for transparency, enhancing trust in the AI’s responses.

**Role in Workflow**:
- Delivers precise, context-aware answers to support student learning.
- Demonstrates RAG and structured output capabilities for the Kaggle competition.

In [15]:
def ask_question(qa_chain, concept_to_explain):
    """Ask a question to the QA system and process its JSON response"""
    
    # Check if the QA system is initialized
    if not qa_chain:
        print("QA system not initialized. Please upload a PDF first.")
        return None, None

    try:
        # Send the question to the QA chain (RAG + Gemini)
        result = qa_chain.invoke({"query": concept_to_explain})
        
        # Extract the JSON string and the source documents
        answer_json_str = result.get("result")
        source_docs = result.get("source_documents", [])

        try:
            # Remove code block formatting if present (e.g., Markdown style)
            if answer_json_str.startswith("```json\n"):
                answer_json_str = answer_json_str[7:]
            if answer_json_str.endswith("\n```"):
                answer_json_str = answer_json_str[:-4]
            answer_json_str = answer_json_str.strip()

            # Attempt to parse the cleaned string as JSON
            answer_data = json.loads(answer_json_str)

            # Validate that all required keys are present
            required_keys = ["explanation", "follow_up_questions", "potential_resources"]
            if not all(k in answer_data for k in required_keys):
                print("AI response is valid JSON but missing expected keys.")
                return None, None

            # Return the structured result and the documents used to generate the answer
            return answer_data, source_docs

        except json.JSONDecodeError as e:
            # Handle cases where the model returned invalid JSON
            print(f"Failed to parse the response from the AI as JSON. Error: {e}")
            return None, None

    except Exception as e:
        # Catch any general error in the QA pipeline
        print(f"Error asking question: {e}")
        return None, None


## Generating Study Materials

This cell defines functions to generate study materials (summaries, MCQs, and flashcards) using the generative AI model. It leverages **Document Understanding** to extract key concepts and **Structured Output** for formatted results.

**Why It’s Important**:
- Creates summaries to help students grasp main ideas quickly.
- Generates MCQs and flashcards to reinforce learning and test understanding.
- Uses structured prompts to ensure consistent, exam-relevant outputs.

**Role in Workflow**:
- Enhances study efficiency by providing tailored resources.
- Demonstrates document understanding and structured output for the Kaggle competition.

In [16]:
def create_llm_for_generation(api_key):
    """Creates an LLM instance specifically for generation tasks."""
    try:
        # Initialize the Gemini model for content generation (e.g., summaries, MCQs, flashcards)
        llm = ChatGoogleGenerativeAI(
            model="gemini-1.5-pro",                  # Higher capacity model for richer outputs
            google_api_key=api_key,
            temperature=0.3,                         # Low to moderate creativity
            max_output_tokens=4096,                  # Maximum allowed token length
            convert_system_message_to_human=True     # Improves response readability
        )
        return llm
    except Exception as e:
        # Catch and report any issues during model initialization
        print(f"Error creating LLM for generation: {e}")
        return None


def create_summary_generator(llm):
    """Create a summary generator chain."""
    # Template prompts Gemini to produce an educational summary
    summary_template = """
    You are an expert educational summarizer. Create a clear, concise summary of the following text.
    Focus on the main concepts, key points, and critical details.

    TEXT:
    {text}

    SUMMARY:
    """
    # Wrap prompt in LangChain's PromptTemplate
    summary_prompt = PromptTemplate(template=summary_template, input_variables=["text"])
    
    # Create an LLMChain for text summarization
    summary_chain = LLMChain(llm=llm, prompt=summary_prompt)
    return summary_chain


def create_mcq_generator(llm):
    """Create a multiple-choice question generator chain."""
    # Prompt template instructs the LLM to generate conceptual MCQs based on academic standards
    mcq_template = """
    You are an expert educator designing exam-applicable multiple-choice questions. Based *only* on the core concepts, key principles, and significant information presented in the following text, create {num_questions} multiple-choice questions.

    **Focus exclusively on understanding and application of the material.** Do NOT ask questions about:
    - Document metadata (e.g., page numbers, specific section titles unless core to the content)
    - The source of the information (e.g., "According to page 5...")
    - Trivial details or overly specific examples unless they illustrate a fundamental concept.

    Each question must test conceptual understanding and have 4 options (labeled A, B, C, D) with only one clearly correct answer. Format your response as plain text, following this structure EXACTLY for each question:

    Q[question number]: [Question text focusing on core concepts]
    A. [Option A]
    B. [Option B]
    C. [Option C]
    D. [Option D]
    Correct Answer: [Correct option letter, e.g., C]
    Explanation: [Brief explanation of why the answer is correct, linking back to the core concept]

    (Ensure there is a blank line between each question block)

    TEXT:
    {text}

    EXAM-APPLICABLE MULTIPLE CHOICE QUESTIONS FOCUSED ON CORE CONCEPTS:
    """
    # Setup the prompt with dynamic inputs for number of questions and content
    mcq_prompt = PromptTemplate(
        template=mcq_template,
        input_variables=["text", "num_questions"]
    )
    
    # Chain that generates MCQs from input text
    mcq_chain = LLMChain(llm=llm, prompt=mcq_prompt)
    return mcq_chain


def create_flashcard_generator(llm):
    """Create a flashcard generator chain."""
    # Prompt for generating clean, conceptual flashcards in Q&A format
    flashcard_template = """
    You are an expert educator. Create {num_cards} flashcards focusing on the CORE CONCEPTS, KEY TERMS, and MAIN IDEAS presented in the following text.
    Avoid trivial details or overly specific examples unless they are central to understanding a core concept.
    Format your response as PLAIN TEXT, following this structure EXACTLY for each card:

    Front: [Question, term, or concept]
    Back: [Answer, definition, or explanation]

    (Ensure there is a blank line between each card block)

    TEXT:
    {text}

    PLAIN TEXT FLASHCARDS (focus on core concepts):
    """
    # Define prompt inputs
    flashcard_prompt = PromptTemplate(
        template=flashcard_template,
        input_variables=["text", "num_cards"]
    )
    
    # Chain to generate flashcards using the LLM
    flashcard_chain = LLMChain(llm=llm, prompt=flashcard_prompt)
    return flashcard_chain


## Orchestrating Study Material Generation

This cell combines the summary, MCQ, and flashcard generators to produce comprehensive study materials from document chunks.

**Why It’s Important**:
- Integrates multiple generative AI tasks to create a complete study package.
- Processes document chunks to extract key concepts, ensuring relevance.
- Supports customizable output (e.g., number of MCQs/flashcards) for flexibility.

**Role in Workflow**:
- Delivers actionable study resources to enhance learning.
- Demonstrates document understanding and structured output for the Kaggle competition.

In [17]:
def generate_study_materials(chunks, api_key, num_chunks=5, num_mcqs=5, num_flashcards=5):
    """Generate study materials (summary, MCQs, flashcards) from selected chunks."""
    
    # Check for available document chunks
    if not chunks:
        print("No document chunks available to generate study materials.")
        return None

    # Create a language model instance specifically for generation tasks
    llm = create_llm_for_generation(api_key)
    if not llm:
        return None

    # Initialize the three generator chains using the LLM
    summary_generator = create_summary_generator(llm)
    mcq_generator = create_mcq_generator(llm)
    flashcard_generator = create_flashcard_generator(llm)

    # Select a subset of the chunks to limit token usage and ensure efficiency
    selected_chunks = chunks[:min(num_chunks, len(chunks))]
    if not selected_chunks:
        print("Not enough document chunks to generate materials.")
        return None

    # Combine the content of the selected chunks into a single input string
    combined_text = "\n\n".join([chunk.page_content for chunk in selected_chunks])

    results = {}

    # Generate summary from the text
    try:
        print("Generating summary...")
        summary = summary_generator.invoke({"text": combined_text}).get("text", "Failed to generate summary.")
        results["summary"] = summary
    except Exception as e:
        print(f"Error generating summary: {e}")
        results["summary"] = "Error generating summary."

    # Generate multiple-choice questions
    try:
        print("Generating MCQs...")
        mcq_response = mcq_generator.invoke({
            "text": combined_text,
            "num_questions": num_mcqs
        }).get("text")
        results["mcqs_text"] = mcq_response if mcq_response else "MCQ generation failed or returned empty."
    except Exception as e:
        print(f"Error generating MCQs: {e}")
        results["mcqs_text"] = f"Error during MCQ generation: {e}"

    # Generate flashcards
    try:
        print("Generating flashcards...")
        flashcard_response = flashcard_generator.invoke({
            "text": combined_text,
            "num_cards": num_flashcards
        }).get("text")
        results["flashcards_text"] = flashcard_response if flashcard_response else "Flashcard generation failed or returned empty."
    except Exception as e:
        print(f"Error generating flashcards: {e}")
        results["flashcards_text"] = f"Error during flashcard generation: {e}"

    return results


## Creating the Interactive Interface

This cell sets up an interactive interface using `ipywidgets` to allow users to upload PDFs, ask questions, and generate study materials. It integrates all components into a cohesive user experience.

**Why It’s Important**:
- Provides a user-friendly interface for students to interact with StudyGenie.
- Handles file uploads, question inputs, and study material generation dynamically.
- Displays results clearly, enhancing usability and engagement.

**Role in Workflow**:
- Ties together document understanding, RAG, and structured output into a practical application.
- Demonstrates the real-world impact of the project for the Kaggle competition.

**Note**: Ensure `ipywidgets` is properly installed and configured to avoid errors.

In [18]:
# File upload widget: Allows the user to upload a PDF file
file_upload = widgets.FileUpload(
    accept='.pdf',  # Accepts only PDF files
    multiple=False,  # Allows only one file at a time
    description='Upload PDF:',  # Label for the upload button
    style={'description_width': 'initial'}  # Adjusts the width of the description label
)

# Question input widget: Text field for the user to enter their question
question_input = widgets.Text(
    value='',  # Initial value of the text field
    placeholder='Enter your question here',  # Placeholder text when the field is empty
    description='Question:',  # Label for the question input field
    style={'description_width': 'initial'},  # Adjusts the width of the description label
    layout=widgets.Layout(width='50%')  # Sets the width of the input field to 50%
)

# Study material generation controls: Sliders to specify the number of study materials to generate
num_mcqs = widgets.IntSlider(
    value=5,  # Default value for the slider
    min=1,  # Minimum value
    max=20,  # Maximum value
    step=1,  # Step size for slider movement
    description='Number of MCQs:',  # Label for the MCQ slider
    style={'description_width': 'initial'}  # Adjusts the width of the description label
)

num_flashcards = widgets.IntSlider(
    value=5,  # Default value for the slider
    min=1,  # Minimum value
    max=20,  # Maximum value
    step=1,  # Step size for slider movement
    description='Number of Flashcards:',  # Label for the flashcard slider
    style={'description_width': 'initial'}  # Adjusts the width of the description label
)

# Output areas: These widgets will display the output for QA and study materials
qa_output = widgets.Output()
study_materials_output = widgets.Output()

# Global variables to store the state of the system
current_vectordb = None  # Stores the vector database for QA
current_qa_chain = None  # Stores the QA chain system
current_chunks = []  # Stores the chunks of the processed document

# Function to process the uploaded PDF file
def process_uploaded_file(change):
    global current_vectordb, current_qa_chain, current_chunks
    
    # Ensure a file is uploaded before processing
    if not file_upload.value:
        return
    
    with qa_output:
        clear_output()  # Clear previous output
        print("Processing uploaded PDF...")
        
        # Save the uploaded PDF content
        pdf_content = file_upload.value[0]['content']
        pdf_name = file_upload.value[0]['name']
        
        with open(pdf_name, 'wb') as f:
            f.write(pdf_content)
        
        # Process the PDF content into documents and chunks
        documents = load_pdf(pdf_name)
        if documents:
            current_chunks = process_documents(documents)
            if current_chunks:
                current_vectordb = create_vectordb(current_chunks, api_key_input.value)  # Create vector database
                if current_vectordb:
                    current_qa_chain = setup_qa_system(current_vectordb, api_key_input.value)  # Set up the QA system
                    if current_qa_chain:
                        print("✅ PDF processed successfully! You can now ask questions or generate study materials.")
                    else:
                        print("❌ Failed to setup Q&A system.")
                else:
                    print("❌ Failed to create vector database.")
            else:
                print("❌ Failed to process document chunks.")
        else:
            print("❌ Failed to load PDF document.")
        
        # Clean up the temporary PDF file after processing
        os.remove(pdf_name)

# Function to handle asking a question
def ask_question_handler(change):
    # Check if QA system is set up before processing the question
    if not current_qa_chain:
        with qa_output:
            clear_output()
            print("Please upload a PDF first.")
        return
    
    with qa_output:
        clear_output()  # Clear previous output
        print(f"Question: {question_input.value}")
        print("\nGenerating answer...\n")
        
        # Get answer data and source documents based on the question
        answer_data, source_docs = ask_question(current_qa_chain, question_input.value)
        
        if answer_data:
            print("Explanation:")
            print(answer_data["explanation"])
            
            print("\nFollow-up Questions:")
            for q in answer_data["follow_up_questions"]:
                print(f"- {q}")
            
            print("\nPotential Resources:")
            for r in answer_data["potential_resources"]:
                print(f"- {r}")
            
            if source_docs:
                print("\nSources:")
                for i, doc in enumerate(source_docs):
                    print(f"\nSource {i+1}:")
                    print(doc.page_content[:200] + "...")
        else:
            print("Failed to generate an answer.")

# Function to handle the generation of study materials
def generate_study_materials_handler(change):
    # Ensure that chunks are available before generating study materials
    if not current_chunks:
        with study_materials_output:
            clear_output()
            print("Please upload a PDF first.")
        return
    
    with study_materials_output:
        clear_output()  # Clear previous output
        print("Generating study materials...\n")
        
        # Generate study materials including MCQs and flashcards
        materials = generate_study_materials(
            current_chunks,
            api_key_input.value,
            num_mcqs=num_mcqs.value,
            num_flashcards=num_flashcards.value
        )
        
        if materials:
            print("Summary:")
            print(materials["summary"])
            
            print("\nMultiple Choice Questions:")
            print(materials["mcqs_text"])
            
            print("\nFlashcards:")
            print(materials["flashcards_text"])
        else:
            print("Failed to generate study materials.")

# Set up event handlers to trigger functions when values change
file_upload.observe(process_uploaded_file, names='value')  # Trigger file processing when file is uploaded
question_input.observe(ask_question_handler, names='value')  # Trigger question handling when user enters a question

# Create buttons for user interactions
ask_button = widgets.Button(
    description='Ask Question',  # Button label
    button_style='primary'  # Button style
)
ask_button.on_click(lambda b: ask_question_handler(None))  # Trigger question handling when button is clicked

generate_button = widgets.Button(
    description='Generate Study Materials',  # Button label
    button_style='success'  # Button style
)
generate_button.on_click(lambda b: generate_study_materials_handler(None))  # Trigger study material generation when button is clicked

# Display the interface with all widgets and outputs
print("📚 StudyGenie: Your AI-Powered Study Assistant")
print("\n1. Configure your Google API Key above")
print("2. Upload a PDF document")
print("3. Ask questions or generate study materials")
print("\n---")

# Display widgets in a vertical layout
display(widgets.VBox([
    file_upload,  # PDF upload widget
    widgets.HBox([question_input, ask_button]),  # Question input and button
    widgets.HBox([num_mcqs, num_flashcards, generate_button]),  # Sliders and generate button
    qa_output,  # Output for QA results
    study_materials_output  # Output for study materials
]))


📚 StudyGenie: Your AI-Powered Study Assistant

1. Configure your Google API Key above
2. Upload a PDF document
3. Ask questions or generate study materials

---


VBox(children=(FileUpload(value=(), accept='.pdf', description='Upload PDF:'), HBox(children=(Text(value='', d…

## 🎉 Wrapping Up StudyGenie

Thank you for exploring **StudyGenie**, your AI-powered study assistant! This notebook showcases how generative AI can transform learning by processing PDFs, answering questions, and generating study resources like summaries, MCQs, and flashcards. Designed for students and educators, StudyGenie makes studying efficient and engaging.

### Key Achievements
- **Process PDFs**: Extract and analyze text from academic materials.
- **Answer Questions**: Deliver precise, context-aware responses.
- **Generate Resources**: Create tailored summaries, MCQs, and flashcards.
- **User-Friendly Interface**: Enable easy interaction via file uploads and inputs.

### Generative AI Concepts Used
- **Document Understanding**: Parse PDFs with `PyPDFLoader`.
- **Retrieval Augmented Generation (RAG)**: Combine FAISS retrieval with Gemini generation.
- **Structured Output**: Format answers as JSON for consistency.
- **Embeddings**: Use Google’s `embedding-001` for semantic search.

### Get Started
1. Enter your Google API key in the widget.
2. Upload a PDF (e.g., lecture notes).
3. Ask questions or generate study materials.
4. Fix dependencies if needed (see installation cell).

### Future Potential
- Add image understanding for diagrams.
- Support DOCX or handwritten notes.
- Deploy with MLOps for scalability.