# Software Engineering Study Assistant - RAG Pipeline

This notebook implements a Retrieval-Augmented Generation (RAG) chatbot designed to help software engineering students with:
- **Understanding complex topics** from lecture notes and textbooks
- **Solving previous year exam questions** with detailed explanations
- **Getting contextual answers** from course materials and PDFs
- **Study assistance** with proper references to source materials

**Technology Stack:**
- **PyMuPDF** for PDF lecture notes extraction
- **LangChain's RecursiveCharacterTextSplitter** for intelligent text chunking
- **Sentence Transformers** (all-MiniLM-L6-v2) for semantic embeddings
- **ChromaDB** for fast similarity search across study materials
- **LangChain's retriever** for relevant content retrieval
- **Gemini Pro** for generating comprehensive answers with context

## Study Assistant Pipeline Flow

```
Lecture Notes PDFs → PyMuPDF → Text Extraction → RecursiveCharacterTextSplitter → Knowledge Chunks
                                                           ↓
                                              Sentence Transformers → Semantic Embeddings → ChromaDB Knowledge Base
                                                           ↓
Student Question/Problem → Query Embedding → LangChain Retriever → Relevant Study Materials
                                                           ↓
                        Gemini Pro ← Context + Question → Detailed Answer with References
```

**Use Cases:**
- "Explain object-oriented programming concepts"
- "How do I solve this data structures problem?"
- "What are the key points about software testing methodologies?"
- "Help me understand this previous year question on algorithms"

## Installation

Install all required packages using the requirements.txt file:

In [1]:
# Install required packages for RAG pipeline using requirements.txt
!pip install -r requirements.txt

Collecting langchain>=0.1.0 (from -r requirements.txt (line 8))
  Using cached langchain-0.3.27-py3-none-any.whl.metadata (7.8 kB)
Collecting langchain-community (from -r requirements.txt (line 9))
  Using cached langchain_community-0.3.29-py3-none-any.whl.metadata (2.9 kB)
Collecting langchain-google-genai>=1.0.0 (from -r requirements.txt (line 10))
  Using cached langchain_google_genai-2.1.11-py3-none-any.whl.metadata (6.7 kB)
Collecting sentence-transformers>=2.2.0 (from -r requirements.txt (line 13))
  Using cached sentence_transformers-5.1.0-py3-none-any.whl.metadata (16 kB)
Collecting chromadb>=0.4.0 (from -r requirements.txt (line 16))
  Using cached chromadb-1.0.21-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (7.3 kB)
Collecting google-generativeai>=0.3.0 (from -r requirements.txt (line 19))
  Using cached google_generativeai-0.8.5-py3-none-any.whl.metadata (3.9 kB)
Collecting langchain-core<1.0.0,>=0.3.72 (from langchain>=0.1.0->-r requirements.txt (line 8

## Import Libraries

In [2]:
import fitz  # PyMuPDF
import os
import io
from langchain.text_splitter import RecursiveCharacterTextSplitter
from sentence_transformers import SentenceTransformer
import chromadb
from chromadb.config import Settings
from langchain_community.vectorstores import Chroma
from langchain_community.embeddings import SentenceTransformerEmbeddings
from langchain.schema import Document
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.chains import RetrievalQA, ConversationalRetrievalChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferWindowMemory
from langchain.schema import BaseMessage, HumanMessage, AIMessage
import uuid
from typing import List

## Gemini API Key Setup

Get your free Gemini API key from [Google AI Studio](https://makersuite.google.com/app/apikey)

In [None]:
import os
from dotenv import load_dotenv

# Load environment variables from .env file
load_dotenv()

# Get Gemini API key from environment variable
GEMINI_API_KEY = os.getenv("GEMINI_API_KEY")

# Verify API key is set
if not GEMINI_API_KEY:
    print("⚠️  Please set your Gemini API key!")
    print("1. Create a .env file in this directory")
    print("2. Add this line: GEMINI_API_KEY=your_actual_api_key_here")
    print("3. Or use: cp .env.example .env and edit it")
    print("Get your free API key from: https://makersuite.google.com/app/apikey")
else:
    print("✅ Gemini API key loaded from .env file")
    print(f"API key starts with: {GEMINI_API_KEY[:8]}...")

# Set environment variable for Google Generative AI
os.environ["GOOGLE_API_KEY"] = GEMINI_API_KEY

✅ Gemini API key configured
API key starts with: AIzaSyCA...


## PDF Text Extraction with PyMuPDF

This section handles text-based PDFs using direct text extraction:

- **Text-based PDFs**: Direct text extraction using PyMuPDF for PDFs created from digital documents (Word, LaTeX, Google Docs, etc.)

**Supported PDF Types:**
- Documents created from Word processors
- LaTeX-generated PDFs  
- Google Docs exports
- Any PDF with embedded text data

The pipeline uses PyMuPDF for fast and accurate text extraction from digital documents.

In [42]:
import fitz  # PyMuPDF
import os

def extract_text_from_pdf(pdf_path: str) -> str:
    """
    Extract text from text-based PDFs using PyMuPDF
    Use this for PDFs created from digital documents (Word, LaTeX, Google Docs, etc.)
    """
    if not os.path.exists(pdf_path):
        print(f"Warning: PDF file not found: {pdf_path}")
        return ""
    
    print(f"Processing: {os.path.basename(pdf_path)}")
    print(f"  → Using direct text extraction")
    
    doc = fitz.open(pdf_path)
    text = ""
    
    for page_num in range(len(doc)):
        page = doc[page_num]
        page_text = page.get_text()
        
        if page_text.strip():  # Only add non-empty pages
            text += f"\n\n--- Lecture Page {page_num + 1} ---\n\n"
            text += page_text
    
    doc.close()
    return text

# Add your text-based PDFs here (created from Word, LaTeX, Google Docs, etc.)
pdf_paths = [
    "./assets/metrics3.pdf", 
    "./assets/Lecture#7.pdf",
    "./assets/Sample.pdf",
    "./assets/GreedyAlgorithms.pdf"
]

extracted_texts = {}

print("=== Processing Text-based PDFs ===")
for pdf_path in pdf_paths:
    if os.path.exists(pdf_path):
        text = extract_text_from_pdf(pdf_path)
        extracted_texts[os.path.basename(pdf_path)] = text
        print(f"Extracted {len(text)} characters from {os.path.basename(pdf_path)}")
    else:
        print(f"PDF file not found: {pdf_path}")

"""if all_extracted_text:
    print(f"\n=== EXTRACTION SUMMARY ===")
    print(f"Total extracted content: {len(all_extracted_text)} characters")
    print(f"PDFs processed: {len([p for p in pdf_paths if os.path.exists(p)])}")
    print(f"First 500 characters:\n{all_extracted_text[:500]}...")
else:
    print("\nNo PDF files were processed.")
    print("Please add your text-based PDFs to the pdf_paths list")"""

=== Processing Text-based PDFs ===
Processing: metrics3.pdf
  → Using direct text extraction
Extracted 17980 characters from metrics3.pdf
Processing: Lecture#7.pdf
  → Using direct text extraction
Extracted 19743 characters from Lecture#7.pdf
Processing: Sample.pdf
  → Using direct text extraction
Extracted 20737 characters from Sample.pdf
Processing: GreedyAlgorithms.pdf
  → Using direct text extraction
Extracted 19614 characters from GreedyAlgorithms.pdf


'if all_extracted_text:\n    print(f"\n=== EXTRACTION SUMMARY ===")\n    print(f"Total extracted content: {len(all_extracted_text)} characters")\n    print(f"PDFs processed: {len([p for p in pdf_paths if os.path.exists(p)])}")\n    print(f"First 500 characters:\n{all_extracted_text[:500]}...")\nelse:\n    print("\nNo PDF files were processed.")\n    print("Please add your text-based PDFs to the pdf_paths list")'

### Understanding Text-based PDF Processing

**Text-based PDFs**: Created from digital documents (Word, LaTeX, Google Docs, etc.) - contain actual text data that can be directly extracted.

**Processing Features:**
- Fast direct text extraction using PyMuPDF
- Maintains original text formatting and structure
- Works with all standard PDF formats containing embedded text
- Preserves page structure with clear page separators

**Best Results With:**
- Documents created from word processors (Word, Google Docs, etc.)
- LaTeX-generated academic papers and textbooks
- Exported PDFs from presentation software
- Any PDF with selectable/copyable text

**Note**: This pipeline is optimized for text-based PDFs. If you have scanned documents (images of text), you would need OCR functionality, which can be added later if needed.

## Text Chunking with LangChain's RecursiveCharacterTextSplitter

In [43]:
def create_overlapping_chunks(text: str, chunk_size: int = 1000, chunk_overlap: int = 200) -> List[Document]:
    """
    Split lecture notes into overlapping chunks for better retrieval
    """
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=chunk_size,
        chunk_overlap=chunk_overlap,
        length_function=len,
        separators=["\n\n", "\n", " ", ""]
    )
    
    chunks = text_splitter.split_text(text)
    
    documents = []
    for i, chunk in enumerate(chunks):
        doc = Document(
            page_content=chunk,
            metadata={
                "chunk_id": i,
                "source": "Unknown",  # default, will be overwritten later
                "chunk_size": len(chunk),
                "content_type": "lecture_notes"
            }
        )
        documents.append(doc)
    
    return documents

documents = []

for source, text in extracted_texts.items():
    docs = create_overlapping_chunks(text)
    for doc in docs:
        doc.metadata["source"] = source
    documents.extend(docs)

unknown_sources = [doc for doc in documents if doc.metadata.get("source") == "Unknown"]
print(f"Chunks with Unknown source: {len(unknown_sources)}")

Chunks with Unknown source: 0


## Initialize Sentence Transformers for Embeddings

In [44]:
# Initialize Sentence Transformers embeddings
embedding_model = SentenceTransformerEmbeddings(model_name="all-MiniLM-L6-v2")

print("Sentence Transformers (all-MiniLM-L6-v2) model loaded")
print(f"Embedding dimension: 384")  # all-MiniLM-L6-v2 produces 384-dimensional embeddings

Sentence Transformers (all-MiniLM-L6-v2) model loaded
Embedding dimension: 384


## Create ChromaDB Vector Store

In [45]:
# Set up ChromaDB knowledge base for study materials
persist_directory = "./chroma_db"

# Create or load ChromaDB vector store for educational content
if 'documents' in locals() and documents:
    vectorstore = Chroma.from_documents(
        documents=documents,
        embedding=embedding_model,
        persist_directory=persist_directory,
        collection_name="software_engineering_knowledge_base"
    )
    
    # Persist the database
    vectorstore.persist()
    
    print(f"Software Engineering Knowledge Base created with {len(documents)} chunks")
    print(f"Database persisted to: {persist_directory}")
    print("Your study materials are now ready for questions!")
else:
    print("No study materials available for knowledge base creation")
print("Loaded vectorstore document metadata samples:")
results = vectorstore.similarity_search("test", k=3)  # just a quick search to get some docs

for i, doc in enumerate(results, 1):
    print(f"Document {i} metadata: {doc.metadata}")
    print(f"Content preview: {doc.page_content[:150]}...\n")

Software Engineering Knowledge Base created with 124 chunks
Database persisted to: ./chroma_db
Your study materials are now ready for questions!
Loaded vectorstore document metadata samples:
Document 1 metadata: {'source': 'Lecture#7.pdf', 'content_type': 'lecture_notes', 'chunk_size': 900, 'chunk_id': 11}
Content preview: Cost Estimation Process
Errors
Effort
Development Time
Size Table
Lines of Code
Number of Use Case
Function Point
Estimation Process
Number of Personn...

Document 2 metadata: {'chunk_size': 900, 'source': 'Lecture#7.pdf', 'chunk_id': 11, 'content_type': 'lecture_notes'}
Content preview: Cost Estimation Process
Errors
Effort
Development Time
Size Table
Lines of Code
Number of Use Case
Function Point
Estimation Process
Number of Personn...

Document 3 metadata: {'chunk_id': 11, 'content_type': 'lecture_notes', 'source': 'Lecture#7.pdf', 'chunk_size': 900}
Content preview: Cost Estimation Process
Errors
Effort
Development Time
Size Table
Lines of Code
Number of Use Cas

## Create LangChain Retriever

In [46]:
# Create a retriever for study materials
if 'vectorstore' in locals():
    retriever = vectorstore.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs={
            "score_threshold": 0.5, #cosine_distance = 1 — cosine_similarity
            "k": 5
        }
    )
    
    print("Study Materials Retriever created")
    print("Search type: similarity")
    print("Number of chunks retrieved per query: 5")
    
    # Test the retriever with a typical student question
    test_query = "What is Function point?"
    retrieved_docs = retriever.get_relevant_documents(test_query)
    print(f"\nTest retrieval for '{test_query}':")
    print(f"Retrieved {len(retrieved_docs)} relevant study materials")
    if retrieved_docs:
        print(f"First retrieved content preview:\n{retrieved_docs[0].page_content[:200]}...")
        print(f"Source: {retrieved_docs[0].metadata.get('source', 'Unknown')}")
else:
    print("Knowledge base not available for retriever creation")

No relevant docs were retrieved using the relevance score threshold 0.5


Study Materials Retriever created
Search type: similarity
Number of chunks retrieved per query: 5

Test retrieval for 'What is Function point?':
Retrieved 0 relevant study materials


## Initialize Gemini Pro with API Key

In [47]:
# Initialize Gemini Pro LLM with API key
try:
    llm = ChatGoogleGenerativeAI(
        model="gemini-1.5-flash",
        google_api_key=GEMINI_API_KEY,
        temperature=0.3,
        max_output_tokens=1024
    )
    
    print("Gemini Pro LLM initialized with API key")
    print(f"Model: gemini-1.5-flash")
    print(f"Temperature: 0.3")
    print(f"Max output tokens: 1024")
    
except Exception as e:
    print(f"Error initializing Gemini Pro: {e}")
    print("Please ensure you have:")
    print("1. Valid Gemini API key")
    print("2. Correct API key format")
    print("3. Get your key from: https://makersuite.google.com/app/apikey")

Gemini Pro LLM initialized with API key
Model: gemini-1.5-flash
Temperature: 0.3
Max output tokens: 1024


E0000 00:00:1758042291.522171    9078 alts_credentials.cc:93] ALTS creds ignored. Not running on GCP and untrusted ALTS is not enabled.


## Create Conversational RAG Chain with Memory

In [48]:
# Initialize conversation memory to remember chat history
memory = ConversationBufferWindowMemory(
    k=5,  # Remember last 5 exchanges
    memory_key="chat_history",
    return_messages=True,
    output_key="answer"
)

# Define a custom prompt template for conversational educational assistance
qa_prompt_template = """
You are an AI Study Assistant for Software Engineering students. Your role is to help students understand concepts, solve problems, and prepare for exams using their course materials.

Given the following conversation history and a follow-up question, provide a helpful response using the context from study materials.

Instructions:
- Consider the conversation history for better context
- Provide clear, detailed explanations suitable for students
- Include examples when helpful for understanding
- Reference the source materials when possible
- For previous year questions, provide step-by-step solutions
- If you need to make assumptions, state them clearly
- If the context doesn't contain enough information, say so and suggest what additional materials might help
- Build upon previous responses when relevant

Context from Study Materials:
{context}

Follow-up Question: {question}
Study Assistant Response:"""

qa_prompt = PromptTemplate(
    template=qa_prompt_template,
    input_variables=["context", "question"]
)

# Create the Conversational Study Assistant RAG chain
if 'llm' in locals() and 'retriever' in locals():
    rag_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=retriever,
        memory=memory,
        return_source_documents=True,
        combine_docs_chain_kwargs={"prompt": qa_prompt},
        verbose=False
    )
    
    print("Conversational Software Engineering Study Assistant created successfully")
    print("Memory: Remembers last 5 question-answer exchanges")
    print("Chain type: Conversational Retrieval")
    print("Returns source documents: Yes")
    print("Ready for conversational study sessions!")
else:
    print("Cannot create Study Assistant - missing LLM or retriever")

Conversational Software Engineering Study Assistant created successfully
Memory: Remembers last 5 question-answer exchanges
Chain type: Conversational Retrieval
Returns source documents: Yes
Ready for conversational study sessions!


## Test the RAG Pipeline

In [49]:
def ask_study_question(question: str):
    """
    Ask a study-related question using the conversational RAG pipeline
    """
    if 'rag_chain' not in locals() and 'rag_chain' not in globals():
        print("Study Assistant not available")
        return
    
    try:
        # Get response from conversational RAG chain
        result = rag_chain({"question": question})
        
        print(f"Student Question: {question}")
        print(f"\nStudy Assistant Answer:\n{result['answer']}")
        
        # Show source materials referenced
        print(f"\nSource Materials Referenced ({len(result['source_documents'])})")
        print("-" * 50)
        for i, doc in enumerate(result['source_documents'], 1):
            source = doc.metadata.get('source', 'Unknown')
            chunk_id = doc.metadata.get('chunk_id', 'N/A')
            print(f"\nSource {i}: {source} (Chunk {chunk_id})")
            print(f"Content Preview: {doc.page_content[:150]}...")
            
        # Show current conversation history length
        if hasattr(rag_chain, 'memory') and rag_chain.memory:
            history = rag_chain.memory.chat_memory.messages
            print(f"\nConversation History: {len(history)//2} exchanges stored")
            
    except Exception as e:
        print(f"Error during study query: {e}")

def show_conversation_history():
    """
    Display the current conversation history
    """
    if 'rag_chain' not in locals() and 'rag_chain' not in globals():
        print("Study Assistant not available")
        return
        
    if hasattr(rag_chain, 'memory') and rag_chain.memory:
        history = rag_chain.memory.chat_memory.messages
        print(f"Conversation History ({len(history)//2} exchanges):")
        print("=" * 60)
        
        for i in range(0, len(history), 2):
            if i + 1 < len(history):
                question = history[i].content
                answer = history[i + 1].content
                print(f"\nQ{(i//2)+1}: {question}")
                print(f"A{(i//2)+1}: {answer[:200]}...")
                print("-" * 40)
    else:
        print("No conversation history available")

def clear_conversation_history():
    """
    Clear the conversation history to start fresh
    """
    if 'rag_chain' not in locals() and 'rag_chain' not in globals():
        print("Study Assistant not available")
        return
        
    if hasattr(rag_chain, 'memory') and rag_chain.memory:
        rag_chain.memory.clear()
        print("Conversation history cleared. Starting fresh!")
    else:
        print("No conversation memory to clear")

# Test the Study Assistant with a sample question
if 'rag_chain' in locals():
    # Test with a typical software engineering question
    ask_study_question("What's semi-detached mode of COCOMO? what formuala are used to calculate effort in this mode?")
else:
    print("Study Assistant not ready for testing")

Student Question: What's semi-detached mode of COCOMO? what formuala are used to calculate effort in this mode?

Study Assistant Answer:
The provided text only gives an example calculation using the Basic COCOMO model in what it refers to as "semi-detached mode," but it doesn't define what "semi-detached mode" actually means within the context of COCOMO.  The text focuses solely on the calculation itself.  Therefore, I cannot answer your question about what semi-detached mode is.

To understand what "semi-detached mode" signifies in the COCOMO model, you need to consult additional resources, such as:

* **The original COCOMO documentation:**  Look for the publication that originally defined the COCOMO model.  This will provide the precise definition of different modes (organic, semi-detached, embedded) and their implications.
* **Your course textbook or lecture slides:**  There's likely a more complete explanation of COCOMO modes beyond the single example provided in the excerpt.  Chec

## Interactive Conversational Q&A Session

In [50]:
# Start a conversational study session
# Ask sequential questions to see how the assistant remembers context

if 'rag_chain' in locals():
    # First question
    print("=== Starting Conversational Study Session ===")
    ask_study_question("What is the COCOMO model?")
    
    print("\n" + "="*80 + "\n")
    
    # Follow-up question that references the previous answer
    ask_study_question("What are its different modes?")
    
    print("\n" + "="*80 + "\n")
    
    # Another follow-up that builds on previous context
    ask_study_question("Can you give me the formula for effort calculation in basic mode?")
    
    print("\n" + "="*80 + "\n")
    
    # Show conversation history
    show_conversation_history()
    
else:
    print("\nStudy Assistant not ready. Please ensure all previous cells ran successfully.")

=== Starting Conversational Study Session ===
Student Question: What is the COCOMO model?

Study Assistant Answer:
The COCOMO (Constructive Cost Model) is a procedural software cost estimation model.  As described in the provided lecture notes (Pages 44-45), it's actually a family of models, offering three increasingly detailed levels of accuracy:

1. **Basic COCOMO:** This is the simplest model, providing a rapid, high-level estimate of development effort and time.  It uses a single equation to estimate effort based on the size of the project (usually measured in lines of code).  The provided text doesn't detail the specific equation, but that would be found in further lecture notes or the textbook.

2. **Intermediate COCOMO:** This model builds upon the basic model by incorporating various attributes of the software project and development environment that influence the development effort. These attributes are called cost drivers and are used to adjust the basic COCOMO estimate.  Aga

In [51]:
# Ask your own follow-up questions here
your_question = "How is the basic mode different from the semi-detached mode we discussed?"  # Modify this

if 'rag_chain' in locals():
    print(f"\nAsking follow-up: {your_question}")
    print("=" * 80)
    ask_study_question(your_question)
    
    print("\n" + "="*50)
    print("Conversation Summary:")
    show_conversation_history()
else:
    print("\nStudy Assistant not ready. Please ensure all previous cells ran successfully.")


Asking follow-up: How is the basic mode different from the semi-detached mode we discussed?
Student Question: How is the basic mode different from the semi-detached mode we discussed?

Study Assistant Answer:
The provided study materials describe three COCOMO models (Basic, Intermediate, and Detailed) but make no mention of a "semi-detached COCOMO mode".  There's no information in the given text to explain the differences between the Basic COCOMO model and a non-existent model.

To answer your question accurately, I need additional information.  Specifically, I need access to the lecture notes or textbook that defines "semi-detached COCOMO mode".  It's possible this is a typo, a term used in a different context, or from a different software estimation model altogether.  Please provide the relevant materials.

Source Materials Referenced (5)
--------------------------------------------------

Source 1: Lecture#7.pdf (Chunk 20)
Content Preview: --- Lecture Page 44 ---

COCOMO I
• Embedd

## Conversational Software Engineering Study Assistant Summary

This notebook now implements a **conversational** study assistant for software engineering students with full memory capabilities:

### New Conversational Features:
1. **Conversation Memory** - Remembers last 5 question-answer exchanges using `ConversationBufferWindowMemory`
2. **Context Awareness** - LLM can reference previous questions and build upon earlier responses
3. **Follow-up Questions** - Ask related questions without repeating context
4. **Chat History Management** - View, track, and clear conversation history

### Core Technologies:
1. **PyMuPDF** - Extracts text from lecture notes, textbooks, and previous year papers
2. **RecursiveCharacterTextSplitter** - Creates intelligent chunks for better knowledge retrieval
3. **Sentence Transformers** (all-MiniLM-L6-v2) - Semantic understanding of technical concepts
4. **ChromaDB** - Fast search across your entire study material collection
5. **ConversationalRetrievalChain** - Memory-enabled retrieval with conversation context
6. **Gemini Pro** - Provides detailed explanations with conversation history awareness

### Conversational Functions:
- `ask_study_question(question)` - Ask questions with memory
- `show_conversation_history()` - View your chat history  
- `clear_conversation_history()` - Start a fresh conversation

### Perfect for Natural Study Sessions:
- **Follow-up Questions**: "What about the intermediate mode?" (after asking about COCOMO)
- **Building Context**: "Can you give an example?" (after explanations)
- **Comparative Questions**: "How is this different from what we discussed earlier?"
- **Clarifications**: "Can you explain that last part in more detail?"

### Example Conversational Flow:
1. "What is COCOMO model?"
2. "What are its different modes?" 
3. "Can you give me the formula for basic mode?"
4. "How is it different from intermediate mode?"

The assistant remembers all previous exchanges and provides contextually aware responses!

**To get started:**
1. Get your free Gemini API key from [Google AI Studio](https://makersuite.google.com/app/apikey)
2. Replace the API key in the setup cell
3. Add your lecture notes PDFs to the `pdf_paths` list
4. Run all cells to build your conversational knowledge base
5. Start asking questions and enjoy natural conversation flow!

**Pro Tips:**
- Memory stores last 5 exchanges automatically
- Ask follow-up questions naturally
- Use `show_conversation_history()` to review your session
- Clear history with `clear_conversation_history()` for new topics
- Perfect for deep-dive study sessions on complex topics

## Conversational Features Demo

In [52]:
# Demonstration of conversational features
if 'rag_chain' in locals():
    print("=== Conversational Study Assistant Features ===")
    print("\n1. ask_study_question(question) - Ask questions with memory")
    print("2. show_conversation_history() - View chat history")
    print("3. clear_conversation_history() - Start fresh conversation")
    
    print("\n=== Example Conversational Flow ===")
    print("Try asking related questions to see how the assistant remembers context:")
    print("1. 'What is COCOMO model?'")
    print("2. 'What are its different modes?'") 
    print("3. 'Can you give me the formula for the basic mode?'")
    print("4. 'How is it different from the intermediate mode?'")
    
    # Show current memory state
    if hasattr(rag_chain, 'memory') and rag_chain.memory:
        history_count = len(rag_chain.memory.chat_memory.messages) // 2
        print(f"\nCurrent conversation history: {history_count} exchanges")
    
    print("\nThe assistant will remember previous questions and build upon them!")
else:
    print("Study Assistant not ready for conversational features")

=== Conversational Study Assistant Features ===

1. ask_study_question(question) - Ask questions with memory
2. show_conversation_history() - View chat history
3. clear_conversation_history() - Start fresh conversation

=== Example Conversational Flow ===
Try asking related questions to see how the assistant remembers context:
1. 'What is COCOMO model?'
2. 'What are its different modes?'
3. 'Can you give me the formula for the basic mode?'
4. 'How is it different from the intermediate mode?'

Current conversation history: 5 exchanges

The assistant will remember previous questions and build upon them!


In [53]:
ask_study_question("What's greedy algorithm?")

Student Question: What's greedy algorithm?

Study Assistant Answer:
A greedy algorithm is a simple, intuitive algorithm that makes the locally optimal choice at each stage with the hope of finding a global optimum.  It doesn't consider the big picture or future consequences; it simply makes the best choice available *right now*.  This approach works well for problems exhibiting two key properties:

1. **Greedy Choice Property:**  This means that there's always an optimal solution that includes the best immediate choice.  In other words, making the locally optimal choice at each step doesn't prevent us from finding the globally optimal solution.  We can make the choice that seems best at the moment and then solve the remaining subproblem.  This leads to a top-down approach where the problem is iteratively reduced to smaller subproblems.

2. **Optimal Substructure Property:** This means that an optimal solution to the problem contains within it optimal solutions to its subproblems.  If w

In [54]:
ask_study_question("could you please explain more simply in simpler words, perhaps with an example")

Student Question: could you please explain more simply in simpler words, perhaps with an example

Study Assistant Answer:
Okay, let's simplify the explanation of greedy algorithms.

Imagine you're making a change with coins (say, US currency). You want to use the fewest number of coins possible.  A greedy approach would be to always choose the largest coin denomination that's less than or equal to the remaining amount.

For example, if you need to make change for $0.87:

1. You'd first take a $0.50 coin (the largest coin less than or equal to $0.87).
2. You'd have $0.37 left.  The next largest coin is a $0.25 coin.
3. You'd have $0.12 left.  The next largest is a $0.10 coin.
4. You'd have $0.02 left.  Two $0.01 coins complete the change.

This is a greedy algorithm because at each step, you make the choice that seems best *at that moment* – choosing the largest possible coin.  You don't look ahead to see if a different choice might lead to fewer coins overall.  In this case, the greedy

In [55]:
ask_study_question("what if a question cannot be solved with this algorithm?")

Student Question: what if a question cannot be solved with this algorithm?

Study Assistant Answer:
The provided study material focuses solely on identifying whether a greedy algorithm *can* be used to solve a problem.  It doesn't offer alternatives when a greedy approach is unsuitable.  To answer your question about alternative algorithms, we need additional information on the type of optimization problem you're facing.

However, I can give you some general categories of algorithms that are often used when a greedy approach fails:

* **Dynamic Programming:**  If the problem exhibits optimal substructure (like greedy problems), but the greedy choice property doesn't hold, dynamic programming is a powerful technique.  It systematically explores all possible subproblem solutions and combines them to find the optimal solution for the overall problem.  This is often more computationally expensive than a greedy algorithm but guarantees an optimal solution.  Examples include the knapsack pro