<a href="https://colab.research.google.com/github/VMadhav007/ML/blob/main/LangchainW_llm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [3]:
!pip install langchain-core langchain-text-splitters langchain-mistralai langchain-huggingface langchain-community langgraph faiss-cpu python-dotenv pypdf

Collecting langchain-mistralai
  Downloading langchain_mistralai-0.2.10-py3-none-any.whl.metadata (2.0 kB)
Collecting langchain-huggingface
  Downloading langchain_huggingface-0.2.0-py3-none-any.whl.metadata (941 bytes)
Collecting langchain-community
  Downloading langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting langgraph
  Downloading langgraph-0.4.8-py3-none-any.whl.metadata (6.8 kB)
Collecting faiss-cpu
  Downloading faiss_cpu-1.11.0-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (4.8 kB)
Collecting python-dotenv
  Downloading python_dotenv-1.1.0-py3-none-any.whl.metadata (24 kB)
Collecting pypdf
  Downloading pypdf-5.6.0-py3-none-any.whl.metadata (7.2 kB)
Collecting httpx-sse<1,>=0.3.1 (from langchain-mistralai)
  Downloading httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting dataclasses-json<0.7,>=0.5.7 (from langchain-community)
  Downloading dataclasses_json-0.6.7-py3-none-any.whl.metadata (25 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (fro

In [4]:
import os
from dotenv import load_dotenv
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_mistralai import ChatMistralAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_core.documents import Document
from typing import List
import time

In [9]:
load_dotenv()

# Environment variables
MISTRAL_API_KEY = "api-key"
PDF_PATH = r"/content/1.pdf"
FAISS_INDEX_PATH = "faiss_index"

In [12]:
def load_and_process_pdf():
    """Load and process the PDF document"""
    print("Loading PDF...")
    loader = PyPDFLoader(PDF_PATH)
    docs = loader.load()
    print(f"The Length of the Documents is: {len(docs)}")
    #print(f"{docs[0].page_content[:200]}\n")
    #print(f"Metadata: {docs[0].metadata}")

    return docs

def create_text_splits(docs):
    """Create text splits from documents"""
    print("Creating text splits...")
    text_splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        add_start_index=True
    )
    all_splits = text_splitter.split_documents(docs)
    print(f"Created {len(all_splits)} text chunks")
    return all_splits

def setup_vector_store(all_splits):
    """Set up the vector store with embeddings"""
    print("Setting up embeddings and vector store...")
    embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")

    # Create vector store
    vector_store = FAISS.from_documents(all_splits, embeddings, distance_strategy="COSINE")
    print("FAISS Vector Store created successfully")

    # Save the vector store for future use
    try:
        vector_store.save_local(FAISS_INDEX_PATH)
        print(f"Vector store saved to {FAISS_INDEX_PATH}")
    except Exception as e:
        print(f"Could not save vector store: {e}")

    return vector_store, embeddings

def load_existing_vector_store():
    """Load existing vector store if available"""
    try:
        embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
        vector_store = FAISS.load_local(FAISS_INDEX_PATH, embeddings, allow_dangerous_deserialization=True, distance_strategy="COSINE")
        print("Loaded existing vector store")
        return vector_store, embeddings
    except Exception as e:
        print(f"Could not load existing vector store: {e}")
        return None, None

def setup_mistral_llm():
    """Initialize Mistral AI LLM"""
    if not MISTRAL_API_KEY:
        raise ValueError("MISTRAL_API_KEY not found in environment variables")

    llm = ChatMistralAI(
        model="mistral-large-latest",
        api_key=MISTRAL_API_KEY,
        temperature=0.3
    )
    print("Mistral AI LLM initialized")
    return llm

def create_rag_chain(vector_store, llm):
    """Create the RAG chain for question answering"""

    # Create retriever
    retriever = vector_store.as_retriever(
        search_type="mmr",
        search_kwargs={"k": 4, "fetch_k": 8}
    )

    # Create prompt template
    #The {context} and {question} are NOT Python f-string variables. They are LC template placeholders that work
    # differently compared to Python String Substitutions.
    # They are parsed and substituted during the invocation phase in the code snippets below when the rag_chain object is created.
    #
    prompt_template = ChatPromptTemplate.from_template("""
    You are an AI assistant that answers questions based on the provided context from a PDF document about AI and Product Management.

    Context: {context}

    Question: {question}

    Instructions:
    - Answer the question based primarily on the provided context
    - If the context doesn't contain enough information, say so clearly
    - Provide specific details and examples from the context when available
    - Keep your answer comprehensive but concise
    - If you reference specific information, try to indicate which part of the document it comes from

    Answer:
    """)

    def format_docs(docs):
        """Format retrieved documents for context"""
        formatted = []
        for i, doc in enumerate(docs):
            page_info = f"Page {doc.metadata.get('page', 'unknown')}" if doc.metadata else "Source unknown"
            formatted.append(f"[{page_info}]: {doc.page_content}")
        return "\n\n".join(formatted)

    # Create the RAG chain
    #Here we use the LC's Chain Composition to send the output of one step as an input to the next step separated by | - Pipe Operator
    #More details will be in the README
    rag_chain = (
        {
            "context": retriever | format_docs,
            "question": RunnablePassthrough()
        }
        | prompt_template
        | llm
        | StrOutputParser()
    )

    return rag_chain, retriever

def chat_with_pdf(rag_chain, retriever):
    """Interactive chat function"""
    print("\n" + "="*60)
    print("🤖 AI PDF Chat Assistant")
    print("="*60)
    print("You can now ask questions about your PDF!")
    print("Type 'quit', 'exit', or 'bye' to end the conversation.")
    print("Type 'help' for available commands.")
    print("-"*60)

    while True:
        try:
            question = input("\n💬 Your question: ").strip()

            if not question:
                continue

            if question.lower() in ['quit', 'exit', 'bye']:
                print("\n👋 Goodbye! Thanks for using the AI PDF Chat Assistant!")
                break

            if question.lower() == 'help':
                print("\n📋 Available commands:")
                print("  • Ask any question about the PDF content")
                print("  • 'quit', 'exit', 'bye' - End conversation")
                print("  • 'help' - Show this help message")
                print("  • 'sources' - Show sources for last question")
                continue

            if question.lower() == 'sources':
                print("\n📚 Retrieving relevant sources...")
                try:
                    docs = retriever.get_relevant_documents(question)
                    for i, doc in enumerate(docs, 1):
                        page_info = f"Page {doc.metadata.get('page', 'unknown')}"
                        print(f"\n--- Source {i} ({page_info}) ---")
                        print(doc.page_content[:300] + "..." if len(doc.page_content) > 300 else doc.page_content)
                except Exception as e:
                    print(f"Error retrieving sources: {e}")
                continue

            print("\n🤔 Thinking...")
            start_time = time.time()

            # Get answer from RAG chain
            answer = rag_chain.invoke(question)

            response_time = time.time() - start_time

            print(f"\n🤖 Answer (responded in {response_time:.2f}s):")
            print("-" * 50)
            print(answer)
            print("-" * 50)

        except KeyboardInterrupt:
            print("\n\n👋 Conversation interrupted. Goodbye!")
            break
        except Exception as e:
            print(f"\n❌ Error: {e}")
            print("Please try again with a different question.")

def main():
    """Main function to run the RAG system"""
    try:
        # Try to load existing vector store first
        vector_store, embeddings = load_existing_vector_store()

        if vector_store is None:
            # If no existing vector store, create new one
            docs = load_and_process_pdf()
            all_splits = create_text_splits(docs)
            vector_store, embeddings = setup_vector_store(all_splits)

        # Initialize Mistral LLM
        llm = setup_mistral_llm()

        # Create RAG chain
        rag_chain, retriever = create_rag_chain(vector_store, llm)

        print("\n✅ System ready!")

        # Test with some example questions
        print("\n" + "="*60)
        print("🧪 TESTING THE SYSTEM")
        print("="*60)

        test_questions = [
            "What is the impact of AI in Product Management?",
            "What are the responsibilities of an os?",
            "How does AI transform product development processes?"
        ]

        for i, question in enumerate(test_questions, 1):
            print(f"\n📝 Test Question {i}: {question}")
            print("-" * 50)
            try:
                answer = rag_chain.invoke(question)
                print(f"🤖 Answer: {answer[:300]}..." if len(answer) > 300 else f"🤖 Answer: {answer}")
            except Exception as e:
                print(f"❌ Error: {e}")

        # Start interactive chat
        chat_with_pdf(rag_chain, retriever)

    except Exception as e:
        print(f"❌ Error in main: {e}")
        print("Please check your environment variables and file paths.")

if __name__ == "__main__":
    main()

Loaded existing vector store
Mistral AI LLM initialized

✅ System ready!

🧪 TESTING THE SYSTEM

📝 Test Question 1: What is the impact of AI in Product Management?
--------------------------------------------------
🤖 Answer: The provided context does not contain any information about the impact of AI in Product Management. The document primarily discusses topics related to operating systems, such as the need for an operating system, Direct Memory Access (DMA) structure, and credits for the slides used in a course. There...

📝 Test Question 2: What are the responsibilities of an os?
--------------------------------------------------
🤖 Answer: Based on the provided context, the responsibilities of an operating system (OS) are:

1. **Execute User Programs**: The OS is responsible for running user programs and making it easier for users to solve problems (Page 6).

2. **User Convenience**: It makes the computer system more convenient to use...

📝 Test Question 3: How does AI transform produ

  docs = retriever.get_relevant_documents(question)



📚 Retrieving relevant sources...

--- Source 1 (Page 5) ---
Genesis
OPERATING SYSTEMS

--- Source 2 (Page 31) ---
sureshjamadagni@pes.edu
THANK YOU
Suresh Jamadagni
Department of Computer Science Engineering

--- Source 3 (Page 30) ---
Direct Memory Access Structure
OPERATING SYSTEMS
• Used for high-speed I/O devices able to 
transmit information at close to memory 
speeds
• Device controller transfers blocks of data 
from buffer storage directly to main memory 
without CPU intervention
• Only one interrupt is generated per block,...

--- Source 4 (Page 10) ---
What Operating Systems Do
OPERATING SYSTEMS
n Depends on the point of view user and system
n Users want convenience, ease of use and good performance 
l Don’t care about resource utilization
n But shared computer such as mainframe or minicomputer must keep all 
users happy.
n Maximize resource utili...


👋 Conversation interrupted. Goodbye!
