<a href="https://colab.research.google.com/github/Maddi-Jahnavi-goud/Machine_Learning_FAQ_chatbot/blob/main/machineLearning_chatbot.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [5]:
!pip install langchain langchain-community chromadb pypdf sentence-transformers openai -q

In [6]:
import os
from google.colab import userdata
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import HuggingFaceEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.memory import ConversationBufferMemory
from langchain_community.chat_models import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.schema import Document

# --- Step 3: Set up the API Key and Environment ---
# IMPORTANT: Before running, you must add your OpenRouter API key to Colab's secrets.
# 1. Click the 'Key' icon on the left sidebar.
# 2. Add a new secret named "OPENROUTER_API_KEY" and paste your key as the value.
try:
    os.environ["OPENROUTER_API_KEY"] = userdata.get("OPENROUTER_API_KEY")
    api_key_found = True
except Exception as e:
    print("ERROR: Could not find the OPENROUTER_API_KEY secret.")
    print("Please add your OpenRouter API key to Colab's secrets (on the left sidebar) and try again.")
    api_key_found = False


if api_key_found:
    # --- Step 4: Load and Process the PDF Document ---
    # Updated the pdf_path to the new file you provided.
    pdf_path = "/content/drive/MyDrive/MACHINE LEARNING(R17A0534).pdf"
    if not os.path.exists(pdf_path):
        print(f"ERROR: The file '{pdf_path}' was not found.")
        print("Please upload the 'RIL-IAR-2025.pdf' file to your Colab session.")
    else:
        print("Loading and processing the PDF... this may take a moment.")
        # Load the PDF
        loader = PyPDFLoader(pdf_path)
        pages = loader.load_and_split()

        # Split the document into smaller chunks for processing
        pdf_splitter = RecursiveCharacterTextSplitter(
            chunk_size=1000,
            chunk_overlap=200,
            length_function=len
        )
        docs = pdf_splitter.split_documents(pages)
        documents = [Document(page_content=doc.page_content) for doc in docs]

        # --- Step 5: Create Text Embeddings and Vector Store ---
        # This converts the text chunks into numerical vectors for similarity searching.
        print("Creating text embeddings and vector store...")
        embeddings = HuggingFaceEmbeddings(
            model_name="sentence-transformers/all-MiniLM-L6-v2",
            model_kwargs={'device': 'cpu'}
        )
        vector_db = Chroma.from_documents(
            documents,
            embedding=embeddings
        )

        # --- Step 6: Set Up the Conversational AI Model ---
        # This configures the chatbot's "brain" and memory.
        print("Setting up the conversational AI...")
        # Set up conversational memory to remember the chat history
        memory = ConversationBufferMemory(
            memory_key="chat_history",
            return_messages=True
        )

        # Initialize the Language Model (LLM) through OpenRouter
        llm = ChatOpenAI(
            model="openai/gpt-3.5-turbo",
            temperature=0.2,
            openai_api_base="https://openrouter.ai/api/v1",
            max_tokens=500,
            openai_api_key=os.environ["OPENROUTER_API_KEY"]
        )

        # Combine the retriever (from the vector store) and the LLM into a conversational chain
        qa_chain = ConversationalRetrievalChain.from_llm(
            llm=llm,
            retriever=vector_db.as_retriever(),
            memory=memory
        )

        print("\n✅ Setup complete! The chatbot is ready.")
        print("You can now ask questions about the Reliance Industries 2024-25 Annual Report.")
        print("Type 'Exit' to end the chat.")
        print("-" * 50)

        # --- Step 7: Start the Real-time Interaction Loop ---
        while True:
            try:
                question = input("User: ")
                if question.lower().strip() == "exit":
                    print("Bot: Thank you for chatting.If you have any queries, you can ask")
                    break
                if not question.strip():
                    continue

                # Get the answer from the QA chain
                answer = qa_chain({"question": question})
                print("Bot:", answer["answer"])

            except Exception as e:
                print(f"An error occurred: {e}")
                break

Loading and processing the PDF... this may take a moment.
Creating text embeddings and vector store...


'(ReadTimeoutError("HTTPSConnectionPool(host='huggingface.co', port=443): Read timed out. (read timeout=10)"), '(Request ID: bea44ccb-6b6b-4097-8158-ac39a25d8972)')' thrown while requesting HEAD https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/config.json
Retrying in 1s [Retry 1/5].


Setting up the conversational AI...

✅ Setup complete! The chatbot is ready.
You can now ask questions about the Reliance Industries 2024-25 Annual Report.
Type 'Exit' to end the chat.
--------------------------------------------------
User: what is machine learning?
Bot: Machine learning is the process of programming computers to optimize a performance criterion using example data or past experience. It involves creating a model with parameters that can be optimized through training data or past experiences. The model can be predictive for making future predictions, descriptive for gaining knowledge from data, or both. Arthur Samuel, a pioneer in computer gaming and artificial intelligence, defined machine learning as giving computers the ability to learn without being explicitly programmed. However, there is no universally accepted definition for machine learning, and different authors may define it differently.
User:  What are the components of a learning process? Explain each compo