<a href="https://colab.research.google.com/github/ChaitanyaAnand1202/tcs_hackathon/blob/main/project.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [None]:
# %% [markdown]
# # AI-Powered Insurance Policy Information Chatbot
#
# This notebook implements an insurance policy information chatbot that can answer questions about different types of insurance policies using a knowledge base and LLM capabilities.

# %%
!pip install -q langchain openai faiss-cpu pypdf tiktoken gradio

# %%
import os
from langchain.chains import RetrievalQA
from langchain.document_loaders import PyPDFLoader
from langchain.embeddings import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import FAISS
import gradio as gr
import warnings
warnings.filterwarnings('ignore')

# %% [markdown]
# ## Step 1: Set Up Knowledge Base
#
# We'll create a knowledge base from insurance policy PDFs. For this demo, you would replace these with your actual insurance policy documents.

# %%
# Upload your insurance policy PDFs to Colab first
# Here we'll use sample paths - replace with your actual files
policy_files = [
    "health_insurance.pdf",
    "auto_insurance.pdf",
    "life_insurance.pdf",
    "home_insurance.pdf"
]

# In a real scenario, you would upload your PDFs to Colab or use a cloud storage
# For this demo, we'll proceed assuming the files are available

# %%
def create_knowledge_base(pdf_paths):
    """Create a vector database from insurance policy PDFs"""
    documents = []
    for pdf_path in pdf_paths:
        try:
            loader = PyPDFLoader(pdf_path)
            documents.extend(loader.load())
        except Exception as e:
            print(f"Could not load {pdf_path}: {e}")
            continue

    if not documents:
        raise ValueError("No documents were loaded. Please check your PDF files.")

    # Split documents into chunks
    text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
    texts = text_splitter.split_documents(documents)

    # Create embeddings and vector store
    embeddings = OpenAIEmbeddings()
    db = FAISS.from_documents(texts, embeddings)

    return db

# %% [markdown]
# ## Step 2: Initialize the Chatbot System

# %%
def initialize_chatbot(openai_api_key):
    """Initialize the chatbot with knowledge base and LLM"""
    os.environ["OPENAI_API_KEY"] = openai_api_key

    # Create knowledge base (in a real scenario, you might want to pre-create this)
    try:
        db = create_knowledge_base(policy_files)
    except ValueError as e:
        print(f"Knowledge base creation failed: {e}")
        # Fallback to a simple LLM without knowledge base
        return OpenAI(temperature=0)

    # Create retriever chain
    retriever = db.as_retriever(search_kwargs={"k": 3})

    qa = RetrievalQA.from_chain_type(
        llm=OpenAI(temperature=0),
        chain_type="stuff",
        retriever=retriever,
        return_source_documents=True
    )

    return qa

# %% [markdown]
# ## Step 3: Chatbot Response Generation

# %%
def generate_response(query, chat_history, qa_system):
    """Generate a response to user query using the QA system"""
    try:
        # Get response from QA system
        result = qa_system({"query": query})
        answer = result["result"]

        # Include source documents for reference
        sources = "\n\nSources:\n" + "\n".join(
            [f"- {doc.metadata['source']} (page {doc.metadata.get('page', 'N/A')})"
             for doc in result["source_documents"]]
        )

        full_response = answer + sources

        # Fallback mechanism if answer is not confident
        if "I don't know" in answer or "not provided in the context" in answer:
            full_response += "\n\nI couldn't find a complete answer in our documents. Would you like me to connect you with a human agent for further assistance?"

    except Exception as e:
        print(f"Error generating response: {e}")
        full_response = "I encountered an error processing your request. Please try again or contact our customer support."

    chat_history.append((query, full_response))
    return chat_history, chat_history

# %% [markdown]
# ## Step 4: User Interface with Gradio

# %%
def create_chatbot_interface(openai_api_key):
    """Create the Gradio interface for the chatbot"""
    qa_system = initialize_chatbot(openai_api_key)

    with gr.Blocks(title="Insurance Policy Chatbot", theme=gr.themes.Soft()) as demo:
        gr.Markdown("""
        # Insurance Policy Information Chatbot
        Welcome to our AI-powered insurance assistant. Ask me anything about our policies, coverage options, premiums, or claims processes.
        """)

        chatbot = gr.Chatbot(label="Chat History", height=500)
        msg = gr.Textbox(label="Your Question", placeholder="Type your question about insurance policies here...")
        clear = gr.Button("Clear Chat")

        def respond(message, chat_history):
            chat_history, _ = generate_response(message, chat_history, qa_system)
            return "", chat_history

        msg.submit(respond, [msg, chatbot], [msg, chatbot])
        clear.click(lambda: None, None, chatbot, queue=False)

    return demo

# %% [markdown]
# ## Step 5: Run the Chatbot
#
# To run this chatbot:
# 1. Upload your insurance policy PDFs to Colab
# 2. Enter your OpenAI API key
# 3. Launch the interface

# %%
# For security, in a real scenario you would use environment variables or secrets
# This is just for demo purposes
OPENAI_API_KEY = "your-api-key-here"  # Replace with your actual API key

# Create and launch the interface
if OPENAI_API_KEY != "your-api-key-here":
    demo = create_chatbot_interface(OPENAI_API_KEY)
    demo.launch(share=True)
else:
    print("Please enter your OpenAI API key to proceed.")

# %% [markdown]
# ## Methodology and Approach
#
# ### Why This Approach Was Selected:
#
# 1. *Knowledge Base Integration*:
#    - Using PDFs as the knowledge base allows easy updating of policy information
#    - FAISS vector store enables efficient similarity search for relevant policy information
#
# 2. *LLM for Natural Language Processing*:
#    - OpenAI's LLM provides strong language understanding capabilities
#    - RetrievalQA chain combines document retrieval with LLM processing for accurate answers
#
# 3. *Fallback Mechanism*:
#    - The system detects when it can't answer confidently and offers human escalation
#    - Error handling ensures graceful degradation
#
# 4. *User Interface*:
#    - Gradio provides a simple, clean interface that can be embedded in websites/apps
#    - Chat history maintains context for better user experience
#
# ### Implementation Notes:
#
# - In a production environment, you would:
#   - Store the vector database persistently rather than recreating it each time
#   - Use proper API key management (not hardcoded)
#   - Add more sophisticated conversation management
#   - Include user authentication if needed
#   - Add more insurance-specific features (premium calculators, etc.)
#
# - The current implementation focuses on the core functionality while demonstrating all key requirements.