<a href="https://colab.research.google.com/github/Nishown2000/AI_Project_Collab/blob/main/IPsec_Agent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Multi-Agent RAG System for IPSec using LangChain and LangGraph with FastAPI

This notebook demonstrates how to build a multi-agent Retrieval-Augmented Generation (RAG) system focused on the topic of IPSec. It utilizes the LangChain and LangGraph frameworks for orchestrating different agents (one for web scraping and indexing, another for retrieval and response generation).

The system is exposed as a web API using FastAPI, which is then made publicly accessible via ngrok for easy interaction, including a simple HTML/CSS/JavaScript chat interface embedded directly within the Colab output.

**Key Components:**

*   **LangChain**: For building the RAG pipeline (Loading data, splitting text, creating embeddings, vector store).
*   **LangGraph**: For orchestrating the workflow between different "agents" or steps (e.g., decide if scraping is needed, perform scraping, then retrieve and generate).
*   **Google Generative AI**: Used for the LLM (Gemini) and Embeddings models.
*   **ChromaDB**: An in-memory or persistent vector database for storing and searching document embeddings.
*   **FastAPI**: A modern, fast (high-performance) web framework for building the API.
*   **ngrok**: Used to create a secure tunnel to the local FastAPI server, making it accessible from the internet.
*   **Playwright**: A library for robust web scraping (used by `WebBaseLoader`).

The workflow allows the system to not only answer questions based on pre-indexed data but also dynamically decide if it needs to scrape more information based on the user's query and the current knowledge base.

### Library Installation

This cell uses the `!pip install -qU` command to quietly install or upgrade several Python libraries required for this multi-agent RAG (Retrieval-Augmented Generation) system.

*   **`langchain-google-genai`**: Provides integrations with Google's Generative AI models (like Gemini) within the LangChain framework.
*   **`beautifulsoup4`**: A library for parsing HTML and XML documents, often used by web scrapers.
*   **`chromadb`**: The client library for Chroma, an open-source vector database used here to store and retrieve document embeddings for RAG.
*   **`langsmith`**: A platform for observing and evaluating language model applications. The `langsmith` library enables tracing and logging of LangChain runs.
*   **`flask`, `uvicorn`, `fastapi`, `flask-cors`**: Libraries for building and serving web applications (FastAPI is used here, with uvicorn as the ASGI server, and flask-cors for handling Cross-Origin Resource Sharing).
*   **`python-dotenv`**: Used for loading environment variables from a `.env` file (though in Colab, `userdata` and `os.environ` are often used directly as shown later).
*   **`langchain-community`**: Contains various third-party integrations for LangChain.
*   **`langgraph`**: A library built on LangChain for building robust and stateful multi-actor applications by modeling steps as a graph.
*   **`langchain-chroma`**: The specific LangChain integration for ChromaDB.
*   **`pyngrok`**: A Python wrapper for ngrok, used to create public URLs for the locally running FastAPI server, making it accessible from outside the Colab environment.
*   **`playwright`**: A library for browser automation, used here for potentially more robust web scraping, especially for dynamic content (though `WebBaseLoader` from `langchain_community` might use it under the hood or rely on other scrapers).
*   **`playwright install`**: A command to install the necessary browser binaries for Playwright.

The warnings about missing libraries for Playwright are common in standard Colab environments and might not prevent basic scraping, but could affect advanced browser functionalities.

In [None]:
# --- 1. Install Libraries ---

!pip install -qU langchain-google-genai beautifulsoup4 chromadb langsmith flask uvicorn python-dotenv langchain-community langgraph fastapi flask-cors langchain-chroma

# For ngrok (exposing FastAPI):
!pip install -qU pyngrok

# For Playwright (robust scraping):
!pip install -qU playwright
!playwright install

# If you prefer localtunnel over ngrok (sometimes simpler, no auth token needed)
# !npm install -g localtunnel # Requires Node.js, so usually `ngrok` is easier in Colab if not pre-installed

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/67.3 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m3.2 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m43.7/43.7 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.8/44.8 kB[0m [31m2.8 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m19.3/19.3 MB[0m [31m92.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m367.7/367.7 kB[0m [31m23.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m69.2 MB/s[0m eta [36m0:00:0

### Import Libraries and Apply nest_asyncio

This cell imports all the necessary classes, functions, and modules from the libraries installed in the previous step.

*   It includes standard Python libraries like `os`, `getpass`, `asyncio`, and `typing` for type hinting.
*   It imports core components from `langchain`, `langchain_google_genai`, `langchain_chroma`, and `langchain_community` for building the RAG pipeline (LLM, embeddings, text splitting, document loading, vector store, chains, prompts, messages, tools).
*   Imports `StateGraph` and `END` from `langgraph` for defining the multi-agent workflow.
*   Imports `ngrok` from `pyngrok` for creating the public tunnel.
*   Imports `FastAPI`, `HTTPException`, and `BaseModel` from `fastapi` for building the web API.
*   Imports `uvicorn` for running the FastAPI application.
*   Imports `threading` (though `asyncio` is primarily used for concurrency here).
*   Imports `HTML`, `display` from `IPython.display` to render HTML output directly in the Colab notebook (used for the chat UI).
*   Imports `CORSMiddleware` from `fastapi.middleware.cors` to handle Cross-Origin Resource Sharing, allowing the frontend (UI) to communicate with the backend (FastAPI) from a different origin (domain/port).
*   **`import nest_asyncio; nest_asyncio.apply()`**: This is crucial in environments like Google Colab where an asyncio event loop might already be running. `nest_asyncio` allows nested use of `asyncio.run`, which is necessary to run the uvicorn server (which uses asyncio) within the existing Colab environment.

The `print("Libraries installed and imported.")` statement confirms that the imports were successful. The `USER_AGENT` warning is informational and doesn't usually cause issues.

In [None]:
# --- 1. Install Libraries ---

import os
import getpass
import asyncio
from typing import List, Dict, Any, TypedDict, Union
from google.colab import userdata

# LangChain imports
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
# from langchain_community.vectorstores import Chroma # Deprecated import
from langchain_chroma import Chroma # Updated import
from langchain_community.document_loaders import WebBaseLoader
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.tools import tool
from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain.schema import SystemMessage
from langchain_core.runnables import RunnablePassthrough

# LangGraph imports (for orchestration)
from langgraph.graph import StateGraph, END

# For ngrok tunneling
from pyngrok import ngrok

# FastAPI imports
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

# Import uvicorn
import uvicorn

# Import Thread
import threading

# For displaying HTML in Colab
from IPython.display import HTML, display

# For CORS in FastAPI
from fastapi.middleware.cors import CORSMiddleware

# Apply nest_asyncio to allow nested event loops (needed for running FastAPI inside Colab's async environment)
import nest_asyncio
nest_asyncio.apply()

print("Libraries installed and imported.")



Libraries installed and imported.


### Environment Variable Setup

This cell handles the configuration of necessary API keys and environment variables.

*   It prioritizes loading sensitive keys (`GOOGLE_API_KEY`, `LANGCHAIN_API_KEY`, `LANGCHAIN_PROJECT`, `NGROK_AUTH_TOKEN`) from **Colab Secrets** using `userdata.get()`. This is the recommended secure way to store keys in Colab.
*   If a key is not found in Secrets, it falls back to prompting the user for input using `getpass.getpass()` (for sensitive keys like API keys, which masks input) or `input()` (for less sensitive info like project name).
*   `os.environ[...] = ...` sets these values as environment variables, which are then accessed by the libraries (like LangChain and ngrok).
*   `os.environ["LANGCHAIN_TRACING_V2"] = "true"` enables LangSmith tracing for observing the execution of the LangChain/LangGraph components.
*   `ngrok.set_auth_token(os.environ["NGROK_AUTH_TOKEN"])` explicitly sets the ngrok authentication token using the value retrieved from environment variables. This is necessary for ngrok to establish a tunnel if you have a registered account.
*   `os.environ["CHROMA_TELEMETRY_DISABLED"] = "True"` disables telemetry reporting for the ChromaDB library.

This cell ensures that the application has the necessary credentials and configurations to interact with external services like Google GenAI, LangSmith, and ngrok.

In [None]:
# --- 2. Environment Variable Setup for Colab ---
# Instead of dotenv, we'll directly set or prompt

# Google API Key
# If you have it as a secret in Colab, you can access it directly.
# Otherwise, use getpass for interactive input.
try:
    # Attempt to load from Colab Secrets (recommended)
    os.environ["GOOGLE_API_KEY"] = userdata.get('GOOGLE_API_KEY')
    print("Google API Key loaded from Colab Secrets.")
except Exception:
    # Fallback to getpass if not using Colab Secrets or running locally
    if not os.environ.get("GOOGLE_API_KEY"):
        os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google API Key: ")
        print("Google API Key set via input.")

# LangSmith API Key
# Ensure you set these in your Colab secrets if you want them to persist between sessions
# Or manually input them if you prefer.
try:
    os.environ["LANGCHAIN_TRACING_V2"] = "true"
    os.environ["LANGCHAIN_API_KEY"] = userdata.get('LANGCHAIN_API_KEY')
    os.environ["LANGCHAIN_PROJECT"] = userdata.get('LANGCHAIN_PROJECT')
    print("LangSmith environment variables loaded from Colab Secrets.")
except Exception:
    print("LangSmith environment variables not found in Colab Secrets. Please set manually:")
    os.environ["LANGCHAIN_TRACING_V2"] = "true"
    os.environ["LANGCHAIN_API_KEY"] = getpass.getpass("Enter your LangSmith API Key: ")
    os.environ["LANGCHAIN_PROJECT"] = input("Enter your LangSmith Project Name (e.g., 'Colab_MultiAgentRAG'): ")
    print("LangSmith environment variables set via input.")

# NGROK AUTH Token
# If you have it as a secret in Colab, you can access it directly.
# Otherwise, use getpass for interactive input.
try:
    os.environ["NGROK_AUTH_TOKEN"] = userdata.get('NGROK_AUTH_TOKEN')
    print("NGROK AUTH Token loaded from Colab Secrets.")
except Exception:
    if not os.environ.get("NGROK_AUTH_TOKEN"):
        os.environ["NGROK_AUTH_TOKEN"] = getpass.getpass("Enter your NGROK AUTH Token: ")
        print("NGROK AUTH Token set via input.")

# This part needs to be in its own cell usually, after the app definition
# Set your ngrok auth token if you have one (optional for free tier, but recommended)
# You can get it from https://dashboard.ngrok.com/get-started/your-authtoken
ngrok.set_auth_token(os.environ["NGROK_AUTH_TOKEN"]) # Use the environment variable

# Disable ChromaDB telemetry
os.environ["CHROMA_TELEMETRY_DISABLED"] = "True"
print("ChromaDB telemetry disabled.")

KeyboardInterrupt: Interrupted by user

### Google Drive Mount (Optional)

This cell is commented out and provides instructions on how to mount your Google Drive to the Colab environment.

*   `from google.colab import drive` imports the necessary module.
*   `drive.mount('/content/drive')` performs the mounting operation, prompting you to authorize Colab to access your Drive files.
*   `PERSIST_DIR = "/content/drive/MyDrive/chroma_db_multi_agent"` defines a path on your Google Drive where you could store the ChromaDB database.
*   The commented-out `os.makedirs(PERSIST_DIR)` would create this directory if it doesn't exist.

Mounting Google Drive is useful if you want the data indexed in your ChromaDB vector store (`vectorstore`) to persist between Colab sessions. If you don't mount Drive, the database will be stored in a temporary directory (`./chroma_db_temp`) and will be lost when the Colab runtime terminates.

In [None]:
# --- 3. Google Drive Mount (Optional, for persistent ChromaDB) ---
# If you want your RAG database to persist between Colab sessions, run this cell
# and follow the instructions to mount your Google Drive.

# from google.colab import drive
# drive.mount('/content/drive')
# PERSIST_DIR = "/content/drive/MyDrive/chroma_db_multi_agent" # Path on your Google Drive
# if not os.path.exists(PERSIST_DIR):
#     os.makedirs(PERSIST_DIR)
# print(f"ChromaDB will persist to: {PERSIST_DIR}")

### Initialize LLM, Embeddings, and Vector Store

This cell initializes the core components for the RAG system.

*   `llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.1)`: Initializes the chat language model using Google's Gemini 1.5 Flash model. `temperature=0.1` makes the model's responses more deterministic and focused.
*   `embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")`: Initializes the embedding model, also from Google. This model is used to convert text (document chunks and user queries) into numerical vectors (embeddings) which are then stored in the vector database.
*   `PERSIST_DIR_CHROMA = "./chroma_db_temp"`: Defines the directory where the ChromaDB database will be stored. By default, it uses a temporary local directory.
*   The commented-out section shows how you would use the `PERSIST_DIR` variable if you had mounted Google Drive and wanted to use that for persistence.
*   `if not os.path.exists(PERSIST_DIR_CHROMA): os.makedirs(PERSIST_DIR_CHROMA)`: Creates the persistence directory if it doesn't already exist.
*   `vectorstore = Chroma(embedding_function=embeddings, persist_directory=PERSIST_DIR_CHROMA)`: Initializes the Chroma vector store. It's configured to use the Google embeddings and the specified persistence directory.
*   `print(...)` confirms the initialization and the location of the ChromaDB persistence directory.

The errors about `chromadb.telemetry` are usually harmless and indicate an issue with sending usage data, which is disabled anyway by the environment variable set in cell 3.

In [None]:
# --- 4. Initialize LLM, Embeddings, and Vector Store ---

llm = ChatGoogleGenerativeAI(model="gemini-1.5-flash", temperature=0.1)
embeddings = GoogleGenerativeAIEmbeddings(model="models/embedding-001")

# Use a temporary directory for ChromaDB if not mounting Google Drive
# Otherwise, use the PERSIST_DIR from the Google Drive mount section
PERSIST_DIR_CHROMA = "./chroma_db_temp" # Default to temporary
# if 'PERSIST_DIR' in locals(): # Check if Google Drive path was set
#     PERSIST_DIR_CHROMA = PERSIST_DIR

if not os.path.exists(PERSIST_DIR_CHROMA):
    os.makedirs(PERSIST_DIR_CHROMA)

vectorstore = Chroma(embedding_function=embeddings, persist_directory=PERSIST_DIR_CHROMA)
# vectorstore.persist() # Ensures directory structure is created - Deprecated

print(f"LangChain, Google models, and ChromaDB initialized. ChromaDB persistence at: {PERSIST_DIR_CHROMA}")

ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientStartEvent: capture() takes 1 positional argument but 3 were given
ERROR:chromadb.telemetry.product.posthog:Failed to send telemetry event ClientCreateCollectionEvent: capture() takes 1 positional argument but 3 were given


LangChain, Google models, and ChromaDB initialized. ChromaDB persistence at: ./chroma_db_temp


### Agent 1: Scraping and Indexing Agent Components

This cell defines the components related to the first agent, responsible for scraping web content and indexing it into the vector store.

*   `initial_ipsec_urls`: A list of predefined URLs related to IPSec that will be scraped initially.
*   `ScrapingAgentState(TypedDict)`: Defines a dictionary structure (`TypedDict`) to represent the state for a potential future, more complex LangGraph agent specifically for scraping. It includes fields for the topic, URLs to process, and status. (Note: The current implementation directly uses a function `scrape_and_index_urls_func` rather than a full LangGraph agent state and nodes, but this TypedDict hints at a possible expansion).
*   `scrape_and_index_urls_func(urls: List[str]) -> str`: This is the core function for the scraping and indexing logic.
    *   It iterates through the provided list of URLs.
    *   For each URL, it uses `WebBaseLoader` from `langchain_community.document_loaders` to fetch and load the content. `loader.load()` is used for synchronous loading.
    *   It aggregates all loaded documents (`all_documents`).
    *   It uses `RecursiveCharacterTextSplitter` to break down the loaded documents into smaller chunks. This is important because language models have context window limits, and smaller chunks are more effective for retrieval. `chunk_size=1000` means chunks are roughly 1000 characters long, and `chunk_overlap=200` means there's a 200-character overlap between consecutive chunks to maintain context.
    *   `vectorstore.add_documents(chunks)`: Adds the processed text chunks and their corresponding embeddings (generated by the `embeddings` model initialized in cell 5) to the ChromaDB vector store.
    *   It returns a status message indicating success or failure.
*   `perform_initial_scrape()`: This async function is intended to run once at the start of the application to populate the vector store with initial data from `initial_ipsec_urls`.
    *   It checks if the vector store is empty before scraping to avoid redundant work on subsequent runs. Note the use of `vectorstore._collection.count()`, which accesses an internal ChromaDB attribute; a more robust check might involve adding metadata to documents and querying based on that.
    *   If the vector store is empty, it calls `scrape_and_index_urls_func` with the predefined URLs.

This cell sets up the foundational logic for acquiring external knowledge and making it searchable via the vector database.

In [None]:
# --- 5. Agent 1: Scraping and Indexing Agent ---

initial_ipsec_urls = [
    "https://www.cloudflare.com/learning/network-layer/what-is-ipsec/",
    "https://www.cisco.com/c/en/us/products/security/ipsec.html",
    "https://en.wikipedia.org/wiki/IPsec",
    "https://www.geeksforgeeks.org/ipsec-security-architecture/",
    "https://www.networkacademy.io/ccna/security/ipsec-vpn",
    "https://www.imperva.com/learn/security/ipsec/"
]

class ScrapingAgentState(TypedDict):
    """Represents the state of the scraping agent's graph."""
    topic: str
    urls_to_scrape: List[str]
    status: str # "pending", "success", "failed"

async def scrape_and_index_urls_func(urls: List[str]) -> str:
    """
    Asynchronously scrapes content from a list of URLs, processes it, and indexes it into the RAG database.
    Returns a status message indicating success or failure.
    """
    print(f"\n--- Scraping Agent: Initiating scrape for {len(urls)} URLs ---")
    try:
        all_documents = []
        for url in urls:
            print(f"Scraping: {url}")
            try:
                loader = WebBaseLoader(url)
                docs = loader.load() # Use load() for synchronous loading
                all_documents.extend(docs)
            except Exception as e:
                print(f"Error scraping {url}: {e}")
                continue

        if not all_documents:
            return "Failed to scrape any content from the provided URLs."

        text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
        chunks = text_splitter.split_documents(all_documents)
        print(f"Split into {len(chunks)} chunks.")

        vectorstore.add_documents(chunks)
        #vectorstore.persist()
        print(f"Successfully indexed {len(chunks)} chunks into the RAG database.")
        return f"Successfully scraped and indexed content from {len(all_documents)} documents."
    except Exception as e:
        print(f"An error occurred in scraping_agent: {e}")
        return f"Scraping and indexing failed: {str(e)}"

async def perform_initial_scrape():
    """Performs an initial scrape of predefined IPSec URLs at startup."""
    print("\n--- Performing initial scrape of IPSec data ---")
    # Check if the vectorstore is already populated to avoid re-scraping on every restart
    # This is a simple check; for large datasets, you might check a flag file or metadata.
    try:
        # A simple way to check if it's empty. `_collection` is an internal attribute, be cautious.
        # A better way might be `vectorstore.get(where={"source": "initial_scrape"})` if you tagged sources.
        # Using count() might also be async or return a future, let's keep it simple for now
        # and assume count() is blocking or handle its potential async nature if needed.
        # For simplicity and to match the synchronous loader.load() change, let's assume
        # we can check the count synchronously or that the error was specifically with aload().
        # If vectorstore._collection.count() is async, we'd need to await it.
        # Let's try it as is first, assuming count() is synchronous or compatible.
        if vectorstore._collection.count() == 0:
            print("Vector store is empty. Proceeding with initial scrape.")
            # Call the now (mostly) synchronous scrape function within the async wrapper
            status = await scrape_and_index_urls_func(initial_ipsec_urls) # Keep await here as the function is still async
            print(f"Initial scrape status: {status}")
        else:
            print("Vector store already contains data. Skipping initial scrape.")
    except Exception as e:
        print(f"Error checking vector store or during initial scrape: {e}")
        print("Proceeding without initial scrape.")

print("Scraping Agent components defined.")

Scraping Agent components defined.


### Agent 2: RAG and Response Agent Components

This cell defines the components for the second agent, which handles retrieving information from the vector store and generating a response to the user's query. This agent is structured as a LangGraph workflow.

*   `AgentState(TypedDict)`: Defines the state for the main LangGraph workflow. This state dictionary passes information between the nodes (steps) of the graph. It includes:
    *   `question`: The user's current query.
    *   `chat_history`: The history of the conversation, used to provide context for the RAG retrieval.
    *   `rag_response`: The initial response generated after the first RAG lookup.
    *   `urls_to_scrape`: A list of URLs suggested for scraping if the RAG response is insufficient (used by the `check_for_scraping` node).
    *   `scrape_needed`: A boolean flag indicating whether the `check_for_scraping` node determined that more data is needed.
    *   `final_answer`: The final response to be sent to the user.
    *   `status_message`: A string indicating the current status or result of a node's execution.
*   `retrieve_and_generate(state: AgentState) -> AgentState`: This async function is a node in the LangGraph. It takes the current state, performs RAG, and updates the state.
    *   It extracts the `question` and `chat_history` from the state.
    *   `create_history_aware_retriever(...)`: Creates a LangChain retriever that first uses an LLM to analyze the chat history and current question to generate a better search query for the retriever.
    *   `create_stuff_documents_chain(...)`: Creates a LangChain chain that takes retrieved documents and the original question/chat history and "stuffs" them into a prompt for the LLM to generate a final answer based on the context.
    *   `create_retrieval_chain(...)`: Combines the history-aware retriever and the document chain into a single retrieval chain.
    *   It `await retrieval_chain.ainvoke(...)` to asynchronously execute the RAG process.
    *   It extracts the generated `answer` and updates the `rag_response` and `status_message` in the state. It also sets `scrape_needed` to `False` initially, as the next step is to check if scraping *is* needed.
*   `check_for_scraping(state: AgentState) -> AgentState`: This async function is another node. It evaluates the initial RAG response to decide if more information is required.
    *   It uses an LLM (`llm.ainvoke`) with a specific prompt to classify the `rag_response` as "SUFFICIENT" or "SCRAPE_NEEDED".
    *   If "SCRAPE_NEEDED", it uses another LLM call to generate potential search queries for scraping.
    *   It sets the `scrape_needed` flag and potentially `urls_to_scrape` in the state. Note: Currently, it hardcodes `example_ipsec_urls` regardless of the LLM's suggested queries; this could be improved to use the generated queries.
*   `call_scraping_agent(state: AgentState) -> AgentState`: This async function node calls the `scrape_and_index_urls_func` defined in cell 6 to fetch and index new data.
    *   It takes the `urls_to_scrape` from the state.
    *   It `await scrape_and_index_urls_func(...)` to run the scraping process.
    *   It updates the `status_message` based on the scraping result and sets `scrape_needed` back to `False` after the attempt.
*   `finalize_response(state: AgentState) -> AgentState`: This async function node prepares the final answer.
    *   In this simple implementation, it just takes the `rag_response` and sets it as the `final_answer` in the state. A more complex version might refine the answer here.

These functions define the individual steps and decision points for the RAG agent's workflow.

In [None]:
# --- 6. Agent 2: RAG and Response Agent ---

class AgentState(TypedDict):
    """Represents the state of the overall multi-agent graph."""
    question: str
    chat_history: List[Union[HumanMessage, AIMessage]]
    rag_response: str
    urls_to_scrape: List[str]
    scrape_needed: bool
    final_answer: str
    status_message: str

async def retrieve_and_generate(state: AgentState) -> AgentState:
    """Retrieves relevant documents from the vector store and generates a response."""
    print("\n--- Agent 2: Retrieving and Generating Response ---")
    question = state["question"]
    chat_history = state["chat_history"]

    history_aware_retriever = create_history_aware_retriever(
        llm, vectorstore.as_retriever(), ChatPromptTemplate.from_messages([
            MessagesPlaceholder("chat_history"),
            ("user", "{input}"),
            ("user", "Given the above conversation, generate a search query to look up in order to get information relevant to the conversation"),
        ])
    )

    document_chain = create_stuff_documents_chain(
        llm,
        ChatPromptTemplate.from_messages([
            ("system", "You are a helpful AI assistant. Answer the user's question based on the provided context. If you don't know the answer, state that you don't have enough information."),
            MessagesPlaceholder("chat_history"),
            ("user", "{input}"),
            ("user", "Context:\n{context}"),
        ]),
    )

    retrieval_chain = create_retrieval_chain(history_aware_retriever, document_chain)

    try:
        response = await retrieval_chain.ainvoke({"input": question, "chat_history": chat_history}) # Use ainvoke for async
        rag_response = response["answer"]
        #print(f"RAG Response Attempt: {rag_response}")
    except Exception as e:
        print(f"Error during RAG retrieval/generation: {e}")
        rag_response = "I encountered an error trying to retrieve information from my database."

    return {"rag_response": rag_response, "scrape_needed": False, "status_message": "Performed initial RAG search."}

async def check_for_scraping(state: AgentState) -> AgentState:
    """
    Determines if more information is needed via scraping.
    """
    print("\n--- Agent 2: Checking if scraping is needed ---")
    rag_response = state["rag_response"]
    question = state["question"]

    check_prompt = ChatPromptTemplate.from_messages([
        SystemMessage(
            "You are an expert evaluator. Your task is to determine if the provided "
            "RAG response is sufficient to fully answer the user's question. "
            "If the answer seems incomplete, too generic, or explicitly states it lacks information, "
            "you should indicate that more scraping is needed. Otherwise, indicate it's sufficient.\n\n"
            "Respond ONLY with 'SCRAPE_NEEDED' if more scraping is required, otherwise respond with 'SUFFICIENT'."
        ),
        ("user", f"User Question: {question}\n\nExisting RAG Response: {rag_response}\n\nIs this response SUFFICIENT or is SCRAPE_NEEDED?"),
    ])

    try:
        decision = (await llm.ainvoke(check_prompt)).content.strip().upper() # Use ainvoke
        print(f"LLM Decision on sufficiency: {decision}")
        if "SCRAPE_NEEDED" in decision:
            search_query_prompt = ChatPromptTemplate.from_messages([
                SystemMessage(
                    "Based on the user's question and the insufficient RAG response, "
                    "suggest 3-5 specific search queries (comma-separated) that would help find "
                    "more comprehensive information. Focus on keywords for web scraping. "
                    "Example: 'IPSec components, IPSec architecture, AH and ESP protocols'"
                ),
                ("user", f"User Question: {question}\n\nInsufficient RAG Response: {rag_response}\n\nSuggest search queries:"),
            ])
            search_queries_str = (await llm.ainvoke(search_query_prompt)).content.strip() # Use ainvoke
            print(f"Suggested search queries for scraping: {search_queries_str}")

            example_ipsec_urls = [
                "https://www.cloudflare.com/learning/network-layer/what-is-ipsec/",
                "https://www.cisco.com/c/en/us/products/security/ipsec.html",
                "https://en.wikipedia.org/wiki/IPsec",
                "https://www.geeksforgeeks.org/ipsec-security-architecture/"
            ]
            urls_for_scraping = example_ipsec_urls

            return {"scrape_needed": True, "urls_to_scrape": urls_for_scraping, "status_message": "Scraping deemed necessary."}
        else:
            return {"scrape_needed": False, "status_message": "Existing information sufficient."}
    except Exception as e:
        print(f"Error during sufficiency check: {e}")
        return {"scrape_needed": False, "status_message": f"Error during sufficiency check: {str(e)}"}

async def call_scraping_agent(state: AgentState) -> AgentState:
    """
    Asynchronously calls the scraping agent to acquire and index new data.
    """
    print("\n--- Agent 2: Calling Scraping Agent ---")
    urls_to_scrape = state["urls_to_scrape"]
    if not urls_to_scrape:
        print("No URLs provided for scraping. Skipping.")
        return {"scrape_needed": False, "status_message": "No URLs to scrape."}

    try:
        scraping_result = await scrape_and_index_urls_func(urls_to_scrape)
        print(f"Scraping Result: {scraping_result}")
        if "successfully" in scraping_result.lower():
            return {"status_message": "Scraping completed successfully. Re-evaluating.", "scrape_needed": False}
        else:
            return {"status_message": f"Scraping failed: {scraping_result}", "scrape_needed": False}
    except Exception as e:
        print(f"Error calling scraping agent function: {e}")
        return {"status_message": f"Error calling scraping agent function: {str(e)}", "scrape_needed": False}

async def finalize_response(state: AgentState) -> AgentState:
    """Prepares the final answer for the user."""
    print("\n--- Agent 2: Finalizing Response ---")
    return {"final_answer": state["rag_response"], "status_message": "Final answer prepared."}

### LangGraph Setup for Agent 2 Workflow

This cell defines the structure and flow of the RAG and Response Agent using LangGraph.

*   `workflow = StateGraph(AgentState)`: Initializes a `StateGraph` with the defined `AgentState`. This graph will manage the flow of the state dictionary between nodes.
*   `workflow.add_node("node_name", async_function)`: Adds each of the async functions defined in cell 7 as a named node in the graph.
*   `workflow.set_entry_point("retrieve_and_generate")`: Specifies the first node to be executed when the graph is invoked.
*   `workflow.add_conditional_edges(...)`: Defines edges (transitions) between nodes based on the state.
    *   From `retrieve_and_generate`, it transitions to `check_for_scraping` if `state["scrape_needed"]` is true (meaning the initial RAG might be insufficient), otherwise it transitions to `finalize_response`.
    *   From `check_for_scraping`, it transitions to `call_scraping_agent` if `state["scrape_needed"]` is true (meaning scraping was deemed necessary), otherwise it transitions to `finalize_response`.
*   `workflow.add_edge("call_scraping_agent", "retrieve_and_generate")`: Creates a direct edge from `call_scraping_agent` back to `retrieve_and_generate`. This means after attempting to scrape and index new data, the workflow returns to the RAG step to potentially generate a better answer with the new information.
*   `workflow.add_edge("finalize_response", END)`: Defines the final step. When the state reaches the `finalize_response` node, the workflow ends (`END`).
*   `app_langgraph = workflow.compile()`: Compiles the defined graph into a runnable object (`app_langgraph`). This optimized object can be asynchronously invoked to run the workflow.

This cell effectively wires together the individual agent components into a directed graph that represents the logic flow for processing a user query, potentially performing retrieval, checking for needed information, scraping if necessary, and then finalizing a response.

In [None]:
# --- 7. LangGraph Setup for Agent 2's Workflow ---

workflow = StateGraph(AgentState)

workflow.add_node("retrieve_and_generate", retrieve_and_generate)
workflow.add_node("check_for_scraping", check_for_scraping)
workflow.add_node("call_scraping_agent", call_scraping_agent)
workflow.add_node("finalize_response", finalize_response)

workflow.set_entry_point("retrieve_and_generate")

workflow.add_conditional_edges(
    "retrieve_and_generate",
    lambda state: "scrape_needed" if state["scrape_needed"] else "finalize_response",
    {
        "scrape_needed": "check_for_scraping",
        "finalize_response": "finalize_response",
    },
)

workflow.add_conditional_edges(
    "check_for_scraping",
    lambda state: "call_scraping" if state["scrape_needed"] else "finalize_response",
    {
        "call_scraping": "call_scraping_agent",
        "finalize_response": "finalize_response",
    },
)

workflow.add_edge("call_scraping_agent", "retrieve_and_generate")
workflow.add_edge("finalize_response", END)

app_langgraph = workflow.compile()

print("RAG and Response Agent (LangGraph workflow) defined.")

RAG and Response Agent (LangGraph workflow) defined.


### Initial Scrape Execution (For Testing RAG)

This commented-out cell shows how to trigger the initial data scraping process.

*   `asyncio.run(perform_initial_scrape())`: Calls the `perform_initial_scrape` async function (defined in cell 6) using `asyncio.run()`. This is necessary because `perform_initial_scrape` is an async function, and `asyncio.run()` executes an async function in a synchronous context, blocking until it's complete.
*   The purpose is to populate the vector store with initial IPSec data when the notebook is first run or when this cell is executed.
*   The print statements indicate the start and completion of the initial scrape process.

This ensures that the RAG system has some base knowledge about IPSec available from the beginning, even before any user queries are processed.

In [None]:
# --- Perform initial scrape at application start ---
# This ensures data is in the vector store before the API is ready.
# Use asyncio.run for calling async function in a sync context (like top-level script execution)
# If this were a FastAPI `startup` event, you'd use `await perform_initial_scrape()`.
#asyncio.run(perform_initial_scrape())
#print("Initial IPSec data loaded (if not already present).")


--- Performing initial scrape of IPSec data ---
Vector store already contains data. Skipping initial scrape.
Initial IPSec data loaded (if not already present).


### FastAPI Application Definition

This cell defines the FastAPI web application that will expose an API endpoint for interacting with the multi-agent RAG system.

*   `app = FastAPI(...)`: Initializes a FastAPI application instance with a title.
*   **`app.add_middleware(CORSMiddleware, ...)`**: Configures Cross-Origin Resource Sharing (CORS).
    *   `allow_origins=["*"]`: Allows requests from *any* origin. This is useful in development (like in Colab with ngrok) but should be restricted in production.
    *   `allow_credentials=True`: Allows cookies and authorization headers to be included in cross-origin requests.
    *   `allow_methods=["*"]`: Allows all standard HTTP methods (GET, POST, PUT, DELETE, OPTIONS, etc.). Explicitly allowing `OPTIONS` is often necessary for CORS preflight requests.
    *   `allow_headers=["*"]`: Allows all headers in cross-origin requests.
    *   This middleware is crucial for the HTML/JavaScript UI (running in your browser's origin) to be able to make API calls to the FastAPI server (running locally in Colab and exposed via the ngrok tunnel).
*   `class QueryRequest(BaseModel):`: Defines a Pydantic model for validating the incoming request body to the `/query` endpoint. It expects a JSON object with a `question` string and an optional `chat_history` list of dictionaries.
*   `@app.post("/query")`: This decorator defines an API endpoint at the `/query` path that accepts POST requests.
*   `async def process_query(request: QueryRequest):`: This is the asynchronous function that will handle incoming POST requests to `/query`. It takes the validated request body as a `QueryRequest` object.
    *   It processes the incoming `chat_history` from the request into LangChain's `HumanMessage` and `AIMessage` objects.
    *   It initializes the `initial_state` dictionary for the LangGraph workflow, populating it with the user's `question` and the processed `chat_history`. `scrape_needed` is set to `True` initially to trigger the workflow's check.
    *   `final_state = await app_langgraph.ainvoke(initial_state)`: This is the core logic, asynchronously invoking the LangGraph workflow (`app_langgraph`) with the initial state. The workflow executes its nodes and transitions based on the state.
    *   It extracts the `final_answer` and `status_message` from the `final_state` returned by the graph.
    *   It constructs the `updated_chat_history` to include the current question and the generated answer.
    *   It returns a JSON response containing the original question, the generated answer, the final status message, and the updated chat history (formatted back into dictionaries).
*   `@app.get("/")`: Defines a simple GET endpoint at the root path (`/`).
*   `async def read_root():`: Returns a simple welcome message as JSON for the root endpoint.

This cell sets up the HTTP interface for the multi-agent system, allowing external clients (like the chat UI) to send queries and receive responses.

In [None]:
# --- 8. FastAPI Application ---

app = FastAPI(title="Multi-Agent IPsec RAG System")

# Add CORS middleware
app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],  # Allows all origins
    allow_credentials=True,
    allow_methods=["*"],  # Allows all methods (GET, POST, PUT, DELETE, OPTIONS, etc.)
    allow_headers=["*"],  # Allows all headers
)

class QueryRequest(BaseModel):
    question: str
    chat_history: List[Dict[str, str]] = []

@app.post("/query")
async def process_query(request: QueryRequest):
    """
    Processes a user query using the multi-agent RAG system.
    """
    initial_chat_history = []
    for msg in request.chat_history:
        if msg.get("type") == "Human":
            initial_chat_history.append(HumanMessage(content=msg.get("content", "")))
        elif msg.get("type") == "AI_Nishown":
            initial_chat_history.append(AIMessage(content=msg.get("content", "")))

    initial_state = {
        "question": request.question,
        "chat_history": initial_chat_history,
        "rag_response": "",
        "urls_to_scrape": [],
        "scrape_needed": True, # Start with scrape_needed=True to always check initially
        "final_answer": "",
        "status_message": "Starting query processing."
    }

    try:
        final_state = await app_langgraph.ainvoke(initial_state)

        answer = final_state.get("final_answer", "Sorry, I could not generate a comprehensive answer.")
        status_message = final_state.get("status_message", "Processing complete.")

        updated_chat_history = initial_chat_history + [
            HumanMessage(content=request.question),
            AIMessage(content=answer)
        ]

        return {
            "question": request.question,
            "answer": answer,
            "status": status_message,
            "updated_chat_history": [{"type": "Human", "content": msg.content} if isinstance(msg, HumanMessage) else {"type": "AI_Nishown", "content": msg.content} for msg in updated_chat_history]
        }

    except Exception as e:
        print(f"Error during query processing: {e}")
        raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")

@app.get("/")
async def read_root():
    return {"message": "Welcome to the Multi-Agent IPsec RAG System API. Use /query to ask questions."}

### Run FastAPI with ngrok GUI (Option 1)

This cell contains the code to start the FastAPI server and the ngrok tunnel, and importantly, **it also includes the HTML/CSS/JavaScript code to display the chat UI directly within the Colab output.**

*   `async def run_fastapi_and_ui():`: An async function encapsulating the startup logic.
*   `port = 8000`: Defines the port on which FastAPI will run locally.
*   `public_url = ngrok.connect(port).public_url`: Uses `pyngrok` to start an ngrok tunnel that forwards traffic from a public ngrok URL to the local port 8000. It gets the generated public URL.
*   `print(...)`: Prints the ngrok public URL and the URL for the FastAPI interactive documentation (Swagger UI).
*   **`chat_ui_html = f"""..."""`**: This multiline f-string contains the full HTML, CSS (`<style>`), and JavaScript (`<script>`) for the chat interface.
    *   The HTML defines the basic structure: a container, header, message area, and input area.
    *   The CSS styles the elements to create a chat-like appearance, including fixing the header and input, making the message area scrollable (`overflow-y: auto`), and styling the message bubbles. Crucially, the `.chat-messages` CSS should ensure it can grow or has a defined height within the flex container to enable scrolling.
    *   The JavaScript handles user interaction:
        *   Getting references to UI elements.
        *   Storing `chatHistory` to maintain conversation context.
        *   `API_BASE_URL = '{public_url}'`: **This is key.** It dynamically inserts the ngrok public URL obtained from the Python code into the JavaScript, so the frontend knows where to send API requests.
        *   `sendMessage()`: An async function triggered by the send button or pressing Enter. It gets the user input, adds it to the UI, disables the input/button, shows a typing indicator, makes a `fetch` POST request to the `/query` endpoint of the FastAPI server (using `API_BASE_URL`), handles the response (adding the AI's answer to the UI and updating `chatHistory`), and finally re-enables input and hides the indicator.
        *   `addMessage()`: A helper function to append messages to the chat UI and ensure it scrolls to the bottom.
        *   Event listeners are attached to the send button click and input field keypress (for Enter).
*   **`display(HTML(chat_ui_html))`**: This is the command that renders the generated HTML directly in the Colab output cell. This is why you see the interactive chat interface appear below this cell's execution.
*   `uvicorn.run(app, host="0.0.0.0", port=port, log_level="info")`: This starts the uvicorn ASGI server, hosting the FastAPI application (`app`) on all available interfaces (`0.0.0.0`) on the specified `port`. **This call is blocking**, meaning the cell will appear to be continuously running as long as the server is active. Stopping the cell execution will stop the server.
*   `asyncio.run(run_fastapi_and_ui())`: Executes the main async function `run_fastapi_and_ui` in the Colab environment.

This cell provides a convenient way to run the entire application (backend and frontend UI display) from a single point in the notebook.

In [None]:
# --- 9. Run FastAPI with ngrok GUI (Option 1) ---
# This part starts the server.

async def run_fastapi_and_ui():
    port = 8000
    public_url = ngrok.connect(port).public_url
    print(f"Ngrok Tunnel URL: {public_url}")
    print(f"Access FastAPI Docs at: {public_url}/docs")

    # Define the HTML/CSS/JavaScript for the chat UI
    # Note: Using f-string for public_url in JS, correctly escaped for template literals
    chat_ui_html = f"""
    <!DOCTYPE html>
    <html>
    <head>
        <title>IPsec Chat</title>
        <style>
            body {{ font-family: Arial, sans-serif; margin: 0; padding: 0; background-color: #f4f7f6; display: flex; justify-content: center; align-items: center; min-height: 100vh; overflow: hidden; /* Prevent body scroll */ }}
            .chat-container {{
                width: 100%;
                max-width: 700px;
                background-color: #fff;
                border-radius: 8px;
                box-shadow: 0 4px 8px rgba(0,0,0,0.1);
                overflow: hidden;
                display: flex;
                flex-direction: column;
                height: 80vh; /* Adjust height as needed */
                max-height: 90vh; /* Add max height for larger screens */
            }}
            .chat-header {{ background-color: #4CAF50; color: white; padding: 15px; text-align: center; font-size: 1.2em; flex-shrink: 0; /* Prevent header from shrinking */ }}
            .chat-messages {{
                flex-grow: 1;
                padding: 20px;
                overflow-y: auto; /* Make this area scrollable vertically */
                background-color: #e9ecef;
                border-bottom: 1px solid #ddd;
                scroll-behavior: smooth; /* Smooth scrolling */
                /* Optional: Define a min/max height here if flex-grow isn't enough */
                min-height: 0; /* Allow flex item to shrink below content size */
            }}
            .message {{ margin-bottom: 15px; display: flex; }}
            .message.user {{ justify-content: flex-end; }}
            .message.ai {{ justify-content: flex-start; }}
            .message-bubble {{
                max-width: 70%;
                padding: 10px 15px;
                border-radius: 20px;
                word-wrap: break-word;
                line-height: 1.4;
            }}
            .message.user .message-bubble {{ background-color: #DCF8C6; color: #333; }}
            .message.ai .message-bubble {{ background-color: #FFF; color: #333; border: 1px solid #ddd; }}
            .chat-input-area {{ display: flex; padding: 15px; border-top: 1px solid #ddd; background-color: #f9f9f9; flex-shrink: 0; /* Prevent input area from shrinking */ }}
            .chat-input {{ flex-grow: 1; padding: 10px; border: 1px solid #ccc; border-radius: 20px; outline: none; font-size: 1em; }}
            .send-button {{ background-color: #4CAF50; color: white; border: none; padding: 10px 20px; border-radius: 20px; margin-left: 10px; cursor: pointer; font-size: 1em; transition: background-color 0.3s ease; }}
            .send-button:hover {{ background-color: #45a049; }}
            .status-message {{ text-align: center; margin-top: 10px; font-size: 0.9em; color: #555; }}
            .typing-indicator {{ margin-left: 10px; font-size: 0.9em; color: #888; display: none; }}
        </style>
    </head>
    <body>
        <div class="chat-container">
            <div class="chat-header">IPsec Chatbot</div>
            <div class="chat-messages" id="chatMessages">
                <div class="message ai"><div class="message-bubble">Hello! I'm AI_Nishown a multi-agent RAG system. Ask me anything about IPSec!</div></div>
            </div>
            <div class="chat-input-area">
                <input type="text" id="chatInput" class="chat-input" placeholder="Type your message...">
                <button id="sendButton" class="send-button">Send</button>
                <div id="typingIndicator" class="typing-indicator">Thinking...</div>
            </div>
        </div>

        <script>
            const chatMessages = document.getElementById('chatMessages');
            const chatInput = document.getElementById('chatInput');
            const sendButton = document.getElementById('sendButton');
            const typingIndicator = document.getElementById('typingIndicator');
            let chatHistory = []; // Stores conversation for context

            // Use the public_url from Python for API calls
            const API_BASE_URL = '{public_url}';

            async function sendMessage() {{
                const message = chatInput.value.trim();
                if (message === '') return;

                // Add user message to UI
                addMessage(message, 'user');
                chatInput.value = ''; // Clear input
                sendButton.disabled = true; // Disable button during processing
                typingIndicator.style.display = 'inline'; // Show typing indicator

                try {{
                    const response = await fetch(`${{API_BASE_URL}}/query`, {{ // Note the /query endpoint
                        method: 'POST',
                        headers: {{
                            'Content-Type': 'application/json'
                        }},
                        body: JSON.stringify({{
                            question: message,
                            chat_history: chatHistory
                        }})
                    }});

                    const data = await response.json();

                    if (response.ok) {{
                        addMessage(data.answer, 'AI_Nishown');
                        chatHistory = data.updated_chat_history; // Update chat history from backend
                    }} else {{
                        addMessage(`Error: ${{data.detail || response.statusText}}`, 'AI_Nishown');
                    }}
                }} catch (error) {{
                    console.error('Error sending message:', error);
                    addMessage('Sorry, an error occurred. Please try again.', 'ai');
                }} finally {{
                    sendButton.disabled = false; // Re-enable button
                    typingIndicator.style.display = 'none'; // Hide typing indicator
                    chatMessages.scrollTop = chatMessages.scrollHeight; // Scroll to bottom
                }}
            }}

            function addMessage(text, sender) {{
                const messageDiv = document.createElement('div');
                messageDiv.classList.add('message', sender);
                const bubbleDiv = document.createElement('div');
                bubbleDiv.classList.add('message-bubble');
                bubbleDiv.innerHTML = text; // Use innerHTML to render newlines from AI

                messageDiv.appendChild(bubbleDiv);
                chatMessages.appendChild(messageDiv);
                chatMessages.scrollTop = chatMessages.scrollHeight; // Scroll to bottom
            }}

            sendButton.addEventListener('click', sendMessage);
            chatInput.addEventListener('keypress', function(e) {{
                if (e.key === 'Enter') {{
                    sendMessage();
                }}
            }});
        </script>
    </body>
    </html>
    """
    display(HTML(chat_ui_html))

    # Start Uvicorn server directly in the main Colab thread.
    # This will block the cell, but keep the server running and accessible via ngrok.
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=port, log_level="info")

# To run the FastAPI app and display the UI in Colab
# This will start the FastAPI server and keep the Colab cell "running"
# indefinitely as long as the server is active.
asyncio.run(run_fastapi_and_ui())

### Run FastAPI with ngrok HTTPs (Option 2 - Alternative)

This cell provides an alternative way to run the FastAPI application and ngrok tunnel, focusing on the server execution without embedding the UI HTML directly.

*   `async def run_fastapi():`: An async function to start the server.
*   `port = 8000`: Defines the local port.
*   `public_url = ngrok.connect(port)`: Starts the ngrok tunnel. Note: This version of `ngrok.connect` might return an object from which you'd extract the URL (`public_url.public_url`).
*   `print(...)`: Prints the public ngrok URL and Swagger UI link.
*   `config = uvicorn.Config(app, host="0.0.0.0", port=port, log_level="info")`: Creates a Uvicorn configuration object for the FastAPI app.
*   `server = uvicorn.Server(config)`: Creates a Uvicorn server instance.
*   `await server.serve()`: Starts the Uvicorn server asynchronously. This line will block the cell execution until the server is stopped.
*   `#await run_fastapi()`: The call to execute the async function is commented out.

This cell is an alternative to the previous one if you wanted to run just the backend server and access it, for example, from an external application or perhaps a separate HTML file loaded elsewhere, rather than having the UI displayed directly in the Colab output.

In [None]:
# --- 9. Run FastAPI with ngrok HTTPs (Option 2) ---

print("FastAPI app defined. Now starting Uvicorn and ngrok...")

async def run_fastapi():
    port = 8000
    # Start ngrok tunnel
    public_url = ngrok.connect(port)
    print(f"Ngrok Tunnel URL: {public_url}")
    print(f"Access Swagger UI at: {public_url}/docs")

    # Start Uvicorn server in a separate thread/process
    # Use `use_subprocess=False` for Colab to avoid issues with separate processes
    config = uvicorn.Config(app, host="0.0.0.0", port=port, log_level="info")
    server = uvicorn.Server(config)
    await server.serve()

# To run the FastAPI app in Colab
# Await the coroutine directly in the notebook environment
#await run_fastapi()

FastAPI app defined. Now starting Uvicorn and ngrok...


### Examples for Alternative 2

# **First Query:**
```
curl -X POST "http://127.0.0.1:8000/query" \
-H "Content-Type: application/json" \
-d '{
  "question": "What are the main components of IPSec?"
}'
```

# **Second Query:**
```
curl -X POST "http://127.0.0.1:8000/query" \
-H "Content-Type: application/json" \
-d '{
  "question": "Tell me about the Authentication Header (AH) and Encapsulating Security Payload (ESP) in IPSec.",
  "chat_history": [
    {"type": "human", "content": "What are the main components of IPSec?"},
    {"type": "ai", "content": "IPSec has several key components, including security protocols like Authentication Header (AH) and Encapsulating Security Payload (ESP), Security Associations (SAs), and key management protocols like IKE. Would you like more detail on any of these?"}
  ]
}'
```

