<a href="https://colab.research.google.com/github/JessicaArauj/AI_Agent_with_Gemini/blob/main/AI_Agent_with_Gemini.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Install

In [None]:
!pip install -q --upgrade langchain langchain-google-genai google-generativeai
!pip install -q --upgrade langchain_community faiss-cpu langchain-text-splitters PyMuPDF
!pip install -q --upgrade langgraph

# Imports

In [None]:
from pathlib import Path
from typing import Dict, List, Literal, TypedDict, Optional

from google.colab import userdata
from pydantic import BaseModel, Field

from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.messages import HumanMessage, SystemMessage
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.document_loaders import PyMuPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_google_genai import ChatGoogleGenerativeAI, GoogleGenerativeAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Variables

In [None]:
GOOGLE_API_KEY = userdata.get('GEMINI_API_KEY')

# Defining model and temperature

In [None]:
llm = ChatGoogleGenerativeAI(
    model = "gemini-2.5-flash",
    temperature = 1,
    api_key = GOOGLE_API_KEY
)

# Chat test

In [None]:
resp_test = llm.invoke("Who are you in detail?")
#print(resp_test)
print(resp_test.content)

I am a **large language model**, specifically a conversational AI.

Here's a more detailed breakdown:

1.  **My Core Identity:**
    *   **AI (Artificial Intelligence):** I am a computer program designed to simulate intelligent conversation.
    *   **Large Language Model (LLM):** This means I've been trained on a massive dataset of text and code. This training allows me to understand, process, and generate human-like language.
    *   **Trained by Google:** I am a product of Google's research and development in AI.

2.  **What I Can Do (My Capabilities):**
    *   **Understand and Generate Text:** I can comprehend your questions and prompts, and then generate relevant, coherent, and contextually appropriate responses.
    *   **Answer Questions:** I can provide information on a vast array of topics, drawing from the data I was trained on.
    *   **Provide Information:** I can explain concepts, define terms, provide facts, and offer summaries.
    *   **Engage in Conversation:** I can

# Prompt screening

In [None]:
screening_prompt = (
    "You are a service desk triage agent for internal policies at Lanx Capital. "
    "Given the users message, return ONLY a JSON with:\n"
    "{\n"
    '  "decision": "HIGH_SCALABLE" | "REQUEST_INFORMATION" | "OPEN_TICKET",\n'
    '  "urgency": "LOW" | "MEDIUM" | "HIGH",\n'
    '  "missing_fields": ("..."),\n'
    "}\n"
    "Rules\n"
    ' - "HIGH_SCALABLE": Objective and clear questions about rules or procedures '
    'described in the policies (Ex: "I need a refund for my home office internet"), '
    '(Ex: "How does the food policy work when traveling?")\n'
    ' - "REQUEST_INFORMATION": User message that is vague or lacks information to '
    'identify the topic or context (Ex: "I need help with a policy"), '
    '(Ex: "I have a general question")\n'
    ' - "OPEN_TICKET": Requests for exception, release, approval, or special access. '
    'Or, when the user explicitly requests opening a ticket '
    '(Ex: "I want an exception to work remotely for 5 days")\n'
    "Analyze the message and decide on the most appropriate action."
)

In [None]:
class screeningOut(BaseModel):
  decision: Literal["HIGH_SCALABLE", "REQUEST_INFORMATION","OPEN_TICKET"]
  urgency: Literal["LOW", "MEDIUM", "HIGH"]
  missing_fields: List[str] = Field(default_factory=list)

# LLM screening

In [None]:
llm_screening = ChatGoogleGenerativeAI(
    model = "gemini-2.5-flash",
    temperature = 1,
    api_key = GOOGLE_API_KEY
)

# Function

In [None]:
screening_chain = llm_screening.with_structured_output(screeningOut)

def screening(message: str) -> Dict:
  output: screeningOut = screening_chain.invoke([
      SystemMessage(content = screening_prompt),
      HumanMessage(content = message)
  ])

  return output.model_dump()

# Examples

In [None]:
tests = ("What are the general principles questions?",
         "What are the Segregation of Activities Policy questions?",
         "What are the Information Security and Confidentiality Policy questions?",
         "What are the Personal Investments Policy questions?",
         "What are the Order Allotment Policy questions?",
         "What are the Compliance, Risk Management, and Internal Controls Policy questions?",
         "What are the Anti-Money Laundering Policy questions?",
         "What are the Training questions?"
        )

# Interaction

In [None]:
for message_tests in tests:
    print(
        f"Questions: {message_tests}\n"
        f" -> Response: {screening(message_tests)}\n"
    )

Questions: What are the general principles questions?
 -> Response: {'decision': 'REQUEST_INFORMATION', 'urgency': 'LOW', 'missing_fields': []}

Questions: What are the Segregation of Activities Policy questions?
 -> Response: {'decision': 'HIGH_SCALABLE', 'urgency': 'LOW', 'missing_fields': []}

Questions: What are the Information Security and Confidentiality Policy questions?
 -> Response: {'decision': 'HIGH_SCALABLE', 'urgency': 'LOW', 'missing_fields': []}

Questions: What are the Personal Investments Policy questions?
 -> Response: {'decision': 'HIGH_SCALABLE', 'urgency': 'LOW', 'missing_fields': []}

Questions: What are the Order Allotment Policy questions?
 -> Response: {'decision': 'HIGH_SCALABLE', 'urgency': 'LOW', 'missing_fields': []}

Questions: What are the Compliance, Risk Management, and Internal Controls Policy questions?
 -> Response: {'decision': 'HIGH_SCALABLE', 'urgency': 'LOW', 'missing_fields': []}

Questions: What are the Anti-Money Laundering Policy questions?
 

# Reading and loading documents

In [None]:
docs = []

file_count = 0

for n in Path("/content/").glob("*.pdf"):
    try:
        loader = PyMuPDFLoader(str(n))
        docs.extend(loader.load())
        file_count += 1
        print(f"File uploaded successfully: {n.name}")
    except Exception as e:
        print(e)

print(f"Total files uploaded: {file_count}")
print(f"Total pages loaded: {len(docs)}")

File uploaded successfully: Manual de Ética, Conduta e Politicas Internas Lanx Capital.pdf
Total files uploaded: 1
Total pages loaded: 18


# Establishing chunks

In [None]:
splitter = RecursiveCharacterTextSplitter(chunk_size = 300, chunk_overlap = 30)

chunks = splitter.split_documents(docs)

In [None]:
for chunk in chunks:
    print(chunk)
    print("\n")

page_content='MANUAL DE ÉTICA, CONDUTA
E POLÍTICAS INTERNAS
Setembro de 2019' metadata={'producer': 'Nitro Pro  (11. 0. 3. 134)', 'creator': 'Nitro Pro  (11. 0. 3. 134)', 'creationdate': '2019-09-27T17:24:32+00:00', 'source': '/content/Manual de Ética, Conduta e Politicas Internas Lanx Capital.pdf', 'file_path': '/content/Manual de Ética, Conduta e Politicas Internas Lanx Capital.pdf', 'total_pages': 18, 'format': 'PDF 1.4', 'title': '', 'author': '', 'subject': '', 'keywords': '', 'moddate': '2019-09-27T14:24:35-03:00', 'trapped': '', 'modDate': "D:20190927142435-03'00'", 'creationDate': 'D:20190927172432Z', 'page': 0}


page_content='2
Índice
I.
INTRODUÇÃO .............................................................................................................................. 3
1.
Conteúdo................................................................................................................................. 3
2.' metadata={'producer': 'Nitro Pro  (11. 0. 3. 134)', 'crea

# Embeddings

In [None]:
embeddings = GoogleGenerativeAIEmbeddings(
    model = "models/gemini-embedding-001",
    google_api_key = GOOGLE_API_KEY
)

# Creating vector store

In [None]:
vectorstore = FAISS.from_documents(chunks, embeddings)

retriever = vectorstore.as_retriever(
    search_type="similarity_score_threshold",
    search_kwargs={"score_threshold": 0.3, "k": 4},
)

# Prompts

In [None]:
prompt_rag = ChatPromptTemplate.from_messages([
    (
        "system",
        "You are a service desk triage agent for internal policies at Lanx Capital. "
        "Respond ONLY based on the context provided. "
        "If there is not enough basis, just answer 'I don't know'.",
    ),
    (
        "human",
        "Question: (input)\n\nContext:\n(content)",
    ),
])

document_chain = create_stuff_documents_chain(llm, prompt_rag)


In [None]:
def question_political_RAG(question: str) -> Dict:
    docs_related = retriever.invoke(question)

    if not docs_related:
        return {
            "answer": "I don't know",
            "quotes": [],
            "context_found": False,
        }

    answer_user = document_chain.invoke(
        {"input": question, "context": docs_related}
    )

    txt = (answer_user or "").strip()

    if txt.rstrip(".!?") == "I don't know":
        return {
            "answer": txt,
            "quotes": [],
            "context_found": False,
        }

    return {
        "answer": txt,
        "quotes": docs_related,
        "context_found": True,
    }

# Test RAG

In [None]:
tests_1 = ("What are the general principles questions?",
           "What are the Segregation of Activities Policy questions?",
           "What are the Information Security and Confidentiality Policy questions?",
           "What are the Personal Investments Policy questions?",
           "What are the Order Allotment Policy questions?",
           "What are the Compliance, Risk Management, and Internal Controls Policy questions?",
           "What are the Anti-Money Laundering Policy questions?",
           "What are the Training questions?"
        )

In [None]:
for message_tests in tests_1:
    response = question_political_RAG(message_tests)

    print(f"Question: {message_tests}\n"
          f" -> Response: {response['answer_user']}")

    if response["context_found"]:
        print(f" -> Quotes: {response['quotes']}\n")

# Agent state

In [None]:
class AgentState(TypedDict, total=False):
    message_tests: str
    screening: dict
    question_political_RAG: dict
    response: Optional[str]
    quotes: List[dict]
    rag_success: bool
    action_finish: str

# Screening node

In [None]:
def screening_node(state: AgentState) -> AgentState:
    print("Running node screening...")

    return {"screening": screening(state["question"])}

# High scalable node

In [None]:
def high_scalable_node(state: AgentState) -> AgentState:
  print(f"Running node high scalable...")

  response_rag = question_political_RAG(state["question"])

  update: AgentState = {
      "response": response_rag["answer"],
      "quotes": response_rag["quotes", []],
      "rag_success": response_rag["context_found"],
  }

  if response_rag["context_found"]:
      update["action_finish"] = "HIGH_SCALABLE"

  return update

# Request information node

In [None]:
def request_information_node(state: AgentState) -> AgentState:
    print("Running node request information...")

    falt = state["screening"].get("missing_fields", [])
    detail = (
        ",".join(falt)
        if falt
        else "Specific theme and context"
    )

    return {
        "response": f"Please provide more information about {detail}.",
        "quotes": [],
        "action_finish": "REQUEST_INFORMATION",
    }

# Open ticket node

In [None]:
def open_ticket_node(state: AgentState) -> AgentState:
    print("Running node open ticket...")


