# Lesson 4
## New .env variables
you need to add three new variables to you r .env file, if you didnt have done so yet

```env
HOSPITAL_AGENT_MODEL=gpt-3.5-turbo-1106
HOSPITAL_CYPHER_MODEL=gpt-3.5-turbo-1106
HOSPITAL_QA_MODEL=gpt-3.5-turbo-0125
```
Your .env file now includes variables that specify which LLM you’ll use for different components of your chatbot. You’ve specified these models as environment variables so that you can easily switch between different OpenAI models without changing any code. Keep in mind, however, that each LLM might benefit from a unique prompting strategy, so you might need to modify your prompts if you plan on using a different suite of LLMs.

## RAG search.
We will demonstrate how to implement a RAG pipeline where user questions about hospital reviews are answered using both retrieval (vector search) and generation (LLM).
To do that we will create a Document Chain +LLM Chain that implement RAG to manage tue queries about the reviews and we will use Neo4j as the vector database.

### Vector Search Indexes
Vector databases are specifically designed to store and query high-dimensional vectors, which are numerical representations of unstructured data like text, images, or audio. Vector search indexes enable efficient similarity searches (semantic searches) based on that semantically similar objects are represented by vectors that are close to each other in the vector space.

Neo4j, a graph database, has added vector search capabilities as an add-on. Neo4j's [Vector search indexes](https://neo4j.com/docs/cypher-manual/current/indexes-for-vector-search/) allows developers to create vector indexes. These indexes work with node or relationship properties that contain vector embeddings

### Creating the embeddings
In LangChain, you can use Neo4jVector to create review embeddings and returns a "vector store" to query it. Here’s the code to create the indexes.

The method takes several parameters:
- embedding: an instance of OpenAIEmbeddings, which will be used to generate embeddings for the text data. The specific embedding model used will depend on the requirements of your project, such as the type of text data, the desired level of accuracy, and the computational resources available. We use it as we have openai api key but there are others in the langchain library.
    - HuggingFaceEmbeddings: uses Hugging Face's Transformers library to generate embeddings
    - SentenceTransformersEmbeddings: uses the Sentence Transformers library to generate embeddings
    - CustomEmbeddings: a custom implementation of the Embeddings interface, allowing you to use your own embedding model
    - Word2VecEmbeddings
    - GloVeEmbeddings
    - FastTextEmbeddings
    - BERTEmbeddings
- url, username, password: these are used to connect to the Neo4j database, and are retrieved from environment variables (NEO4J_URI, NEO4J_USERNAME, NEO4J_PASSWORD).
- index_name: the name of the vector index to create ("reviews").
- node_label: the label of the nodes in the graph database that will be indexed ("Review").
- text_node_properties: a list of properties on the nodes that contain text data to be indexed (["physician_name", "patient_name", "text", "hospital_name"]).
- embedding_node_property: the property on the nodes where the generated embeddings will be stored ("embedding").

---
> ⚠️ **Note:** `from_existing_graph` is primarily used when your text data is already in the Neo4j graph, but the embeddings and the vector index might not yet exist or need to be generated by the LangChain process. It seems to automate querying Neo4j to get the relevant text data, calculate embeddings for that text, storing those newly generated embeddings back into the specified node property in Neo4j and creating the necessary vector index if it doesn't exist. Therefore, if the embeddings and the corresponding vector index already fully exist and are populated in your Neo4j database, the method you would typically use is `Neo4jVector.from_existing_index`, not `from_existing_graph`
---



In [8]:
import os
from langchain.vectorstores.neo4j_vector import Neo4jVector
from langchain_openai import OpenAIEmbeddings


neo4j_vector_index = Neo4jVector.from_existing_graph(
    embedding=OpenAIEmbeddings(),
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"),
    password=os.getenv("NEO4J_PASSWORD"),
    index_name="reviews",
    node_label="Review",
    text_node_properties=[
        "physician_name",
        "patient_name",
        "text",
        "hospital_name",
    ],
    embedding_node_property="embedding",
)



In [2]:
neo4j_vector_index

<langchain_community.vectorstores.neo4j_vector.Neo4jVector at 0x7fcbf4cc10d0>

If you filled the database in the previous lesson, after running this you will see that the embeddings had been created as an index of the review text 

![embeddings.png](embeddings.png)

This allows you to answer questions like "Which hospitals have had positive reviews?" It also allows the LLM to tell you which patient and physician wrote reviews matching your question.

### The retriever
To use the Retrieval Augmented Generation we need to create a retriever based in this vector store we just created.

In [9]:

from langchain.chains import RetrievalQA
from langchain_openai import ChatOpenAI
from langchain.prompts import (
    PromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    ChatPromptTemplate,
)
HOSPITAL_QA_MODEL = os.getenv("HOSPITAL_QA_MODEL") # gpt-3.5-turbo-0125

# chat prompt template with system and human messages
review_template = """Your job is to use patient
reviews to answer questions about their experience at a hospital. Use
the following context to answer questions. Be as detailed as possible, but
don't make up any information that's not from the context. If you don't know
an answer, say you don't know.
{context}
"""

review_system_prompt = SystemMessagePromptTemplate(
    prompt=PromptTemplate(input_variables=["context"], template=review_template)
)

review_human_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate(input_variables=["question"], template="{question}")
)
messages = [review_system_prompt, review_human_prompt]

review_prompt = ChatPromptTemplate(
    input_variables=["context", "question"], messages=messages
)

# the document chain, based in our vector index
reviews_vector_chain = RetrievalQA.from_chain_type(
    llm=ChatOpenAI(model=HOSPITAL_QA_MODEL, temperature=0),
    chain_type="stuff",
    retriever=neo4j_vector_index.as_retriever(k=12),
)

# the complete review chain
reviews_vector_chain.combine_documents_chain.llm_chain.prompt = review_prompt

The last part is setting up a RetrievalQA chain, what is called a document chain.

**The retriever/document chain**

* `reviews_vector_chain` is the instance of `RetrievalQA`
* The `from_chain_type` defines the chain type (pass all the docs found entirely, pass only relevant chunks of it, ...). In this case, the chain type is `"stuff"` the standard one, it passes all retrieved data as context. Different chain types are "stuff", "map refine", and "rerank", with the later are alternative complex ones to avoid overflowing the context window 
* The `llm` parameter specifies the language model that will be used specifically for the question-answering system and its temperature.
* The `retriever` parameter specifies the vector store (retriever) that will be used to fish the relevant documents from the database. In this case, it's an instance of `neo4j_vector_index.as_retriever(k=12)`, which is a retriever that uses the Neo4j vector index to gather documents. The `k` parameter specifies the number of documents to retrieve (12).

**Combining the documents chain and LLM chain**

* The `combine_documents_chain` method is used to combine the documents chain (which retrieves relevant documents from the database) with the LLM chain (which uses the language model to generate answers).
* The `llm_chain.prompt` parameter specifies the prompt that will be used for the LLM chain. In this case, it's set to the `review_prompt` instance that was defined earlier.


In [10]:
query = """What have patients said about hospital efficiency?
        Mention details from specific reviews."""

response = reviews_vector_chain.invoke(query)
response.get("result")

"Patients have generally praised the efficiency of the hospital staff and the cleanliness of the facilities. For example, Kevin Cox mentioned that the hospital staff at Wallace-Hamilton was efficient, and the facilities were clean. Similarly, Kim Franklin also noted that the hospital staff at Wallace-Hamilton was efficient, and the facilities were clean. These reviews highlight a positive aspect of the hospital's operations in terms of efficiency and cleanliness."

## DB query search.
In this part we will create the Cypher generation chain that you’ll use to answer queries about structured hospital system data.

Neo4j Cypher chain will accept a user’s natural language query, convert the natural language query to a Cypher query, run the Cypher query in Neo4j, and use the Cypher query results to respond to the user’s query. You’ll leverage LangChain’s ```GraphCypherQAChain``` for this. 

If we were to use a normal SQL database we would use langchain's own ```SQLDatabase``` instead ```Neo4jGraph``` & ```SQLDatabaseChain``` instead ```GraphCypherQAChain```

### DB Connection neo4j-style
To get started we need first to have a LLM capable database connection with ```Neo4jGraph```:

In [11]:
import os
from langchain_community.graphs import Neo4jGraph

graph = Neo4jGraph(
    url=os.getenv("NEO4J_URI"),
    username=os.getenv("NEO4J_USERNAME"),
    password=os.getenv("NEO4J_PASSWORD"),
)

graph.refresh_schema()

The Neo4jGraph object is a LangChain wrapper that allows LLMs to execute queries on your Neo4j instance. You instantiate graph using your Neo4j credentials, and you call graph.refresh_schema() to sync any recent changes to your instance.

### cypher generation template. 
We want the LLM to generate a Cypher (neo4j query language) query based in a human prompt.

This needs heavy prompting engineering and serious trial and error:
- we need the answer to provide the query and only the query
- It should only query, not insert, update or delete
- we rather it use alias to make the query readable for debug purposes
- we need to inform beforehand of the equivalent in the DB of certain status, businmess particulars that are not self evident from the schema i.e: 
    - visit is concluded if has a discharge date
    - payment was rejected/pending if it does not have the service_date in the COVERED BY relationship
    - Use abbreviations when filtering on hospital states (e.g. "Texas" is "TX", "Colorado" is "CO", "North Carolina" is "NC",...)
    - list some categorical values of a few node properties for it to know beforehand
    - etc.
    
All of the detail you provide in your prompt template improves the LLM’s chance of generating a correct Cypher query for a given question. If you’re curious about how necessary all this detail is, try creating your own prompt template with as few details as possible. Then run questions through your Cypher chain and see whether it correctly generates Cypher queries.

In [12]:
from langchain.prompts import PromptTemplate

cypher_generation_template = """
Task:
Generate Cypher query for a Neo4j graph database.

Instructions:
Use only the provided relationship types and properties in the schema.
Do not use any other relationship types or properties that are not provided.

Schema:
{schema}

Note:
Do not include any explanations or apologies in your responses.
Do not respond to any questions that might ask anything other than
for you to construct a Cypher statement. Do not include any text except
the generated Cypher statement. Make sure the direction of the relationship is
correct in your queries. Make sure you alias both entities and relationships
properly. Do not run any queries that would add to or delete from
the database. Make sure to alias all statements that follow as with
statement (e.g. WITH v as visit, c.billing_amount as billing_amount)
If you need to divide numbers, make sure to
filter the denominator to be non zero.

Examples:
# Who is the oldest patient and how old are they?
MATCH (p:Patient)
RETURN p.name AS oldest_patient,
       duration.between(date(p.dob), date()).years AS age
ORDER BY age DESC
LIMIT 1

# Which physician has billed the least to Cigna
MATCH (p:Payer)<-[c:COVERED_BY]-(v:Visit)-[t:TREATS]-(phy:Physician)
WHERE p.name = 'Cigna'
RETURN phy.name AS physician_name, SUM(c.billing_amount) AS total_billed
ORDER BY total_billed
LIMIT 1

# Which state had the largest percent increase in Cigna visits
# from 2022 to 2023?
MATCH (h:Hospital)<-[:AT]-(v:Visit)-[:COVERED_BY]->(p:Payer)
WHERE p.name = 'Cigna' AND v.admission_date >= '2022-01-01' AND
v.admission_date < '2024-01-01'
WITH h.state_name AS state, COUNT(v) AS visit_count,
     SUM(CASE WHEN v.admission_date >= '2022-01-01' AND
     v.admission_date < '2023-01-01' THEN 1 ELSE 0 END) AS count_2022,
     SUM(CASE WHEN v.admission_date >= '2023-01-01' AND
     v.admission_date < '2024-01-01' THEN 1 ELSE 0 END) AS count_2023
WITH state, visit_count, count_2022, count_2023,
     (toFloat(count_2023) - toFloat(count_2022)) / toFloat(count_2022) * 100
     AS percent_increase
RETURN state, percent_increase
ORDER BY percent_increase DESC
LIMIT 1

# How many non-emergency patients in North Carolina have written reviews?
MATCH (r:Review)<-[:WRITES]-(v:Visit)-[:AT]->(h:Hospital)
WHERE h.state_name = 'NC' and v.admission_type <> 'Emergency'
RETURN count(*)

String category values:
Test results are one of: 'Inconclusive', 'Normal', 'Abnormal'
Visit statuses are one of: 'OPEN', 'DISCHARGED'
Admission Types are one of: 'Elective', 'Emergency', 'Urgent'
Payer names are one of: 'Cigna', 'Blue Cross', 'UnitedHealthcare', 'Medicare',
'Aetna'

A visit is considered open if its status is 'OPEN' and the discharge date is
missing.
Use abbreviations when
filtering on hospital states (e.g. "Texas" is "TX",
"Colorado" is "CO", "North Carolina" is "NC",
"Florida" is "FL", "Georgia" is "GA", etc.)

Make sure to use IS NULL or IS NOT NULL when analyzing missing properties.
Never return embedding properties in your queries. You must never include the
statement "GROUP BY" in your query. Make sure to alias all statements that
follow as with statement (e.g. WITH v as visit, c.billing_amount as
billing_amount)
If you need to divide numbers, make sure to filter the denominator to be non
zero.

The question is:
{question}
"""

cypher_generation_prompt = PromptTemplate(
    input_variables=["schema", "question"], template=cypher_generation_template
)


---
> ⚠️ **Note:** The above prompt template provides the LLM with four examples of valid Cypher queries for your graph. Giving the LLM a few examples and then asking it to perform a task is known as few-shot prompting, and it’s a simple yet powerful technique for improving generation accuracy. However, few-shot prompting might not be sufficient for Cypher query generation, especially if you have a complicated graph. One way to improve this is to create a vector database that embeds example user questions/queries and stores their corresponding Cypher queries as metadata. When a user asks a question, you inject Cypher queries from semantically similar questions into the prompt, providing the LLM with the most relevant examples needed to answer the current question.
---

### cypher results interpretation template
Next, you define the prompt template for the question-answer component of your chain. This template tells the LLM to use the Cypher query results from launching the query obtained with the previous prompt to generate a nicely-formatted answer to the user’s query. Again heavy prompting required, but not quite. 

In [13]:
qa_generation_template = """You are an assistant that takes the results
from a Neo4j Cypher query and forms a human-readable response. The
query results section contains the results of a Cypher query that was
generated based on a user's natural language question. The provided
information is authoritative, you must never doubt it or try to use
your internal knowledge to correct it. Make the answer sound like a
response to the question.

Query Results:
{context}

Question:
{question}

If the provided information is empty, say you don't know the answer.
Empty information looks like this: []

If the information is not empty, you must provide an answer using the
results. If the question involves a time duration, assume the query
results are in units of days unless otherwise specified.

When names are provided in the query results, such as hospital names,
beware  of any names that have commas or other punctuation in them.
For instance, 'Jones, Brown and Murray' is a single hospital name,
not multiple hospitals. Make sure you return any list of names in
a way that isn't ambiguous and allows someone to tell what the full
names are.

Never say you don't have the right information if there is data in
the query results. Always use the data in the query results.

Helpful Answer:
"""

qa_generation_prompt = PromptTemplate(
    input_variables=["context", "question"], template=qa_generation_template
)

### The GraphCypherQAChain
This chain will integrate both parts: the creation of the query and the parse of the results retrieved by launching it.

Here’s a breakdown of the parameters used in GraphCypherQAChain.from_llm():

- cypher_llm: The LLM used to generate Cypher queries.
- qa_llm: The LLM used to generate an answer given Cypher query results.
- graph: The Neo4jGraph object that connects to your Neo4j instance. The database connection if you will.
- verbose: Whether intermediate steps your chain performs should be printed.
- qa_prompt: The prompt template for responding to questions/queries.
- cypher_prompt: The prompt template for generating Cypher queries.
- validate_cypher: If true, the Cypher query will be inspected for errors and corrected before running. Note that this doesn’t guarantee the Cypher query will be valid. Instead, it corrects simple syntax errors that are easily detectable using regular expressions.
- top_k: The number of query results to include in qa_prompt.

In [14]:
from langchain_openai import ChatOpenAI
from langchain.chains import GraphCypherQAChain

HOSPITAL_QA_MODEL = os.getenv("HOSPITAL_QA_MODEL")
HOSPITAL_CYPHER_MODEL = os.getenv("HOSPITAL_CYPHER_MODEL")

hospital_cypher_chain = GraphCypherQAChain.from_llm(
    cypher_llm=ChatOpenAI(model=HOSPITAL_CYPHER_MODEL, temperature=0),
    qa_llm=ChatOpenAI(model=HOSPITAL_QA_MODEL, temperature=0),
    graph=graph,
    verbose=True,
    qa_prompt=qa_generation_prompt,
    cypher_prompt=cypher_generation_prompt,
    validate_cypher=True,
    top_k=100,
)

In [15]:
question = """What is the average visit duration for emergency visits in North Carolina?"""
response = hospital_cypher_chain.invoke(question)
response.get("result")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (v:Visit)-[:AT]->(h:Hospital)
WHERE h.state_name = 'NC' AND v.admission_type = 'Emergency' AND v.status = 'DISCHARGED'
WITH v, duration.between(date(v.admission_date), date(v.discharge_date)).days AS visit_duration
RETURN AVG(visit_duration) AS average_visit_duration[0m
Full Context:
[32;1m[1;3m[{'average_visit_duration': 15.072972972972991}][0m

[1m> Finished chain.[0m


'The average visit duration for emergency visits in North Carolina is approximately 15.07 days.'

In [16]:
question = """Which state had the largest percent increase in Medicaid visits from 2022 to 2023?"""
response = hospital_cypher_chain.invoke(question)
response.get("result")



[1m> Entering new GraphCypherQAChain chain...[0m
Generated Cypher:
[32;1m[1;3mMATCH (h:Hospital)<-[:AT]-(v:Visit)-[:COVERED_BY]->(p:Payer)
WHERE p.name = 'Medicaid' AND v.admission_date >= '2022-01-01' AND
v.admission_date < '2024-01-01'
WITH h.state_name AS state, COUNT(v) AS visit_count,
     SUM(CASE WHEN v.admission_date >= '2022-01-01' AND
     v.admission_date < '2023-01-01' THEN 1 ELSE 0 END) AS count_2022,
     SUM(CASE WHEN v.admission_date >= '2023-01-01' AND
     v.admission_date < '2024-01-01' THEN 1 ELSE 0 END) AS count_2023
WITH state, visit_count, count_2022, count_2023,
     (toFloat(count_2023) - toFloat(count_2022)) / toFloat(count_2022) * 100
     AS percent_increase
RETURN state, percent_increase
ORDER BY percent_increase DESC
LIMIT 1[0m
Full Context:
[32;1m[1;3m[{'state': 'TX', 'percent_increase': 8.823529411764707}][0m

[1m> Finished chain.[0m


'The state with the largest percent increase in Medicaid visits from 2022 to 2023 is Texas (TX) with a percent increase of 8.82%.'

## Wait Time Functions/Search
This part will be to simulate the LLM interactions with other APIs, trully agent-like. It will answer questions about hospital wait times. You’ll write two functions for this: 
- one that simulates finding the current wait time at a hospital
- and another that finds the hospital with the shortest wait time.

### Hospital name retriever
First we need to get the hospital names from the same Neo4j database

In [17]:
import os
from langchain_community.graphs import Neo4jGraph

def _get_current_hospitals() -> list[str]:
    """Fetch a list of current hospital names from a Neo4j database."""
    graph = Neo4jGraph(
        url=os.getenv("NEO4J_URI"),
        username=os.getenv("NEO4J_USERNAME"),
        password=os.getenv("NEO4J_PASSWORD"),
    )

    current_hospitals = graph.query(
        """
        MATCH (h:Hospital)
        RETURN h.name AS hospital_name
        """
    )

    return [d["hospital_name"].lower() for d in current_hospitals]

_get_current_hospitals()[:5]

['burke, griffin and cooper',
 'walton llc',
 'garcia ltd',
 'jones, brown and murray',
 'boyd plc']

### Time functions
(Simulated) 

In [18]:
import numpy as np
from typing import Any
def _get_current_wait_time_minutes(hospital: str) -> int:
    """
    Get the current wait time at a hospital in minutes.
    This will be a real API call in an application.
    """
    current_hospitals = _get_current_hospitals()

    if hospital.lower() not in current_hospitals:
        return -1

    return np.random.randint(low=0, high=600)


def get_current_wait_times(hospital: str) -> str:
    """Get the current wait time at a hospital formatted as a string."""
    wait_time_in_minutes = _get_current_wait_time_minutes(hospital)

    if wait_time_in_minutes == -1:
        return f"Hospital '{hospital}' does not exist."

    hours, minutes = divmod(wait_time_in_minutes, 60)

    if hours > 0:
        return f"{hours} hours {minutes} minutes"
    else:
        return f"{minutes} minutes"
    

def get_most_available_hospital(_: Any) -> dict[str, float]:
    """Find the hospital with the shortest wait time."""
    current_hospitals = _get_current_hospitals()

    current_wait_times = [
        _get_current_wait_time_minutes(h) for h in current_hospitals
    ]

    best_time_idx = np.argmin(current_wait_times)
    best_hospital = current_hospitals[best_time_idx]
    best_wait_time = current_wait_times[best_time_idx]

    return {best_hospital: best_wait_time}


In [19]:
get_current_wait_times("Wallace-Hamilton")

'1 hours 34 minutes'

In [20]:
get_most_available_hospital(None)

{'bell, mcknight and willis': 34}

## Creation of the agent
We are going to create the agent. Depending on the query you give it, your agent needs to decide between your Cypher/DB Query chain, reviews/RAG chain, and wait times/API functions. Another tuesday for an AI Agent.

Start by loading your agent’s dependencies, reading in the agent model name from an environment variable, and loading a prompt template from LangChain Hub: https://smith.langchain.com/hub/hwchase17/openai-functions-agent

HOSPITAL_AGENT_MODEL is the LLM that will act as your agent’s brain, deciding which tools to call and what inputs to pass them.

Instead of defining your own prompt for the agent, which you can certainly do, you load a predefined prompt from LangChain Hub. LangChain hub lets you upload, browse, pull, test, and manage prompts. In this case, the default prompt for OpenAI function agents works great.

### The agent prompt
The prompt is 
```python
ChatPromptTemplate(
    input_variables=['agent_scratchpad', 'input'], 
    input_types={'chat_history': typing.List[typing.Union[langchain_core.messages.ai.AIMessage, 
                                            langchain_core.messages.human.HumanMessage, 
                                            langchain_core.messages.chat.ChatMessage, 
                                            langchain_core.messages.system.SystemMessage, 
                                            langchain_core.messages.function.FunctionMessage, 
                                            langchain_core.messages.tool.ToolMessage]], 
                'agent_scratchpad': typing.List[... again all messages ...]
    }, 
    messages=[
        SystemMessagePromptTemplate(
            prompt=PromptTemplate(
                    input_variables=[], 
                    template='You are a helpful assistant'
                )
        ), 
        MessagesPlaceholder(
            variable_name='chat_history', 
            optional=True
        ), 
        HumanMessagePromptTemplate(
            prompt=PromptTemplate(
                    input_variables=['input'], 
                    template='{input}'
                )
        ), 
        MessagesPlaceholder(
            variable_name='agent_scratchpad'
        )
    ]

```
**Overall Structure**

This template creates a structured conversation flow with four main components arranged in order:

**System instructions**

- Chat history (optional)
- Current user input
- Agent scratchpad

**Key Components Explained**

1. agent_scratchpad
This is the "working memory" of the agent during tool usage. It contains the sequence of:
- Tool calls the agent makes
- Tool responses from those calls
- Agent reasoning between tool uses

Example flow:
```shell
User: "What's the weather in Paris?"
Agent: I'll check the weather for you.
[Tool Call: weather_api("Paris")]
[Tool Response: "22°C, sunny"]
Agent: The weather in Paris is 22°C and sunny.
```
The scratchpad captures this entire reasoning chain, allowing the agent to:

Track what tools it has used
- Remember tool results
- Make multi-step reasoning decisions
- Avoid repeating the same tool calls

2. chat_history
This maintains the conversation context across multiple turns. It stores the complete conversation history as a list of messages (Human, AI, System, etc.).
Purpose:
- Enables follow-up questions ("What about tomorrow?")
- Maintains context ("Paris" from previous question)
- Allows referencing earlier parts of conversation

3. Input Variables & Types
- input: The current user question/request
- agent_scratchpad: The tool usage history for current turn

4. Message Flow

- SystemMessagePromptTemplate: Sets the agent's role ("You are a helpful assistant")
- MessagesPlaceholder(chat_history): Injects previous conversation turns (marked optional=True)
- HumanMessagePromptTemplate: The current user input
- MessagesPlaceholder(agent_scratchpad): The agent's tool usage and reasoning for this turn

**Why This Structure?**

This template enables ReAct-style reasoning (Reasoning + Acting):

- Agent receives input
- Can use tools to gather information
- Reasons about tool results
- Provides final response
- All while maintaining conversation context

The scratchpad is essential because it allows the LLM to see its own "thought process" and tool interactions, enabling more sophisticated multi-step problem solving.

In [22]:
import os
from langchain_openai import ChatOpenAI
from langchain.agents import (
    create_openai_functions_agent,
    Tool,
    AgentExecutor,
)
from langchain import hub


HOSPITAL_AGENT_MODEL = os.getenv("HOSPITAL_AGENT_MODEL")

hospital_agent_prompt = hub.pull("hwchase17/openai-functions-agent")
hospital_agent_prompt

ChatPromptTemplate(input_variables=['agent_scratchpad', 'input'], input_types={'chat_history': typing.List[typing.Union[langchain_core.messages.ai.AIMessage, langchain_core.messages.human.HumanMessage, langchain_core.messages.chat.ChatMessage, langchain_core.messages.system.SystemMessage, langchain_core.messages.function.FunctionMessage, langchain_core.messages.tool.ToolMessage]], 'agent_scratchpad': typing.List[typing.Union[langchain_core.messages.ai.AIMessage, langchain_core.messages.human.HumanMessage, langchain_core.messages.chat.ChatMessage, langchain_core.messages.system.SystemMessage, langchain_core.messages.function.FunctionMessage, langchain_core.messages.tool.ToolMessage]]}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='You are a helpful assistant')), MessagesPlaceholder(variable_name='chat_history', optional=True), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['input'], template='{input}')), MessagesPlaceholder(

### The Tools

Your agent has four tools available to it: Experiences, Graph, Waits, and Availability. The Experiences and Graph tools call .invoke() from their respective chains, while Waits and Availability call the wait time functions you defined. Notice that many of the tool descriptions have few-shot prompts, telling the agent when it should use the tool and providing it with an example of what inputs to pass.

As with chains, good prompt engineering is crucial for your agent’s success. You have to clearly describe each tool and how to use it so that your agent isn’t confused by a query and choose the wrong tool.

In [25]:
tools = [
    Tool(
        name="Experiences",
        func=reviews_vector_chain.invoke,
        description="""Useful when you need to answer questions
        about patient experiences, feelings, or any other qualitative
        question that could be answered about a patient using semantic
        search. Not useful for answering objective questions that involve
        counting, percentages, aggregations, or listing facts. Use the
        entire prompt as input to the tool. For instance, if the prompt is
        "Are patients satisfied with their care?", the input should be
        "Are patients satisfied with their care?".
        """,
    ),
    Tool(
        name="Graph",
        func=hospital_cypher_chain.invoke,
        description="""Useful for answering questions about patients,
        physicians, hospitals, insurance payers, patient review
        statistics, and hospital visit details. Use the entire prompt as
        input to the tool. For instance, if the prompt is "How many visits
        have there been?", the input should be "How many visits have
        there been?".
        """,
    ),
    Tool(
        name="Waits",
        func=get_current_wait_times,
        description="""Use when asked about current wait times
        at a specific hospital. This tool can only get the current
        wait time at a hospital and does not have any information about
        aggregate or historical wait times. Do not pass the word "hospital"
        as input, only the hospital name itself. For example, if the prompt
        is "What is the current wait time at Jordan Inc Hospital?", the
        input should be "Jordan Inc".
        """,
    ),
    Tool(
        name="Availability",
        func=get_most_available_hospital,
        description="""
        Use when you need to find out which hospital has the shortest
        wait time. This tool does not have any information about aggregate
        or historical wait times. This tool returns a dictionary with the
        hospital name as the key and the wait time in minutes as the value.
        """,
    ),
]

### Instantiate the agent
"Always two, there are: the agent and the executor"

with create_openai_functions_agent(). This creates an agent that’s been designed by OpenAI to pass inputs to functions. It does this by returning JSON objects that store function inputs and their corresponding value.

To create the agent run time, you pass your agent and tools into AgentExecutor. Setting return_intermediate_steps and verbose to true allows you to see the agent’s thought process and the tools it calls.

In [26]:
chat_model = ChatOpenAI(
    model=HOSPITAL_AGENT_MODEL,
    temperature=0,
)

hospital_rag_agent = create_openai_functions_agent(
    llm=chat_model,
    prompt=hospital_agent_prompt,
    tools=tools,
)

hospital_rag_agent_executor = AgentExecutor(
    agent=hospital_rag_agent,
    tools=tools,
    return_intermediate_steps=True,
    verbose=True,
)

In [31]:
response = hospital_rag_agent_executor.invoke(
     {"input": "What is the wait time at Wallace-Hamilton?"}
 )

print(f'{response["input"]}=')
print(f'{response["output"]}=')
response["intermediate_steps"]



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Waits` with `Wallace-Hamilton`


[0m[38;5;200m[1;3m7 hours 40 minutes[0m[32;1m[1;3mThe current wait time at Wallace-Hamilton is 7 hours and 40 minutes.[0m

[1m> Finished chain.[0m
What is the wait time at Wallace-Hamilton?=
The current wait time at Wallace-Hamilton is 7 hours and 40 minutes.=


[(AgentActionMessageLog(tool='Waits', tool_input='Wallace-Hamilton', log='\nInvoking: `Waits` with `Wallace-Hamilton`\n\n\n', message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"__arg1":"Wallace-Hamilton"}', 'name': 'Waits'}})]),
  '7 hours 40 minutes')]

In [32]:
response = hospital_rag_agent_executor.invoke(
     {"input": "Which hospital has the shortest wait time?"}
 )

print(f'{response["input"]}=')
print(f'{response["output"]}=')
response["intermediate_steps"]



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Availability` with `shortest wait time`


[0m[36;1m[1;3m{'boyd plc': 37}[0m[32;1m[1;3mThe hospital with the shortest wait time is Boyd PLC, with a wait time of 37 minutes.[0m

[1m> Finished chain.[0m
Which hospital has the shortest wait time?=
The hospital with the shortest wait time is Boyd PLC, with a wait time of 37 minutes.=


[(AgentActionMessageLog(tool='Availability', tool_input='shortest wait time', log='\nInvoking: `Availability` with `shortest wait time`\n\n\n', message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"__arg1":"shortest wait time"}', 'name': 'Availability'}})]),
  {'boyd plc': 37})]

In [33]:
response = hospital_rag_agent_executor.invoke(
     {"input": "What have patients said about their quality of rest during their stay?"}
 )

print(f'{response["input"]}=')
print(f'{response["output"]}=')
response["intermediate_steps"]



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Experiences` with `quality of rest during their stay`


[0m[36;1m[1;3m{'query': 'quality of rest during their stay', 'result': "Monique Foster at Pearson LLC mentioned that the hospital's cleanliness and hygiene standards were commendable. However, she found it challenging to get a good night's sleep due to constant interruptions for routine checks. Mrs. Kelly Berry DVM at Vaughn PLC also faced difficulty with rest due to higher-than-expected noise levels in the ward. Jesse Tucker at Wallace-Hamilton had a similar experience with constant interruptions during the night affecting their ability to rest. David Kim, also at Wallace-Hamilton, mentioned that while the hospital staff was accommodating, the disruptive noise levels in the hallway affected his ability to rest. Overall, it seems that the quality of rest during their stay was impacted by various factors such as noise levels and interruptions for routine ch

[(AgentActionMessageLog(tool='Experiences', tool_input='quality of rest during their stay', log='\nInvoking: `Experiences` with `quality of rest during their stay`\n\n\n', message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"__arg1":"quality of rest during their stay"}', 'name': 'Experiences'}})]),
  {'query': 'quality of rest during their stay',
   'result': "Monique Foster at Pearson LLC mentioned that the hospital's cleanliness and hygiene standards were commendable. However, she found it challenging to get a good night's sleep due to constant interruptions for routine checks. Mrs. Kelly Berry DVM at Vaughn PLC also faced difficulty with rest due to higher-than-expected noise levels in the ward. Jesse Tucker at Wallace-Hamilton had a similar experience with constant interruptions during the night affecting their ability to rest. David Kim, also at Wallace-Hamilton, mentioned that while the hospital staff was accommodating, the disruptive noise leve