# Step 1: Get Familiar With LangChain
You  instantiate a ChatOpenAI model using GPT 3.5 Turbo as the base LLM, and you set temperature to 0. 

OpenAI offers a diversity of models with varying price points, capabilities, and performances. GPT 3.5 turbo is a great model to start with because it performs well in many use cases and is cheaper than more recent models like GPT 4 and beyond.

---
> ⚠️ **Note:** It’s a common misconception that setting temperature=0 guarantees deterministic responses from GPT models. While responses are closer to deterministic when temperature=0, there’s no guarantee that you’ll get the same response for identical requests. Because of this, GPT models might output slightly different results than what you see in the examples throughout this tutorial.
---

## Simple query with system and human messages

In [4]:
import dotenv
from langchain_openai import ChatOpenAI

dotenv.load_dotenv()

chat_model = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)

from langchain.schema.messages import HumanMessage, SystemMessage
from langchain_intro.chatbot import chat_model

messages = [
     SystemMessage(
         content="""You're an assistant knowledgeable about
         healthcare. Only answer healthcare-related questions."""
     ),
     HumanMessage(content="What is Medicaid managed care?"),
 ]
chat_model.invoke(messages).content

'Medicaid managed care is a system in which states contract with managed care organizations (MCOs) to provide healthcare services to Medicaid beneficiaries. These MCOs are responsible for coordinating and delivering healthcare services to enrollees in exchange for a fixed monthly payment per enrollee. Medicaid managed care aims to improve access to care, enhance quality, and control costs for Medicaid beneficiaries.'

## CHAINS
### Prompt templates


In [5]:
from langchain.prompts import (
    PromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    ChatPromptTemplate
)

review_system_template_str = """Your job is to use patient
reviews to answer questions about their experience at a
hospital. Use the following context to answer questions.
Be as detailed as possible, but don't make up any information
that's not from the context. If you don't know an answer, say
you don't know.

Patient reviews:

{context}
"""

review_system_prompt = SystemMessagePromptTemplate(
    prompt=PromptTemplate.from_template(review_system_template_str)
)
print (review_system_prompt)

review_human_prompt = HumanMessagePromptTemplate(
    prompt=PromptTemplate.from_template("{question}")
)
print(review_human_prompt)


messages = [review_system_prompt, review_human_prompt]
review_chat_prompt_template = ChatPromptTemplate.from_messages(messages)
review_chat_prompt_template

prompt=PromptTemplate(input_variables=['context'], template="Your job is to use patient\nreviews to answer questions about their experience at a\nhospital. Use the following context to answer questions.\nBe as detailed as possible, but don't make up any information\nthat's not from the context. If you don't know an answer, say\nyou don't know.\n\nPatient reviews:\n\n{context}\n")
prompt=PromptTemplate(input_variables=['question'], template='{question}')


ChatPromptTemplate(input_variables=['context', 'question'], messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], template="Your job is to use patient\nreviews to answer questions about their experience at a\nhospital. Use the following context to answer questions.\nBe as detailed as possible, but don't make up any information\nthat's not from the context. If you don't know an answer, say\nyou don't know.\n\nPatient reviews:\n\n{context}\n")), HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['question'], template='{question}'))])

### Chains and LangChain Expression Language (LCEL)
The glue that connects chat models, prompts, and other objects in LangChain is the chain. A chain is nothing more than a sequence of calls between objects in LangChain. The recommended way to build chains is to use the LangChain Expression Language (LCEL).

In [6]:
# define the chain
review_chain = review_chat_prompt_template | chat_model

context = "I had a great stay!"
question = "Did the patient have a positive experience?"
# invoke it passing the parameters of the template for the first step
print(review_chain.invoke({"context": context, "question": question}))


content='Yes, the patient had a positive experience during their stay at the hospital.'


In general, the LCEL allows you to create arbitrary-length chains with the pipe symbol (|). For instance, if you wanted to format the model’s response, then you could add an output parser to the chain:

In [7]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

review_chain = review_chat_prompt_template | chat_model | output_parser

context = "I had a great stay!"
question = "Did the patient have a positive experience?"
# invoke it passing the parameters of the template for the first step
print(review_chain.invoke({"context": context, "question": question}))



Yes, the patient had a positive experience during their stay at the hospital.


## Retrieval-augmented generation
The goal of our review_chain is to answer questions about patient experiences in the hospital from their reviews. So far, we manually passed reviews in as context for the question. While this can work for a small number of reviews, it doesn’t scale well. Moreover, even if you can fit all reviews into the model’s context window, there’s no guarantee it will use the correct reviews when answering a question.

To overcome this, you need a retriever. The process of retrieving relevant documents and passing them to a language model to answer questions is known as retrieval-augmented generation (RAG).

For this example, you’ll store all the reviews in a vector database called ChromaDB. If you’re unfamiliar with this database tool and topics, then check out [Embeddings and Vector Databases with ChromaDB](https://realpython.com/chromadb-vector-database/) before continuing.

In our example we will get the reviews that we have in a csv/"human format" and *embed* it on the vector database.
This process implies converting the content of the document into a dense numerical representation called a vector. These vectors are lists of numbers and can have hundreds or thousands of dimensions. The main goal of creating embeddings for documents is to represent data like text, images, or audio in a numerical format that computational algorithms can more easily process. It allows you to compare documents to each other based on their content. Its the base of the LLM learning and also for the retrieval augmented generation:  In a RAG pipeline, when a user submits a query, that query is also converted into an embedding vector, and a similarity search is performed in the vector database to find the most relevant document embeddings. The actual documents associated with these relevant embeddings are then retrieved. These retrieved documents serve as context for a language model to help it generate a more accurate and informed response that goes beyond its original training data

---
> ⚠️ **Note:** In practice, if you’re embedding a large document, you should use a text splitter. Text splitters break the document into smaller chunks before running them through an embedding model. This is important because embedding models have a fixed-size context window, and as the size of the text grows, an embedding’s ability to accurately represent the text decreases. For this example, you can embed each review individually because they’re relatively small.
---

### feed the vector database
- we'll use langchain own CSVLoader
- and its vector database Chroma connector
- we need an empty folder called chroma_data at the level of the notebook

Keep in mind that as many times we run the "feeder" we add the data again: *we will duplicate* the review data embedded. Empty the chroma_data folder and restart the jupyter kernel to reload it 

In [8]:
import dotenv
from langchain.document_loaders.csv_loader import CSVLoader
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

REVIEWS_CSV_PATH = "data/reviews.csv"
REVIEWS_CHROMA_PATH = "chroma_data"

dotenv.load_dotenv()

loader = CSVLoader(file_path=REVIEWS_CSV_PATH, source_column="review")
reviews = loader.load()

reviews_vector_db = Chroma.from_documents(
    reviews, OpenAIEmbeddings(), persist_directory=REVIEWS_CHROMA_PATH
)

Now we have our relevant data transformed into vectors and stored in chroma, ready for RAG

Retrieval Augmented Generation (RAG) is a technique designed to enhance the capabilities of Large Language Models (LLMs). It addresses a key limitation of LLMs: their reliance on fixed training datasets, which can result in outdated or incomplete information.
RAG works by combining an LLM with an external knowledge base, such as a vector database. When a user submits a query, the RAG system performs the following steps:

- The system receives the user query.

- The query is converted into a vector representation (embedding).

- A similarity search is conducted in the external knowledge base (like a vector database) using the query embedding to find the most relevant information or documents 

- The most relevant data or documents are retrieved from the database.

- This retrieved information is then incorporated into the prompt that is sent to the LLM. This provides crucial context to the model [previous turns, 340, 354].

- The LLM processes both the original user query and the retrieved information.

- Finally, the LLM generates a response or takes an action based on the provided context and query.


RAG significantly enhances LLMs in several ways:

- It allows LLMs to access and utilize up-to-date information.

- It enables LLMs to provide domain-specific expertise or answer questions about data not included in their original training [previous turns, 71, 340, 356, 358, 490, 489].

- It helps reduce hallucination (generating false information) by grounding the LLM's responses in retrieved facts.

- It offers a cost-effective alternative to fine-tuning or retraining models for incorporating new knowledge.

- It leads to the generation of more accurate, informed, specific, and detailed responses that go beyond the LLM's pre-trained knowledge 

### Explaining RAG with NotebookLM sources
it is widely stated and confirmed that NotebookLM utilizes Retrieval-Augmented Generation (RAG) to manage and interact with the sources you provide.

It is a key aspect of NotebookLM because it allows grounding in your data: Unlike general LLMs that might pull information from their vast training data, RAG ensures that NotebookLM's responses are grounded in the specific documents, notes, web pages, or other sources you've uploaded. This dramatically reduces "hallucinations" (the AI making up facts) and makes the output more reliable and relevant to your context.

How it Works (Simplified):
1. Retrieval: When you ask a question or interact with NotebookLM, it first uses an information retrieval system to search through your uploaded sources. It finds the most relevant passages, chunks of text, or even specific quotes.
2. Augmentation: These retrieved snippets are then provided as additional context to the underlying Large Language Model (LLM), which is typically Gemini 1.5 Pro in NotebookLM's case.
3. Generation: The LLM then uses this specific, retrieved information, along with its general language understanding, to generate a coherent and accurate response.

RAG enables NotebookLM to effectively summarize documents, create study guides, identify key themes, and spark new ideas based on your provided material.
In essence, NotebookLM is designed as an "end-user customizable RAG product," allowing you to create a personalized AI expert based on your own knowledge base.

### semantic search in a vector database

In [9]:
import dotenv
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

REVIEWS_CHROMA_PATH = "chroma_data/"

dotenv.load_dotenv()

# connect to the vector database
reviews_vector_db = Chroma(
     persist_directory=REVIEWS_CHROMA_PATH,
     embedding_function=OpenAIEmbeddings(),
 )

# semantic search in the database itself. we look for similar and the three best (k=3)
question = """Has anyone complained about communication with the hospital staff?"""
relevant_docs = reviews_vector_db.similarity_search(question, k=3)

relevant_docs[0].page_content

'review_id: 73\nvisit_id: 7696\nreview: I had a frustrating experience at the hospital. The communication between the medical staff and me was unclear, leading to misunderstandings about my treatment plan. Improvement is needed in this area.\nphysician_name: Maria Thompson\nhospital_name: Little-Spencer\npatient_name: Terri Smith'

In [10]:
relevant_docs[1].page_content

'review_id: 521\nvisit_id: 631\nreview: I had a challenging time at the hospital. The medical care was adequate, but the lack of communication between the staff and me left me feeling frustrated and confused about my treatment plan.\nphysician_name: Samantha Mendez\nhospital_name: Richardson-Powell\npatient_name: Kurt Gordon'

### passing a vector database retriever to the chain
We now use this to pass as many reviews that match/are similar to the criteria we want. We do not do the search ourselves, we pass a retriever to the chain

- create a retriever pointed to our vector reviews db
- format the chat prompt
    - System prompt
    - Human prompt
- chat model
- create the chain


In [11]:
import dotenv
from langchain_openai import ChatOpenAI
from langchain.prompts import (
    PromptTemplate,
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate,
    ChatPromptTemplate,
)
from langchain_core.output_parsers import StrOutputParser
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.schema.runnable import RunnablePassthrough

REVIEWS_CHROMA_PATH = "chroma_data/"

dotenv.load_dotenv()

# retriever

reviews_vector_db = Chroma(
    persist_directory=REVIEWS_CHROMA_PATH,
    embedding_function=OpenAIEmbeddings()
)

reviews_retriever  = reviews_vector_db.as_retriever(k=10)

# system propmt. It will receive the reviews as {context}
review_system_template_str = """Your job is to use patient reviews to answer questions about their experience at a hospital. Use the following context to answer questions.
Be as detailed as possible, but don't make up any information that's not from the context. If you don't know an answer, say you don't know.
Patient reviews:
{context}
"""
review_system_prompt_template = SystemMessagePromptTemplate(
    prompt=PromptTemplate.from_template(review_system_template_str)
)

# human prompt. It will receive the question as {question}
review_human_prompt_template = HumanMessagePromptTemplate(
    prompt=PromptTemplate.from_template("{question}")
)

# compose the chat prompt with human and system
messages = [review_system_prompt_template, review_human_prompt_template]
review_chat_prompt_template = ChatPromptTemplate.from_messages(messages)

# the chat model
chat_model = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)

# the chain. We pass the reviews retriever as context. the question will be passed with the "runnablePassthrough" to the next chain without changes
normal_review_chain = review_chat_prompt_template | chat_model | StrOutputParser()
retriever_review_chain = (
    {"context": reviews_retriever, "question": RunnablePassthrough()}
    | review_chat_prompt_template
    | chat_model
    | StrOutputParser()
)

question = """Has anyone complained about
            communication with the hospital staff?"""
print(normal_review_chain.invoke({"question": question, "context": "I had a great stay!"}))
print(retriever_review_chain.invoke(question))

Based on the patient review provided ("I had a great stay!"), there is no indication of any complaints about communication with the hospital staff. The patient's experience seems to have been positive overall.
Yes, several patients have complained about communication with the hospital staff. For example, Terri Smith mentioned that the communication between the medical staff and her was unclear, leading to misunderstandings about her treatment plan. Ryan Jacobs also noted that the lack of communication from the staff created frustration during his stay at the hospital. Kurt Gordon expressed feeling frustrated and confused about his treatment plan due to the lack of communication between the staff and himself. Additionally, Michele Jones mentioned that the lack of communication about changes in her treatment plan created unnecessary stress.


In [12]:
print(retriever_review_chain.invoke("""Which doctors where more liked?"""))

Based on the patient reviews provided, it seems that both Dr. Karen Smith from Wallace-Hamilton and Dr. Matthew Marks from Little-Spencer were well-liked by the patients. Dr. Karen Smith was praised for being skilled, while Dr. Matthew Marks was appreciated for his positive attitude and going above and beyond to make the patient feel comfortable and informed about their treatment. Both physicians received positive feedback from the patients they treated.


## Agents
An agent is a system designed to use an LLM as a reasoning engine to parse whats asked of it and determine which actions to take. They facilitate the interaction between language models and various tools, acting as intermediaries, enabling seamless communication and task execution by leveraging the capabilities of language models.
A simple one could be one that decide to look into the pacient reviews or current hospital waiting times (via an API call for example) depending on what's asked.
### Fake API/Function to return waiting times
In get_current_wait_time(), you pass in a hospital name, check if it’s valid, and then generate a random number to simulate a wait time. In reality, this would be some sort of database query or API call


In [13]:
import random
import time

def get_current_wait_time(hospital: str) -> int | str:
    """Dummy function to generate fake wait times"""

    if hospital not in ["A", "B", "C", "D"]:
        return f"Hospital {hospital} does not exist"

    # Simulate API call delay
    time.sleep(1)

    return random.randint(0, 10000)

 ### Tools and Hubs
 A tool is essentially a capability or function that the language model (LLM), acting as the agent's reasoning engine, can decide to use to perform specific actions. Tools enable agents to go beyond just generating text and interact with the external world or perform tasks the LLM cannot do inherently via different apis or hooks. The LLM within the agent uses its reasoning capabilities to determine which tool to use and what input is necessary for that tool. This process is often referred to as tool-calling

 A tool can:
- Calling external APIs.
- Querying databases (like vector databases or SQL databases) dynamically.
- Interacting with external services (such as a search engine like Tavily, which is shown as a tool example).
- Fetching specific data or information (useful for RAG).
- Running custom code or logic.
- Looking up information or performing calculations.

The tool is defined by 
- a name
- the function it calls 
- a description of what it does so the LLM will know when and how to use it

Notice how description gives the agent instructions as to when it should call the tool. This is where good prompt engineering skills are paramount to ensuring the LLM calls the correct tool with the correct inputs.

Langchain Hub: hosted in https://github.com/hwchase17/langchain-hub and now in https://smith.langchain.com/hub, this is a collection of all artifacts useful for working with LangChain primitives such as prompts, chains and agents. The goal of this repository is to be a central resource for sharing and discovering high quality prompts, chains and agents that combine together to form complex LLM applications. 
In our case we will use the [openai-functions-agent](https://smith.langchain.com/hub/hwchase17/openai-functions-agent)

In [17]:

from langchain_core.output_parsers import StrOutputParser
from langchain.schema.runnable import RunnablePassthrough
from langchain.agents import (
    create_openai_functions_agent,
    Tool,
    AgentExecutor,
)
from langchain import hub

# we use our retriever_review_chain we defined in the previous cell
# for the review related queries
retriever_review_chain = (
    {"context": reviews_retriever, "question": RunnablePassthrough()}
    | review_chat_prompt_template
    | chat_model
    | StrOutputParser()
)

# tool definitions: one to answer review questions and one to get wait times
tools = [
    Tool(
        name="Reviews",
        func=retriever_review_chain.invoke,
        description="""Useful when you need to answer questions about patient reviews or experiences at the hospital.
        Not useful for answering questions about specific visit details such as payer, billing, treatment, diagnosis,
        chief complaint, hospital, or physician information. 
        Pass the entire question as input to the tool. For instance, if the question is "What do patients think about the triage system?",
        the input should be "What do patients think about the triage system?"
        """,
    ),
    Tool(
        name="Waits",
        func=get_current_wait_time,
        description="""Use when asked about current wait times at a specific hospital. This tool can only get the current
        wait time at a hospital and does not have any information about aggregate or historical wait times. This tool returns wait times in
        minutes. Do not pass the word "hospital" as input, only the hospital name itself. For instance, if the question is
        "What is the wait time at hospital A?", the input should be "A".
        """,
    ),
]

hospital_agent_prompt = hub.pull("hwchase17/openai-functions-agent") # which is simply {"system":"You are a helpful assistant", "human":"{input}"}

agent_chat_model = ChatOpenAI(model="gpt-3.5-turbo-1106",temperature=0)

# create the agent with our tool list
hospital_agent = create_openai_functions_agent(
    llm=agent_chat_model,
    prompt=hospital_agent_prompt,
    tools=tools,
)

#This is the runtime for the agent
# Setting return_intermediate_steps and verbose to True 
# will allow you to see the agent’s thought process and the tools it calls.
hospital_agent_executor = AgentExecutor(
    agent=hospital_agent,
    tools=tools,
    return_intermediate_steps=True,
    verbose=True,
)

In [15]:
hospital_agent_executor.invoke(
     {"input": "What is the current wait time at hospital C?"}
 )



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Waits` with `C`


[0m[33;1m[1;3m1284[0m[32;1m[1;3mThe current wait time at hospital C is 1284 minutes.[0m

[1m> Finished chain.[0m


{'input': 'What is the current wait time at hospital C?',
 'output': 'The current wait time at hospital C is 1284 minutes.',
 'intermediate_steps': [(AgentActionMessageLog(tool='Waits', tool_input='C', log='\nInvoking: `Waits` with `C`\n\n\n', message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"__arg1":"C"}', 'name': 'Waits'}})]),
   1284)]}

In [18]:
hospital_agent_executor.invoke(
     {"input": "What have patients said about their comfort at the hospital?"}
)



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m
Invoking: `Reviews` with `What have patients said about their comfort at the hospital?`


[0m[36;1m[1;3mPatients have mentioned both positive and negative aspects of their comfort at the hospital. One patient appreciated the hospital's dedication to patient comfort, mentioning well-designed private rooms and comfortable furnishings that made their recovery more bearable and contributed to an overall positive experience. However, other patients have complained about uncomfortable beds affecting their ability to get a good night's sleep during their stay.[0m[32;1m[1;3mPatients have mentioned both positive and negative aspects of their comfort at the hospital. One patient appreciated the hospital's dedication to patient comfort, mentioning well-designed private rooms and comfortable furnishings that made their recovery more bearable and contributed to an overall positive experience. However, other patients have complained

{'input': 'What have patients said about their comfort at the hospital?',
 'output': "Patients have mentioned both positive and negative aspects of their comfort at the hospital. One patient appreciated the hospital's dedication to patient comfort, mentioning well-designed private rooms and comfortable furnishings that made their recovery more bearable and contributed to an overall positive experience. However, other patients have complained about uncomfortable beds affecting their ability to get a good night's sleep during their stay.",
 'intermediate_steps': [(AgentActionMessageLog(tool='Reviews', tool_input='What have patients said about their comfort at the hospital?', log='\nInvoking: `Reviews` with `What have patients said about their comfort at the hospital?`\n\n\n', message_log=[AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{"__arg1":"What have patients said about their comfort at the hospital?"}', 'name': 'Reviews'}})]),
   "Patients have mentioned b