<a href="https://colab.research.google.com/github/adeshmukh/gaiip/blob/main/notebooks/Intro_to_Observability_LangSmith_Phoenix_LangFuse.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Intro to Observability of Gen AI applications

This notebook introduces the notion of observability for Gen AI appliations that use the LangChain framework. The use of popular observability frameworks like LangSmith, Arize Phoenix, and LangFuse is illustrated in this notebook.

## Setup

In [None]:
%%capture

!pip install langchain              # Base LangChain
!pip install langchain-core         # Core LangChain components
!pip install langchain-community    # Extenteded list of LangChain components
!pip install langchain-openai       # LangChain bindings for OpenAI LLM
!pip install langchain-chroma       # LangChain bindings for ChromaDB
!pip install langchainhub           # Hub for prompt templates

!pip install langsmith              # LangSmith for observability
!pip install arize-phoenix          # Phoenix for observability
!pip install langfuse               # LangFuse for observability

!pip install openai                 # OpenAI LLM provider
!pip install google-search-results  # Serp API web search provider
!pip install pypdf                  # Parse PDF docs
!pip install chromadb               # Vector DB

!pip install tiktoken               # Tokenizer for OpenAI LLM
!pip install nest-asyncio           # Async IO in notebook environments

Initialize secrets as environment variables.

In [None]:
from google.colab import userdata
import os

# API keys we will need for this demo.
api_key_names = [
    'LANGCHAIN_API_KEY',
    'OPENAI_API_KEY',
    'SERPAPI_API_KEY',
    'LANGFUSE_SECRET_KEY',  # Optional, only needed for the LangFuse portion.
  ]

for api_key in api_key_names:
  os.environ[api_key]=userdata.get(api_key)

## 1. LangSmith

First, enable tracing and provide the tracing endpoint.

In [None]:
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'

### Automatic instrumentation


In this example, we are using LangChain components without explicitly using any langsmith functions/decorators. This example illustrates that standard LangChain components are automatically observable with LangSmith once the environment variables are configured appropriately.

In [None]:
from langchain_openai import ChatOpenAI

llm = ChatOpenAI()
llm.invoke("Hello, world!")

AIMessage(content='Hello there! How can I assist you today?', response_metadata={'token_usage': {'completion_tokens': 10, 'prompt_tokens': 11, 'total_tokens': 21}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-ddcfbefa-94e1-49d4-ba8c-48ab351bacbf-0', usage_metadata={'input_tokens': 11, 'output_tokens': 10, 'total_tokens': 21})

After running the above code cell, check the LangSmith dashboard for a corresponding trace.

*NOTE: If you don't see a trace in LangSmith, then something's not right with your setup. Pause here and fix any issues before proceeding with the rest of the notebook.*

### Instrumenting with LangSmith SDK & Decorator

Tracing SDK enables wrapping non-langchain components to make them observable.

Tracing decorator enables tracing for any application method.

In [None]:
import openai
from langsmith.wrappers import wrap_openai
from langsmith import traceable

# Auto-trace LLM calls in-context
client = wrap_openai(openai.Client())

@traceable # Auto-trace this function
def llm_chat(user_input: str):
    result = client.chat.completions.create(
        messages=[{"role": "user", "content": user_input}],
        model="gpt-3.5-turbo"
    )
    return result.choices[0].message.content

countries = ['United States', 'Denmark', 'Egypt', 'China', 'India', 'Vietnam', 'Australia']

for country in countries:
  print(llm_chat(f'What is the capital of {country}?'))

The capital of the United States is Washington, D.C.
Copenhagen
The capital of Egypt is Cairo.
The capital of China is Beijing.
The capital of India is New Delhi.
The capital of Vietnam is Hanoi.
The capital of Australia is Canberra.


Once again, check the LangSmith dashboard and observe the traces generated for each of the calls to `llm_chat` in the loop. So, in total you should observe `7` calls corresponding to the `7` items in the countries list above.

*Q: Was the output from the LLM what you expected? If not, you can try adjusting the prompt or experiment with the prompt here and observe the LangSmith dashboard.*

### ReAct Agent

A ReAct agent operates in a Think-Observe-Action loop until it achieves it's goal of getting a satisfactory answer to the original question or until it exhausts maximum number of tries.

We will construct a simply ReAct agent that uses 3 tools:
1. **Retriever**: To search specific documents that we indexed in an in-memory Vector DB.
2. **Search**: To search current information on the web.
3. **Calculator**: To evaluate math expressions.

#### Initialize Vector DB used for the Retriever tool

For this demo, we will fetch PDF files that contain information pertaining to the US National Budget for the fiscal year 2024. This will be loaded into a Vector DB (Chroma) and made available to the agent as a LangChain tool.

First, we will clone the repo that contains the PDFs.

In [None]:
%%capture

!rm -rf ./repo
!git clone --depth 1 https://github.com/adeshmukh/gaiip.git ./repo

Now we load the PDFs as chunks of text using PyPDFLoader and store it in an in-memory list.

In [None]:
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
import os
import logging

# Ignore pypdf warnings
logging.getLogger('pypdf').setLevel(logging.ERROR)

base_path = './repo/pdfs'
pdf_documents = os.listdir(base_path)
docs = []

# Load docs as Chroma Document objects
for pdf_document in pdf_documents:
    pdf_loader = PyPDFLoader(f'{base_path}/{pdf_document}')
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
    pages = pdf_loader.load_and_split(text_splitter=splitter)
    docs.extend(pages)
    print(f'Loaded {len(pages):2} pages from {pdf_document}')

Loaded  7 pages from us-national-budget-civil-fy24.pdf
Loaded  7 pages from us-national-budget-treasury-fy24.pdf
Loaded  7 pages from us-national-budget-nsf-fy24.pdf
Loaded 31 pages from us-national-budget-health-fy24.pdf
Loaded  6 pages from us-national-budget-ssa-fy24.pdf
Loaded  6 pages from us-national-budget-nasa-fy24.pdf


Now we will initialize the Vector DB (Chroma) with the list of text chunks that we loaded previously.

Then we will create a LangChain tool that wraps the Vector DB. The tool is an adapter that makes it easy to plug it in the ReAct agent.

In [None]:
%%capture

from langchain.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

embedding_function = OpenAIEmbeddings(model='text-embedding-3-large')

# Initialize the vector DB with the docs
vector_db = Chroma.from_documents(docs, embedding_function)

# Create a retriever adapter for this vector db
retriever = vector_db.as_retriever(search_type='mmr')

Let's try using the retriever as a standalone utility, i.e. without integrating it in a ReAct agent.

In [None]:
# Try querying the retriever to see how it functions
hits = retriever.invoke('What is the total requested budget for NASA and NSF?')

for hit in hits[:2]:  # Peek at the top 3 hits
    print('-------------------------------------------')
    print(hit.page_content, hit.metadata, sep='\n\n')

-------------------------------------------
127NATIONAL SCIENCE FOUNDATIONThe National Science Foundation (NSF) is responsible for science education and promoting the 
progress of science and innovation.  The President’s 2024 Budget for NSF advances the goals 
of the CHIPS and Science Act by:  strengthening U.S. leadership in emerging technologies; 
expanding the science, technology, engineering, and mathematics (STEM) workforce while 
advancing equity; boosting research and development (R&D) including research infrastructure; 
and combating the climate crisis. 
The Budget requests $11.3 billion in discretionary budget authority for 2024, a $1.8 billion or 
18.6-percent increase from the 2023 enacted level to support the CHIPS and Science Act.
The President’s 2024 Budget:  
• Ensures the Future Is Made in America.   NSF plays a key role in the CHIPS and Science 
Act with its focus on U.S. leadership in new technologies—from artiﬁcial intelligence to

{'page': 0, 'source': './repo/pdfs/

After running the above cell successfully, check the LangSmith dashboard for corresponding traces.

#### ReAct agent without Vector DB

First, we create a ReAct agent with just the `search` and `llm-math` tools and observe the output.

In [None]:
from langchain import hub
from langchain.agents import (
    create_react_agent,
    load_tools,
    AgentType,
    AgentExecutor,
)
from langchain.callbacks.tracers import ConsoleCallbackHandler

from langchain_openai import OpenAI

LLM_MODEL = 'gpt-3.5-turbo'

# More OpenAI models at: https://platform.openai.com/docs/models/
# LLM_MODEL = 'gpt-4-turbo'
# LLM_MODEL = 'gpt-4o'

# Initialize the LLM
llm = ChatOpenAI(
    model=LLM_MODEL,
    temperature=0,  # temperature = 0 results in fewer hallucinations
)

# Fetch the ReAct agent prompt template from LangChain hub.
prompt_template = hub.pull('hwchase17/react')

# Tools that agents can use to augment their capabilities
tools = load_tools(['serpapi', 'llm-math'], llm=llm)

In [None]:
# Let's take a quick look at the tools
print('\n'.join(str(t) for t in tools))

name='Search' description='A search engine. Useful for when you need to answer questions about current events. Input should be a search query.' func=<bound method SerpAPIWrapper.run of SerpAPIWrapper(search_engine=<class 'serpapi.google_search.GoogleSearch'>, params={'engine': 'google', 'google_domain': 'google.com', 'gl': 'us', 'hl': 'en'}, serpapi_api_key='b8eb91f1f26a99c8f90125ab13a19f076a39bc98e0ad32a1545c27d577b0f2b2', aiosession=None)> coroutine=<bound method SerpAPIWrapper.arun of SerpAPIWrapper(search_engine=<class 'serpapi.google_search.GoogleSearch'>, params={'engine': 'google', 'google_domain': 'google.com', 'gl': 'us', 'hl': 'en'}, serpapi_api_key='b8eb91f1f26a99c8f90125ab13a19f076a39bc98e0ad32a1545c27d577b0f2b2', aiosession=None)>
name='Calculator' description='Useful for when you need to answer questions about math.' func=<bound method Chain.run of LLMMathChain(llm_chain=LLMChain(prompt=PromptTemplate(input_variables=['question'], template='Translate a math problem into a

In [None]:
# Initialize the ReAct agent
react_agent = create_react_agent(llm, tools, prompt_template)
agent_executor = AgentExecutor.from_agent_and_tools(agent=react_agent, tools=tools)

In [None]:
query = """
What is the total requested budget for NASA and NSF for Fiscal Year 2024?
Important instructions:
1. You must not round the final answer.
2. Answer must be human-readable but brief.
"""

response = agent_executor.invoke(
    {'input': query},
    # config={"callbacks": [ConsoleCallbackHandler()]},  # Print events internal to the ReAct agent.
)
print(response['output'])

The total requested budget for NASA and NSF for Fiscal Year 2024 is $36.23 billion.


#### ReAct agent with retriever tool
Now we activate the retriever tool by adding it to the list of tools. Basically, we can now compare the difference in output with & without the use of the custom retriever tool.

In [None]:
from langchain.tools.retriever import create_retriever_tool

# Wrap the retriever in a LangChain "tool"
query_us_budget_fy2024 = create_retriever_tool(
    retriever,
    'us_fiscal_budget_fy2024',
    'Searches and returns excerpts from the US national budget for the fiscal year 2024.',
)

In [None]:
if query_us_budget_fy2024 not in tools:
  tools.insert(0, query_us_budget_fy2024)

# Let's take a quick look at the tools
print("\n".join(str(t) for t in tools))

name='us_fiscal_budget_fy2024' description='Searches and returns excerpts from the US national budget for the fiscal year 2024.' args_schema=<class 'langchain_core.tools.RetrieverInput'> func=functools.partial(<function _get_relevant_documents at 0x7a4e0e3975b0>, retriever=VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7a4e0ddd0280>, search_type='mmr'), document_prompt=PromptTemplate(input_variables=['page_content'], template='{page_content}'), document_separator='\n\n') coroutine=functools.partial(<function _aget_relevant_documents at 0x7a4e0e397760>, retriever=VectorStoreRetriever(tags=['Chroma', 'OpenAIEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x7a4e0ddd0280>, search_type='mmr'), document_prompt=PromptTemplate(input_variables=['page_content'], template='{page_content}'), document_separator='\n\n')
name='Search' description='A search engine. Useful for when you

Now we need to create the `agent` and `agent_executor` with this updated list of `tools`.

In [None]:
react_agent = create_react_agent(llm, tools, prompt_template)
agent_executor = AgentExecutor.from_agent_and_tools(agent=react_agent, tools=tools)

response = agent_executor.invoke(
    {'input': query},
    # config={"callbacks": [ConsoleCallbackHandler()]},
)
print(response['output'])

The total requested budget for NASA and NSF for Fiscal Year 2024 is $38.5 billion.


*Q: Are the answers the same? If not, which is the correct answer?*

*Q: What does the trace say about the agent execution?*

## 2. Phoenix

Phoenix is a free, open-source observability tool from Arize.

Arize also has a SaaS observability solution available as a paid option. The SaaS solution has more features and is perhaps a better fit for production systems than the open-source solution.

Launch the Phoenix server.

In [None]:
import nest_asyncio
import phoenix as px


nest_asyncio.apply()
session = px.launch_app()

🌍 To view the Phoenix app in your browser, visit https://zkwjimvu5fd3-496ff2e9c6d22116-6006-colab.googleusercontent.com/
📖 For more information on how to use Phoenix, check out https://docs.arize.com/phoenix


Instrument LangChain for Phoenix observability.


In [None]:
from phoenix.trace.langchain import LangChainInstrumentor

LangChainInstrumentor().instrument()

We will use the same agent components that were used in the previous example.

In [None]:
# Initialize the ReAct agent
# react_agent = create_react_agent(llm, tools, prompt)
# agent_executor = AgentExecutor.from_agent_and_tools(agent=react_agent, tools=tools)
response = agent_executor.invoke(
    {'input': query},
    # config={"callbacks": [ConsoleCallbackHandler()]},  # Print events internal to the ReAct agent.
)
print(response['output'])

The total requested budget for NASA and NSF for Fiscal Year 2024 is $38.5 billion.


## 3. LangFuse



In [None]:
os.environ['LANGFUSE_PUBLIC_KEY'] = 'pk-lf-66a42a10-d9b1-4e8b-94a5-21aaf0c587d3'
os.environ['LANGFUSE_HOST'] = "https://us.cloud.langfuse.com"

LangFuse uses the callback mechanism in LangChain to instrument the execution.

In [None]:
from langfuse.callback import CallbackHandler

langfuse_handler = CallbackHandler()

response = agent_executor.invoke(
    {'input': query},
    config={"callbacks": [langfuse_handler]},
)

print(response['output'])

The requested budget for NASA is $27.2 billion and for NSF is $11.3 billion for Fiscal Year 2024.
