# LangSmith and Evaluation Overview with AI Makerspace

Today we'll be looking at an amazing tool:

[LangSmith](https://docs.smith.langchain.com/)!

This tool will help us monitor, test, debug, and evaluate our LangChain applications - and more!

## Task 1: Dependencies and Env Variables

In [1]:
!pip install langchain_core langchain_huggingface langchain_openai langchain_community langchain-qdrant qdrant-client langsmith openai tiktoken cohere -qU

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m357.9/357.9 kB[0m [31m2.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m46.1/46.1 kB[0m [31m575.7 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m16.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m254.1/254.1 kB[0m [31m9.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m127.9/127.9 kB[0m [31m8.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m328.5/328.5 kB[0m [31m16.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.1/1.1 MB[0m [31m38.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m173.8/173.8 kB[0m [31m11.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━

Once again, we're going to provide our HUGGING FACE TOKEN as our `OPENAI_API_KEY` to allow LangChain to work with the endpoint as an OpenAI Endpoint!

In [2]:
import os
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your Hugging Face API Key:")

Enter your Hugging Face API Key:··········


In [3]:
HF_LLM_URL = "YOUR LLM URL HERE" + "/v1/"

In [4]:
os.environ["OPENAI_BASE_URL"] = HF_LLM_URL

## Task 2: Basic RAG Chain

Now we'll set up our basic RAG chain, first up we need a model!

### OpenAI Model (Pointed at the Hugging Face Inference Endpoint)


We'll use OpenAI's `NousResearch/Hermes-2-Pro-Llama-3-8B ` model to ensure we can use a stronger model for decent evaluation later!

Notice that we can tag our resources - this will help us be able to keep track of which resources were used where later on!

In [5]:
from langchain_openai.chat_models import ChatOpenAI

base_llm = ChatOpenAI(model="tgi", tags=["base_llm"])

#### Asyncio Bug Handling

This is necessary for Colab.

In [6]:
import nest_asyncio
nest_asyncio.apply()

### SiteMap Loader

We'll use a SiteMapLoader to scrape the LangChain blogs.

In [7]:
from langchain.document_loaders import SitemapLoader

documents = SitemapLoader(web_path="https://blog.langchain.dev/sitemap-posts.xml").load()

Fetching pages: 100%|##########| 225/225 [00:05<00:00, 38.25it/s]


In [8]:
documents[0].metadata["source"]

'https://blog.langchain.dev/customers-wordsmith/'

### RecursiveCharacterTextSplitter

We're going to use a relatively naive text splitting strategy today!

In [9]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

split_documents = RecursiveCharacterTextSplitter(
    chunk_size = 750,
    chunk_overlap = 20
).split_documents(documents)

In [10]:
len(split_documents)

3524

### Embeddings

We'll be using Snowflakes's `Snowflake/snowflake-arctic-embed-m ` model as our embedding model today!

In [14]:
HF_EMBED_URL = "YOUR EMBED MODEL URL HERE"

In [15]:
from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings

embedding_model = HuggingFaceEndpointEmbeddings(
    model=HF_EMBED_URL,
    task="feature-extraction",
    huggingfacehub_api_token=os.environ["OPENAI_API_KEY"],
)

### Qdrant VectorStore Retriever

Now we can use a Qdrant VectorStore to embed and store our documents and then convert it to a retriever so it can be used in our chain!

In [16]:
from langchain_community.vectorstores import Qdrant

for i in range(0, len(split_documents), 32):
  if i == 0:
    vectorstore = Qdrant.from_documents(
        split_documents[i:i+32],
        embedding_model,
        location=":memory:",
        collection_name="LangChain Blogs")
    continue
  vectorstore.add_documents(split_documents[i:i+32])

In [17]:
base_retriever = vectorstore.as_retriever(addition_kwargs={"k" : 5})

### Prompt Template

All we have left is a prompt template, which we'll create here!

In [18]:
from langchain.prompts import ChatPromptTemplate

base_rag_prompt_template = """\
Using the provided context, please answer the user's question. If you don't know the answer based on the context, say you don't know.

Context:
{context}

Question:
{question}
"""

base_rag_prompt = ChatPromptTemplate.from_template(base_rag_prompt_template)

### LCEL Chain

Now that we have:

- Embeddings Model
- Generation Model
- Retriever
- Prompt

We're ready to build our LCEL chain!

Keep in mind that we're returning our source documents with our queries - while this isn't necessary, it's a great thing to get into the habit of doing.

In [19]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough, RunnableParallel
from langchain.schema import StrOutputParser

base_rag_chain = (
    # INVOKE CHAIN WITH: {"question" : "<<SOME USER QUESTION>>"}
    # "question" : populated by getting the value of the "question" key
    # "context"  : populated by getting the value of the "question" key and chaining it into the base_retriever
    {"context": itemgetter("question") | base_retriever, "question": itemgetter("question")}
    # "context"  : is assigned to a RunnablePassthrough object (will not be called or considered in the next step)
    #              by getting the value of the "context" key from the previous step
    | RunnablePassthrough.assign(context=itemgetter("context"))
    # "response" : the "context" and "question" values are used to format our prompt object and then piped
    #              into the LLM and stored in a key called "response"
    # "context"  : populated by getting the value of the "context" key from the previous step
    | {"response": base_rag_prompt | base_llm | StrOutputParser(), "context": itemgetter("context")}
)

Let's test it out!

In [20]:
base_rag_chain.invoke({"question" : "What is a good way to evaluate agents?"})["response"]

"Based on the provided context, it doesn't directly provide a good way to evaluate agents. However, one of the documents mentions evaluating models and selecting the best performing evaluator for each model. This may indirectly suggest that evaluating agents can be done by comparing their performance and selecting the best one. For more specific information on evaluating agents, it would be best to look for resources outside of the provided context."

## Task 3: Setting Up LangSmith

Now that we have a chain - we're ready to get started with LangSmith!

We're going to go ahead and use the following `env` variables to get our Colab notebook set up to start reporting.

If all you needed was simple monitoring - this is all you would need to do!

In [21]:
from uuid import uuid4

unique_id = uuid4().hex[0:8]

os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_PROJECT"] = f"LangSmith AIMS@TMLS- {unique_id}"

### LangSmith API

In order to use LangSmith - you will need a key!

Join [here](https://www.langchain.com/langsmith)!

In [22]:
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass('Enter your LangSmith API key: ')

Enter your LangSmith API key: ··········


Let's test our our first generation!

In [23]:
base_rag_chain.invoke({"question" : "What is LangSmith?"}, {"tags" : ["Demo Run"]})['response']

'LangSmith is a tool that helps with the full workflow, particularly in the process of fine-tuning open-source LLMs (Large Language Models). It is also used for production monitoring and automations, as mentioned in another blog post. Users can benefit from its efficiency, and it can be tried for free by signing up on the LangSmith website. The tool has been praised for its integration with LangChain in various pipelines.'

### Evaluation

We can now set up a simple "LLM-As-A-Judge" style evaluation.

In essence, this process can be boiled down to two steps:

1. Generate an output with your LCEL Chain
2. Prompt an LLM to evaluate it against some defined metric.

In [26]:
response = base_rag_chain.invoke({"question" : "Why is LangSmith a good tool for evaluation?"}, {"tags" : ["Demo Run"]})['response']

In [27]:
response

'LangSmith is a good tool for evaluation because it helps streamline and optimize the process, making it more efficient. As mentioned in the context, it can be used for tasks like pairwise evaluations and aligning LLMs as a judge with human preferences. Additionally, it integrates well with the LangChain pipeline, providing a seamless workflow experience. The tool has been recommended by users who already use LangChain in their pipeline, showcasing its effectiveness and usefulness.'

### Evaluation Chain

Now we can construct a simple chain that will take the response as input - and output a "Y" or a "N".

We'll start with the prompt.

In [28]:
EVAL_PROMPT = """\
Given a question and a response - you must indicate if the question is fully answered by the response or not.

You can indicate a response fully answered the question by saying "Y".
You can indicate a response not fully answered the question by saying "N".

Question: {question}
Response: {response}
"""

eval_prompt = ChatPromptTemplate.from_template(EVAL_PROMPT)

Now, let's add the LLM to our chain and call it with the original question - and the generated response from above!

In [29]:
eval_chain = eval_prompt | base_llm | StrOutputParser()

In [32]:
eval_chain.invoke({
    "question" : "Why is LangSmith a good tool for evaluation?",
    "response" : response
})

'Y'