# Conversational Threads

Many LLM applications have a chatbot-like interface in which the user and the LLM application engage in a multi-turn conversation. In order to track these conversations, you can use the Threads feature in LangSmith.

This is relevant to our RAG application, which should maintain context from prior conversations with users.

### Setup

In [2]:
from dotenv import load_dotenv
load_dotenv()  

import os
print("LangSmith Key Set:", os.getenv("LANGCHAIN_API_KEY") is not None)


LangSmith Key Set: True


### Group traces into threads


A Thread is a sequence of traces representing a single conversation. Each response is represented as its own trace, but these traces are linked together by being part of the same thread.

To associate traces together, you need to pass in a special metadata key where the value is the unique identifier for that thread.

The key value is the unique identifier for that conversation. The key name should be one of:

- session_id
- thread_id
- conversation_id.

The value should be a UUID.

In [5]:
import uuid
thread_id = uuid.uuid4()

In [None]:
from langsmith import traceable
from langchain_groq import ChatGroq
from typing import List
import nest_asyncio
from utils import get_vector_db_retriever

# Initialize Groq client with your model
groq_client = ChatGroq(model="llama-3.3-70b-versatile", temperature=0.0)

nest_asyncio.apply()
retriever = get_vector_db_retriever()

@traceable(run_type="chain")
def retrieve_documents(question: str):
    return retriever.invoke(question)

@traceable(run_type="chain")
def generate_response(question: str, documents):
    formatted_docs = "\n\n".join(doc.page_content for doc in documents)
    rag_system_prompt = """You are an assistant for question-answering tasks. 
Use the following pieces of retrieved context to answer the latest question in the conversation. 
If you don't know the answer, just say that you don't know. 
Use three sentences maximum and keep the answer concise.
"""
    messages = [
        {
            "role": "system",
            "content": rag_system_prompt
        },
        {
            "role": "user",
            "content": f"Context: {formatted_docs} \n\n Question: {question}"
        }
    ]
    return call_groq(messages)

@traceable(run_type="llm")
def call_groq(
    messages: List[dict]
):
    # Groq client expects messages passed as `input`
    response = groq_client.invoke(input=messages)
    return response

@traceable(run_type="chain")
def langsmith_rag(question: str):
    documents = retrieve_documents(question)
    response = generate_response(question, documents)
    # Groq returns an AIMessage-like object, access content directly
    return response.content


### Now let's run our application twice with this thread_id

In [10]:
import uuid
thread_id = str(uuid.uuid4())

In [11]:
question = "What is 4+5?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"thread_id": thread_id}})
print(ai_answer)

The answer to 4+5 is 9. I don't see any relevant information in the provided context to help with this question. The context appears to be related to a tracing hierarchy and logging, which is unrelated to basic arithmetic.


In [14]:
question = "What is the weather in Tangier?"
ai_answer = langsmith_rag(question, langsmith_extra={"metadata": {"thread_id": thread_id}})
print(ai_answer)

It's 90 degrees and sunny. 
This answer is based on a pre-existing dataset of weather information. 
The dataset contains answers to specific questions, including the weather in Tangier.


### Let's take a look in LangSmith!