## Chat with Me

Goal is to develop a straightforward application (w/o UI) that accepts a user's question, searches for relevant documents in my web page, feeds both the question and retrieved documents into a language model, and provides an answer. I use langchain for this purpose.

**Highlights**:
- Langsmith allows to trace basically everything and what is interesting for applications is that it collects data on **latency**, token **usage** and costs per request.
- It also breaks down the request into its constitutive parts including the augmented prompt. This meta data can probably help to optimize the splits and other parameters. I don't know yet limits of langsmith api 
- The usual roadblock for application is there, i.e. privacy and security. It'd be cool to try a local model, eg the new student llama, for the inference part

#### Setup the environment and tools

In [None]:
import ssl
print(ssl.OPENSSL_VERSION)
# libressl 2.6.4

> 🚫 under the current env I have LibraSSL and not OpenSSL. This leads to api connection error according to [openai](https://help.openai.com/en/articles/6897191-apiconnectionerror). I can install openSSL and recompile python but prefer to use Conda since python distribution is precompiled there. So I create the env with conda.

In [1]:
import ssl
print(ssl.OPENSSL_VERSION)

OpenSSL 3.3.2 3 Sep 2024


I was about to use Anthropic LLM but counldn't find an embedding model for it. It seems like langchain still uses `OpenAIEmbeddings` no matter what LLM is used. So if I have to anyway pay for the openai api, I'd rather use it for both LLM and embeddings. Let's check them out.

In [2]:
import getpass
import os

# set the openai api key from my environment variable
os.environ["OPENAI_API_KEY"] = getpass.getpass()

In [3]:
# check if ssl version is okay, try an openai embeddings
from langchain_openai import OpenAIEmbeddings  

embed = OpenAIEmbeddings(model="text-embedding-3-small")

input_text = "The meaning of life is 42"  
vector = embed.embed_query("hello")  
print(vector[:3])

[0.016751619055867195, -0.055799614638090134, 0.005647437181323767]


In [4]:
# quick test of the chat model, use gpt-4o-mini which is cost efficient
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
llm.invoke("Hello, world!")

AIMessage(content='Hello! How can I assist you today?', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 9, 'prompt_tokens': 11, 'total_tokens': 20, 'completion_tokens_details': {'audio_tokens': None, 'reasoning_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_f59a81427f', 'finish_reason': 'stop', 'logprobs': None}, id='run-c8b125fa-2fd0-43ee-b09d-a517078b2fad-0', usage_metadata={'input_tokens': 11, 'output_tokens': 9, 'total_tokens': 20, 'input_token_details': {'cache_read': 0}, 'output_token_details': {'reasoning': 0}})

Both are working. Let's setup the langchain and import libraries

In [5]:
# enable LangChain tracing, curious to see how it works
os.environ["LANGCHAIN_TRACING_V2"] = "true"

# set the langchain api key from my environment variable
os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

In [6]:
# fix the issue with USER_AGENT environment variable not set, consider setting it to identify your requests.
os.environ['USER_AGENT'] = getpass.getpass()

> 🚫 Chroma has a compatibility issue with python version (3.13), so I head back to 3.10.

In [7]:
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_chroma import Chroma

#### Let's begin loading the content

Langchain has an infinite number of [loaders](https://python.langchain.com/docs/integrations/document_loaders/). There are only a few types (eg web, json, pdf, etc) but per type you get an endless number of loaders. I'll start with the web loader. Let's load the content of my page. 
It should return the page as lists of strings and metadata for each string.

I'd like to include also the subpages, eg my Event page, I can add it as a separate link and load it but there should be a smarter way to do that. For now I stick to the main page.

In [8]:
# lets get the contents of my page

loader = WebBaseLoader(web_paths=("https://amirkhalilzadeh.github.io/wp/",),)
docs = loader.load()

# lets see the content and metadata 
print('number of characters in the content: ', len(docs[0].page_content))
display('here are all of the content:', docs[0].page_content)
print('here is the metadata:\n', docs[0].metadata)

number of characters in the content:  1732


'here are all of the content:'

'\n\n\n\n\nAmir Khalilzadeh\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHome\n\n\n\nEvents\n\n\n\n\n \n \n\n\n\n\n\n\n\n\n\n\n\n\nAmir Khalilzadeh\n\n\n\n\n\n\n\nLinkedIn\n\n\n\nGithub\n\n\n Email\n\n\n\n\n\n\nabout me\nI am a product lead at the Continuing Education of EPFL and UNIL. I co-lead and teach a COS program, Applied Data Science and Machine Learning, to private individuals and industry professionals. I also (co)-deliver on-demand workshops to companies and organizations.\nI spend most of my day helping our learners understand the underlying principles of data pipelines and ML models, identify issues in their Python code, and guide them in framing their ML projects with data that they bring from their work or else where.\n\n\nEvent updates 🤖🛠️\n\n2025/05 | a short course on LLMs is under construction 🚧 🤩\n2024/10 | delivered a 2-days workshop at Logitech\n2024/09 | delivered an NLP masterclass\n\n2024/03 | delivered a half-day hands-on workshop at AMLD\n2

here is the metadata:
 {'source': 'https://amirkhalilzadeh.github.io/wp/', 'title': 'Amir Khalilzadeh', 'language': 'en'}


#### Split the content

GPT-4o mini has 128k context window, so my page with only 1.7k characters doesn't even scratch the surface and splitting is not really necessary. But I'll do it anyway to see how it works and I guess it will help the embeddings to be more accurate.

In [9]:
# chunk the contents of the page
text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20, add_start_index=False)

splits = text_splitter.split_documents(docs)
display(len(splits))
splits[0]

27

Document(metadata={'source': 'https://amirkhalilzadeh.github.io/wp/', 'title': 'Amir Khalilzadeh', 'language': 'en'}, page_content='Amir Khalilzadeh\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHome\n\n\n\nEvents')

In [10]:
# create a for loop to get the length of each split and check disribution of the counts
lengths = [len(split.page_content) for split in splits]
sum(lengths)


1753

The sum is slightly more than the total number of characters.

#### Store the embeddings

It's now time to store the information in the database and that essentially means to store the embeddings. I'll go for `text-embedding-3-small` because it is the [cheapest](https://openai.com/api/pricing/). The vector length is 1536 (vs 3072 for the large version). The price-performance of small vs large is great, see [here](https://platform.openai.com/docs/guides/embeddings/what-are-embeddings), and more on the openai embeddings [here](https://openai.com/index/new-embedding-models-and-api-updates/) from earlier this January.

In [11]:
# store the chunks in a vector database
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings(model="text-embedding-3-small"))
vectorstore

<langchain_chroma.vectorstores.Chroma at 0x110de23e0>

#### Retrieve information

By default, the vector store retriever uses similarity search, and it is possible to set the number of results to return. I started with 1 but it clearly failed to find similar content for the following query.

In [12]:
# Retrieve similar information from the vector database
retriever = vectorstore.as_retriever(search_kwargs={"k": 6})

# lets see how the retriever works for a sample query
out = retriever.invoke("who is the author of the page?")
out

[Document(metadata={'language': 'en', 'source': 'https://amirkhalilzadeh.github.io/wp/', 'title': 'Amir Khalilzadeh'}, page_content='about me'),
 Document(metadata={'language': 'en', 'source': 'https://amirkhalilzadeh.github.io/wp/', 'title': 'Amir Khalilzadeh'}, page_content='developer and instructor.'),
 Document(metadata={'language': 'en', 'source': 'https://amirkhalilzadeh.github.io/wp/', 'title': 'Amir Khalilzadeh'}, page_content='Amir Khalilzadeh\n\n\n\n\n\n\n\nLinkedIn\n\n\n\nGithub\n\n\n Email'),
 Document(metadata={'language': 'en', 'source': 'https://amirkhalilzadeh.github.io/wp/', 'title': 'Amir Khalilzadeh'}, page_content='Amir Khalilzadeh\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nHome\n\n\n\nEvents'),
 Document(metadata={'language': 'en', 'source': 'https://amirkhalilzadeh.github.io/wp/', 'title': 'Amir Khalilzadeh'}, page_content='them in framing their ML projects with data that they bring from their work or else where.'),
 Document(metadata={'langu

#### Augment the input query

Langchain has a hub for prompts, eg `rag-prompt` (see below) but I'll use a custom one.

*You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.*

Let's chain everything together and see if it works. The **prompt** is fixed, and **context** is given by the retriever, and **question** is the user input. The **llm** takes the questions which is augmented with the context and prompt, yet they fit into its context window. It has all the information it needs to generate an answer.


In [13]:
# constructs a prompt 
from langchain_core.prompts import PromptTemplate

prompt = """Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, check the [page](https://amirkhalilzadeh.github.io/wp/) yourself but it's likely that there is no information about your question.
Use three sentences maximum and keep the answer concise.

{context}

Question: {question}

Helpful Answer:"""

custom_prompt = PromptTemplate.from_template(prompt)

#### Chain things together

Let's chain everything together and see if it works. The prompt is fixed, and context is given by the retriever, and question is the user input. The llm takes the questions which is augmented with the context and prompt, yet they fit into its context window. It has all the information it needs to generate an answer.

In [14]:
def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)

# passes things to the llm and parses the output
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_prompt
    | llm
    | StrOutputParser()
)

#### Time to try it out

In [15]:
rag_chain.invoke("who is the author of the page?")

'The author of the page is Amir Khalilzadeh.'

In [16]:
rag_chain.invoke("how many workshops he has delivered so far?")

'He has delivered a total of six workshops so far.'

In [19]:
rag_chain.invoke("does he have any workshop planned for next year, 2025?")

'Yes, there is a short course on LLMs under construction for May 2025.'

In [21]:
rag_chain.invoke("who was the audience for his most recent workshop?")

'The audience for his most recent workshop, delivered in October 2024, was likely professionals from Logitech.'

In [22]:
# cleanup
vectorstore.delete_collection()

There are lot's things to do to improve the QA chat and make it more practical but for now I stop here.