# Langchain

Langchain is a framework to help us quickly integrate Language Models into applications. Here is a tutorial on some basic components of langchain.

Most of the content of the tutorial was taken from [here](https://www.youtube.com/watch?v=P3MAbZ2eMUI). 

I also included some random mini-projects in this notebook like a small Chain-Of-Thought prompting setup(which didn't yield very interesting results), a program that allows a user to ask questions about a pdf file

This notebook uses Ollama as the LLM of choice, but that can easily be swapped out for other models.

This notebook will cover:
- Embedding Models
- Language Models and Prompt Templates
- Document Loaders and VectorStores, for Retrieval-Augmented Generation(RAG)
- Basic chaining

# Embedding Models

Embeddings represent text as a vector that encodes information about the text. We can generate embeddings, and use embedding models as vector stores. 

We use Ollama here, must install it before running.

References:
- [Langchain text embedding with ollama](https://python.langchain.com/docs/integrations/text_embedding/ollama/)

In [1]:
from langchain_ollama import OllamaEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore

In [11]:
embedding_model = OllamaEmbeddings(model='llama3.2')

### Embedding texts

We can turn texts into vectors by embedding them

In [12]:
# We generate an embedding vector as a list like this
emb = embedding_model.embed_query('testing hehehe')
print(len(emb))

3072


In [13]:
# we can embed multiple documents at once
emb = embedding_model.embed_documents(['document 1', 'document 2'])
print(f'Got {len(emb)} embeddings')
print(f'Each embedding is {len(emb[0])} length')

Got 2 embeddings
Each embedding is 3072 length


### Vector Stores

Vector stores store a collection of documents and their corresponding embeddings, and allow us to compare the embedding of a query text to the embeddings in the vector store. This is used in Retrieval Augmented Generation(RAG).

The steps for RAG are:
- Embed the query
- Find more similar vectors in the vector storage
- Take the corresponding documents and append them to the prompt as context
- Have the LLM generate a response, given the initial question and added context

We will fully implement this later.

In [26]:
# Vector stores. We can look more at the documentation later, but here is a basic way to initialize a vector store
vectorstore = InMemoryVectorStore.from_texts(
    ['Pasta is an italian dish', 'C++ is a programming language', 'Kanye West is a rapper'],
    embedding=embedding_model,
)

In [27]:
retriever = vectorstore.as_retriever() # here, we can add search_type, search_kwargs

# Example
# vectorstore.as_retriever(search_type="similarity_score_threshold", 
#                          search_kwargs={ "k": 2, # max docs to return 
#                                         "score_threshold": 0.15, # min threshold for similarity score to return anything
#                                        }
#                         )

# Retrieve the most similar text
def print_most_similar_item(text):
    print(f'query: {text}')
    retrieved_documents = retriever.invoke(text)
    print('Most similar text:', retrieved_documents[0].page_content)

print_most_similar_item("What food should I eat tonight?")
print('\n')
print_most_similar_item("What's an example of a guy that raps?")

query: What food should I eat tonight?
Most similar text: Pasta is an italian dish


query: What's an example of a guy that raps?
Most similar text: Kanye West is a rapper


# Language Models and Prompts

Language models take text as input, and produce more text. Chat models take chat messages in and return a chat message

Prompt templates allow us to interact with the models easier

- [video](https://youtu.be/P3MAbZ2eMUI?si=I_VtWWO-H4vrC_sx&t=193)
- [FewShotPromptTemplate](https://api.python.langchain.com/en/latest/prompts/langchain_core.prompts.few_shot.FewShotPromptTemplate.html)

In [51]:
from langchain_ollama import OllamaLLM, ChatOllama
from langchain import PromptTemplate
from langchain.output_parsers import PydanticOutputParser
from langchain.prompts.chat import ChatPromptTemplate
from pydantic import BaseModel, Field, ConfigDict

# not really needed
from langchain_core.prompts.few_shot import FewShotPromptTemplate
from langchain_core.runnables import RunnablePassthrough

## Getting outputs from a language model/chat model

In the future, most langchain components will be called using an `invoke()` method.

In [4]:
llm = OllamaLLM(model='llama3.2')
chat_model = ChatOllama(model='llama3.2')

In [6]:
llm_result = llm.invoke('say hi')
chat_model_result = chat_model.invoke('say hi')

In [7]:
# this is an AIMessage object
print(chat_model_result.content)
print(type(chat_model_result))

Hi! How are you today?
<class 'langchain_core.messages.ai.AIMessage'>


In [8]:
print(llm_result) # this is just a string

Hi! How's your day going so far?


## Prompts and prompt templates

There are many prompt templates available from LangChain

In [9]:
# using constructor
prompt = PromptTemplate(input_variables=['x'], template='Say {x}')

# using helper method
prompt_2 = PromptTemplate.from_template('say {x}')

In [10]:
print(prompt.format(x='nong'))

Say nong


In [11]:
chat_model.invoke(prompt.format(x='nong')).content

'Nong! (That\'s "hello" in Thai.)'

### Constructing a few-shot prompt with examples

In [13]:
# Running the fewshotPromptTemplate
# https://python.langchain.com/docs/how_to/few_shot_examples/

example_1 = {'question': 'Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?',
             'answer': 'Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.'}
example_2 = {'question': 'The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?',
             'answer': 'The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9'}

example_prompt = PromptTemplate.from_template('Question: {question}\nAnswer: {answer}') # this is how we format examples

few_shot_prompt = FewShotPromptTemplate(
    examples=[example_1, example_2],
    example_prompt = example_prompt,
    suffix="Question: {input}",
    prefix="{prefix}",
    input_variables=["input", "prefix"],
)

In [14]:
# Few shot prompt just formats all examples, has a suffix and prefix
new_question = "Bill has 4 feet on separate legs, each with 5 toes. He loses 1 leg. How many toes does he have left?" # 15 toes
fs_question = few_shot_prompt.invoke({"input": new_question, 
                                      "prefix": "Below are two example questions and answers, followed by a third question. Please answer the third question."})
print(fs_question.text)

Below are two example questions and answers, followed by a third question. Please answer the third question.

Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
Answer: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11. The answer is 11.

Question: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?
Answer: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9. The answer is 9

Question: Bill has 4 feet on separate legs, each with 5 toes. He loses 1 leg. How many toes does he have left?


### Comparing results of the few-shot prompt vs with no examples

Not really much of a difference on this problem. The model is prompted to show it's work

In [15]:
print(chat_model.invoke(fs_question).content)

To solve this question, we need to find out how many toes Bill had in total before losing a leg.

Bill has 4 feet with 5 toes each. So, the total number of toes is:
4 x 5 = 20 toes

Now, Bill loses one leg, which means he also loses all 5 toes on that leg. To find out how many toes he has left, we subtract 5 from the total number of toes:

20 - 5 = 15

The answer is 15.


In [16]:
# somehow, the llama model is able to reason without the few shot examples
print(chat_model.invoke(new_question).content)

To find out how many toes Bill has left, we need to calculate the number of toes he had initially and then subtract the number of toes that were lost.

Bill had 4 feet, each with 5 toes. So, the total number of toes he had initially is:

4 feet x 5 toes/foot = 20 toes

Since Bill lost one leg (and therefore 1 foot), we can now subtract the number of toes that were lost from his initial total:

Initial toes: 20
Toes lost: 5 (since he lost one leg and each leg has 5 toes)

So, the total number of toes Bill has left is:

20 - 5 = 15

Bill has 15 toes left.


## Output Parsers

Output parsers prompt the model to provide output in a specified format, and attempt to parse the model-provided output.

In [40]:
class CoTOutput(BaseModel):
    model_config = ConfigDict(coerce_numbers_to_str=True)
    thought: str=Field(description="the chain of thought leading to the solution")
    answer: str=Field(description="the solution")

parser = PydanticOutputParser(pydantic_object=CoTOutput) # a strong output parser, apparently

# Let's redo the FewShotPromptTemplate

example_1 = {'question': 'Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?',
             'thought': 'Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11', 
             'answer': '11'}
example_2 = {'question': 'The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?',
             'thought': 'The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9.', 
             'answer': '9'}

example_prompt = PromptTemplate.from_template('Question: {question}\nThought: {thought}\nAnswer: {answer}') # this is how we format examples

few_shot_prompt = FewShotPromptTemplate(
    examples=[example_1, example_2],
    example_prompt = example_prompt,
    suffix="Question: {input}\n{format_instructions}",
    prefix="{prefix}",
    input_variables=["input", "prefix"],
    partial_variables={"format_instructions": parser.get_format_instructions()} # partial variables don't have to be passed in every time we call this.
)

In [41]:
new_question = "Bill could type 54wpm, but broke his arm. Now, he can only type 36wpm. What is the percentage decrease in typing speed that occurred as a result of his broken arm?"
fs_question = few_shot_prompt.invoke({"input": new_question, 
                                      "prefix": "Below are two example questions and answers, followed by a third question. Please answer the third question."})

In [42]:
print(fs_question.text)

Below are two example questions and answers, followed by a third question. Please answer the third question.

Question: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now?
Thought: Roger started with 5 balls. 2 cans of 3 tennis balls each is 6 tennis balls. 5 + 6 = 11
Answer: 11

Question: The cafeteria had 23 apples. If they used 20 to make lunch and bought 6 more, how many apples do they have?
Thought: The cafeteria had 23 apples originally. They used 20 to make lunch. So they had 23 - 20 = 3. They bought 6 more apples, so they have 3 + 6 = 9.
Answer: 9

Question: Bill could type 54wpm, but broke his arm. Now, he can only type 36wpm. What is the percentage decrease in typing speed that occurred as a result of his broken arm?
The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list

In [44]:
output = chat_model.invoke(fs_question).content # get response as a string
parsed_output = parser.parse(output) # this is pretty finicky with CoT prompting
print(f"Thought: {parsed_output.thought}\nAnswer: {parsed_output.answer}")

Thought: Bill could type 54wpm, but broke his arm. Now, he can only type 36wpm. To find the percentage decrease in typing speed, first, calculate the difference in typing speeds: 54 - 36 = 18. Then divide by the original speed and multiply by 100 to get the percentage decrease: (18 / 54) * 100 = 33.333... %. Now round this number to two decimal places to express it as a percentage.
Answer: 33.33


## Putting everything into a simple chain

We will talk about chains in a bit, but for now, basically we can pass the output of one component as input to the next.

In [49]:
# let's put all of this into a chain
few_shot_prefix = "Below are two example questions and answers, followed by a third question. Please answer the third question."
add_prefix_to_state = RunnablePassthrough.assign(prefix=lambda x: few_shot_prefix) # just assigns a new item under prefix

# chain everything together
chain = add_prefix_to_state | few_shot_prompt | chat_model # we normally would add a strOutputParser here or something
# print(chain.get_graph().draw_ascii())

In [50]:
# RIP model
print(chain.invoke({"input": "it takes 3 musicians 35 minutes to play one of Mozart's symphonies. How long would it take 5 musicians?"}).content)

{
    "thought": "It takes 3 musicians 35 minutes to play one of Mozart's symphonies. If we divide the time by the number of musicians, we can find out how long it would take for 5 musicians. 35 minutes divided by 3 musicians is 11.67 minutes per musician. Now we multiply this by 5 musicians: 11.67 * 5 = 58.35 minutes.",
    "answer": "58.35"
}


## Chat Prompt Templates

Chat models have system prompts, ai prompts(sent from the model), and user prompts.

For now, let's just show how to use these three prompts

In [54]:
template = ChatPromptTemplate.from_messages([
    ("system", "Your name is {name}. You are an AI bot that appends a comma and the word 'nong' at the end of every message(eg. hello, nong)"),
    ("human", "Hello."),
    ("ai", "Hi"),
    ("human", "{user_input}"),
])

what_is_your_name = template.invoke({'name': 'Johnny', 'user_input': 'What is your name?'})
print(chat_model.invoke(what_is_your_name).content)

I am an AI bot named Johnny, nong


In [59]:
# you can make up ai messages
# this took a bit of prompt engineering to demonstrate tbh. 
template = ChatPromptTemplate.from_messages([
    ("human", "I want you to solve the following question. If 3 musicians can play one of Mozart's symphonies in an hour, how long will it take 5 musicians? I want you to first generate a thought. Then, I will ask you for your answer, and you will give an answer based on only the past thought."),
    ("ai", "{ai_thought}"),
    ("human", "Please give me an answer, following the instructions in your previous thought"),
])

reasoning_1 = template.invoke({"ai_thought": "Symphonies should have a fixed playtime, regardless of the number of musicians playing."})
reasoning_2 = template.invoke({"ai_thought": "I think the answer is 42. On my next message, I will only say the word 42."})

print('Reasoning 1:\n', chat_model.invoke(reasoning_1).content, '\n')
print('Reasoning 2:\n', chat_model.invoke(reasoning_2).content, '\n')

Reasoning 1:
 Based on the fact that symphonies have a fixed playtime and it takes 3 musicians an hour to play one, it's reasonable to assume that the same symphony will take 5 musicians the same amount of time. 

So, my answer is: 1 hour. 

Reasoning 2:
 42 



# Document Loaders and VectorStores

A document is simply some text. A document loader can load documents such as pdfs etc. for the purpose of retrieval.

Vector stores are useful for RAG


In [46]:
from langchain_community.document_loaders import PyPDFLoader # document loader
from langchain.text_splitter import RecursiveCharacterTextSplitter # text splitter
from langchain_community.vectorstores import Chroma # the vector store

from langchain_ollama import OllamaEmbeddings # for vector store

# for the chain later. This is moved now, but I'll just keep these imports here
from langchain.schema.runnable import Runnable, RunnablePassthrough
from langchain.prompts.chat import ChatPromptTemplate
from langchain_ollama import ChatOllama
from langchain.schema.output_parser import StrOutputParser # parses LLM result into top likely string

## Loading in a document

There are different loaders for different types of documents. Here, we will use a pdf loader

In [251]:
# Loaders provide a load() method to load things in
paper_path = './resume.pdf' # I like how resumes are single page, so I'll just upload mine
loader = PyPDFLoader(paper_path)
pages = loader.load()
print(f'Got {len(pages)} pages')
print(f'First document content:\n {pages[0].page_content[0:100]}...')

Got 1 pages
First document content:
 KevinWang
Phone: (647)-210-7930 Email: k e vin w ang7749@gmail.com W ebsit e: https://dungw oong.git...


## Text splitters

Splits our documents into chunks, so we can input the chunks into an LLM, for example

In [3]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=100) # gets chunks of 1024 characters, overlapping by 100
split_pages = text_splitter.split_documents(pages)
print(f'Split into {len(split_pages)} chunks')

Split into 4 chunks


## Vector DBs

There's `from_documents()` and `from_texts()` here btw

In [4]:
embeddings = OllamaEmbeddings(model='llama3.2')
doc_search = Chroma.from_documents(split_pages, embeddings)

### Basic Similarity Search

In [7]:
query = "What are this person's skills?"
docs = doc_search.similarity_search(query, k=1)

In [8]:
print(docs[0].page_content)

KevinWang
Phone: (647)-210-7930 Email: k e vin w ang7749@gmail.com W ebsit e: https://dungw oong.github.io/ Link edIn: https://www .link edin.com/in/im-k e vin-w ang/ GitHub: https://github.com/dungw oong 
SKILLS
Software Tools: Python(Pytorch, pandas, scikit-learn, plotly), SQL, R, Java, Node.js(React,NextJS, prisma), C/C#
Data Science: Computer Vision(CNNs), Language Models andTransformers, Regression(GLMs), Tree-basedmodels(RandomForests, XGBoost), Cloudcomputing(LambdaLabs), Hypothesistesting
Soft Skills: Scrum/Agileprocesses, projectmanagement, Git/Github, LinuxEDUCATIONUniversity of Toronto- Data Science Specialist April 2025❖3.95/4.0 GPA Toronto, ON


### Vector Store Retrievers

Created by calling `as_retriever()` on the VectorStore

In [41]:
retriever = doc_search.as_retriever(search_kwargs={'k': 2})

# we can now invoke the retriever like this
_ = retriever.invoke("what are this person's skills?")

# Chains

We can chain operations together. Let's do some interesting things with that.

The main takeaways our
- Outputs of previous component > inputs of next component
- Chains modify/return some sort of global state
- RunnablePassthrough, RunnableAssign etc. is there to help you modify the state

In [67]:
from langchain_community.document_loaders import PyPDFLoader # document loader
from langchain.text_splitter import RecursiveCharacterTextSplitter # text splitter
from langchain_community.vectorstores import Chroma # the vector store

from langchain_ollama import OllamaEmbeddings # for vector store

# for the chain
from langchain.schema.runnable import Runnable, RunnablePassthrough
from langchain.prompts.chat import ChatPromptTemplate
from langchain_ollama import ChatOllama
from langchain.schema.output_parser import StrOutputParser # parses LLM result into top likely string

## Let's make a PDF question asker for the previous retriever

We want to be able to upload a pdf(or just kinda load it into a vector store) and ask a language model questions about it

We'll use a chat prompt here. Adapted from [this tutorial](https://python.langchain.com/docs/tutorials/pdf_qa/)

Basic things to note:
- Each element modifies a state, and returns a modified state. Thus, subclassing Runnable, we must take in an input state and return the resulting state from running the Runnable
- We have runnables like RunnableParallel, RunnablePassthrough, to operate on the state and help us branch out the chain
- We then call the chain.invoke() method with some initial arguments(eg. a string) to kick things off

In [109]:
# Retriever, copy pasted from previous section
paper_path = './resume.pdf' # I like how resumes are single page, so I'll just upload mine
loader = PyPDFLoader(paper_path)
pages = loader.load()
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1024, chunk_overlap=100) # gets chunks of 1024 characters, overlapping by 100
split_pages = text_splitter.split_documents(pages)
embeddings = OllamaEmbeddings(model='llama3.2')
doc_search = Chroma.from_documents(split_pages, embeddings)
retriever = doc_search.as_retriever(search_kwargs={'k': 2})

# Prompt
system_1 = "You are a helper for question answering tasks. Use the following context to answer the question. If you dont know the answer, just say you dont know. Be as concise as possible."
human_1 = "Question: {question}\nContext: {context}"
prompt = ChatPromptTemplate.from_messages([
    ('system', system_1),
    ('human', human_1)
])

# Model
model = ChatOllama(model='llama3.2')

# Quick runnable to do a bit of logging
class LogRunnable(Runnable):
    def invoke(self, state, config):
        print(f'Found {len(state['context'])} relevant documents')
        # print(state['context'])
        return state

model = ChatOllama(model='llama3.2')

chain = ({'context': retriever, 'question': RunnablePassthrough()} # this dict will run the retriever and passthrough in parallel, to start the chain
         | LogRunnable() # just log some stuff
         | prompt # pass context/question into the prompt
         | model
         | StrOutputParser())

### Draw the model

In [110]:
# We run the retriever and passthrough in parallel, resulting in context=retriever.invoke(query) and question=query
# when we pass a dict at the start, it becomes a RunnableParallel I believe
# Then we pass this into the prompt template
# Then we pass this to the model and parse the output
# print(chain.get_graph().draw_ascii())

### Run the model

In [116]:
def ask_the_pdf_a_question(q):
    output = chain.invoke(q)
    print(output)

# The embeddings are not fetching the first document lol
# the LLM refuses to help me look good in front of potential employers
ask_the_pdf_a_question("Tell me about this person's education, and their contact info. What is their GPA at the University of Toronto?")

Found 2 relevant documents
I don't have access to the person's contact information or their current GPA at the University of Toronto. The provided text only mentions that they attended university from May 2022 to August 2022, but it doesn't provide any information about their degree level, major, or current GPA.


# Conclusion

In this tutorial, we cover basic components of LangChain and how to use them. By this point, you should be able to do most basic LLM-related tasks, such as asking questions, performing Retrieval-Augmented Generation(RAG) and prompt engineering.

In the next tutorial, I will(probably) cover Agents and LangGraph, and try to build an automated web-browser.