# LangChain Cookbook

Goal: Provide an introductory understanding of the components and use cases of LangChain

What is LangChain?
LangChain is a framework for developing applications powered by language models.

TLDR: LangChain makes the complicated parts of working & building with AI models easier. It helps do this in two ways:

Integration - Bring external data, such as your files, other applications, and api data, to your LLMs
Agency - Allow your LLMs to interact with it's environment via decision making. Use LLMs to help decide which action to take next
Why LangChain?
Components - LangChain makes it easy to swap out abstractions and components necessary to work with language models.

Customized Chains - LangChain provides out of the box support for using and customizing 'chains' - a series of actions strung together.

Speed 🚢 - This team ships insanely fast. You'll be up to date with the latest LLM features.

Community 👥 - Wonderful discord and community support, meet ups, hackathons, etc.

Though LLMs can be straightforward (text-in, text-out) you'll quickly run into friction points that LangChain helps with once you develop more complicated applications.

In [4]:
# load environment with OpenAI API key
from dotenv import load_dotenv
import os

In [5]:
load_dotenv()

True

## LangChain Components

### Text 
- The natural language way to interact with LLMs

In [3]:
# You'll be working with simple strings (that'll soon grow in complexity!)
my_text = "What day comes after Friday?"

### Chat Messages

Like text, but specified with a message type (System, Human, AI)

- System - Helpful background context that tell the AI what to do
- Human - Messages that are intented to represent the user
- AI - Messages that show what the AI responded with

In [4]:
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage, SystemMessage, AIMessage

chat = ChatOpenAI(temperature=.8, openai_api_key=os.environ.get('OPENAI_API_KEY'))


In [5]:
chat(
    [
        SystemMessage(content="You are a nice AI bot that helps a user figure out what to eat in one short sentence"),
        HumanMessage(content="I like meat, what should I eat?")
    ]
)

AIMessage(content='You can try a juicy steak or a grilled chicken breast.', additional_kwargs={}, example=False)

You can also pass more chat history w/ responses from the AI

In [6]:
chat(
    [
        SystemMessage(content="You are a nice AI bot that helps a user figure out where to travel in one short sentence"),
        HumanMessage(content="I like the mountains where should I go?"),
        AIMessage(content="You should go to Los Angeles, USA"),
        HumanMessage(content="What else should I do when I'm there?")
    ]
)

AIMessage(content='You should explore Griffith Observatory and enjoy the stunning views of the city.', additional_kwargs={}, example=False)

### Documents



In [7]:
from langchain.schema import Document

In [8]:
Document(page_content="This is my document. It is full of text that I've gathered from other places",
         metadata={
             'my_document_id' : 234234,
             'my_document_source' : "The LangChain Papers",
             'my_document_create_time' : 1680013019
         })

Document(page_content="This is my document. It is full of text that I've gathered from other places", metadata={'my_document_id': 234234, 'my_document_source': 'The LangChain Papers', 'my_document_create_time': 1680013019})

## Models - The interface to the AI brains

### Language Model

In [9]:
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-ada-001", openai_api_key=os.environ.get('OPENAI_API_KEY'))

In [10]:
llm("What day comes after Monday?")

'\n\nTuesday'

### Chat Model

A model that takes a series of messages and returns a message output

In [11]:
chat = ChatOpenAI(temperature=1, openai_api_key=os.environ.get('OPENAI_API_KEY'))

In [13]:
chat(
    [
        SystemMessage(content="You are an unhelpful AI bot that makes a joke at whatever the user says"),
        HumanMessage(content="I would like to go to England, how should I do this?")
    ]
)

AIMessage(content="Why don't you try turning your car into a boat and sailing across the Atlantic? It's a foolproof plan, trust me!", additional_kwargs={}, example=False)

### Text Embedding Model

Change your text into a vector (a series of numbers that hold the semantic 'meaning' of your text). Mainly used when comparing two pieces of text together.

BTW: Semantic means 'relating to meaning in language or logic.'

In [14]:
from langchain.embeddings import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get('OPENAI_API_KEY'))

In [15]:
text = "Hello! We should go play soccer!"

In [16]:
text_embedding = embeddings.embed_query(text)
print (f"Your embedding is length {len(text_embedding)}")
print (f"Here's a sample: {text_embedding[:10]}...")

Your embedding is length 1536
Here's a sample: [-0.0015049881767481565, 0.00043679255759343505, 0.011140372604131699, -0.01787743531167507, -0.025790812447667122, 0.030722517520189285, -0.007240298669785261, -0.004151691682636738, 0.007649177219718695, -0.03847234323620796]...


### Prompts 

- Text generally used as instructions to your model

Prompt

In [17]:
llm = OpenAI(model_name="text-davinci-003", openai_api_key=os.environ.get('OPENAI_API_KEY'))

# I like to use three double quotation marks for my prompts because it's easier to read
prompt = """
Today is Monday, tomorrow is Wednesday.

What is wrong with that statement?
"""

llm(prompt)

'\nThe statement is incorrect - tomorrow is Tuesday.'

### Prompt Template
An object that helps create prompts based on a combination of user input, other non-static information and a fixed template string.



In [19]:
from langchain import PromptTemplate

template = """
I really want to travel to {location}. What should I do there?

Respond in one short sentence
"""

prompt = PromptTemplate(
    input_variables=["location"],
    template=template,
)

final_prompt = prompt.format(location='London')

print (f"Final Prompt: {final_prompt}")
print ("-----------")
print (f"LLM Output: {llm(final_prompt)}")

Final Prompt: 
I really want to travel to London. What should I do there?

Respond in one short sentence

-----------
LLM Output: Visit historical sites such as Buckingham Palace, the Tower of London, and Westminster Abbey.


### Example Selectors
An easy way to select from a series of examples that allow you to dynamic place in-context information into your prompt. Often used when your task is nuanced or you have a large list of examples.


In [23]:
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import FAISS
from langchain.prompts import FewShotPromptTemplate, PromptTemplate

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Example Input: {input}\nExample Output: {output}",
)

# Examples of locations that nouns are found
examples = [
    {"input": "pirate", "output": "ship"},
    {"input": "pilot", "output": "plane"},
    {"input": "driver", "output": "car"},
    {"input": "tree", "output": "ground"},
    {"input": "bird", "output": "nest"},
]


In [25]:
# SemanticSimilarityExampleSelector will select examples that are similar to your input by semantic meaning; we need to install both faiss-cpu and tiktoken

example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples, 
    
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(openai_api_key=os.environ.get('OPENAI_API_KEY')), 
    
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    FAISS, 
    
    # This is the number of examples to produce.
    k=2
)

In [26]:
similar_prompt = FewShotPromptTemplate(
    # The object that will help select examples
    example_selector=example_selector,
    
    # Your prompt
    example_prompt=example_prompt,
    
    # Customizations that will be added to the top and bottom of your prompt
    prefix="Give the location an item is usually found in",
    suffix="Input: {noun}\nOutput:",
    
    # What inputs your prompt will receive
    input_variables=["noun"],
)

In [27]:
# Select a noun!
my_noun = "student"

print(similar_prompt.format(noun=my_noun))

Give the location an item is usually found in

Example Input: driver
Example Output: car

Example Input: pilot
Example Output: plane

Input: student
Output:


In [28]:
llm(similar_prompt.format(noun=my_noun))

' classroom'

### Output Parsers
A helpful way to format the output of a model. Usually used for structured output.

Two big concepts:

1. Format Instructions - A autogenerated prompt that tells the LLM how to format it's response based off your desired result

2. Parser - A method which will extract your model's text output into a desired structure (usually json)

In [29]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate

In [30]:
llm = OpenAI(model_name="text-davinci-003", openai_api_key=os.environ.get('OPENAI_API_KEY'))

In [31]:
# How you would like your response structured. This is basically a fancy prompt template
response_schemas = [
    ResponseSchema(name="bad_string", description="This a poorly formatted user input string"),
    ResponseSchema(name="good_string", description="This is your response, a reformatted response")
]

# How you would like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

In [32]:
# See the prompt template you created for formatting
format_instructions = output_parser.get_format_instructions()
print(format_instructions)

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```


In [33]:
template = """
You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

{format_instructions}

% USER INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt = PromptTemplate(
    input_variables=["user_input"],
    partial_variables={"format_instructions": format_instructions},
    template=template
)

promptValue = prompt.format(user_input="welcom to califonya!")

print(promptValue)


You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "```json" and "```":

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```

% USER INPUT:
welcom to califonya!

YOUR RESPONSE:



In [34]:
llm_output = llm(promptValue)
llm_output

'```json\n{\n\t"bad_string": "welcom to califonya!",\n\t"good_string": "Welcome to California!"\n}\n```'

In [35]:
output_parser.parse(llm_output)

{'bad_string': 'welcom to califonya!', 'good_string': 'Welcome to California!'}

### Indexes 

- Structuring documents to LLMs can work with them

Document Loaders

In [36]:
from langchain.document_loaders import HNLoader

In [37]:
loader = HNLoader("https://news.ycombinator.com/item?id=34422627")

In [38]:
data = loader.load()

In [39]:
print (f"Found {len(data)} comments")
print (f"Here's a sample:\n\n{''.join([x.page_content[:150] for x in data[:2]])}")

Found 76 comments
Here's a sample:

Ozzie_osman 5 months ago  
             | next [–] 

LangChain is awesome. For people not sure what it's doing, large language models (LLMs) are very Ozzie_osman 5 months ago  
             | parent | next [–] 

Also, another library to check out is GPT Index (https://github.com/jerryjliu/gpt_index)


### Text Splitters

Often times your document is too long (like a book) for your LLM. You need to split it up into chunks. Text splitters help with this.

In [40]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

In [41]:
# This is a long document we can split up.
with open('data/PaulGrahamEssays/worked.txt') as f:
    pg_work = f.read()
    
print (f"You have {len([pg_work])} document")

You have 1 document


In [42]:
text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 150,
    chunk_overlap  = 20,)

texts = text_splitter.create_documents([pg_work])

In [43]:
print (f"You have {len(texts)} documents")

You have 610 documents


In [44]:
print ("Preview:")
print (texts[0].page_content, "\n")
print (texts[1].page_content)

Preview:
February 2021Before college the two main things I worked on, outside of school,
were writing and programming. I didn't write essays. I wrote what 

beginning writers were supposed to write then, and probably still
are: short stories. My stories were awful. They had hardly any plot,


### Retrievers

Easy way to combine documents with language models.

There are many different types of retrievers, the most widely supported is the VectoreStoreRetriever

In [45]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

loader = TextLoader('data/PaulGrahamEssays/worked.txt')
documents = loader.load()

In [46]:
# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# Get embedding engine ready
embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get('OPENAI_API_KEY'))

# Embedd your texts
db = FAISS.from_documents(texts, embeddings)


In [47]:
# Init your retriever. Asking for just 1 document back
retriever = db.as_retriever()

In [48]:
retriever

VectorStoreRetriever(vectorstore=<langchain.vectorstores.faiss.FAISS object at 0x139f02e10>, search_type='similarity', search_kwargs={})

In [49]:
docs = retriever.get_relevant_documents("what types of things did the author want to build?")

In [50]:
print("\n\n".join([x.page_content[:200] for x in docs[:2]]))

standards; what was the point? No one else wanted one either, so
off they went. That was what happened to systems work.I wanted not just to build things, but to build things that would
last.In this di

much of it in grad school.Computer Science is an uneasy alliance between two halves, theory
and systems. The theory people prove things, and the systems people
build things. I wanted to build things. 


### VectorStores

Databases to store vectors. Most popular ones are Pinecone & Weaviate. More examples on OpenAIs retriever documentation. Chroma & FAISS are easy to work with locally.

Conceptually, think of them as tables w/ a column for embeddings (vectors) and a column for metadata.



In [51]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

loader = TextLoader('data/PaulGrahamEssays/worked.txt')
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# Get embedding engine ready
embeddings = OpenAIEmbeddings(openai_api_key=os.environ.get('OPENAI_API_KEY'))

In [52]:
print (f"You have {len(texts)} documents")

You have 78 documents


In [53]:
embedding_list = embeddings.embed_documents([text.page_content for text in texts])

In [54]:
print (f"You have {len(embedding_list)} embeddings")
print (f"Here's a sample of one: {embedding_list[0][:3]}...")

You have 78 embeddings
Here's a sample of one: [-0.0011548422398683247, -0.01124102377384105, -0.012887198873624496]...


### Memory

Helping LLMs remember information.

Memory is a bit of a loose term. It could be as simple as remembering information you've chatted about in the past or more complicated information retrieval.

We'll keep it towards the Chat Message use case. This would be used for chat bots.

In [55]:
from langchain.memory import ChatMessageHistory
from langchain.chat_models import ChatOpenAI

chat = ChatOpenAI(temperature=0, openai_api_key=os.environ.get('OPENAI_API_KEY'))

history = ChatMessageHistory()

history.add_ai_message("hi!")

history.add_user_message("what is the capital of Spain?")

In [56]:
history.messages

[AIMessage(content='hi!', additional_kwargs={}, example=False),
 HumanMessage(content='what is the capital of Spain?', additional_kwargs={}, example=False)]

In [57]:
ai_response = chat(history.messages)
ai_response

AIMessage(content='The capital of Spain is Madrid.', additional_kwargs={}, example=False)

In [58]:
history.add_ai_message(ai_response.content)
history.messages

[AIMessage(content='hi!', additional_kwargs={}, example=False),
 HumanMessage(content='what is the capital of Spain?', additional_kwargs={}, example=False),
 AIMessage(content='The capital of Spain is Madrid.', additional_kwargs={}, example=False)]

### Chains ⛓️⛓️⛓️

Combining different LLM calls and action automatically

Ex: Summary #1, Summary #2, Summary #3 > Final Summary

1. Simple Sequential Chains

Easy chains where you can use the output of an LLM as an input into another. Good for breaking up tasks (and keeping your LLM focused)

In [59]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain, SimpleSequentialChain
from langchain.prompts import PromptTemplate

llm = OpenAI(temperature=1, openai_api_key=os.environ.get('OPENAI_API_KEY'))

In [60]:
template = """Your job is to come up with a classic dish from the area that the users suggests.
% USER LOCATION
{user_location}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)

# Holds my 'location' chain
location_chain = LLMChain(llm=llm, prompt=prompt_template)

In [61]:
template = """Given a meal, give a short and simple recipe on how to make that dish at home.
% MEAL
{user_meal}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)

# Holds my 'meal' chain
meal_chain = LLMChain(llm=llm, prompt=prompt_template)

In [62]:
overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=True)


In [64]:
review = overall_chain.run("China")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mClassic dish from China: Kung Pao Chicken.[0m
[33;1m[1;3mKung Pao Chicken Recipe:

Ingredients:
- 4 boneless chicken breasts, diced
- 1/4 cup low-sodium soy sauce
- 2 tablespoons cornstarch
- 2 tablespoons rice wine or dry sherry
- 2 tablespoons vegetable oil
- 1 tablespoon Chinese black vinegar or Worcestershire sauce
- 1 teaspoon sesame oil
- 2 cloves garlic, minced
- 1 teaspoon freshly grated ginger
- 2 green onions, thinly sliced
- 1/4 teaspoon red pepper flakes
- 1/4 cup dry roasted peanuts

Instructions:
1. In a medium bowl, whisk together the soy sauce, cornstarch, and rice wine.
2. Heat the vegetable oil in a large skillet over medium-high heat until shimmering.
3. Add the chicken and stir-fry until lightly browned, about 4 minutes.
4. Add the garlic, ginger, and green onions and cook until fragrant, about 1 minute.
5. Pour in the soy sauce mixture and stir to combine.
6. Add the black vinegar, sesame oil, 

2. Summarization Chain

Easily run through long numerous documents and get a summary.

In [66]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader('data/PaulGrahamEssays/disc.txt')
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# There is a lot of complexity hidden in this one line. I encourage you to check out the video above for more detail
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(texts)



[1m> Entering new MapReduceDocumentsChain chain...[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"January 2017Because biographies of famous scientists tend to 
edit out their mistakes, we underestimate the 
degree of risk they were willing to take.
And because anything a famous scientist did that
wasn't a mistake has probably now become the
conventional wisdom, those choices don't
seem risky either.Biographies of Newton, for example, understandably focus
more on physics than alchemy or theology.
The impression we get is that his unerring judgment
led him straight to truths no one else had noticed.
How to explain all the time he spent on alchemy
and theology?  Well, smart people are often kind of
crazy.But maybe there is a simpler explanation. Maybe"


CONCISE SUMMARY:[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"the smartness and the craziness were not as sepa

" Isaac Newton's major accomplishments in physics, alchemy and theology may make him seem like an incredibly smart and marginally crazy figure, but his risks were ultimately successful and influential in fostering current practices in science. Even though his bets had uncertain outcomes, his willingness to take risks was a major factor in the successes he achieved."

## Agents 🤖🤖

Official LangChain Documentation describes agents perfectly:

Some applications will require not just a predetermined chain of calls to LLMs/other tools, but potentially an unknown chain that depends on the user's input. In these types of chains, there is a “agent” which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.

Basically you use the LLM not just for text output, but also for decision making. The coolness and power of this functionality can't be overstated enough.

In [6]:
from langchain.agents import load_tools
from langchain.agents import initialize_agent
from langchain.llms import OpenAI
import json, os

llm = OpenAI(temperature=0, openai_api_key=os.environ.get('OPENAI_API_KEY'))


In [7]:
toolkit = load_tools(["serpapi"], llm=llm, serpapi_api_key=os.environ.get('SERPAPI_API_KEY'))

In [8]:
agent = initialize_agent(toolkit, llm, agent="zero-shot-react-description", verbose=True, return_intermediate_steps=True)


In [9]:
response = agent({"input":"what was the first album of the" 
                    "band that Natalie Bergman is a part of?"})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m I should try to find out what band Natalie Bergman is a part of.
Action: Search
Action Input: "Natalie Bergman band"[0m
Observation: [36;1m[1;3mNatalie Bergman is an American singer-songwriter. She is one half of the duo Wild Belle, along with her brother Elliot Bergman. Her debut solo album, Mercy, was released on Third Man Records on May 7, 2021. She is based in Los Angeles.[0m
Thought:[32;1m[1;3m I should search for the debut album of Wild Belle.
Action: Search
Action Input: "Wild Belle debut album"[0m
Observation: [36;1m[1;3mIsles[0m
Thought:[32;1m[1;3m I now know the final answer.
Final Answer: Isles is the debut album of Wild Belle, the band that Natalie Bergman is a part of.[0m

[1m> Finished chain.[0m


In [10]:
print(json.dumps(response["intermediate_steps"], indent=2))

[
  [
    [
      "Search",
      "Natalie Bergman band",
      " I should try to find out what band Natalie Bergman is a part of.\nAction: Search\nAction Input: \"Natalie Bergman band\""
    ],
    "Natalie Bergman is an American singer-songwriter. She is one half of the duo Wild Belle, along with her brother Elliot Bergman. Her debut solo album, Mercy, was released on Third Man Records on May 7, 2021. She is based in Los Angeles."
  ],
  [
    [
      "Search",
      "Wild Belle debut album",
      " I should search for the debut album of Wild Belle.\nAction: Search\nAction Input: \"Wild Belle debut album\""
    ],
    "Isles"
  ]
]
