following https://github.com/gkamradt/langchain-tutorials/blob/main/LangChain%20Cookbook.ipynb

# LangChain Cookbook 👨‍🍳👩‍🍳

In [2]:
# load API keys from .env
from dotenv import load_dotenv

load_dotenv()

True

## Schema

### Text 

...

### Chat

3 types of message:

- System
- Human
- AI

### Documents

hold both of text and metadata

In [1]:
from langchain.schema import Document


document = Document(
    page_content="This is my document. It is full of text that I've gathered from other places",
    metadata={
        'my_document_id' : 234234,
        'my_document_source' : "The LangChain Papers",
        'my_document_create_time' : 1680013019
    }
)

print(f"{document = }")

document = Document(page_content="This is my document. It is full of text that I've gathered from other places", metadata={'my_document_id': 234234, 'my_document_source': 'The LangChain Papers', 'my_document_create_time': 1680013019})


## Models

### Language Models

text in, text out

### Chat Model

list of messages in, message out

### TextEmbedding

text in, embedding out

## Prompt Templates

Dynamically generate prompts

## Example Selectors




In [3]:
from langchain.prompts.example_selector import SemanticSimilarityExampleSelector
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
from langchain.prompts import FewShotPromptTemplate, PromptTemplate
from langchain.llms import OpenAI

llm = OpenAI(model_name="text-davinci-003")

example_prompt = PromptTemplate(
    input_variables=["input", "output"],
    template="Example Input: {input}\nExample Output: {output}",
)

# Examples of locations that nouns are found
examples = [
    {"input": "pirate", "output": "ship"},
    {"input": "pilot", "output": "plane"},
    {"input": "driver", "output": "car"},
    {"input": "tree", "output": "ground"},
    {"input": "bird", "output": "nest"},
]


# SemanticSimilarityExampleSelector will select examples that are similar to your input by semantic meaning
example_selector = SemanticSimilarityExampleSelector.from_examples(
    # This is the list of examples available to select from.
    examples, 
    
    # This is the embedding class used to produce embeddings which are used to measure semantic similarity.
    OpenAIEmbeddings(), 
    
    # This is the VectorStore class that is used to store the embeddings and do a similarity search over.
    FAISS, 
    
    # This is the number of examples to produce.
    k=2
)

similar_prompt = FewShotPromptTemplate(
    # The object that will help select examples
    example_selector=example_selector,
    
    # Your prompt
    example_prompt=example_prompt,
    
    # Customizations that will be added to the top and bottom of your prompt
    prefix="Give the location an item is usually found in",
    suffix="Input: {noun}\nOutput:",
    
    # What inputs your prompt will receive
    input_variables=["noun"],
)


In [4]:
# Select a noun
my_noun = "student"

print(similar_prompt.format(noun=my_noun))

Give the location an item is usually found in

Example Input: driver
Example Output: car

Example Input: pilot
Example Output: plane

Input: student
Output:


In [9]:
# Select a noun!
my_noun = "flower"

print(similar_prompt.format(noun=my_noun))

Give the location an item is usually found in

Example Input: tree
Example Output: ground

Example Input: bird
Example Output: nest

Input: flower
Output:


In [8]:
# Select a noun!
my_noun = "doctor"

prompt = similar_prompt.format(noun=my_noun)

print(prompt)
(llm(prompt))


Give the location an item is usually found in

Example Input: driver
Example Output: car

Example Input: pirate
Example Output: ship

Input: doctor
Output:


' hospital'

### Output Parsers

- Format Instructions: A autogenerated prompt that tells the LLM how to format it's response based off your desired result
- Parser: A method which will extract your model's text output into a desired structure (usually json)


In [10]:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI

llm = OpenAI()

# How you would like your reponse structured. This is basically a fancy prompt template
response_schemas = [
    ResponseSchema(name="bad_string", description="This a poorly formatted user input string"),
    ResponseSchema(name="good_string", description="This is your response, a reformatted response")
]

# How you would like to parse your output
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# See the prompt template you created for formatting
format_instructions = output_parser.get_format_instructions()

# Template with placeholders for format_instructions and user_input
template = """
You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

{format_instructions}

% USER INPUT:
{user_input}

YOUR RESPONSE:
"""

prompt = PromptTemplate(
    input_variables=["user_input"],
    partial_variables={"format_instructions": format_instructions},
    template=template
)

promptValue = prompt.format(user_input="welcom to califonya!")

print(promptValue)



You will be given a poorly formatted string from a user.
Reformat it and make sure all the words are spelled correctly

The output should be a markdown code snippet formatted in the following schema:

```json
{
	"bad_string": string  // This a poorly formatted user input string
	"good_string": string  // This is your response, a reformatted response
}
```

% USER INPUT:
welcom to califonya!

YOUR RESPONSE:



In [11]:
llm_output = llm(promptValue)
print(llm_output)


```json
{
	"bad_string": "welcom to califonya!",
	"good_string": "Welcome to California!"
}
```


In [12]:
parsed = output_parser.parse(llm_output)

print(f"{type(parsed) = }")
print(f"{parsed = }")

type(parsed) = <class 'dict'>
parsed = {'bad_string': 'welcom to califonya!', 'good_string': 'Welcome to California!'}


## Indexes

### Document Loaders

In [14]:
# hacker news loader
from langchain.document_loaders import HNLoader

loader = HNLoader("https://news.ycombinator.com/item?id=34422627")

data = loader.load()

print (f"Found {len(data)} comments")
print (f"Here's a sample:\n\n{''.join([x.page_content[:150] for x in data[:2]])}")

Found 76 comments
Here's a sample:

Ozzie_osman 3 months ago  
             | next [–] 

LangChain is awesome. For people not sure what it's doing, large language models (LLMs) are very Ozzie_osman 3 months ago  
             | parent | next [–] 

Also, another library to check out is GPT Index (https://github.com/jerryjliu/gpt_index)


## Text Splitting

Often times your document is too long (like a book) for your LLM. You need to split it up into chunks. Text splitters help with this.


In [15]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

# This is a long document we can split up.
# I manually copied the text into assets
with open('assets/paulgraham_worked.txt') as f:
    pg_work = f.read()
    
print (f"You have {len([pg_work])} document")

text_splitter = RecursiveCharacterTextSplitter(
    # Set a really small chunk size, just to show.
    chunk_size = 1_000,
    chunk_overlap  = 20,
)

texts = text_splitter.create_documents([pg_work])

You have 1 document


In [16]:
print (f"You have {len(texts)} documents")


You have 103 documents


## Retrievers

In [17]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings

loader = TextLoader('assets/paulgraham_worked.txt')
documents = loader.load()


# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1_000, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# Get embedding engine ready
embeddings = OpenAIEmbeddings()

# Embedd your texts
db = FAISS.from_documents(texts, embeddings)


# Init your retriever. Asking for just 1 document back
retriever = db.as_retriever()


In [18]:
docs = retriever.get_relevant_documents("what types of things did the author want to build?")

print("\n\n".join([x.page_content[:200] for x in docs[:2]]))

I started working on the application builder, Dan worked on network infrastructure, and the two undergrads worked on the first two services (images and phone calls). But about halfway through the summ

In this dissatisfied state I went in 1988 to visit Rich Draves at CMU, where he was in grad school. One day I went to visit the Carnegie Institute, where I'd spent a lot of time as a kid. While lookin


## VectorStores

...

## Memory

### Chat Message History

In [19]:
from langchain.memory import ChatMessageHistory
from langchain.chat_models import ChatOpenAI

chat = ChatOpenAI()

history = ChatMessageHistory()

history.add_ai_message("hi!")

history.add_user_message("what is the capital of france?")

print(f"{history.messages = }")

history.messages = [AIMessage(content='hi!', additional_kwargs={}), HumanMessage(content='what is the capital of france?', additional_kwargs={})]


In [20]:
ai_response = chat(history.messages)
ai_response

AIMessage(content='The capital of France is Paris.', additional_kwargs={})

In [21]:
history.add_ai_message(ai_response.content)
print(f"{history.messages = }")

history.messages = [AIMessage(content='hi!', additional_kwargs={}), HumanMessage(content='what is the capital of france?', additional_kwargs={}), AIMessage(content='The capital of France is Paris.', additional_kwargs={})]


## Chains

Combining different LLM calls and actions automatically.

### 1. Simple Sequential Chain


In [22]:
from langchain.llms import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.chains import SimpleSequentialChain

llm = OpenAI(temperature=1)


## chain 1
template = """Your job is to come up with a classic dish from the area that the users suggests.
% USER LOCATION
{user_location}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_location"], template=template)

# Holds my 'location' chain
location_chain = LLMChain(llm=llm, prompt=prompt_template)

## chain 2
template = """Given a meal, give a short and simple recipe on how to make that dish at home.
% MEAL
{user_meal}

YOUR RESPONSE:
"""
prompt_template = PromptTemplate(input_variables=["user_meal"], template=template)

# Holds my 'meal' chain
meal_chain = LLMChain(llm=llm, prompt=prompt_template)

## combine chains

overall_chain = SimpleSequentialChain(chains=[location_chain, meal_chain], verbose=True)


In [24]:
review = overall_chain.run("Madrid")



[1m> Entering new SimpleSequentialChain chain...[0m
[36;1m[1;3mA classic dish from Madrid is Cocido madrileño, a traditional stew made with chorizo, chickpeas, beef, pork, and vegetables.[0m
[33;1m[1;3m
COCIDO MADRILEÑO 
Ingredients:
-500 g boneless beef chuck, cut into cubes
-500 g pork shoulder, cut into cubes
-150 g Spanish chorizo sausage, cut into thick discs
-2 carrots, diced
-2 potatoes, diced
-2 large onions, diced
-1 can (400 g) of chickpeas, drained
-4 cloves garlic, minced 
-2 bay leaves
-3 tbsp olive oil 
-Salt & pepper to taste

Instructions: 
1. Heat the olive oil in a large pot over medium-high heat.
2. Add the beef, pork, chorizo, carrots, potatoes, onions, garlic, bay leaves, salt, and pepper to the pot.
3. Cook for 5 minutes, stirring occasionally.
4. Add enough water to cover the ingredients, bring to a boil, then reduce heat to low and simmer for 1 hour.
5. Add the chickpeas and cook for an additional 10 minutes. 
6. Enjoy![0m

[1m> Finished chain.[0m


### Summarization Chain

In [25]:
from langchain.chains.summarize import load_summarize_chain
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

loader = TextLoader('assets/state_of_the_union.txt')
documents = loader.load()

# Get your splitter ready
text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)

# Split your docs into texts
texts = text_splitter.split_documents(documents)

# There is a lot of complexity hidden in this one line. I encourage you to check out the video above for more detail
chain = load_summarize_chain(llm, chain_type="map_reduce", verbose=True)
chain.run(texts)



[1m> Entering new MapReduceDocumentsChain chain...[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated."


CONCISE SUMMARY:[0m
Prompt after formatting:
[32;1m[1;3mWrite a concise summary of the following:


"He thought he could roll into Ukraine and the world would roll over. Instead he met a wall

' President Biden is taking action to battle against Russian President Vladimir Putin and is investing in the American economy to create jobs, reduce violence in communities, reduce inflation and the deficit, and provide economic relief. He also seeks to pass gun control laws, support the right to vote and immigrants, and beat the opioid epidemic. Additionally, Biden is pushing for a Unity Agenda for the Nation to reduce cancer death rates and implementing policies to increase fairness and opportunity for Americans.'

## Agents

For applications that potentially require an **unknown chain** depending on the user's input.  In these types of chains, there is a “agent” which has access to a suite of tools. Depending on the user input, the agent can then decide which, if any, of these tools to call.