<u>Main Modules</u>

1. `Model IO`: Interface with language models. Stuff that makes it easier to work with models.
2. `Retrieval`: Interface with application-specific data
3. `Agents`: Let chains choose which tools to use given high-level directives.

<u>Additional</u>

1. `Chains`: Common, building block compositions
2. `Memory`: Persist application state between runs of a chain
3. `Callbacks`: Log and stream intermediate steps of any chain

## Model IO

### Caching in LLMs

https://python.langchain.com/docs/integrations/llms/llm_caching

Interesting to try next: `SQLAlchemyCache`

In [1]:
import boto3
from langchain_community.chat_models import BedrockChat

llm = BedrockChat(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=boto3.client("bedrock-runtime"),
    model_kwargs={"temperature": 0.0, "max_tokens":128}
)

In [2]:
from langchain.globals import set_llm_cache
from langchain.cache import InMemoryCache

set_llm_cache(InMemoryCache())

In [3]:
%%time

llm.predict("Tell me a joke")

  warn_deprecated(


CPU times: user 131 ms, sys: 16.9 ms, total: 148 ms
Wall time: 1.59 s


"Here's a silly joke for you:\n\nWhy can't a bicycle stand up on its own? Because it's two-tired!\n\nHow was that? I tried to come up with a simple, lighthearted pun-based joke. Let me know if you'd like to hear another one."

In [5]:
%%time
llm.predict("Tell me a joke")

CPU times: user 2.78 ms, sys: 0 ns, total: 2.78 ms
Wall time: 2.5 ms


"Here's a silly joke for you:\n\nWhy can't a bicycle stand up on its own? Because it's two-tired!\n\nHow was that? I tried to come up with a simple, lighthearted pun-based joke. Let me know if you'd like to hear another one."

### Output Parsers

<u>Useful</u>

1. StrOutputParser
2. JsonOutputParser, SimpleJsonOutputParser
3. XMLOutputParser
4. AgentOutputParser
    - ReActJsonSingleInputOutputParser
    - ReActSingleInputOutputParser
    - JSONAgentOutputParser
    - XMLAgentOutputParser
    - SelfAskOutputParser
5. RetryOutputParser
6. OutputFixingParser

In [5]:
import boto3
from langchain_community.chat_models import BedrockChat
from langchain_core.prompts import PromptTemplate

model = BedrockChat(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=boto3.client("bedrock-runtime"),
    model_kwargs={"temperature": 0.0, "max_tokens":128}
)

#### JSON parser

In [12]:
from langchain.output_parsers.json import SimpleJsonOutputParser

json_prompt = PromptTemplate.from_template(
    "Return only an JSON object with an `answer` key that answers the following question: {question}"
)
json_parser = SimpleJsonOutputParser()
json_chain = json_prompt | model | json_parser

In [13]:
list(json_chain.stream({"question": "Who invented the microscope?"}))

[{},
 {'answer': ''},
 {'answer': 'The'},
 {'answer': 'The microsc'},
 {'answer': 'The microscope'},
 {'answer': 'The microscope was'},
 {'answer': 'The microscope was invented'},
 {'answer': 'The microscope was invented by'},
 {'answer': 'The microscope was invented by Hans'},
 {'answer': 'The microscope was invented by Hans L'},
 {'answer': 'The microscope was invented by Hans Lipp'},
 {'answer': 'The microscope was invented by Hans Lippersh'},
 {'answer': 'The microscope was invented by Hans Lippershey'},
 {'answer': 'The microscope was invented by Hans Lippershey,'},
 {'answer': 'The microscope was invented by Hans Lippershey, Zach'},
 {'answer': 'The microscope was invented by Hans Lippershey, Zacharias'},
 {'answer': 'The microscope was invented by Hans Lippershey, Zacharias Jan'},
 {'answer': 'The microscope was invented by Hans Lippershey, Zacharias Janssen'},
 {'answer': 'The microscope was invented by Hans Lippershey, Zacharias Janssen,'},
 {'answer': 'The microscope was inve

In [19]:
from tqdm import tqdm

mem = []
for i in tqdm(range(500)):
    try:
        res = json_chain.invoke({"question": "Who invented the microscope?"})
    except Exception as e:
        print(i, e)
        mem.append({
            "i": i,
            "error": e,
            "res": res
        })

len(mem)

100%|██████████| 500/500 [06:01<00:00,  1.38it/s]


0

#### Output fixing parser

In [56]:
from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List

class Actor(BaseModel):
    name: str = Field(description="name of an actor")
    film_names: List[str] = Field(description="list of names of films they starred in")


parser = PydanticOutputParser(pydantic_object=Actor)

misformatted = "{'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}"

parser.parse(misformatted)

OutputParserException: Invalid json output: {'name': 'Tom Hanks', 'film_names': ['Forrest Gump']}

In [57]:
from langchain.output_parsers import OutputFixingParser

fix_parser = OutputFixingParser.from_llm(parser=parser, llm=llm)

fix_parser.parse(misformatted)

Actor(name='Tom Hanks', film_names=['Forrest Gump'])

In [58]:
# prompt that's used by default
from langchain.output_parsers.prompts import NAIVE_FIX_PROMPT

NAIVE_FIX_PROMPT.pretty_print()

Instructions:
--------------
[33;1m[1;3m{instructions}[0m
--------------
Completion:
--------------
[33;1m[1;3m{completion}[0m
--------------

Above, the Completion did not satisfy the constraints given in the Instructions.
Error:
--------------
[33;1m[1;3m{error}[0m
--------------

Please try again. Please only respond with an answer that satisfies the constraints laid out in the Instructions:


In [81]:
# custom prompt but same input_ariables as above

template = """
Instructions:
<instructions>
{instructions}
</instructions>

Completion:
<completion>
{completion}
<completion>

Above, the Completion did not satisfy the constraints given in the Instructions.
<error>
{error}
</error>

Please only respond with completion that satisfies the constraints laid out in the Instructions. Do not generate text beyond given completion.
"""

fix_prompt = PromptTemplate.from_template(template=template) 

fix_prompt.pretty_print()


Instructions:
<instructions>
[33;1m[1;3m{instructions}[0m
</instructions>

Completion:
<completion>
[33;1m[1;3m{completion}[0m
<completion>

Above, the Completion did not satisfy the constraints given in the Instructions.
<error>
[33;1m[1;3m{error}[0m
</error>

Please only respond with completion that satisfies the constraints laid out in the Instructions. Do not generate text beyond given completion.



In [82]:
fix_parser_prompt = OutputFixingParser.from_llm(
    parser=parser, llm=llm, prompt=fix_prompt
)

fix_parser_prompt.parse(misformatted)

Actor(name='Tom Hanks', film_names=['Forrest Gump'])

#### Retry parser

In [96]:
from langchain.prompts import PromptTemplate

template = """Based on the user question, provide an Action and Action Input for what step should be taken.
{format_instructions}
Question: {query}
Response: """

prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

prompt_value = prompt.format_prompt(query="who invented computer?")

bad_response = '{"action": "search"}'

In [97]:
from langchain.output_parsers import PydanticOutputParser
from langchain_core.pydantic_v1 import BaseModel, Field

class Action(BaseModel):
    action: str = Field(description="action to take")
    action_input: str = Field(description="input to the action")

parser = PydanticOutputParser(pydantic_object=Action)
parser.parse(bad_response)

OutputParserException: Failed to parse Action from completion {'action': 'search'}. Got: 1 validation error for Action
action_input
  field required (type=value_error.missing)

In [87]:
from langchain.output_parsers import OutputFixingParser

fix_parser = OutputFixingParser.from_llm(parser=parser, llm=llm, prompt=fix_prompt)
fix_parser.parse(bad_response)

Action(action='search', action_input='')

In [49]:
from langchain.output_parsers import RetryOutputParser

retry_parser = RetryOutputParser.from_llm(parser=parser, llm=llm)
retry_parser.parse_with_prompt(bad_response, prompt_value)

OutputParserException: Invalid json output: Here is the response formatted as a JSON instance that conforms to the provided schema:

{
  "action": "provide information",
  "action_input": "The computer was invented by multiple people over time, with key contributions from pioneers in computer science and engineering. Some of the major figures in the invention of the computer include:

- Charles Babbage - Designed the first mechanical computer, the Analytical Engine, in the 19th century.
- Alan Turing - Developed the theoretical foundations of computer science and the concept of the Turing machine in the 1930s.

In [89]:
from langchain.output_parsers import RetryWithErrorOutputParser

retry_with_error_parser = RetryWithErrorOutputParser(parser=parser, llm=llm)
retry_with_error_parser.parse_with_prompt(bad_response, prompt_value)

AttributeError: 'NoneType' object has no attribute 'run'

In [93]:
# prompt that's used by default
NAIVE_COMPLETION_RETRY = """Prompt:
{prompt}
Completion:
{completion}

Above, the Completion did not satisfy the constraints given in the Prompt.
Please try again:"""

NAIVE_COMPLETION_RETRY_WITH_ERROR = """Prompt:
{prompt}
Completion:
{completion}

Above, the Completion did not satisfy the constraints given in the Prompt.
Details: {error}
Please try again:"""

NAIVE_RETRY_PROMPT = PromptTemplate.from_template(NAIVE_COMPLETION_RETRY)
NAIVE_RETRY_WITH_ERROR_PROMPT = PromptTemplate.from_template(
    NAIVE_COMPLETION_RETRY_WITH_ERROR
)

# NAIVE_RETRY_PROMPT.pretty_print()
# NAIVE_RETRY_WITH_ERROR_PROMPT.pretty_print()

In [101]:
# custom prompt but same input_ariables as above

template = """
Prompt:
<prompt>
{prompt}
</prompt>

Completion:
<completion>
{completion}
</completion>

Please only respond with completion that satisfies the constraints given in the Prompt. Do not generate text beyond given completion.
"""

retry_prompt = PromptTemplate.from_template(template=template) 

# retry_prompt.pretty_print()

error_template = """
Prompt:
<prompt>
{prompt}
</prompt>

Completion:
<completion>
{completion}
</completion>

Details: {error}

Please only respond with completion that satisfies the constraints given in the Prompt. Do not generate text beyond given completion.
"""

retry_with_error_prompt = PromptTemplate.from_template(template=error_template) 

retry_with_error_prompt.pretty_print()


Prompt:
<prompt>
[33;1m[1;3m{prompt}[0m
</prompt>

Completion:
<completion>
[33;1m[1;3m{completion}[0m
</completion>

Details: [33;1m[1;3m{error}[0m

Please only respond with completion that satisfies the constraints given in the Prompt. Do not generate text beyond given completion.



In [98]:
retry_parser = RetryOutputParser.from_llm(
    parser=parser, llm=llm, prompt=retry_prompt
)
retry_parser.parse_with_prompt(bad_response, prompt_value)

Action(action='search', action_input='who invented computer')

In [102]:
# below didnt work - need to pass prompt_value with error I guess
from langchain.output_parsers import RetryWithErrorOutputParser

retry_with_error_parser = RetryWithErrorOutputParser(
    parser=parser, llm=llm, prompt=retry_with_error_prompt
)
retry_with_error_parser.parse_with_prompt(bad_response, prompt_value)

AttributeError: 'NoneType' object has no attribute 'run'

#### XML output parser & challenges

In [110]:
from langchain_core.prompts import ChatPromptTemplate

template = """Generate the shortened filmography for {actor}.
Please enclose the movies in <movie></movie> tags."""
# prompt = ChatPromptTemplate.from_messages(
#     ("human", template)
# )
prompt = PromptTemplate.from_template(
    template=template
)

prompt_value = prompt.format(actor="Tom Hanks")

print(llm.invoke(prompt_value).content)

Here is the shortened filmography for Tom Hanks:

<movie>Forrest Gump</movie>
<movie>Saving Private Ryan</movie>
<movie>Cast Away</movie>
<movie>Apollo 13</movie>
<movie>Toy Story</movie>
<movie>The Green Mile</movie>
<movie>Catch Me If You Can</movie>
<movie>Captain Phillips</movie>
<movie>Sully</movie>


In [134]:
from langchain.output_parsers import XMLOutputParser

parser = XMLOutputParser()

model = BedrockChat(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=boto3.client("bedrock-runtime"),
    model_kwargs={"temperature": 0.0, "max_tokens":512}
)

new_template = """Format Instructions:
<format_instructions>
{format_instructions}
</format_instructions>

Follow the Format Instructions and generate shortened the filmography for {actor}. Do not explain. Start with XML tags.
"""

# Follow the Format Instructions and generate the  filmography for {actor}. Do not explain. Start with XML tags.

# Enclose within ```xml ```.

# print(parser.get_format_instructions())
prompt = PromptTemplate(
    template=new_template,
    input_variables=["actor"],
    partial_variables={
        "format_instructions": parser.get_format_instructions(),
    }
)

chain = prompt | model | parser

print(chain.invoke({"actor": "Tom Hanks"}))

{'filmography': [{'film': [{'title': 'Forrest Gump'}, {'year': '1994'}]}, {'film': [{'title': 'Saving Private Ryan'}, {'year': '1998'}]}, {'film': [{'title': 'Cast Away'}, {'year': '2000'}]}, {'film': [{'title': 'The Green Mile'}, {'year': '1999'}]}, {'film': [{'title': 'Toy Story'}, {'year': '1995'}]}]}


In [142]:
parser_mod = XMLOutputParser(tags=["movies", "actor", "film", "name", "genre"])

prompt_mod = PromptTemplate(
    template=new_template,
    input_variables=["actor"],
    partial_variables={
        "format_instructions": parser_mod.get_format_instructions(),
    }
)

# chain = prompt_mod | model | parser_mod
chain = prompt_mod | model

print(chain.invoke({"actor": "Tom Hanks"}))

content='<movies>\n    <actor>\n        <name>Tom Hanks</name>\n        <film>\n            <name>Forrest Gump</name>\n            <genre>Drama, Comedy</genre>\n        </film>\n        <film>\n            <name>Saving Private Ryan</name>\n            <genre>War, Drama</genre>\n        </film>\n        <film>\n            <name>Cast Away</name>\n            <genre>Drama, Adventure</genre>\n        </film>\n        <film>\n            <name>Toy Story</name>\n            <genre>Animation, Comedy, Family</genre>\n        </film>\n    </actor>\n</movies>'


In [141]:
# for single key its possible 

from langchain_core.output_parsers import StrOutputParser

def _sanitize_output(text: str):
    _, after = text.split("<sql>")
    return after.split("</sql>")[0]


# chain = sql_prompt| model | StrOutputParser() | _sanitize_output

inputs = """<sql>
select *
from products;
</sql>"""

sql_query = _sanitize_output(inputs)
print(sql_query)


select *
from products;



In [144]:
# but what to do when we have so many
print('<movies>\n    <actor>\n        <name>Tom Hanks</name>\n        <film>\n            <name>Forrest Gump</name>\n            <genre>Drama, Comedy</genre>\n        </film>\n        <film>\n            <name>Saving Private Ryan</name>\n            <genre>War, Drama</genre>\n        </film>\n        <film>\n            <name>Cast Away</name>\n            <genre>Drama, Adventure</genre>\n        </film>\n        <film>\n            <name>Toy Story</name>\n            <genre>Animation, Comedy, Family</genre>\n        </film>\n    </actor>\n</movies>')

<movies>
    <actor>
        <name>Tom Hanks</name>
        <film>
            <name>Forrest Gump</name>
            <genre>Drama, Comedy</genre>
        </film>
        <film>
            <name>Saving Private Ryan</name>
            <genre>War, Drama</genre>
        </film>
        <film>
            <name>Cast Away</name>
            <genre>Drama, Adventure</genre>
        </film>
        <film>
            <name>Toy Story</name>
            <genre>Animation, Comedy, Family</genre>
        </film>
    </actor>
</movies>


#### YAML parser

In [147]:
from typing import List

from langchain.output_parsers import YamlOutputParser
from langchain.prompts import PromptTemplate
from langchain_core.pydantic_v1 import BaseModel, Field

class Joke(BaseModel):
    setup: str=Field(description="question to set up a joke")
    punchline: str=Field(description="answer to resolve the joke")

parser = YamlOutputParser(pydantic_object=Joke)

# print(parser.get_format_instructions())

In [149]:
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": parser.get_format_instructions()},
)

chain = prompt | model | parser

joke_query = "Tell me a joke."
chain.invoke({"query": joke_query})


Joke(setup="Why don't scientists trust atoms?", punchline='Because they make up everything!')

## Retrieval

<u>Flow</u>

Source -> Load -> Transform -> Embed -> Store -> Retrieve

<u>Concepts</u>
1. Document loaders: load Documents from diff sources. Document(page_content, metdata)
2. Text Splitting: chunking strategy. ex: optimized logic for code, markdown docs
3. Text Embedding models: embeddings capture the semantic meaning of the text
4. Vector Stores: store and search embeddings
5. Retrievers: retrieval algo (semantic search etc) + vectorstore
    - parent document retriever
    - self-query retriever
    - ensemble retriever
6. Indexing

##### Document Loaders: PDF

In [150]:
! pip install pypdf rapidocr-onnxruntime --quiet

[0m

In [151]:
%%time
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("https://arxiv.org/pdf/1706.03762.pdf")
pages = loader.load_and_split()

CPU times: user 1.54 s, sys: 62.7 ms, total: 1.6 s
Wall time: 1.68 s


In [152]:
len(pages)

16

In [157]:
print(pages[3].page_content)

Figure 1: The Transformer - model architecture.
The Transformer follows this overall architecture using stacked self-attention and point-wise, fully
connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,
respectively.
3.1 Encoder and Decoder Stacks
Encoder: The encoder is composed of a stack of N= 6 identical layers. Each layer has two
sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-
wise fully connected feed-forward network. We employ a residual connection [ 11] around each of
the two sub-layers, followed by layer normalization [ 1]. That is, the output of each sub-layer is
LayerNorm( x+ Sublayer( x)), where Sublayer( x)is the function implemented by the sub-layer
itself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding
layers, produce outputs of dimension dmodel = 512 .
Decoder: The decoder is also composed of a stack of N= 6identical layers.

##### Document loaders: PDF - extract images as text

In [161]:
%%time


loader = PyPDFLoader("https://arxiv.org/pdf/1706.03762.pdf", extract_images=True)
pages = loader.load_and_split()
# pages = loader.load()

len(pages)

CPU times: user 7.78 s, sys: 3.09 s, total: 10.9 s
Wall time: 10.9 s


16

In [162]:
pages

[Document(page_content='Provided proper attribution is provided, Google hereby grants permission to\nreproduce the tables and figures in this paper solely for use in journalistic or\nscholarly works.\nAttention Is All You Need\nAshish Vaswani∗\nGoogle Brain\navaswani@google.comNoam Shazeer∗\nGoogle Brain\nnoam@google.comNiki Parmar∗\nGoogle Research\nnikip@google.comJakob Uszkoreit∗\nGoogle Research\nusz@google.com\nLlion Jones∗\nGoogle Research\nllion@google.comAidan N. Gomez∗ †\nUniversity of Toronto\naidan@cs.toronto.eduŁukasz Kaiser∗\nGoogle Brain\nlukaszkaiser@google.com\nIllia Polosukhin∗ ‡\nillia.polosukhin@gmail.com\nAbstract\nThe dominant sequence transduction models are based on complex recurrent or\nconvolutional neural networks that include an encoder and a decoder. The best\nperforming models also connect the encoder and decoder through an attention\nmechanism. We propose a new simple network architecture, the Transformer,\nbased solely on attention mechanisms, dispensing 

In [164]:
print(pages[3].page_content)

Figure 1: The Transformer - model architecture.
The Transformer follows this overall architecture using stacked self-attention and point-wise, fully
connected layers for both the encoder and decoder, shown in the left and right halves of Figure 1,
respectively.
3.1 Encoder and Decoder Stacks
Encoder: The encoder is composed of a stack of N= 6 identical layers. Each layer has two
sub-layers. The first is a multi-head self-attention mechanism, and the second is a simple, position-
wise fully connected feed-forward network. We employ a residual connection [ 11] around each of
the two sub-layers, followed by layer normalization [ 1]. That is, the output of each sub-layer is
LayerNorm( x+ Sublayer( x)), where Sublayer( x)is the function implemented by the sub-layer
itself. To facilitate these residual connections, all sub-layers in the model, as well as the embedding
layers, produce outputs of dimension dmodel = 512 .
Decoder: The decoder is also composed of a stack of N= 6identical layers.

##### Document loaders: PDF - using Amazon Textract

In [None]:
from langchain_community.document_loaders import AmazonTextractPDFLoader
loader = AmazonTextractPDFLoader("example_data/alejandro_rosalez_sample-small.jpeg")
documents = loader.load()

##### Text Splitters

langchain-text-splitters - library from langchain

`Transform docs` - split a long document into smaller chunks that can fit into your model's context window.
(goal is not to chunk for chunking sake, our goal is to get our data in a format where it can be retrieved for value later).

Text Splitters `work` as following:
1. Split the text up into small, semantically meaningful chunks (often sentences)
2. Start combining these small chunks into a larger chunk until you reach a certain size (as measured by some function).
3. Once you reach that size, make that chunk its own piece of text and then start creating a new chunk of text with some overlap 
   (to keep context between chunks).

Customize your text splitter:
1. how the text is split
2. how the chunk size is measured


`Ref`: https://github.com/FullStackRetrieval-com/RetrievalTutorials/blob/main/tutorials/LevelsOfTextSplitting/5_Levels_Of_Text_Splitting.ipynb

`Levels` of text splitting:
1. Character splitting
2. Recursive Character text splitting
3. Document specific splitting
4. Semantic splitting
5. Agentic splitting
6. Alternative Representation chunking + indexing

How to `evaluate` text splitters:
Chunkvix utility: https://www.chunkviz.com


`Interesting` ones:
1. Text: 
    - CharacterTextSplitter
    - RecursiveCharacterTextSplitter
    - RecursiveJsonSplitter
2. Code: PythonCodeTextSplitter
3. PDFs with tables
4. Multi-modal (text + images)
5. Semantic Chunking
    - SemanticChunker
6. Hypothetical Questions: generate hypothetical questions about raw documents. 
   Helpful when you have sparse unstructured data, like chat messages.
7. Split by tokens: when you split your text into chunks it is therefore a good idea to count the number of tokens. Tokenizers for this. 
   When you count tokens in your text you should use the same tokenizer as used in the language model.
   - SentenceTransformersTokenTextSplitter
   - NLTKTextSplitter
   - Huggingface's Tokenizer. ex: GPT2TokenizerFast
      from transformers import GPT2TokenizerFast
      from langchain_text_splitters import CharacterTextSplitter
      tokenizer = GPT2TokenizerFast.from_pretrained("gpt2")
      text_splitter = CharacterTextSplitter.from_huggingface_tokenizer(
         tokenizer, chunk_size=100, chunk_overlap=0
      )

Graphs:
if data is rich with entities, relationships, and connections -> then a graph structure would benefir
Options:
- Diffbot
- InstaGraph

Webscraping tools:
1. Diffbot - https://python.langchain.com/docs/integrations/document_loaders/diffbot
2. 


##### Text Splitters: Semantic chunking

In [4]:
! pip install langchain_experimental -qU

[0m

In [5]:
import json, os

with open("/home/ubuntu/config.json") as f:
    config = json.loads(f.read())
os.environ["COHERE_API_KEY"] = config["cohere_api_key"]

with open("data/state_of_the_union.txt") as f:
    state_of_the_union = f.read()
print(state_of_the_union)

Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans.  

Last year COVID-19 kept us apart. This year we are finally together again. 

Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. 

With a duty to one another to the American people to the Constitution. 

And with an unwavering resolve that freedom will always triumph over tyranny. 

Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. 

He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. 

He met the Ukrainian people. 

From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. 

Groups of citizens blocking tanks with their bodies. Every

In [6]:
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import CohereEmbeddings

text_splitter = SemanticChunker(CohereEmbeddings())
docs = text_splitter.create_documents([state_of_the_union])

print(f"splits: {len(docs)}")
print(f"first doc content: {docs[0].page_content}")

splits: 26
first doc content: Madam Speaker, Madam Vice President, our First Lady and Second Gentleman. Members of Congress and the Cabinet. Justices of the Supreme Court. My fellow Americans. Last year COVID-19 kept us apart. This year we are finally together again. Tonight, we meet as Democrats Republicans and Independents. But most importantly as Americans. With a duty to one another to the American people to the Constitution. And with an unwavering resolve that freedom will always triumph over tyranny. Six days ago, Russia’s Vladimir Putin sought to shake the foundations of the free world thinking he could make it bend to his menacing ways. But he badly miscalculated. He thought he could roll into Ukraine and the world would roll over. Instead he met a wall of strength he never imagined. He met the Ukrainian people. From President Zelenskyy to every Ukrainian, their fearlessness, their courage, their determination, inspires the world. Groups of citizens blocking tanks with their bo

In [8]:
text_splitter = SemanticChunker(
    CohereEmbeddings(), breakpoint_threshold_type="percentile"
)

TypeError: SemanticChunker.__init__() got an unexpected keyword argument 'breakpoint_threshold_type'

##### Text Embedding Models: caching

In [None]:
CacheBackedEmbeddings
- back by Vector Store
- back by ByteStore

In [None]:
# vectorstore
from langchain.storage import LocalFileStore
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

underlying_embeddings = OpenAIEmbeddings()

store = LocalFileStore("./cache/")

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings, store, namespace=underlying_embeddings.model
)

list(store.yield_keys())

# bytestore
from langchain.embeddings import CacheBackedEmbeddings
from langchain.storage import InMemoryByteStore

store = InMemoryByteStore()

cached_embedder = CacheBackedEmbeddings.from_bytes_store(
    underlying_embeddings, store, namespace=underlying_embeddings.model
)

##### Vectorstores

<img src="https://python.langchain.com/assets/images/vector_stores-125d1675d58cfb46ce9054c9019fea72.jpg" alt="drawing" width="1000"/>

##### Retrievers

[What]
- Interface that returns documents given an unstructured query.
- Different that a vectorstore - in a way that - doesn't need to store docuemnts - it only needs to focus on retrieval logic.

[Types]
- Vectorstore
- Parent Document
- Multi Vector
- Self Query
- Contextual Compression
- Time Weighted Vectorstore
- Multi Query Retriever
- Ensemble
- Long Context Reorder 

[MultiQueryRetriever]
- Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on "distance". 
- But, retrieval may produce different results with subtle changes in query wording or if the embeddings do not capture the semantics of the data well. 
- Prompt engg/tuning is sometimes done to manually address these problems, but can be tedious.
- MultiQueryRetriever automates the process of prompt tuning by using an llm to generate multiple queries from different perspectives for a given user input query. For each query, it retrieves a set of relevant docuemnts and takes the unique union across all queries to get a larger set of potentially relevant documents.
- Essentially generating multiple perspectives on the same question -> Done to overcome some of the limiations of the distance-based retrieval and get a richer set of results.

[EnsembleRetriever]
- takes a list of retrievers as input and 
- ensemble the results of their get_relevant_documents()
- rerank the results based on Reciprocal Rank Fusion algorithm
- Places to use:
    - leverage strengths of diff algorithms like sparse (keyword-search) + dense retriever(semantic similarity)
    - mulltiple sources of data

[LongContextReorder]
- No matter the architecture of your model, there is a substantial performance degradation when you include 10+ retrieved documents. 
- From paper(https://arxiv.org/abs/2307.03172) when models must access relevant information in the middle of long contexts, they tend to ignore the provided documents.
- To avoid the issue, you can reorder documents after retrieval to avoid performance degradation.

[MultiVectorRetriever]
- useful to store multiple vectors per document
- methods to create multiple vectors per document include:
    - smaller chunks: split a doc into smaller chunks, and embed those (ParentDocumentRetriever).
    - summary: create summary for each document, embed that along with (or instead of) the document.
    - hypothetical questions: create hypothetical questions that each document would be appropriate to answer, embed those along with (or instead of) the document.

[SelfQueryRetriever]
- ability to query itself
- user-input query for semantic similarity comparison with the contents of stored documents but also to extract filters from the user query on the metadata of stored documents and to execure those filters.
- retriever uses a query-constructing llm chain to write a structured query and then apply that structured query to its underlying vectorstore.
- Flow: user_question -> query_constructor_llm_call -> query_translator -> vectorstore -> retrieved_docs


In [8]:
# Custom retriever

from langchain_core.retrievers import BaseRetriever
from langchain_core.callbacks import CallbackManagerForRetrieverRun
from langchain_core.documents import Document
from typing import List

class CustomRetriever(BaseRetriever):
    def _get_relevant_documents(
        self, 
        query: str, 
        *,
        run_manager: CallbackManagerForRetrieverRun,
    ) -> List[Document]:
        
        # your logic, below is just some dummy code
        return [
            Document(page_content=query)
        ]

retriever = CustomRetriever()

retriever.get_relevant_documents("research")

[Document(page_content='research')]

##### Multi Query Retriever

In [19]:
! pip install chromadb --quiet

[0m

In [11]:
import json, os

with open("/home/ubuntu/config.json") as f:
    config = json.loads(f.read())
os.environ["COHERE_API_KEY"] = config["cohere_api_key"]

In [14]:
from langchain_community.document_loaders import WebBaseLoader

data = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/").load()
data

[Document(page_content='\n\n\n\n\n\nLLM Powered Autonomous Agents | Lil\'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nLil\'Log\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\nPosts\n\n\n\n\nArchive\n\n\n\n\nSearch\n\n\n\n\nTags\n\n\n\n\nFAQ\n\n\n\n\nemojisearch.app\n\n\n\n\n\n\n\n\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\n \n\n\nTable of Contents\n\n\n\nAgent System Overview\n\nComponent One: Planning\n\nTask Decomposition\n\nSelf-Reflection\n\n\nComponent Two: Memory\n\nTypes of Memory\n\nMaximum Inner Product Search (MIPS)\n\n\nComponent Three: Tool Use\n\nCase Studies\n\nScientific Discovery Agent\n\nGenerative Agents Simulation\n\nProof-of-Concept Examples\n\n\nChallenges\n\nCitation\n\nReferences\n\n\n\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer an

In [15]:
# chunking based on length
from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500, chunk_overlap=0
)
docs = text_splitter.split_documents(data)


print(f"splits: {len(docs)}")
print(f"first doc content: {docs[0].page_content}")

splits: 130
first doc content: LLM Powered Autonomous Agents | Lil'Log







































Lil'Log






















Posts




Archive




Search




Tags




FAQ




emojisearch.app









      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


 


Table of Contents



Agent System Overview

Component One: Planning

Task Decomposition

Self-Reflection


Component Two: Memory

Types of Memory

Maximum Inner Product Search (MIPS)


In [16]:
# with semantic chunking

from langchain_community.document_loaders import WebBaseLoader
from langchain_experimental.text_splitter import SemanticChunker
from langchain_community.embeddings import CohereEmbeddings

text_splitter = SemanticChunker(CohereEmbeddings())
docs = text_splitter.split_documents(data)

print(f"splits: {len(docs)}")
print(f"first doc content: {docs[0].page_content}")


splits: 22
first doc content: 





LLM Powered Autonomous Agents | Lil'Log







































Lil'Log






















Posts




Archive




Search




Tags




FAQ




emojisearch.app









      LLM Powered Autonomous Agents
    
Date: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng


 


Table of Contents



Agent System Overview

Component One: Planning

Task Decomposition

Self-Reflection


Component Two: Memory

Types of Memory

Maximum Inner Product Search (MIPS)


Component Three: Tool Use

Case Studies

Scientific Discovery Agent

Generative Agents Simulation

Proof-of-Concept Examples


Challenges

Citation

References





Building agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can b

In [20]:
from langchain_community.vectorstores import Chroma

vectordb = Chroma.from_documents(
    documents=docs,
    embedding=CohereEmbeddings()
)

In [26]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_community.chat_models import BedrockChat
import boto3

llm = BedrockChat(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=boto3.client("bedrock-runtime"),
    model_kwargs={"temperature": 0.0, "max_tokens":512}
)

retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectordb.as_retriever(), llm=llm
)

In [27]:
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [32]:
%%time

question = "what are the approaches to Task Decomposition?"
response = retriever_from_llm.get_relevant_documents(query=question)

len(response)

response

INFO:langchain.retrievers.multi_query:Generated queries: ['Here are three different versions of the original question to help retrieve relevant documents from a vector database:', '', 'What are the different techniques or methods used for Task Decomposition?', '', 'Approaches to breaking down complex tasks into smaller, more manageable subtasks.', '', 'Strategies and frameworks for decomposing tasks into hierarchical or modular components.']


CPU times: user 153 ms, sys: 9.34 ms, total: 162 ms
Wall time: 2.14 s


[Document(page_content='8. Categorization of human memory. We can roughly consider the following mappings:\n\nSensory memory as learning embedding representations for raw inputs, including text, image or other modalities;\nShort-term memory as in-context learning. It is short and finite, as it is restricted by the finite context window length of Transformer. Long-term memory as the external vector store that the agent can attend to at query time, accessible via fast retrieval. Maximum Inner Product Search (MIPS)#\nThe external memory can alleviate the restriction of finite attention span. A standard practice is to save the embedding representation of information into a vector store database that can support fast maximum inner-product search (MIPS). To optimize the retrieval speed, the common choice is the approximate nearest neighbors (ANN)\u200b algorithm to return approximately top k nearest neighbors to trade off a little accuracy lost for a huge speedup. A couple common choices of 

In [62]:
# custom prompt 

from langchain_core.prompts import ChatPromptTemplate
from typing import List
from langchain.schema.messages import AIMessage

prompt_template = """You are an AI language model assistant. Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector database.
By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search.
Provide these alternative questions within <question> XML tags. Entire output should be within <answer> XML tags
Original question: {question}"""

prompt = ChatPromptTemplate.from_template(template=prompt_template)
llm = BedrockChat(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=boto3.client("bedrock-runtime"),
    model_kwargs={"temperature": 0.0, "max_tokens":512}
)



In [63]:
%%time

# approach 1

question = "What are the approaches to Task Decomposition?"

output_parser = XMLOutputParser(tags=["answer"])
query_chain = prompt | llm | output_parser

response = query_chain.invoke({"question": question})

qns = [x["question"] for x in response["answer"]]
qns

CPU times: user 26.1 ms, sys: 45 µs, total: 26.1 ms
Wall time: 1.96 s


['What are the different techniques for breaking down a complex task into smaller, more manageable subtasks?',
 'What are the common methods used to decompose a high-level task into a series of lower-level steps or actions?',
 'How can a complex problem be divided into smaller, more easily solvable components or subproblems?',
 'What are the various strategies or frameworks for partitioning a larger task into a set of interdependent subtasks?',
 'What are the approaches or principles that can be applied to break down a complex task into a hierarchical structure of subtasks?']

In [64]:
%%time

# approach 2

question = "What are the approaches to Task Decomposition?"

def parse(ai_message: AIMessage) -> List[str]:
    text = ai_message.content
    output_parser = XMLOutputParser(tags=["answer"])
    response = output_parser.parse(text)
    qns = [x["question"] for x in response["answer"]]
    return qns


query_chain = prompt | llm | parse

response = query_chain.invoke({"question": question})
response

CPU times: user 27.4 ms, sys: 4.14 ms, total: 31.5 ms
Wall time: 1.86 s


['What are the different techniques for breaking down complex tasks into smaller, more manageable subtasks?',
 'What are the common methods used to decompose complex problems into simpler, more easily solvable components?',
 'How can a complex task be divided into a series of smaller, more specific sub-tasks to facilitate efficient problem-solving?',
 'What are the various strategies employed to break down a larger objective into a set of more granular, actionable steps?',
 'What are the established approaches for partitioning a complex problem into a hierarchy of more manageable sub-problems?']

In [65]:
retriever = MultiQueryRetriever(
    retriever=vectordb.as_retriever(),
    llm_chain=query_chain,
)

question = "What are the approaches to Task Decomposition?"
response = retriever.get_relevant_documents(query=question)
len(response)

ValidationError: 5 validation errors for MultiQueryRetriever
llm_chain -> prompt
  field required (type=value_error.missing)
llm_chain -> llm
  field required (type=value_error.missing)
llm_chain -> first
  extra fields not permitted (type=value_error.extra)
llm_chain -> last
  extra fields not permitted (type=value_error.extra)
llm_chain -> middle
  extra fields not permitted (type=value_error.extra)

In [67]:
# llm_chain should be of LLMChain type

MultiQueryRetriever?

[0;31mInit signature:[0m
[0mMultiQueryRetriever[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0;34m*[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mname[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mstr[0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mtags[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mList[0m[0;34m[[0m[0mstr[0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mmetadata[0m[0;34m:[0m [0mOptional[0m[0;34m[[0m[0mDict[0m[0;34m[[0m[0mstr[0m[0;34m,[0m [0mAny[0m[0;34m][0m[0;34m][0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mretriever[0m[0;34m:[0m [0mlangchain_core[0m[0;34m.[0m[0mretrievers[0m[0;34m.[0m[0mBaseRetriever[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mllm_chain[0m[0;34m:[0m [0mlangchain[0m[0;34m.[0m[0mchains[0m[0;34m.[0m[0mllm[0m[0;34m.[0m[0mLLMChain[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mver

##### Contextual Compression

- Idea is simple: instead of immediately returning retrieved documents as-is, compress them using context of the given query, so that only the relevant information is returned.
- Compressing means both compressing the contents of an individual document and filtering out documents wholesale.
- Need 2 things:
    - a base retriever
    - a document Compressor
- Flow:
    - `input_query -> Contextual Compression Retriever (pass query to base retriever) -> inital docs -> Document Compressor -> shortened docs (final ans)`

In [None]:
from langchain.retrievers import ContextualCompressionRetriever
from langchain.retrievers.document_compressors import LLMChainExtractor

# LLMChainExtractor, which will iterate over the initially returned documents and extract from each only the content that is relevant to the query.
compressor = LLMChainExtractor.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=compressor,
    base_retriever=retriever
)
compressed_docs = compression_retriever.get_relevant_documents(
    "What did the president say about Ketanji Jackson Brown"
)

In [None]:
# The LLMChainFilter is slightly simpler but more robust compressor that uses an LLM chain to decide which of the initially retrieved documents to filter out and which ones to return, without manipulating the document contents.
from langchain.retrievers.document_compressors import LLMChainFilter

_filter = LLMChainFilter.from_llm(llm)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=_filter, base_retriever=retriever
)

In [None]:
# EmbeddingsFilter
# Making an extra LLM call over each retrieved document is expensive and slow. The EmbeddingsFilter provides a cheaper and faster option by embedding the documents and query and only returning those documents which have sufficiently similar embeddings to the query.

from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain_community.embeddings import CohereEmbeddings

embeddings_filter = EmbeddingsFilter(embeddings=CohereEmbeddings(), similarity_threshold=0.76)
compression_retriever = ContextualCompressionRetriever(
    base_compressor=embeddings_filter, base_retriever=retriever
)

In [None]:
# stringing compressors and document transformers together

# use DocumentCompressorPipeline - use this to combine multiple compressors in sequence.
# Along with compressors we can add BaseDocumentTransformers to our pipeline, 
# which don’t perform any contextual compression but simply perform some transformation 
# on a set of documents. 
# For example TextSplitters can be used as document transformers to split documents 
# into smaller pieces, and the EmbeddingsRedundantFilter can be used to 
# filter out redundant documents based on embedding similarity between documents.


from langchain.retrievers.document_compressors import DocumentCompressorPipeline
from langchain_community.document_transformers import EmbeddingsRedundantFilter
from langchain_text_splitters import CharacterTextSplitter

splitter = CharacterTextSplitter(chunk_size=300, chunk_overlap=0, separator=". ")
redundant_filter = EmbeddingsRedundantFilter(embeddings=embeddings)
relevant_filter = EmbeddingsFilter(embeddings=embeddings, similarity_threshold=0.76)
pipeline_compressor = DocumentCompressorPipeline(
    transformers=[splitter, redundant_filter, relevant_filter]
)

compression_retriever = ContextualCompressionRetriever(
    base_compressor=pipeline_compressor, base_retriever=retriever
)

compressed_docs = compression_retriever.get_relevant_documents(
    "What did the president say about Ketanji Jackson Brown"
)

##### LongContext Reordering

In [None]:
from langchain_community.document_transformers import LongContextReorder

reordering = LongContextReorder()
docs = retriever.get_relevant_documents(query)
reordered_docs = reordering.transform_documents(docs)
reordered_docs

## Agents

[Basics]
- The core idea of agents is to use a language model to choose a sequence of actions to take. In chains, a sequence of actions is hardcoded (in code). 
- In agents, a language model is used as a reasoning engine to determine which actions to take and in which order.
- agents take a self-determined, input-dependent sequence of steps before returning a user-facing output


[AgentExecutor]
- Runtime for an agent.
```python
    next_action = agent.get_action(...)
    while next_action != AgentFinish:
        observation = run(next_action)
        next_action = agent.get_action(..., next_action, observation)
    return next_action
```
- Several complexities this runtime can handle:
    - Handling cases where the agent selects a non-existent tool
    - Handling cases where the tool errors
    - Handling cases where the agent produces output that cannot be parsed into a tool invocation
    - Logging and observability at all levels (agent decisions, tool calls) to stdout and/or to LangSmith.

[AgentTypes]

Dimensions to consider:
1. Intended model type - LLMs or ChatModels
2. Supports chat history ?
3. Supports multi-input tools ?
4. Supports parallel function calling
    - `having an llm call multiple tools at the same time can greatly speed up agents whether there are tasks that are assisted by doing so. `
5. Required model params (OpenAI Function calling)

Types:
1. OpenAI Tools
2. OpenAI Functions (deprecated. openai favors tools than functions)
3. XML (read the docs there are few caveats)
4. Structured Chat
5. JSON Chat
6. ReAct
7. Self Ask with Search


[XML]
- use with regular LLMs, not with chat models.
- use only with unstructured tools. i.e., tools that accept a single string input.

[SelfAsk]
- For this agent, only one tool can be used and it needs to be named “Intermediate Answer”

[Streaming]
- final_answer and/or intermediate_steps

[LATER]
- Add in the prompt: `Only use a tool if needed, otherwise respond with Final Answer`
- Add in the prompt: `Now, complete the below sequence all the way to the "Final Answer:" step:`
- add in the AgentExecutor: 
```python
    agent_executor = AgentExecutor(
        agent=agent,
        tools=tools,
        verbose=True,
        handle_parsing_errors="Check your output and make sure it conforms, use the Action/Action Input syntax",
    )
```


In [10]:
from langchain import hub

# openai tools
# prompt = hub.pull("hwchase17/openai-tools-agent")

# xml
# prompt = hub.pull("hwchase17/xml-agent-convo")

# structured chat # TODO: somthing to try next (Final Answer problem could be solved with this)
# prompt = hub.pull("hwchase17/structured-chat-agent")

# json chat
# prompt = hub.pull("hwchase17/react-chat-json")

# react
# prompt = hub.pull("hwchase17/react")

# selfask
prompt = hub.pull("hwchase17/self-ask-with-search")

prompt.pretty_print()

Question: Who lived longer, Muhammad Ali or Alan Turing?
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali

Question: When was the founder of craigslist born?
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952

Question: Who was the maternal grandfather of George Washington?
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washin

## Memory

- past messages/interactions

- information introduced earlier in conversation

- world model (entities and their relationships) that is constantly updated

- `ChatMessageHistory` - production ready

- A very simple memory might just retirn the most recent messages each run.

- A slightly more complex memory system might return a succint summary of the past K messages. 

- An even more sophisticated system might extract entities from stored messages and only return information about entities referenced in the current run.

[Types]
- `ConversationBufferMemory`. This memory allows for storing messages and then extracts the messages in a variable.

- `ConversationBufferWindowMemory` keeps a list of the interactions of the conversation over time. It only uses the last K interactions. This can be useful for keeping a sliding window of the most recent interactions, so the buffer does not get too large. 

- `ConversationEntityMemory`: Entity memory remembers given facts about specific entities in a conversation. It extracts information on entities (using an LLM) and builds up its knowledge about that entity over time (also using an LLM).

- `ConversationKGMemory`: uses a knowledge graph to recreate memory.

- `ConversationSummaryMemory`: creates a summary of the conversation over time.

- `ConversationSummaryBufferMemory` combines the two ideas. It keeps a buffer of recent interactions in memory, but rather than just completely flushing old interactions it compiles them into a summary and uses both. It uses token length rather than number of interactions to determine when to flush interactions.

- `ConversationTokenBufferMemory` keeps a buffer of recent interactions in memory, and uses token length rather than number of interactions to determine when to flush interactions.

- `VectorStoreRetrieverMemory` stores memories in a vector store and queries the top-K most "salient" docs every time it is called. This differs from most of the other Memory classes in that it doesn't explicitly track the order of interactions. In this case, the "docs" are previous conversation snippets. This can be useful to refer to relevant pieces of information that the AI was told earlier in the conversation.

In [2]:
from langchain.memory.prompt import ENTITY_MEMORY_CONVERSATION_TEMPLATE

ENTITY_MEMORY_CONVERSATION_TEMPLATE.pretty_print()

You are an assistant to a human, powered by a large language model trained by OpenAI.

You are designed to be able to assist with a wide range of tasks, from answering simple questions to providing in-depth explanations and discussions on a wide range of topics. As a language model, you are able to generate human-like text based on the input you receive, allowing you to engage in natural-sounding conversations and provide responses that are coherent and relevant to the topic at hand.

You are constantly learning and improving, and your capabilities are constantly evolving. You are able to process and understand large amounts of text, and can use this knowledge to provide accurate and informative responses to a wide range of questions. You have access to some personalized information provided by the human in the Context section below. Additionally, you are able to generate your own text based on the input you receive, allowing you to engage in discussions and provide explanations and de

## Callbacks

- `CallbackHandlers` are objects that implement the `CallbackHandler` interface, which has a method for each event that can be subscribed to. 

- The `CallbackManager` will call the appropriate method on each handler when the event is triggered.

[Types]
- `StdOutCallbackHandler`: logs all events to stdout


In [None]:
class BaseCallbackHandler:
    """Base callback handler that can be used to handle callbacks from langchain."""

    def on_llm_start(
        self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any
    ) -> Any:
        """Run when LLM starts running."""

    def on_chat_model_start(
        self, serialized: Dict[str, Any], messages: List[List[BaseMessage]], **kwargs: Any
    ) -> Any:
        """Run when Chat Model starts running."""

    def on_llm_new_token(self, token: str, **kwargs: Any) -> Any:
        """Run on new LLM token. Only available when streaming is enabled."""

    def on_llm_end(self, response: LLMResult, **kwargs: Any) -> Any:
        """Run when LLM ends running."""

    def on_llm_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """Run when LLM errors."""

    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs: Any
    ) -> Any:
        """Run when chain starts running."""

    def on_chain_end(self, outputs: Dict[str, Any], **kwargs: Any) -> Any:
        """Run when chain ends running."""

    def on_chain_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """Run when chain errors."""

    def on_tool_start(
        self, serialized: Dict[str, Any], input_str: str, **kwargs: Any
    ) -> Any:
        """Run when tool starts running."""

    def on_tool_end(self, output: Any, **kwargs: Any) -> Any:
        """Run when tool ends running."""

    def on_tool_error(
        self, error: Union[Exception, KeyboardInterrupt], **kwargs: Any
    ) -> Any:
        """Run when tool errors."""

    def on_text(self, text: str, **kwargs: Any) -> Any:
        """Run on arbitrary text."""

    def on_agent_action(self, action: AgentAction, **kwargs: Any) -> Any:
        """Run on agent action."""

    def on_agent_finish(self, finish: AgentFinish, **kwargs: Any) -> Any:
        """Run on agent end."""

In [3]:
from langchain_core.callbacks import StdOutCallbackHandler
from langchain.chains import LLMChain
import boto3
from langchain_community.chat_models import BedrockChat
from langchain_core.prompts import PromptTemplate

handler = StdOutCallbackHandler()

llm = BedrockChat(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=boto3.client("bedrock-runtime"),
    model_kwargs={"temperature": 0.0, "max_tokens":128}
)
prompt = PromptTemplate.from_template("1 + {number} = ")

In [4]:
# Constructor callback: First, let's explicitly set the StdOutCallbackHandler when initializing our chain
chain = LLMChain(llm=llm, prompt=prompt, callbacks=[handler])
chain.invoke({"number":2})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m1 + 2 = [0m

[1m> Finished chain.[0m


{'number': 2, 'text': '3'}

In [5]:
# Use verbose flag: Then, let's use the `verbose` flag to achieve the same result
chain = LLMChain(llm=llm, prompt=prompt, verbose=True)
chain.invoke({"number":2})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m1 + 2 = [0m

[1m> Finished chain.[0m


{'number': 2, 'text': '3'}

In [6]:
# Request callbacks: Finally, let's use the request `callbacks` to achieve the same result
chain = LLMChain(llm=llm, prompt=prompt)
chain.invoke({"number":2}, {"callbacks":[handler]})



[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m1 + 2 = [0m

[1m> Finished chain.[0m


{'number': 2, 'text': '3'}

In [7]:
handler = StdOutCallbackHandler()

llm = BedrockChat(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=boto3.client("bedrock-runtime"),
    model_kwargs={"temperature": 0.0, "max_tokens":128}
)

prompt = PromptTemplate.from_template("1 + {number} = ")

config = {
    'callbacks' : [handler]
}

chain = prompt | chain
chain.invoke({"number":2}, config=config)



[1m> Entering new RunnableSequence chain...[0m


[1m> Entering new PromptTemplate chain...[0m

[1m> Finished chain.[0m


[1m> Entering new LLMChain chain...[0m
Prompt after formatting:
[32;1m[1;3m1 + text='1 + 2 = ' = [0m

[1m> Finished chain.[0m

[1m> Finished chain.[0m


{'number': StringPromptValue(text='1 + 2 = '),
 'text': 'The expression "1 + text=\'1 + 2 = \'" is not a valid mathematical expression. It appears to be a combination of a numeric value (1) and a string assignment (text=\'1 + 2 = \').\n\nTo perform a calculation and assign the result to a variable, you would typically write something like this:\n\ntext = \'1 + 2 = \'\nresult = 1 + 2\nprint(text + str(result))\n\nThis would output:\n\n1 + 2 = 3\n\nThe key steps are:\n\n1.'}

In [9]:
# custom callback handlers
from langchain_core.callbacks import BaseCallbackHandler
from langchain_core.messages import HumanMessage


class MyCustomHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"My custom handler, token: {token}")


# To enable streaming, we pass in `streaming=True` to the ChatModel constructor
# Additionally, we pass in a list with our custom handler
# chat = ChatOpenAI(max_tokens=25, streaming=True, callbacks=[MyCustomHandler()])
chat = BedrockChat(
    model_id="anthropic.claude-3-haiku-20240307-v1:0",
    client=boto3.client("bedrock-runtime"),
    model_kwargs={"temperature": 0.0, "max_tokens":128},
    streaming=True,
    callbacks=[MyCustomHandler()]
)
chat.invoke([HumanMessage(content="Tell me a joke")])

My custom handler, token: Here
My custom handler, token: 's
My custom handler, token:  a
My custom handler, token:  silly
My custom handler, token:  joke
My custom handler, token:  for
My custom handler, token:  you
My custom handler, token: :
My custom handler, token: 

Why
My custom handler, token:  can
My custom handler, token: 't
My custom handler, token:  a
My custom handler, token:  bicycle
My custom handler, token:  stand
My custom handler, token:  up
My custom handler, token:  on
My custom handler, token:  its
My custom handler, token:  own
My custom handler, token: ?
My custom handler, token:  Because
My custom handler, token:  it
My custom handler, token: 's
My custom handler, token:  two
My custom handler, token: -
My custom handler, token: tired
My custom handler, token: !
My custom handler, token: 

How
My custom handler, token:  was
My custom handler, token:  that
My custom handler, token: ?
My custom handler, token:  I
My custom handler, token:  tried
My custom handler, 

AIMessage(content="Here's a silly joke for you:\n\nWhy can't a bicycle stand up on its own? Because it's two-tired!\n\nHow was that? I tried to come up with a simple, lighthearted pun-based joke. Let me know if you'd like to hear another one.")