### For counting tokens

In [10]:
%pip install anthropic

Collecting anthropic
  Downloading anthropic-0.5.0-py3-none-any.whl (801 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m801.8/801.8 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m00:01[0m
Collecting distro<2,>=1.7.0 (from anthropic)
  Downloading distro-1.8.0-py3-none-any.whl (20 kB)
Collecting httpx<1,>=0.23.0 (from anthropic)
  Downloading httpx-0.25.1-py3-none-any.whl (75 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.0/75.0 kB[0m [31m1.5 MB/s[0m eta [36m0:00:00[0mta [36m0:00:01[0m
Collecting tokenizers>=0.13.0 (from anthropic)
  Downloading tokenizers-0.14.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (3.8 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m26.6 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
Collecting httpcore (from httpx<1,>=0.23.0->anthropic)
  Downloading httpcore-1.0.1-py3-none-any.whl (76 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.

### Imports

In [3]:
import json
import os
import sys
from langchain.llms import Bedrock
import boto3
from langchain.agents import XMLAgent, tool, AgentExecutor


module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww


bedrock_runtime = bedrock.get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region=os.environ.get("AWS_DEFAULT_REGION", None)
)

model = Bedrock(
    model_id="anthropic.claude-v2", 
    client=bedrock_runtime,
    model_kwargs={'temperature': 0.3}
    )

Create new client
  Using region: us-east-1
boto3 Bedrock client successfully created!
bedrock-runtime(https://bedrock-runtime.us-east-1.amazonaws.com)


### Load shareholder letter

In [5]:
shareholder_letter = "./letters/2022-letter.txt"

with open(shareholder_letter, "r") as file:
    letter = file.read()

In [55]:
len(letter.split(' '))

5084

In [56]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    separators=["\n\n", "\n"], chunk_size=4000, chunk_overlap=100
)

docs = text_splitter.create_documents([letter])

In [57]:
num_docs = len(docs)

num_tokens_first_doc = model.get_num_tokens(docs[0].page_content)

print(
    f"Now we have {num_docs} documents and the first one has {num_tokens_first_doc} tokens"
)

Now we have 10 documents and the first one has 435 tokens


In [58]:
from langchain.prompts import PromptTemplate
from langchain.output_parsers import XMLOutputParser, PydanticOutputParser
from langchain.output_parsers.json import SimpleJsonOutputParser
from langchain.schema.output_parser import StrOutputParser


xml_parser = XMLOutputParser(tags=['insight'])
str_parser = StrOutputParser()

prompt = PromptTemplate(
    template="""
    
    Human:
    {instructions} : \"{document}\"
    Format help: {format_instructions}.
    Assistant:""",
    input_variables=["instructions","document"],
    partial_variables={"format_instructions": xml_parser.get_format_instructions()},
)

insight_chain = prompt | model | StrOutputParser()

In [59]:
len(docs)

10

# Option 1. Manually process insights, then summarize

In [41]:
%%time
insights=[]
for i in range(len(docs)):
    insights.append(
        insight_chain.invoke({
        "instructions":"Provide Key insights from the following text",
        "document": {docs[i].page_content}
    }))

CPU times: user 71 ms, sys: 9 µs, total: 71 ms
Wall time: 1min 45s


In [47]:
str_parser = StrOutputParser()

prompt = PromptTemplate(
    template="""
    
    Human:
    {instructions} : \"{document}\"
    Assistant:""",
    input_variables=["instructions","document"]
)

summary_chain = prompt | model | StrOutputParser()

In [52]:
%%time
print(summary_chain.invoke({
        "instructions":"You will be provided with multiple sets of insights. Compile and summarize these insights and provide key takeaways in one concise paragraph. Do not use the original xml tags. Just provide a paragraph with your compiled insights.",
        "document": {'\n'.join(insights)}
    }))

 Here are the key insights compiled from the text formatted as XML:

Amazon continuously evolves and adapts its strategy based on changing macroeconomic conditions, emerging technologies, and new market opportunities. Despite challenges, Amazon maintains investments in long-term priorities like AWS, advertising, international expansion, grocery, healthcare, and satellite broadband access. Amazon aims to build customer trust through relevant advertising, cost optimizations, and new capabilities like generative AI. Amazon believes it is still early in its potential growth, with significant room to expand core businesses like retail and AWS as more commerce and computing shifts online. Amazon's culture embraces invention, customer obsession, and long-term thinking, which gives confidence that the company's best days lie ahead. Key future growth drivers include advertising, grocery, healthcare, international markets, new retail initiatives like Buy with Prime, and transformational technolo

Map reduce

# Option 2. Use Map reduce pattern on Langchain

In [33]:
from langchain.chains.summarize import load_summarize_chain
summary_chain = load_summarize_chain(llm=model, chain_type="map_reduce", verbose=False)

In [38]:
%%time
print(summary_chain.run(docs))

 Here is a concise summary of the key points:

Amazon CEOs remain confident in long-term growth despite current economic challenges. They continue investing in emerging opportunities like cloud computing, advertising, healthcare, and satellite internet that leverage Amazon's strengths. Though optimizing for efficiency and managing costs in the near-term, Amazon's focus stays on customer-centric innovation and expanding into large addressable markets. Amazon has successfully navigated major transitions before, evolving from just books to diverse retail and web services. The company culture values invention, taking risks on new ideas that could unlock big markets. Experience shows patience pays off as fledgling businesses like AWS become highly successful. Amazon will keep adapting its strategy while maintaining its customer obsession and long-term orientation.
CPU times: user 62.8 ms, sys: 631 µs, total: 63.4 ms
Wall time: 1min 39s
