# Prompt Decomposition
This notebook contains an example of prompt decomposition, or taking one long prompt and breaking it down into smaller parts.  These smaller parts can then be run independantly, often in parallel, and often on smaller models.  This can lead to significant performance enhancements and cost savings, may increase quality, and will make your prompts much easier to maintain as the workload grows.  This is because each step can be maintained and tested independantly, without any tweaks on one step impacting all the others (which may occur if they're all one huge prompt).

Here are three common times when Prompt Decomposition can be helpful:
  1) Preprocessing of RAG or context data.  For example, if context data is large, such a a long support document, consider a prompt that summarizes that content once, then future queries retrieve the summary rather than the long document. 
  2)  Breaking multi-step prompts into a maintainable DAG.  This can be helpful when a prompt reads like code, with a large number of steps or if/then instructions.  Instead, consider breaking these out and generating a flow diagram, resulting in maintainable individual pieces.
  3)  Streamlining long linear prompts.  Even where prompts have only a single logical flow, if they are long, it can help to break the flow into small, sequential steps.  These steps combined may execute faster than the long original prompt, and they can also be maintained and tested independently. 

The notebook follows this structure:
  1) Set up the envionment
  2) Examples of decomposition

## 1) Set up the envionment

In [92]:
!pip install anthropic

Collecting anthropic
  Downloading anthropic-0.21.1-py3-none-any.whl.metadata (16 kB)
Collecting distro<2,>=1.7.0 (from anthropic)
  Downloading distro-1.9.0-py3-none-any.whl.metadata (6.8 kB)
Collecting httpx<1,>=0.23.0 (from anthropic)
  Downloading httpx-0.27.0-py3-none-any.whl.metadata (7.2 kB)
Collecting tokenizers>=0.13.0 (from anthropic)
  Downloading tokenizers-0.15.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting typing-extensions<5,>=4.7 (from anthropic)
  Downloading typing_extensions-4.10.0-py3-none-any.whl.metadata (3.0 kB)
Collecting httpcore==1.* (from httpx<1,>=0.23.0->anthropic)
  Downloading httpcore-1.0.4-py3-none-any.whl.metadata (20 kB)
Collecting huggingface_hub<1.0,>=0.16.4 (from tokenizers>=0.13.0->anthropic)
  Downloading huggingface_hub-0.21.4-py3-none-any.whl.metadata (13 kB)
Collecting fsspec>=2023.5.0 (from huggingface_hub<1.0,>=0.16.4->tokenizers>=0.13.0->anthropic)
  Downloading fsspec-2024.3.1-py3-none-any.whl.met

In [93]:
#use Anthropics library only to count tokens locally
from anthropic import Anthropic
client = Anthropic()
def count_tokens(text):
    return client.count_tokens(text)

In [101]:
#for connecting with Bedrock, use Boto3
import boto3, time, json
from botocore.config import Config

#increase the standard time out limits in boto3, because Bedrock may take a while to respond to large requests.
my_config = Config(
    connect_timeout=60*3,
    read_timeout=60*3,
)
bedrock = boto3.client(service_name='bedrock-runtime',config=my_config)
bedrock_service = boto3.client(service_name='bedrock',config=my_config)

In [141]:
#check that it's working:
models = bedrock_service.list_foundation_models()
for line in models["modelSummaries"]:
    #print (line["modelId"])
    pass
if "anthropic.claude-3" in str(models):
    print("Claud-v3 found!")
else:
    print ("Error, no model found.")

Claud-v3 found!


In [142]:
MAX_ATTEMPTS = 1 #how many times to retry if Claude is not working.
session_cache = {} #for this session, do not repeat the same query to claude.
def ask_claude(messages,system="", DEBUG=False, model="haiku"):
    '''
    Send a prompt to Bedrock, and return the response.  Debug is used to see exactly what is being sent to and from Bedrock.
    messages can be an array of role/message pairs, or a string.
    '''
    raw_prompt_text = str(messages)
    
    if type(messages)==str:
        messages = [{"role": "user", "content": messages}]
    
    promt_json = {
        "system":system,
        "messages": messages,
        "max_tokens": 10000,
        "temperature": 0.7,
        "anthropic_version":"",
        "top_k": 250,
        "top_p": 0.7,
        "stop_sequences": ["\n\nHuman:"]
    }
    
    
    if DEBUG: print("sending:\nSystem:\n",system,"\nMessages:\n","\n".join(messages))
    
    if model== "opus":
        modelId = 'error'
    elif model== "sonnet":
        modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
    elif model== "haiku":
        modelId = 'anthropic.claude-3-haiku-20240307-v1:0'
    else:
        print ("ERROR:  Bad model, must be opus, sonnet, or haiku.")
        modelId = 'error'
    
    if raw_prompt_text in session_cache:
        return [raw_prompt_text,session_cache[raw_prompt_text]]
    attempt = 1
    while True:
        try:
            response = bedrock.invoke_model(body=json.dumps(promt_json), modelId=modelId, accept='application/json', contentType='application/json')
            response_body = json.loads(response.get('body').read())
            results = response_body.get("content")[0].get("text")
            if DEBUG:print("Recieved:",results)
            break
        except Exception as e:
            print("Error with calling Bedrock: "+str(e))
            attempt+=1
            if attempt>MAX_ATTEMPTS:
                print("Max attempts reached!")
                results = str(e)
                break
            else:#retry in 10 seconds
                time.sleep(10)
    session_cache[raw_prompt_text] = results
    return [raw_prompt_text,results]

In [143]:
%%time
#check that it's working:
try:
    query = "Please say the number four."
    #query = [{"role": "user", "content": "Please say the number two."},{"role": "assistant", "content": "Two."},{"role": "user", "content": "Please say the number three."}]
    result = ask_claude(query)
    print(query)
    print(result[1])
except Exception as e:
    print("Error with calling Claude: "+str(e))

Please say the number four.
Four.
CPU times: user 5.2 ms, sys: 235 µs, total: 5.44 ms
Wall time: 355 ms


In [144]:
from queue import Queue
from threading import Thread

# Threaded function for queue processing.
def thread_request(q, result):
    while not q.empty():
        work = q.get()                      #fetch new work from the Queue
        thread_start_time = time.time()
        try:
            data = ask_claude(work[1],model=work[2])
            result[work[0]] = data          #Store data back at correct index
        except Exception as e:
            error_time = time.time()
            print('Error with prompt!',str(e))
            result[work[0]] = (str(e))
        #signal to the queue that task has been processed
        q.task_done()
    return True

def ask_claude_threaded(prompts,model="haiku",DEBUG=False):
    '''
    Call ask_claude, but multi-threaded.
    Returns a dict of the prompts and responces.
    '''
    q = Queue(maxsize=0)
    num_theads = min(50, len(prompts))
    
    #Populating Queue with tasks
    results = [{} for x in prompts];
    #load up the queue with the promts to fetch and the index for each job (as a tuple):
    for i in range(len(prompts)):
        #need the index and the url in each queue item.
        q.put((i,prompts[i],model))
        
    #Starting worker threads on queue processing
    for i in range(num_theads):
        #print('Starting thread ', i)
        worker = Thread(target=thread_request, args=(q,results))
        worker.setDaemon(True)    #setting threads as "daemon" allows main program to 
                                  #exit eventually even if these dont finish 
                                  #correctly.
        worker.start()

    #now we wait until the queue has been processed
    q.join()

    if DEBUG:print('All tasks completed.')
    return results

In [145]:
%%time
#test if our threaded Claude calls are working
q1 = [{"role": "user", "content": "Please say the number one."}]
q2 = [{"role": "user", "content": "Please say the number two."}]
q3 = [{"role": "user", "content": "Please say the number three."}]
#print(ask_claude_threaded([q1,q2,q3]))
print(ask_claude_threaded(["Please say the number one.","Please say the number two.","Please say the number three.","Please say the number four.","Please say the number five."],model='sonnet'))


  worker.setDaemon(True)    #setting threads as "daemon" allows main program to


[['Please say the number one.', 'One.'], ['Please say the number two.', 'two'], ['Please say the number three.', 'three'], ['Please say the number four.', 'Four.'], ['Please say the number five.', 'Five.']]
CPU times: user 48.7 ms, sys: 19 ms, total: 67.7 ms
Wall time: 580 ms


## 2) Examples of decomposition

Here we'll consider an example use case of a user who would like to undersand what is going on in their area.

### Start by downloading some sample data

In [109]:

import requests, re
from bs4 import BeautifulSoup 
  
url = 'https://www.bbc.com/news'
response = requests.get(url) 
  
soup = BeautifulSoup(response.text, 'html.parser') 
#print (soup)
headlines = soup.find('body').find_all("a",{"data-testid" : "internal-link"}) 

def get_bbc_text(url:str) -> list:
    """Parse bbc article and return text in list of strings"""
    
    response = requests.get(url)

    text = re.findall( r'{"text":"(.*?)","',response.text)
    #print(response.text)
    full_text = ""
    last_one = ""
    for t in text:
        if len(t)<3:continue
        if t==last_one:
            continue
        last_one = t
        full_text += "\n" + t
    return full_text

articles = []
MAX_TOKENS = 30000
current_token_count = 0
print ("Found articles on BBC front page.  Downloading full text...")
for x in list(dict.fromkeys(headlines)): 
    if len(x.text.strip())>30: #skip categories, only look at headlines
        title = x.text.strip()
        url2 = 'https://www.bbc.com' + x["href"]
        print (url2)
        text = get_bbc_text(url2)
        articles.append(text)
        current_token_count += count_tokens(text)
        if current_token_count>MAX_TOKENS:
            print ("Max tokens reached, ending download.")
            break
        
print ("Done.")
        

Found articles on BBC front page.  Downloading full text...
https://www.bbc.com/news/world-us-canada-68613083
https://www.bbc.com/news/world-europe-68616372
https://www.bbc.com/news/world-europe-guernsey-68615288
https://www.bbc.com/news/world-australia-68572280
https://www.bbc.com/news/business-68619144
https://www.bbc.com/news/world-us-canada-68621004
https://www.bbc.com/news/world-us-canada-68609687
https://www.bbc.com/news/world-us-canada-68486307
https://www.bbc.com/news/world-africa-68606201
https://www.bbc.com/news/world-middle-east-68614549
https://www.bbc.com/news/world-europe-68618722
https://www.bbc.com/news/world-us-canada-68616476
https://www.bbc.com/news/world-us-canada-68621005
https://www.bbc.com/news/uk-scotland-68610426
https://www.bbc.com/news/world-us-canada-68618342
https://www.bbc.com/news/uk-68609361
https://www.bbc.com/news/world-68614782
https://www.bbc.com/news/world-asia-china-68508694
https://www.bbc.com/news/world-europe-40134140
https://www.bbc.com/news/wo

In [110]:
print ("Number of articles:",len(articles))
print ("Total length",sum(len(x) for x in articles))
print ("Length in tokens:", count_tokens("\n".join(articles)))

Number of articles: 27
Total length 137979
Length in tokens: 30479


In [111]:
url = 'https://www.mercurynews.com/'
response = requests.get(url) 
  
soup = BeautifulSoup(response.text, 'html.parser') 
#print (soup)
headlines = soup.find('body').find_all("a") 

def get_san_jose_text(url:str) -> list:
    """Parse bbc article and return text in list of strings"""
    
    response = requests.get(url)
    #print (url)
    #print (response.text)
    local_soup = BeautifulSoup(response.text, 'html.parser') 
    text = local_soup.find('body').getText()
    text = text.split('\n')
    full_text = []
    for t in text:
        t = t.strip()
        if len(t)<60:continue
        full_text.append(t)
    full_text = '\n'.join(full_text)
    
    return full_text

local_articles = []
MAX_TOKENS = 30000
current_token_count = 0
print ("Found articles on San Jose Mercury News front page.  Downloading full text...")
for x in list(dict.fromkeys(headlines)): 
    if len(x.text.strip())>30: #skip categories, only look at headlines
        title = x.text.strip()
        if title in ["Do Not Sell/Share My Personal Information"]:continue
        url2 = x['href']
        text = get_san_jose_text(url2)
        local_articles.append(text)
        current_token_count += count_tokens(text)
        if current_token_count>MAX_TOKENS:
            print ("Max tokens reached, ending download.")
            break

print ("Done.")

Found articles on San Jose Mercury News front page.  Downloading full text...
Max tokens reached, ending download.
Done.


In [112]:
print ("Number of local articles:",len(local_articles))
print ("Total length",sum(len(x) for x in local_articles))
print ("Length in tokens:", count_tokens("\n".join(local_articles)))

Number of local articles: 18
Total length 135067
Length in tokens: 31698


In [132]:
long_prompt_template = """Consider the following information:
<local_news>{{LOCAL_NEWS}}</local_news>
<global_news>{{GLOBAL_NEWS}}</global_news>

Please answer this question: 
<question>{{QUESTION}}</question>
"""

short_prompt_template = """Consider the following news articles:
<news>{{NEWS}}</news>
These articles have been scraped from an HTML page, and may contain extra text scraps that are safe to ignore.
Please summarize key points in each article in brief concise language that preserves important details, and write the key points of each article in its own <article> tag.
"""

In [155]:
local_news_prompt = "<article>" + "</article>\n<article>".join(local_articles) + "</article>"
global_news_prompt = "<article>" + "</article>\n<article>".join(articles) + "</article>"

question = "Are there any people, places, or ideas mentioned in both the local and global news?"

test_long_prompt = long_prompt_template.replace("{{LOCAL_NEWS}}",local_news_prompt).replace("{{GLOBAL_NEWS}}",global_news_prompt).replace("{{QUESTION}}",question)

print ("Total prompt length in tokens:",count_tokens(test_long_prompt))

Total prompt length in tokens: 62492


In [156]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
long_responce = ask_claude(test_long_prompt, model="sonnet")[1]
print(long_responce)

Yes, there are a few people, places, and ideas mentioned in both the local and global news sections:

People:
- Princess Anne - There is an article in the local news about an attempted kidnapping of Princess Anne in 1974, and a video in the global news section about her police bodyguard who was shot while thwarting the kidnap attempt.

Places: 
- Gaza - There are articles in both sections discussing the ongoing conflict and humanitarian crisis in Gaza.
- Sudan - Both sections have articles covering the civil war and famine risk in Sudan.

Ideas:
- Immigration laws/policies - There are articles in both sections discussing controversial immigration laws, such as the strict new law proposed in Texas (SB4) and Hong Kong's new national security law (Article 23) that critics say erodes civil liberties.

So while the specific articles differ, there is some overlap in the people, places, and broad topics/ideas covered in the local Bay Area news and global international news sections.
CPU times

## Not bad!  60K tokens processed in about 30 seconds.  Let's see if we can make that faster and cheaper using prompt decoposition.

In [157]:
test_short_prompt_local = short_prompt_template.replace("{{NEWS}}",local_news_prompt)
test_short_prompt_global = short_prompt_template.replace("{{NEWS}}",global_news_prompt)

print ("local tokens:",count_tokens(test_short_prompt_local))
print ("global tokens:",count_tokens(test_short_prompt_global))

local tokens: 31868
global tokens: 30708


In [158]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
local_responce = ask_claude(test_short_prompt_local, model="haiku")[1]
print (local_responce[:200],"...")

<article>
Key points:
- A company operating 10 nursing homes in the Bay Area has settled a lawsuit by local county prosecutors and the State of California alleging it neglected vulnerable patients' me ...
CPU times: user 10.2 ms, sys: 469 µs, total: 10.6 ms
Wall time: 26.9 s


In [159]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
global_responce = ask_claude(test_short_prompt_global, model="haiku")[1]
print (global_responce[:200],"...")

<article>
Key points:
- A controversial Texas law known as SB4 would allow local and state police to arrest and prosecute undocumented migrants, upending federal immigration enforcement.
- The law is  ...
CPU times: user 7.3 ms, sys: 170 µs, total: 7.47 ms
Wall time: 35.9 s


In [163]:
local_news_prompt = local_responce
global_news_prompt = global_responce

question = "Are there any people, places, or ideas mentioned in both the local and global news?"

test_long_prompt_decomposed = long_prompt_template.replace("{{LOCAL_NEWS}}",local_news_prompt).replace("{{GLOBAL_NEWS}}",global_news_prompt).replace("{{QUESTION}}",question)

print ("Total prompt length in tokens:",count_tokens(test_long_prompt_decomposed))

Total prompt length in tokens: 4132


In [164]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
decomposed_response = ask_claude(test_long_prompt_decomposed, model="sonnet")[1]
print (decomposed_response)

Based on the news articles provided, there does not appear to be any direct overlap of specific people, places, or ideas mentioned in both the local Bay Area news and the global news stories. The local news focuses on events and issues within the Bay Area region, while the global news covers a diverse range of international stories from various countries around the world.
CPU times: user 4.04 ms, sys: 2.13 ms, total: 6.18 ms
Wall time: 3.42 s


In [165]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
decomposed_response = ask_claude(test_long_prompt_decomposed, model="haiku")[1]
print (decomposed_response)

Yes, there are a few people, places, or ideas mentioned in both the local and global news articles:

1. Famine/Hunger Crisis: The global news article mentions the severe humanitarian crisis and threat of famine in Gaza, as well as the potential for the conflict in Sudan to trigger the world's largest hunger crisis. This relates to the local news article that discusses the need for "single payer" healthcare legislation to curb profiteering in the healthcare system.

2. Discrimination/Exclusion: The global news article discusses the lawsuit against the "Ladies Lounge" exhibit in Australia for excluding men, which relates to the local news article about the nursing home company that allegedly neglected vulnerable patients and exposed them to physical and sexual assaults.

3. Police Misconduct/Civil Rights Violations: The global news article mentions the sentencing of former Mississippi police officers for torturing two black men, which relates to the local news article about the arrest of