# Prompt Decomposition Example
This notebook contains an example of prompt decomposition, or taking one long prompt and breaking it down into smaller parts.  These smaller parts can then be run independantly, often in parallel, and often on smaller models.  This can lead to significant performance enhancements and cost savings, may increase quality, and will make your prompts much easier to maintain as the workload grows.  This is because each step can be maintained and tested independantly, without any tweaks on one step impacting all the others (which may occur if they're all one huge prompt).

Here are three common times when Prompt Decomposition can be helpful:
  1) Preprocessing of RAG or context data.  For example, if context data is large, such a a long support document, consider a prompt that summarizes that content once, then future queries retrieve the summary rather than the long document. 
  2)  Breaking multi-step prompts into a maintainable DAG.  This can be helpful when a prompt reads like code, with a large number of steps or if/then instructions.  Instead, consider breaking these out and generating a flow diagram, resulting in maintainable individual pieces.
  3)  Streamlining long linear prompts.  Even where prompts have only a single logical flow, if they are long, it can help to break the flow into small, sequential steps.  These steps combined may execute faster than the long original prompt, and they can also be maintained and tested independently. 

The notebook follows this structure:
  1) Set up the envionment
  2) Examples of decomposition

## 1) Set up the envionment

In [None]:
!pip install anthropic

In [29]:
#use Anthropics library only to count tokens locally
from anthropic import Anthropic
client = Anthropic()
def count_tokens(text):
    return client.count_tokens(text)

In [30]:
#for connecting with Bedrock, use Boto3
import boto3, time, json
from botocore.config import Config

#increase the standard time out limits in boto3, because Bedrock may take a while to respond to large requests.
my_config = Config(
    connect_timeout=60*3,
    read_timeout=60*3,
)
bedrock = boto3.client(service_name='bedrock-runtime',config=my_config)
bedrock_service = boto3.client(service_name='bedrock',config=my_config)

In [31]:
#check that it's working:
models = bedrock_service.list_foundation_models()
for line in models["modelSummaries"]:
    #print (line["modelId"])
    pass
if "anthropic.claude-3" in str(models):
    print("Claud-v3 found!")
else:
    print ("Error, no model found.")

Claud-v3 found!


In [92]:
MAX_ATTEMPTS = 1 #how many times to retry if Claude is not working.
session_cache = {} #for this session, do not repeat the same query to claude.
def ask_claude(messages,system="", DEBUG=False, model="haiku"):
    '''
    Send a prompt to Bedrock, and return the response.  Debug is used to see exactly what is being sent to and from Bedrock.
    messages can be an array of role/message pairs, or a string.
    '''
    raw_prompt_text = str(messages)
    #print ("Calling %s on Bedrock.  Prompt length (tokens):%s"%(model,count_tokens(raw_prompt_text)))
    if type(messages)==str:
        messages = [{"role": "user", "content": messages}]
    
    promt_json = {
        "system":system,
        "messages": messages,
        "max_tokens": 10000,
        "temperature": 0.7,
        "anthropic_version":"",
        "top_k": 250,
        "top_p": 0.7,
        "stop_sequences": ["\n\nHuman:"]
    }
    
    
    if DEBUG: print("sending:\nSystem:\n",system,"\nMessages:\n","\n".join(messages))
    
    if model== "opus":
        modelId = 'error'
    elif model== "sonnet":
        modelId = 'anthropic.claude-3-sonnet-20240229-v1:0'
    elif model== "haiku":
        modelId = 'anthropic.claude-3-haiku-20240307-v1:0'
    else:
        print ("ERROR:  Bad model, must be opus, sonnet, or haiku.")
        modelId = 'error'
    
    if raw_prompt_text in session_cache:
        return [raw_prompt_text,session_cache[raw_prompt_text]]
    attempt = 1
    while True:
        try:
            response = bedrock.invoke_model(body=json.dumps(promt_json), modelId=modelId, accept='application/json', contentType='application/json')
            response_body = json.loads(response.get('body').read())
            results = response_body.get("content")[0].get("text")
            if DEBUG:print("Recieved:",results)
            break
        except Exception as e:
            print("Error with calling Bedrock: "+str(e))
            attempt+=1
            if attempt>MAX_ATTEMPTS:
                print("Max attempts reached!")
                results = str(e)
                break
            else:#retry in 10 seconds
                time.sleep(10)
    session_cache[raw_prompt_text] = results
    return [raw_prompt_text,results]

In [99]:
%%time
#check that it's working:
session_cache = {} 
try:
    query = "Please say the number four."
    #query = [{"role": "user", "content": "Please say the number two."},{"role": "assistant", "content": "Two."},{"role": "user", "content": "Please say the number three."}]
    result = ask_claude(query)
    print(query)
    print(result[1])
except Exception as e:
    print("Error with calling Claude: "+str(e))

Please say the number four.
Four.
CPU times: user 5.24 ms, sys: 0 ns, total: 5.24 ms
Wall time: 422 ms


In [94]:
from queue import Queue
from threading import Thread

# Threaded function for queue processing.
def thread_request(q, result):
    while not q.empty():
        work = q.get()                      #fetch new work from the Queue
        thread_start_time = time.time()
        try:
            data = ask_claude(work[1],model=work[2])
            result[work[0]] = data          #Store data back at correct index
        except Exception as e:
            error_time = time.time()
            print('Error with prompt!',str(e))
            result[work[0]] = (str(e))
        #signal to the queue that task has been processed
        q.task_done()
    return True

def ask_claude_threaded(prompts,model="haiku",DEBUG=False):
    '''
    Call ask_claude, but multi-threaded.
    Returns a dict of the prompts and responces.
    '''
    q = Queue(maxsize=0)
    num_theads = min(50, len(prompts))
    
    #Populating Queue with tasks
    results = [{} for x in prompts];
    #load up the queue with the promts to fetch and the index for each job (as a tuple):
    for i in range(len(prompts)):
        #need the index and the url in each queue item.
        q.put((i,prompts[i],model))
        
    #Starting worker threads on queue processing
    for i in range(num_theads):
        #print('Starting thread ', i)
        worker = Thread(target=thread_request, args=(q,results))
        worker.setDaemon(True)    #setting threads as "daemon" allows main program to 
                                  #exit eventually even if these dont finish 
                                  #correctly.
        worker.start()

    #now we wait until the queue has been processed
    q.join()

    if DEBUG:print('All tasks completed.')
    return results

In [100]:
%%time
#test if our threaded Claude calls are working
session_cache = {} 
#q1 = [{"role": "user", "content": "Please say the number one."}]
#q2 = [{"role": "user", "content": "Please say the number two."}]
#q3 = [{"role": "user", "content": "Please say the number three."}]
#print(ask_claude_threaded([q1,q2,q3]))
print(ask_claude_threaded(["Please say the number one.","Please say the number two.","Please say the number three.","Please say the number four.","Please say the number five."],model='sonnet'))


  worker.setDaemon(True)    #setting threads as "daemon" allows main program to


[['Please say the number one.', 'One.'], ['Please say the number two.', 'two'], ['Please say the number three.', 'three'], ['Please say the number four.', 'Four.'], ['Please say the number five.', 'five']]
CPU times: user 29.9 ms, sys: 4.31 ms, total: 34.2 ms
Wall time: 1.35 s


## 2) Examples of decomposition

Here we'll consider an example use case of a user who would like to undersand how many unique characters are in a novel, and a bit about the three most common ones.

### Start by downloading the novel.  Here we use Frankenstein by Mary Shelley, as it is in public domain.

In [36]:
import requests, re
from bs4 import BeautifulSoup 

In [37]:
#grab the text from the Gutenberg project, a collection of public domain works.
#We use Beautiful Soup to parse the HTML of the webpage.
url = "https://www.gutenberg.org/files/84/84-h/84-h.htm"
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
raw_full_text_webpage = soup.text

In [38]:
#Cut the top and bottom of the webpage so that we only have the text of the book.
raw_full_text = raw_full_text_webpage[raw_full_text_webpage.index("Letter 1\n\nTo Mrs. Saville, England."):raw_full_text_webpage.index("*** END OF THE PROJECT GUTENBERG EBOOK FRANKENSTEIN ***")].replace("\r\n"," ").replace("\n", " ")
#encode some misc unicode charaters.
full_text = raw_full_text.encode('raw_unicode_escape').decode()
#show that we found the expected length
words_count = len(full_text.split(" "))
pages_count = int(words_count/500)#quick estimate, real page count is dependant on page and font size.
token_count = count_tokens(full_text)
print ("Approximate word count:",words_count)
print ("Approximate page count:",pages_count)
print ("Approximate token count:",token_count)

Approximate word count: 76553
Approximate page count: 153
Approximate token count: 92970


### Now that we have our novel, let's try to find all the unique characters with a single prompt.

In [78]:
long_prompt_template = """Consider the following novel:
<novel>
{{NOVEL}}
</novel>

How many unique characters are there with at least one spoken line of dialog?  Please also provide a brief description of the top three most common characters in separate paragraphs. 
Only count charaters that have at least one spoken line of dialog.
"""

long_prompt = long_prompt_template.replace("{{NOVEL}}",full_text)
print ("Approximate prompt token count:",count_tokens(long_prompt))

Approximate prompt token count: 93036


In [59]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
long_responce = ask_claude(long_prompt, model="sonnet")[1]
print(long_responce)

Based on the text, there are 10 unique characters that have at least one spoken line of dialogue:

1. Victor Frankenstein
Victor Frankenstein is the protagonist and narrator for most of the novel. He is a scientist who creates a hideous but sentient creature through an unorthodox scientific experiment. His obsession with his work and the consequences of his creation drive the plot.

2. The Creature/Monster
The Creature, often referred to as the Monster, is Frankenstein's creation. He is intelligent and articulate, but his grotesque appearance causes him to be shunned by society, leading him to seek revenge against his creator.

3. Robert Walton
Robert Walton is the explorer who rescues Victor Frankenstein near the end of the novel. He serves as the narrator for the opening and closing sections of the book, providing a frame for Frankenstein's story.
CPU times: user 11.7 ms, sys: 0 ns, total: 11.7 ms
Wall time: 39.7 s


## Not bad!  93K tokens processed in about 40 seconds.  Let's see if we can make that faster and cheaper using prompt decomposition.
### We'll divide the novel into thirds, run each third in parallel, then write a fourth prompt to combine the results.

In [79]:
short_prompt_template = """Consider the following portion of a novel:
<novel>
{{NOVEL}}
</novel>

Please provide a list of unique characters, each in a character tag.  Inside the character tag should be a name tag with their name,
a count tag with an exact count of times they appear, and a description tag with a brief description of that character.
Only count charaters that have at least one spoken line of dialog.
"""

#let's cut the novel into thirds.
third = int(len(full_text)/3)
short_prompt_1 = short_prompt_template.replace("{{NOVEL}}",full_text[:third])
short_prompt_2 = short_prompt_template.replace("{{NOVEL}}",full_text[third:third+third])
short_prompt_3 = short_prompt_template.replace("{{NOVEL}}",full_text[third+third:])
print ("Approximate prompt token count:",count_tokens(short_prompt_1))

Approximate prompt token count: 30740


### Now let's run these three prompts in parallel

In [61]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
short_responces = ask_claude_threaded([short_prompt_1,short_prompt_2,short_prompt_3],model='sonnet')
#show the reply from one of the three prompts
print(short_responces[0][1])

  worker.setDaemon(True)    #setting threads as "daemon" allows main program to


Here is a list of unique characters with their names, counts, and descriptions, for characters that have at least one spoken line of dialog:

<character>
  <name>Robert Walton</name>
  <count>3</count>
  <description>The narrator who is leading an expedition to the North Pole and encounters Victor Frankenstein.</description>
</character>

<character>
  <name>Victor Frankenstein</name>
  <count>27</count>
  <description>The protagonist who creates a monster and relates his tragic story to Robert Walton.</description>
</character>

<character>
  <name>Elizabeth Lavenza</name>
  <count>4</count>
  <description>Victor Frankenstein's adopted sister and love interest, who defends Justine Moritz at her trial.</description>
</character>

<character>
  <name>Alphonse Frankenstein</name>
  <count>2</count>
  <description>Victor Frankenstein's father, who writes him a letter about the murder of William.</description>
</character>

<character>
  <name>Justine Moritz</name>
  <count>2</count>
  <de

## So far it's looking good!  We've processed the whole novel in around 17 seconds, down from 42.  Let's make a final call to get a final result that matches our original long prompt.

In [62]:
final_prompt_template = """Consider the following list of charaters from a novel.  Each entry contains the character's name,
a count of the number of times they appeared, and a brief description of that charater:
<characters>
{{CHARACTERS}}
</characters>
Some charaters may be listed more than once.  Use the name and description to determine that two entries are the same, 
and if they are, sum their count to support your responce.

How many unique characters are there?  Please also provide a brief description of the top three most common characters in separate paragraphs. 
"""

characters = short_responces[0][1]+short_responces[1][1]+short_responces[2][1]

final_prompt = final_prompt_template.replace("{{CHARACTERS}}",characters)
print ("Approximate prompt token count:",count_tokens(final_prompt))

Approximate prompt token count: 1306


In [63]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
final_responce = ask_claude(final_prompt, model="sonnet")[1]
print(final_responce)

Based on the provided list of characters, there are 11 unique characters:

1. Victor Frankenstein
2. The Creature/Monster/Daemon
3. Robert Walton
4. Elizabeth Lavenza
5. Henry Clerval
6. Justine Moritz
7. Alphonse Frankenstein
8. De Lacey
9. Agatha
10. Felix
11. Safie

The top three most common characters are:

Victor Frankenstein (count: 175) - The protagonist, a young scientist who creates a hideous sapient creature in an unorthodox scientific experiment. He is the central figure of the novel, and his story is narrated to Robert Walton.

The Creature/Monster/Daemon (count: 68) - Frankenstein's creation, who is initially benevolent but becomes murderous after being rejected by his creator and society. The creature's quest for acceptance and companionship drives much of the novel's conflict.

Robert Walton (count: 20) - The explorer who rescues Frankenstein and records his story. He serves as the frame narrator, introducing and concluding Frankenstein's narrative.
CPU times: user 6.83 

### This final prompt took about 5 seconds to run.  The original long prompt took 42 seconds to run, and our decomposed version took 17s + 5, or 22 seconds total.  Almost twice as fast to do the same amount work!
Note that the decomposed version actually found 11 characters, not 10.  This is somewhat common, that the quality will slightly improve with smaller, more focused prompts, because the LLM can focus more when the prompt is smaller.

## For fun, let's repeate the exact same tests as above, but with the smaller, faster Haiku model.

In [65]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
long_responce_haiku = ask_claude(long_prompt, model="haiku")[1]
print(long_responce_haiku)

There are 10 unique characters with at least one spoken line of dialog in the novel:

1. Victor Frankenstein
2. The Monster
3. Walton (the narrator)
4. Elizabeth Lavenza
5. Alphonse Frankenstein (Victor's father)
6. Ernest Frankenstein (Victor's brother)
7. Justine Moritz
8. Henry Clerval
9. The magistrate
10. The old woman (the nurse)

The three most common characters are:

Victor Frankenstein
Victor Frankenstein is the central character of the novel. He is the creator of the monster and the one who narrates the majority of the story. Frankenstein is driven by his ambition and desire for knowledge, which leads him to create the monster. However, he is horrified by his creation and abandons it, setting off a chain of tragic events. Frankenstein is tormented by guilt and remorse over the destruction his creation has caused.

The Monster
The monster, also known as the creature, is the other central character. He is the being that Frankenstein creates and brings to life. The monster is in

### Now the full prompt takes only 11s, down from about 40 with the larger model.

In [None]:
%%time
session_cache = {}#don't use cached info, since we. want to time this.
short_responces_haiku = ask_claude_threaded([short_prompt_1,short_prompt_2,short_prompt_3],model='haiku')
characters_haiku = short_responces_haiku[0][1]+short_responces_haiku[1][1]+short_responces_haiku[2][1]
final_prompt_haiku = final_prompt_template.replace("{{CHARACTERS}}",characters_haiku)
final_responce_haiku = ask_claude(final_prompt_haiku, model="haiku")[1]
print(final_responce_haiku)

  worker.setDaemon(True)    #setting threads as "daemon" allows main program to


Based on the provided list of characters, there are 11 unique characters.

The top three most common characters are:

1. Victor Frankenstein:
Victor Frankenstein is the protagonist of the story, who creates a monster and is then tormented by his own creation. He is the central figure in the narrative, recounting his story to the ship captain, Walton. Frankenstein is driven by his ambition to create life, but the consequences of his actions haunt him throughout the novel.

2. The Monster:
The Monster, also known as the Creature, is the being that Frankenstein creates. Abandoned by his creator, the Monster becomes a tormented and vengeful figure, seeking to destroy Frankenstein and his loved ones. The Monster's story is a tragic one, as he longs for companionship and understanding but is rejected by society due to his hideous appearance.

3. Walton:
Walton is the captain of the ship who rescues Frankenstein and hears his story. He serves as a framing device for the narrative, as Frankens

### Here we see that with such a fast model and small prompt, we don't get a perfomance improvment from decomposition.  Part of this is that with the shorter times involved, system operations and data transfer takes a larger percent of the time.