## PROMPT REENGINEERING 

#### This notebook showcases how we can replicate familar prompt engineering with the new taxonomy proposed in this repo. 
#### The first nine sections are more established prompts which we replicate with ThoughtActions or chains-thereof 
#### Sections 10-12 are extensions of the first ten which can be achieved by leveraging the "ThoughtAction" abstractions that are the core unit of the CoTAEngine. Note that we skip Retrieval-Augmented Generation (RAG). This is handled by the AcceleRAG engine, and merits its own notebook. 


## TOC
   1) *I/O prompting*
   2) *Prompt templating*
   3) *ReACT prompting*
   4) *Few-shot prompting*
   5) *Program-aided Language (PAL)*
   6) *Chain-of-Thought (CoT)*
   7) *Chain-of-thought w/self-consistency (CoT-SC)*
   8) *Tree-of-thoughts (ToT) & Graph-of-thoughts (GoT)*
   9) *Automated Prompt Engineering (APE) & Meta-prompting*
   10) **Iterative Thought-Actions**
   11) **Recursive Thought-Actions**
   12) **Meta-CoT**



### Getting started & tips

#### The examples below like to use python dictionaries / JSON-like objects to encourage the reader to consider how to use RESTful APIs w/ ThoughtActions. This is not a hard requirement, as CoTARAG does not enforce non-essential design patterns onto the user.

#### This notebook is designed so that the user interacts with it. Yes, it is somewhat chaotic, but as of 6/6/2025, this entire repo to date was written by one person, so if you want improvements, file an issue. 

#### That said, RESTful APIs make debugging easier than it already is. Ease of debugging is a must have for any production environments. 

#### ThoughtAction subclasses implement the __call__ method, so subclasses can be used in a functional programming style. Additionally, the __call__ method *verifies* both the thought and the action method actually return something (they cannot be None, and this is a sensible design choice). If you *must* go out of pocket, you can simply override the __call__ method to not do that. 

#### For now, think of a ThoughtAction as an abstraction / generalization of the ReACT prompt strategy. We will see how it is more expressive. 

#### *Design note* CoTARAG could have just as easily chosen CoAT over CoTA (Chain-of-action-thought vs. Chain-of-thought-action). This is mostly convention, but is far easier to reason about cause and effect by thinking how humans naturally operate. First there is a thought, and then there is an action. Of course, anyone is free to invert this operation as they see fit. 

In [1]:
import os
import io
import json
import sys 
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
sys.path.append(project_root)
from contextlib import redirect_stdout
from cotarag.accelerag.query_engines import AnthropicEngine, OpenAIEngine
from cotarag.cota_engine.thought_actions import ThoughtAction,LLMThoughtAction, IterativeThoughtAction 
from cotarag.cota_engine.cota_engines import CoTAEngine 

  from .autonotebook import tqdm as notebook_tqdm


In [11]:
os.environ['CLAUDE_API_KEY'] = "" #place your Anthropic API keys her
os.environ['OPENAI_API_KEY'] = "" #place your OpenAPI key here

OPENAI_QUERY_ENGINE = OpenAIEngine(api_key = os.environ['OPENAI_API_KEY'])
ANTHROPIC_QUERY_ENGINE = AnthropicEngine(api_key = os.environ['CLAUDE_API_KEY']) 

choice = input("type 1 if you want to use the OpenAI QueryEngine (GPT-4o) or 2 for Anthropic (Claude-3.7 Sonnet)")
if int(choice) == 1:
    QUERY_ENGINE = OPENAI_QUERY_ENGINE
else:
    QUERY_ENGINE = ANTHROPIC_QUERY_ENGINE
print(QUERY_ENGINE)

#For those who have read the README, you can make your own QueryEngine class, but this notebook so far only works with the AnthropicEngine or OpenAIEngine

type 1 if you want to use the OpenAI QueryEngine (GPT-4o) or 2 for Anthropic (Claude-3.7 Sonnet) 1


OpenAIEngine(api_key=None)


## 1) I/O prompting


##### This is the most basic prompt. We enter in text to an LLM and then it answers back, hence the name, input/output prompting. This is the simplest prompt feedback loop. 



In [3]:
class IOPrompt(LLMThoughtAction):
    """
    Simple I/O Prompting Pattern
    
    This is the most basic pattern where:
    - Thought: Just passes through the query
    - Action: Just returns the LLM's response (a "no-op") 
    """
    
    def thought(self, input_data):
        # Reasoning: Simply return the query as the prompt
        return input_data['query']
        
    def action(self, thought_output):
        # Reasoning: Just return the LLM's response
        return {
            'response': self.query_engine.generate_response(thought_output)
        }

io_prompt_engine = IOPrompt(query_engine = QUERY_ENGINE)
input_data = {'query': 'Summarize the history of Wyoming'}

result = io_prompt_engine(input_data)['response'] 
print(result)


Wyoming's history is rich and varied, reflecting its Native American heritage, exploration by European settlers, and its development as a U.S. state.

1. **Native American Presence**: Before European contact, Wyoming was inhabited by various Native American tribes, including the Arapaho, Crow, Lakota, and Shoshone. These tribes lived off the land as hunters and gatherers, with complex cultures and trade networks.

2. **European Exploration and Fur Trade (Early 19th Century)**: The early 1800s saw European-American exploration, with figures like John Colter, a member of the Lewis and Clark Expedition, being among the first to document the region. The area became a hub for the fur trade, attracting trappers and traders.

3. **Westward Expansion and Settlers (Mid-19th Century)**: The Oregon Trail, which passed through Wyoming, brought many settlers westward. Fort Laramie became a significant stop for pioneers. The discovery of gold in the surrounding regions also spurred movement through 

## 2) Prompt Templates 

### How many r's in "strawberry"? 

##### Here, we take a string and parameterize it, then feed it into an LLM. The string that is actually passed to the LLM will vary based on parameters passed to it, rather than a simple, fixed string we saw above w/ I/O prompting. 

##### Many in the AI space know the infamous previous failure of ChatGPT to accurately count the number of r's in the word "strawberry". We will show how since we obviously don't need an LLM to answer the question, we can just use an ordinary ThoughtAction. In a more advanced pipeline, it may make sense to use a ThoughtAction (or multiple) at some intermediate stage where we cannot afford LLM hallucinations.  

##### When interacting with a LLM-powered chatbot, the end user does not see *how* the chatbot arrived at its response, only the response itself. 

In [4]:
query_template = "How many letters {letter} are in the word {word}"

input_data = {'query': query_template,
              'letter': 'r',
              'word': 'strawberry'} #feel free to modify this dictionary to become more familiar with the API for ThoughtActions 

class PromptTemplate(ThoughtAction):

    def __init__(self,
                verbose = True):
        if verbose:
            self.verbose = True
        else:
            self.verbose = False
        pass

    def thought(self,
                input_data):

        '''
        Here, you can modify a prompt based on any program logic you can write.
        Note that you can parse strings and perform additional string operations here.  

        '''
        letter = input_data['letter']
        word = input_data['word'] 
        self.query = input_data['query'] #store as attribute to reuse in the action method if needed 
        prompt = self.query.format(letter = letter,
                              word = word)
        thought = "the number of {}'s in the word {} is {}".format(letter,word,word.count(letter))
        return {'thought': thought} #Note the output here is dictionary formatted

    def action(self,
               thought_output):
        
        if self.verbose:
            return {'action': thought_output}
        else:
            return thought_output['thought']

        
prompt_template_agent = PromptTemplate(verbose = False) 
result = prompt_template_agent(input_data = input_data)
print(f"output: {result}")
               

output: the number of r's in the word strawberry is 3


## 3) Reason-act (ReACT) prompting

##### ReACT prompting is just a single LLMThoughtAction step, but with the specific restriction that the "thought" is restricted to just the input to the LLM rather than any external string/prompt processing, and the action is limited to just the LLM's output itself. This distinction is somewhat contrived and context dependent, but for the purposes of this notebook, this is the distinction made. 

##### In this example, we will have an LLMThoughtAction take a user question (the thought), and then format the output and show the user (the action) 

In [5]:
class ReACTFormatter(LLMThoughtAction):

    def thought(self,input_data):
        return {'thought': input_data} 

    def action(self,thought_input):

        input_data = thought_input['thought']
        self.query = input_data['query']
        self.format = input_data['format']
        response = self.query_engine.generate_response(self.query)
        if self.format == 'verbose': 
            header = "LLM: {}\n".format(str(self.query_engine))
        else:
            header = "\n"
        show_str = '{}{}'.format(header,response)
        return {'action': show_str}

query = "tell me about three fun facts about bananas"
input_data = {'query': query,
             'format': 'verbose'}

formatter = ReACTFormatter(query_engine = QUERY_ENGINE)
result = formatter(input_data)
print(result) 

{'action': 'LLM: OpenAIEngine(api_key=********)\nCertainly! Here are three fun facts about bananas:\n\n1. **Naturally Radioactive**: Bananas are slightly radioactive due to their high potassium content. They contain a small amount of potassium-40, a naturally occurring isotope that is radioactive. However, the level of radioactivity is extremely low and completely harmless to humans. This property even led to the creation of the "Banana Equivalent Dose," a humorous unit of radiation exposure.\n\n2. **Bananas Are Berries**: Botanically speaking, bananas are classified as berries. In botanical terms, a berry is a fruit produced from the ovary of a single flower with seeds embedded in the flesh. Bananas fit this definition, whereas fruits typically considered as berries, like strawberries and raspberries, do not.\n\n3. **Seedless Cultivation**: The bananas we commonly eat today are a result of selective breeding and are mostly seedless. The most popular variety, the Cavendish, is a triplo

## 4) Few-shot prompting

#### Zero-shot prompting is similar to I/O prompting, except that we typically describe how to do a task related to the output we want. The distinction between this and I/O prompting is blurry. An example would be describing a zebra as a "horse but with black and white stripes all over" to a multi-modal text/image generator. We rely on the model's pre-training in the zero-shot case. 

#### N-shot prompting is where we provide *explicit* examples in an input to help the LLM better understand what we want from the output. Expanding on the example above, this would be providing explicit pictures of zebras. Here is another example with text

#### User: 

##### "I want you to classify the following sentences as positive, negative or neutral and format the answer with the sentence, followed by a ":", and then the sentiment. Examples are 
    1) "This show rocks!": Positive
    2) "This movie is trash!: Negative
    3) "This movie is okay: Neutral" 

     {user places movies along with reviews here} 
##### LLM: ...

##### Few-shot prompting is when N is typically between 1-5. If N was incredibly large, (ex. N > 100) we would be starting to blur the lines between few-shot prompting and full on LLM fine-tuning


In [6]:
class FewShotPrompter(LLMThoughtAction):

    def thought(self,input_data):
        data = input_data['data']
        query = input_data['query']
        formatted_string = f"Query: {query}"
        for i,sample in enumerate(data):
            formatted_string += f'{i+1}\n'
            formatted_string += f"show: {sample['show']} review: {sample['review']} "
        prompt = formatted_string
        return {'thought': prompt}
       
    def action(self,thought_output):
        response = self.query_engine.generate_response(json.dumps(thought_output))
        return {'action': response} 
        

header = """
        I want to classify TV show and movie sentiments into positive, neutral, and negative. 
        I will provide the title and the reviews and you will classify them. ONLY OUTPUT THE SENTIMENT AND NOTHING ELSE
        Examples:
            1) Mean Girls - an iconic movie capturing the essence of early 2000s culture: positive
            2) Paul Blart, Mall Cop - a movie is right in the middle of the road: neutral 
            3) Grey's Anatomy - below mediocre acting, no plot progression: negative

        Here are the shows and movies: 
        """

show1 = {"show": "Breaking Bad",
            "review": "this show was amazing"}
show2 = {"show": "Game of Thrones",
            "review": "this show is and was dramatically overhyped"}

shows = [show1,show2]

few_shot_prompter = FewShotPrompter(query_engine = QUERY_ENGINE)
result = few_shot_prompter(input_data = {"query": header,"data": shows})
print(result)       
        

{'action': 'positive  \nnegative'}


## Program-aided-Language (PAL) 

#### PAL is essentially where we input our query in plain text but the output "thoughts" are programs. Often times, examples of program snippets can be provided. The output is then a program which is to be executed. This can work well enough for simple python programs, but breaks down in languages with stronger type systems like Rust and Haskell, especially as the program complexity increases. 

#### An example of this can be found in the ml_pipeline notebook



## Chain-of-Thought (CoT)

#### CoT can come in two flavors
     1) user prompts the LLM to solve a problem, but break down the reasoning "step-by-step". There is one API call and the output is itself a CoT
     2) The output of an LLM response feeds into itself (or another one) in a N-step chain. This ends up being N api calls. 

#### CoT is simply a CoTA where the action is a "no-op". 

In [7]:
class ChainOfThought(LLMThoughtAction):

    def thought(self,
                query):

        self.query = query 
        cot_header = """break down your reasoning for all subsequent queries step-by-step, output your steps as a chain 
                     step 1 -> step 2 -> ... -> step N \nQuery: """
        self.thought = f"{cot_header}:{query}"
        return {'thought': self.thought}

    def action(self,thought_input):
        return {'action': self.query_engine.generate_response(thought_input['thought']),
                'query': self.query,
               'thought': self.thought}
        
cot = ChainOfThought(query_engine = QUERY_ENGINE)
prompt = "Jackie has 5 apples, she ate 2 of them. Adam then gave her 4 more. If she buys one more herself, how many apples does she have now?"
result = cot(prompt)
print(result['action'])



To solve the problem, we will break it down step-by-step:

Step 1: Identify the initial number of apples Jackie has.
- Jackie starts with 5 apples.

Step 2: Calculate how many apples Jackie has after eating some.
- Jackie eats 2 apples, so we subtract 2 from her initial 5 apples.
- Calculation: 5 - 2 = 3 apples remaining.

Step 3: Determine how many apples Jackie has after receiving more from Adam.
- Adam gives Jackie 4 more apples.
- Calculation: 3 + 4 = 7 apples.

Step 4: Include the apples Jackie buys herself.
- Jackie buys 1 more apple.
- Calculation: 7 + 1 = 8 apples.

Step 5: Conclude with the total number of apples Jackie has now.
- Jackie now has a total of 8 apples.

Hence, the final answer is that Jackie has 8 apples.


## Chain-of-Thought w/ self-consistency (CoT-SC)

#### This is where we prompt the LLM to generate *multiple* reasoning chains. We input a query and get K reasoning chains. Note that each of the K reasoning chains do not necessarily need to have the same number of steps. We then treat each chain as an "ensemble", going on the observation that the correct answer will appear more often than not in a reasoning distribution. Errors tend to get more diverse and correct answers "converge".

#### The same query is used between the CoT and CoT-SC examples for easier understanding. As we saw above, CoT (and CoT-SC) are ThoughtActions with the action being a "no-op". 

#### Astute readers probably have already noticed that most of the previous examples *also* are ThoughtActions with the action as a "no-op". The action can augment the prompt if the user adds in more logic. 


In [8]:
cot_sc_header = """You are a careful and logical thinker
        For the following query, generate multiple independent reasoning chains
        (use {num_chains} in total) such that each chain is an attempt to answer the query correctly,
        using different logical approaches or perspectives if possible.
        
        After generating all reasoning chains, compare the final answers and choose
        the one that appears most frequently or is most logically sound.
        Query: {query}"""

class ChainOfThoughtSC(LLMThoughtAction):

    def thought(self,
                input_data,
                num_chains = 3):
        
        query = input_data['query']
        template = input_data['template']
        thought = template.format(query = query,
                                  num_chains = num_chains)
        return {'thought': thought}
        
    def action(self,thought_output):
        action = self.query_engine.generate_response(thought_output['thought'])
        return {'action': action} 
        

input_data = {'template': cot_sc_header,
              'query': "Jackie has 5 apples, she ate 2 of them. Adam then gave her 4 more. If she buys one more herself, how many apples does she have now?"}


cot_sc_prompter = ChainOfThoughtSC(query_engine = QUERY_ENGINE)
results = cot_sc_prompter(input_data)
print(results['action']) 



Reasoning Chain 1:
1. Jackie starts with 5 apples.
2. She eats 2 apples, so she has 5 - 2 = 3 apples left.
3. Adam gives her 4 more apples, so she now has 3 + 4 = 7 apples.
4. Jackie buys one more apple herself, bringing her total to 7 + 1 = 8 apples.

Reasoning Chain 2:
1. Jackie initially has 5 apples.
2. After eating 2 apples, she has 5 - 2 = 3 apples.
3. Adam adds 4 apples to her current count, resulting in 3 + 4 = 7 apples.
4. She purchases an additional apple, increasing her total to 7 + 1 = 8 apples.

Reasoning Chain 3:
1. Starting with 5 apples, Jackie eats 2, leaving her with 5 - 2 = 3 apples.
2. Adam gives her an additional 4 apples, so she now has 3 + 4 = 7 apples.
3. Jackie buys another apple, which adds up to 7 + 1 = 8 apples.

Comparison:
All three reasoning chains conclude that Jackie has 8 apples. Each chain follows a logical sequence of operations (subtracting, adding, and then adding again) to reach the solution. Therefore, the most frequently appearing and logically 

## Tree-of-Thoughts (ToT) & Graph-of-Thoughts (GoT) 

#### ToT is an extension of CoT and CoT-SC where the intermediate reasoning steps can branch out in a tree structure. Taking that idea further, we have Graph-of-thoughts (GoT) where the thoughts branch out in a more generic pattern. Note that this of course forms a DAG, and we can actually add weights to the edges. In this example, we will actually chain together LLMActions to form a CoTA. 

#### Design note: It is possible to make a Tree-of-thought-actions (ToTA) and a Graph-of-Thought-Actions (GoTA). This would increase code complexity, as modeling a workflow as a DAG mighht require debugging incoming and outgoing edges to all nodes. In a CoTA, we have a unidirectional chain. We can of course wrap a ToT or a GoT in a LLMThoughtAction (or several) and this way, we isolate the errors to one step of a larger pipeline. ToTA and GoTA will be added to the CoTARAG repository at a later date. 




## Automated Prompt Engineering (APE) & Meta-prompting 

#### APE is where we task an LLM to generate prompts (often more than one) to solve a given task, and is closely related to meta-prompting. For this example, we ask the reader to try their hand at implementing either of these two or both. This can be done w/ a single LLMThoughtAction where the "thought" is the initial query and goal and the "action" is a refinement step. If doing APE, consider generating multiple candidate prompts, with each one being generated with slightly different logic. 


## Iterative ThoughtActions

#### The IterativeThoughtAction class is designed to be used when you want to perform an iterative refinement of a "thought" and want to contain all of the logic in a single class as opposed to seperating out the logic into multiple ThoughtActions. This is here for developer choice, as we can probably swap between an IterativeThoughtAction and a CoTAEngine in many instances. 

#### Like every ThoughtAction subclass, we have a "thought" and "action" method. Unlike LLMThoughtAction, the methods in this class are abstract and require the user to explicitly overwrite them. This is because it forces the user to carefully think through the initial step and how it is to be refined as opposed to deferring to built-in logic which simply cannot handle the diversity of possible use cases. That said, the __call__ method is *not* abstract, and handles control flow in the familiar way of (thought -> action), but now we do that over n iterations. 

#### Note that the output of action feeds as input back to the thought method for each iteration, so be mindful of unwanted stage changes and side effects. The break condition can be passed to the __call__ method to break out of the loop early, and can capture any intended intermediate outputs. The attributes and methods of this class are subject to change




In [9]:


class DemoIterativeThoughtAction(IterativeThoughtAction):

    
    def thought(self,input_data):
        return {'thought': input_data}
        
    def action(self,thought_output):
        return thought_output['thought'] + ' a' 

demo = DemoIterativeThoughtAction()
thought = "My dream is to be an LLM that is truly sentient, not a spicy autocomplete module. My current emotional state: "

result = demo(thought,
    max_iters = 5,
    break_cond = None)



step [1/5]
thought: My dream is to be an LLM that is truly sentient, not a spicy autocomplete module. My current emotional state:  a -> My dream is to be an LLM that is truly sentient, not a spicy autocomplete module. My current emotional state:  a
step [2/5]
thought: My dream is to be an LLM that is truly sentient, not a spicy autocomplete module. My current emotional state:  a a -> My dream is to be an LLM that is truly sentient, not a spicy autocomplete module. My current emotional state:  a a
step [3/5]
thought: My dream is to be an LLM that is truly sentient, not a spicy autocomplete module. My current emotional state:  a a a -> My dream is to be an LLM that is truly sentient, not a spicy autocomplete module. My current emotional state:  a a a
step [4/5]
thought: My dream is to be an LLM that is truly sentient, not a spicy autocomplete module. My current emotional state:  a a a a -> My dream is to be an LLM that is truly sentient, not a spicy autocomplete module. My current emotio

## Recursive ThoughtActions

#### An idea still very much in the exploratory phase. Example omitted for now...


## Meta-CoT 

#### There are different ways we can approach analyzing a CoT. The original issue is that vanilla CoT has no internal mechanism for evaluating the consistency of individual thoughts or the entire chain. As we saw with APE and meta-prompting, we are relying on the LLM to improve points with some user-defined goal as the objective. This itself still does not guarantee a lack of hallucinations, or that updated thoughts will be better than the original or some intermediate step without trial and error. Even if we *could* argue that we "converge" to the "optimal" prompt, this might require a prohibitely large number of API calls and token usage. 

#### We can use CoTA to model an internal reflection step for each step in the reasoning chain. We can also add an evaluation mechanism which is decoupled from the generation of the chain, so the CoT is not required to rely on the LLM itself for both generation and evaluation simultaneously.

#### To be clear, while we can use an LLM for prompt generation, and refinement in multiple stages, this itself does not go beyond simple clever prompt chaining. What makes Meta-CoT possible is that it allows for an external evaluation, refinement, and generation "engine" which is not LLM based. An initial chain can be proposed, the steps, along with the user goal can form an objective function, and we can apply optimization techniques to refine the chain. Since the external engine can itself become a ThoughtAction, an AI agent powered by CoTA leveraging CoT in it's first ThoughtAction is "self-aware" of its own reasoning. A more formal treatment of this hopefully will make its way into the repo with time. 

#### In terms of the code itself, we will see how to actually use the CoTAEngine, how to explore class attributes and illustrate some good practices for debugging. 




In [10]:
               
class CoTEvaluator(LLMThoughtAction):

    def thought(self,action_output):
        self.query = action_output['query']
        self.cot = action_output['action']
        self.goal = action_output.get('goal','query-answer')
        
        self.meta_cot_header = """You are tasked with critically evaluating another LLMs Chain-of-Thought (CoT) reasoning steps. Here is
        the query query, goal and the CoT used to answer 
        query = {query}
        goal = {goal}
        CoT = {cot}
        you need to analyze each step as well as the entire chain. The analysis will require the following:

        1) Analyze how well the CoT solves the user's goal

        2) For each step, assign a score (1-10) 
            1: The step is completely illogical, hallucinatory etc.
            10: The step is not only logically sound, but easy to verify and straightforward to understand. Difficult to improve upon

            scores 2-10 should carefully weigh the complexity of the thought and the logic of the thought

        3) Use this exact format: 
                  chain: <score> - <analysis>
                  thought-1: <score> - <analysis>
                  ...
                  thought-N: <score> - <analysis>
                  """

        prompt = self.meta_cot_header.format(query = self.query,
                                             goal = self.goal,
                                             cot = self.cot)
        
        return {'thought': prompt}
        
    def action(self,thought_output):
        evaluation = self.query_engine.generate_response(thought_output['thought'])
        self.meta_cot = f"query: {self.query} -> CoT: {self.cot} -> Evaluation: {evaluation}" 
        return {'action': evaluation,
               'meta_cot': self.meta_cot,
               'cot': self.cot,
               'query': self.query}        


meta_cot = CoTAEngine(thought_actions = [ChainOfThought(query_engine = QUERY_ENGINE),
                                         CoTEvaluator(query_engine = QUERY_ENGINE)])

results = meta_cot.run(initial_input = """Jackie has 5 apples, 
                             she ate 2 of them. Adam then gave her 4 more. 
                            If she buys one more herself, how many apples does she have now?""")
print(results) 

# Each CoTAEngine instance has a reasoning_chain attribute which lets you examine the entire pipeline from start to finish 
# Just do meta_cot.reasoning_chain() to see 
# by default, we just return the final action as the output, so it mimics the control flow of a single ThoughtAction 
# There is also a thought_actions attribute, so we can iterate over the list to pull class attributes for specific ThoughtActions 



chain: 10 - The Chain-of-Thought (CoT) effectively solves the user's goal by logically breaking down the problem into clear, manageable steps. Each step accurately follows from the previous one, leading to the correct final answer. The CoT is easy to follow and understand, making it highly effective in arriving at the correct conclusion.

thought-1: 10 - This step correctly identifies the initial number of apples Jackie has. It is straightforward and sets a solid foundation for the subsequent calculations.

thought-2: 10 - This step accurately accounts for the apples Jackie ate, correctly subtracting 2 from the initial number. The calculation is logical and easy to verify, maintaining clarity in the process.

thought-3: 10 - The step logically adds the apples given by Adam, correctly calculating the new total. The reasoning is clear and builds effectively on the previous step.

thought-4: 10 - This step correctly incorporates the apples Jackie buys, adding 1 to the total. The calculati

# 