# Automated Prompting

Automated prompting is an advanced technique in prompt engineering, introduced by [Zhang et al. (2022)](https://arxiv.org/abs/2211.01910), that utilizes a GenAI model to automatically generate and evaluate well structured prompts to be used within the same or other models.

For the purposes of this experiment, we are only creating a single automated prompt in a zero shot style using meta prompting, formatting it to remove unnecessary reponse elements, and feed it back into the same model.

In [None]:
##
## AUTOMATED PROMPTING
##

from _pipeline import create_payload, model_req

#### (1) Create an inbound prompt
#### In this case, it was designed to be short and consise, without any extra prompting specifications as it is being handled by a template.
MESSAGE = "A study companion discord bot that is to accept any document, analyse it, and answer questions based on the document."

#### (2) Simulate a Workflow Template. 
#### In this case, a Template is being used to simplify a process of Requirement Analysis for a user and make it universal for any project / solution.
TEMPLATE_BEFORE="You are an expert in software requirement analysis. Your task is to generate a well-structured zero-shot prompt that guides an AI model to analyze and extract key requirements for a following solution:"
TEMPLATE_AFTER= ("The generated prompt must instruct the AI to Identify User Interactions (List all distinct interactions that users will perform within the system) " 
                 "and Define Supporting Functionalities (Specify the essential system functionalities required to enable these interactions). "
                 "The generated prompt must exclude non-essential details such as system architecture, implementation details, or non-functional requirements, "
                 "and be formatted as a standalone prompt with no additional explanation, introductory text, or metadata. Output only the generated prompt and nothing else.")
PROMPT = TEMPLATE_BEFORE + '\n' + MESSAGE + '\n' + TEMPLATE_AFTER

#### The Template above has gone through multiple iterations of fine tuning along with help of automated prompting technique from ChatGPT. 
#### We set the GenAI's identity to help it narrow down its response pool, specify a desired prompting technique and request a prompt realted to Requirement Analysis.
#### We then specify exact needs of our Requirement Analysis (User Interactions and Supporting Functionalities), and format it's output by cutting out any non essential analysis or text.

#### (3) Create a first payload
payload = create_payload(target="ollama",
                         model="llama3.2:latest", 
                         prompt=PROMPT, 
                         temperature=0.3, # Responsible for how "random" can the answer be. Keeping at 0.3 to keep prompt consistant yet allow for small variations.
                         num_ctx=512, # Essentially responsible for models memory / context window. 512 allows the model to create a consize response with some extra room if needed.
                         num_predict=250) # Responsible for how many word tokens are permitted to be returned. 250 keeps the prompt consize.

#### Upon runing multiple experiments with various parameters, we have found that:
#### Temperature of 0-0.2 keeps answers consistant with no iterations, 0.3-0.5 adds some variations while keeping content the same, 0.5-1.0 adds high degree of variation which often causes model to go off topic.
#### Context window is highly dependant on the task. While 100-200 might be enough for simple prompts and responses (such as the ones given in the example), more complex questions require larger context windows. That said, having it too high causes model to off topic.
#### Number of tokens simply controls the allowed size of the output. Smaller numbers will give short and consize responses, while higher numbers will go into a lot more details. Context Window needs to be adjusted accordingly.

# Send out to the first model
time, response = model_req(payload=payload)
print("Automatically Generated Prompt:\n" + response)
if time: print(f'Time taken: {time}s')

print("\n\n\n")

#### (4) Create a second payload
automatedPayload = create_payload(target="ollama",
                                model="llama3.2:latest", 
                                prompt=response, 
                                temperature=0.2, # Responsible for how "random" can the answer be. Keeping at 0.2 to keep answers consistant
                                num_ctx=1024, # Essentially responsible for models memory / context window. 1024 ensures the model fully understands both the meta-prompt and is capable of generating the requirement analysis.
                                num_predict=500) # Responsible for how many word tokens are permitted to be returned. 500 keeps the requirement analysis consize and to the point.

# Send out to the second model
automatedTime, automatedResponse = model_req(payload=automatedPayload)
print("Model Response:\n" + automatedResponse)
if automatedTime: print(f'Time taken: {automatedTime}s')

#### Please see below for a detailed analysis of the implemented technique.

{'model': 'llama3.2:latest', 'prompt': 'You are an expert in software requirement analysis. Your task is to generate a well-structured zero-shot prompt that guides an AI model to analyze and extract key requirements for a following solution:\nA study companion discord bot that is to accept any document, analyse it, and answer questions based on the document.\nThe generated prompt must instruct the AI to Identify User Interactions (List all distinct interactions that users will perform within the system) and Define Supporting Functionalities (Specify the essential system functionalities required to enable these interactions). The generated prompt must exclude non-essential details such as system architecture, implementation details, or non-functional requirements, and be formatted as a standalone prompt with no additional explanation, introductory text, or metadata. Output only the generated prompt and nothing else.', 'stream': False, 'options': {'temperature': 0.3, 'num_ctx': 512, 'num

# Experiment Conclusions:

In order to implement the automated prompting technique, we have used a combination of prompt templates, meta prompting, and zero-shot prompting to achieve desirable results.

We are using prompt templates with meta prompting to generate a detailed zero-shot style prompt with minimal user input, which is then fed back into the model.

Before assembling this prompting method, we first ran extensive tests on meta prompting and zero-shot prompting to identify the best model parameters for each part of the automated prompting pipeline. Below are our observations regarding model parameters.

## Parameter Settings:

Temperature:
0.0 - 0.2 yield the most consistent results with little to no variation. We have found this range to be the most useful when generating the requirement analysis, as there is no need for any sort of "randomness" or "creativity" for this task.

0.3 - 0.5 still yield consistent results, however, they allow for some variation, which we found useful when generating prompts themselves. Normally, automated prompting includes continuous prompt creation and evaluation, which allows the algorithm to pick the best prompt/answer pair, which is where a small degree of variation is beneficial.

0.5 - 1.0 yields results with a high degree of variation, which often causes prompts and requirement analysis to go vastly off topic or cover irrelevant information. There were even cases where the AI refused to generate a prompt and instead generated portions of the requirement analysis. We find this range of temperatures to be the best for creative tasks.

Context Window:
Through our experiments, we have found the context window to be heavily dependent on the size and complexity of the task. Lower context windows benefit small and easy tasks (such as answering basic algebra questions), while larger context windows are required to properly analyze more complex tasks and generate appropriate responses.

We have found that a range between 300-500 yielded the best results for prompt generation, as it allowed the AI to properly comprehend the given prompt and provide an appropriate result. Setting it to lower sizes made the AI struggle with either prompt comprehension or prompt generation, while higher sizes used unnecessary resources and sometimes made the AI "overthink" and go off topic.

A range between 750-1000 has yielded the best results for requirement analysis for the same reasons as specified above.

Token Generation Limit:
Through our experiments and research, we have found this to be heavily dependent on the task at hand, more specifically on how detailed a user wants an output to be. Smaller limits create concise responses, while higher limits allow for a lot more detail, examples, and explanations.

For prompt generation, we have found 200-300 tokens to be plentiful, if not a little redundant. Going below 200 sometimes limited the details that needed to be included in the prompt, and going over 300 sometimes caused unnecessary details to appear. 250 tokens seemed to consistently bring the best results.

For requirement analysis, 400-600 tokens seemed to give the best results, allowing for a detailed list of interactions and functionalities along with some descriptions; however, sometimes it would get off topic and start including unnecessary information such as "Extra Considerations" or "Examples", etc. 500 tokens seemed to be a good middle ground, although even that has given issues at times.

## Performance and Behavior:

Overall, this prompting technique implementation has yielded decent performance. It successfully generates a well-defined prompt, and the created prompt produces a decent surface-level requirement analysis. That being said, the requirement analysis could definitely go into more detail, and the model tends to include unnecessary information in the requirement analysis that was not requested. We attribute this to the lack of a prompt evaluation algorithm and to llama3.2 being a smaller and weaker model than what is required for this task.

We would say that using this implementation of the automated prompting technique for requirement analysis could give users a good starting point, but it will not generate a complete analysis by itself.

## Limitations:

This implementation of automated prompting is not without its limitations:

1. It uses the llama3.2 model, which has a limited data pool and reasoning capabilities.
2. There is no prompt evaluation, which means that the end result will never be optimal.
3. Automated prompting in general is not entirely optimal due to users' inability to alter the generated prompt to best fit the requirements.