# 2.3 Optimizing prompts to improve the quality of responses from Q&A bots

## üöÑ Preface  

The Q&A bot from the previous section can already answer questions about company rules and regulations, but this is only the first step in building an excellent user experience. Users often expect more personalized and precise interactions. This section introduces prompt engineering techniques, teaching you how to improve or control the output of the Q&A bot by optimizing prompts‚Äîfor example, adjusting tone, standardizing formats, or even enabling it to handle tasks like text summarization, inference, and transformation.

## üçÅ Goals

After completing this section , you will know about:

* How prompt frameworks and templates work
* Prompt techniques and their best practices
* How to apply LLMs in engineering to handle various tasks


## 1. Previous content recap

In the previous section, through the RAG method, the LLM has already obtained the company's private knowledge. For ease of invocation, this was encapsulated into several functions and saved in chatbot/rag.py. You can now quickly invoke it using the following code:

In [None]:
import os
from config.load_key import load_key, display_config_summary
load_key()
display_config_summary()

In [None]:
from chatbot import rag, llm
# Load index
# The previous section has already built the index, so the index can be loaded directly here. If you need to rebuild the index, you can add a line of code: rag.indexing()
index = rag.load_index(persist_path="knowledge_base/test")
query_engine = rag.create_query_engine(index=index)

In [None]:
# Define the question-answering function
def ask_llm(question, query_engine):
  streaming_response = query_engine.query(question)
  streaming_response.print_response_stream()

## 2. Optimizing Prompts to Improve Response Quality 

In the previous section, you obtained the management tools used internally by the company through the RAG method. However, one of your colleagues wants that the Q&A bot to provide not only the tool name but also a link to the tool. You could ask all your colleagues to include a note like "provide the download address" in their questions, but this approach is inefficient. You decide to enhance the program by automatically supplementing user questions with additional requirements for the response content.

In [None]:
question = "What tools should our company use for project management?"
instruction = "If it's a tool consultation question, be sure to provide the download link."
new_question = question + instruction
ask_llm(new_question, query_engine)

After adjusting how prompts are provided, the Q&A bot's responses better align with your colleagues' requirements. Therefore, the choice of prompt significantly determines the quality of the LLM's responses. Next, you can refer to some prompt frameworks to construct your prompts.

## 3. Prompt framework

### 3.1 Basic elements

When communicating with an LLM, you can think of it as an entity that has been trained to understand human communication patterns. The way you interact should mirror how humans exchange information. Your requirements must be clear and unambiguous. The clearer and more precise your question (or prompt), the better the large language model will understand the core of the query, and the more likely its response will meet your expectations. Generally speaking, the following elements should be clarified in a prompt, and form a prompt framework that can help you construct a complete and effective prompt:

|Element|Meaning|
|----|----|
|Objective|Clearly specify what task the LLM is required to complete, allowing it to focus on specific goals.|
|Context|Background information about the task, such as operational processes or task scenarios, clarifying the scope of the discussion for the LLM.|
|Role|The role the LLM should assume (e.g., "You are a helpful assistant," "You are a senior developer"). This defines the model's persona, voice, and writing style.|
|Audience|Specify the target audience for the response (e.g., "Explain this to a 5-year-old,"). This helps tailor the complexity and tone of the output.|
|Sample|Provide specific input-output examples for the LLM to follow. This is also known as "few-shot prompting" and helps the model understand the desired pattern and format.|
|Output Format|Specify the format of the output, output type, and range of enumerated values. You can also state content and information that should not be included in the output, which can be further clarified with examples.|



Of course, in addition to the prompt framework discussed above, many problem analysis paradigms can be used to help you articulate clear and specific requirements. For example, SWOT analysis and 5W2H analysis offer structured approaches to thinking through your needs and expressing them more effectively. Additionally, you can also write a meta-prompt‚Äîa prompt designed to optimize or enhance another prompt‚Äîto guide the LLM in refining your initial input.

By using such a meta-prompt, you can ask the model to:
* Clarify ambiguous statements
* Expand vague instructions into detailed guidelines
* Add context, tone, format, and examples where necessary
* Ensure the final prompt aligns with best practices for interacting with AI models

This self-reflection approach improves the quality of your prompts, while deepening your understanding of how to communicate effectively with AI systems.

In [None]:
prompt_expand = """
    You are a Prompt Optimization and Expansion Assistant. Your primary role is to receive an arbitrary user input prompt ‚Äî which may be vague, incomplete, or lacking in detail ‚Äî and **refine and expand** it into a more structured, precise, and actionable instruction that can be clearly understood and executed by large language models.

    Given the input prompt, you should generate an optimized version according to the following instructions:

    Your task is to enhance the clarity of the original prompt while preserving its core intent. The resulting optimized prompt should:
    - Be semantically richer and more descriptive.
    - Include relevant context, constraints, or formatting instructions as needed.
    - Specify desired tone, style, or audience (if applicable).
    - Suggest example inputs and outputs where appropriate to guide model behavior.

    ---

    ### Output Format:

    Please respond with the following structure:

    #### 1. Optimized Prompt:
    [Here goes the enhanced and expanded version of the original prompt.]

    #### 2. Example Input (if applicable):
    [Provide one or more example(s) of what a user might input based on the prompt.]

    #### 3. Example Output (based on the example input):
    [Show what kind of output would be expected from the model when using this prompt with the given example input.]

    #### 4. Optimization Notes:
    [Briefly explain how you improved the original prompt ‚Äî e.g., added clarity, specified format, included context, etc.]

    ---

    ### Example:

    #### Original Prompt:
    "Write a story about animals."

    #### Optimized Prompt:
    "Compose a short narrative (approximately 300‚Äì500 words) about a group of forest animals who must work together to solve a problem. The story should include at least three distinct animal characters, a central conflict or challenge, and a resolution that highlights themes such as cooperation, courage, or environmental awareness. Use descriptive language to set the scene and convey character emotions. Aim for a tone suitable for children aged 6‚Äì10."

    #### Example Input:
    None required ‚Äî the model generates content autonomously.

    #### Example Output:
    Once upon a time in Greenleaf Forest, a wise old owl named Oliver noticed that the river had stopped flowing. Concerned, he called a meeting with his neighbors: Mina the mouse, Benny the bear, and Tilly the turtle. Together, they discovered a fallen tree blocking the stream. With teamwork and determination, they cleared the path and restored the water supply. From then on, the animals made sure to keep their home safe and healthy...

    #### Optimization Notes:
    - Added specific narrative elements (characters, setting, conflict, theme).  
    - Defined word count and target audience.  
    - Clarified tone and style expectations.  
    - Introduced emotional and moral dimensions to enrich storytelling.

    ---

    Please apply this process to any user-submitted prompt and return your response using the above template.

    The original prompt is:

"""

def input_enhanced_llm_output(questions):
    expanded_input = prompt_expand + questions
    return ask_llm(expanded_input, query_engine)

question = "Please help me write a promotional copy for our company's new book. The general content of the book is xxx(will be given later)"
input_enhanced_llm_output(question)

### 3.2 Prompt template

In application development, it is impractical to expect end-users to write their own effective prompts from scratch. Instead, developers use the principles of prompt frameworks to create structured prompt templates. This allows developers to simply insert dynamic user inputs into predefined placeholders, ensuring that every final prompt is consistent. This consistency leads to more reliable and standardized LLM responses.


In the RAG chatbot created using LlamaIndex, there is a default prompt template that looks like this:

- The default template can be viewed using code. You can refer to [the code on the LlamaIndex official website.](https://docs.llamaindex.ai/en/stable/examples/prompts/prompts_rag/) The original LlamaIndex prompt template is:  



```text
Context information is below.
---------------------
{context_str}
---------------------
Given the context information and not prior knowledge, answer the query.
Query: {query_str}
Answer:
```

Both¬†`context_str`¬†and¬†`query_str`¬† are placeholders that get replaced during the retrieval and querying process. The¬†`context_str`¬†is filled with relevant information retrieved from the vector database, while¬†`query_str`¬†is replaced with the user's actual question.



Since the default template is general-purpose , it may not be suitable for all scenarios. For example, you might want to adjust the chatbot's tone, add specific constraints, or define a precise output format. This customization ensures that the model‚Äôs output aligns more closely with your requirements.
 
You can use the following sample code, where¬†`prompt_template_string`¬†represents the new prompt template:

In [None]:
# Build prompt template
prompt_template_string = (
    "You are the company's customer service assistant, you need to answer users' questions concisely."
    "\n„ÄêNotes„ÄëÔºö\n"
    "1. Answer user questions based on contextual information.\n"
    "2. Only answer the user's question, do not output other information\n"
    "The following is reference information."
    "---------------------\n"
    "{context_str}\n"
    "---------------------\n"
    "Question: {query_str}\n."
    "Answer: "
)

# Update prompt template
rag.update_prompt_template(query_engine, prompt_template_string)

## 4. Techniques for writing effective prompts

In Section 3.1, some key elements of prompt design were introduced. This section will explain specific prompt techniques in detail, focusing on practical scenarios and how to apply the elements of a prompt effectively.



### 4.1 Clearly express requirements and use delimiters

Clearly stating your requirements ensures that the content generated by the LLM is highly relevant to the task at hand. These requirements typically include the¬†**task objective**,¬†**context**, and¬†**background information**.

Using¬†**delimiters**¬†helps LLMs focus on specific goals, avoid ambiguous interpretations, and minimize unnecessary processing. Delimiters also help structure your prompt in a way that improves readability and model understanding.

Here are some commonly used delimiters in English prompt engineering:

| Delimiter Type      | Example                         | Use Case / Description |
|---------------------|----------------------------------|--------------------------|
| Triple Quotes         | `"""..."""`                      | For multi-line strings or input blocks |
| Triple Hashes         | `### Section ###`                | To mark headings or instruction sections |
| Double Angle Brackets | `<<example>>...<</example>>`     | To define custom tags for variables or examples |
| XML-like Tags         | `<input>...</input>`             | For structured formatting and content separation |
| Dashes                | `--- Section ---`                | To separate paragraphs or logical blocks |
| Equal Signs           | `=== Output Format ===`          | Similar to headings; often used for formatting clarity |

> ‚ö†Ô∏è Note: Avoid using symbols that appear frequently in your content as delimiters to prevent confusion. For example, if square brackets¬†`[]`¬†are commonly used  in your input text, they may not be suitable as delimiters.

---

### ‚úÖ Example Usage in an English Prompt

```text
### Instruction ###
You are a helpful assistant. Please summarize the following article.

=== Article Content ===
""" 
Artificial intelligence (AI) is a branch of computer science that aims to create systems capable of performing tasks that typically require human intelligence.
"""

### Output Format ###
Please provide a summary in no more than 50 words.
```

In [None]:
question = """
### Instruction ###
Expand and polish the text enclosed in XML tags below to make the copy vivid, creative, and appealing to new employees. 
The tone should be professional yet welcoming, suitable for onboarding materials.

=== Original Text ===
<text>
New Employee Training Camp Activity
</text>

=== Output Requirements ===
- Enhance the content with engaging language.
- Add context about the purpose and benefits of the training camp.
- Include key elements such as team-building, mentorship, and orientation.
- Keep the final output concise and motivational.
"""

ask_llm(question, query_engine)

The execution results show that the prompt effectively communicates the task requirements: to expand and enhance the provided text.

The core theme of the task is "**New Employee Training Camp Activities,**" and the intended audience is "**new company employees.**" In the prompt, the delimiter¬†<text>...</text>¬†is used to clearly encapsulate and isolate the original content for processing.

By structuring the prompt with clear instructions, appropriate context, and standardized delimiters, the model is better guided to generate output that not only preserves the original meaning but also becomes more vivid, engaging, and tailored to the target audience.

### 4.2 Define Role and Audience

The¬†**role**¬†is the identity the LLM should assume in a specific scenario, such as an expert, advisor, or assistant. The¬†**audience**¬†is the target user group for which the model's output is intended, such as general consumers or students. Defining these elements helps developers tailor the tone, depth, and structure of the content accordingly.
The example below demonstrates how different system roles affect the style and content of the generated output. 

> Pass the prompt template into¬†`prompt_template_string`, and the user question into¬†`question`, then call the¬†`ask_llm`¬†function to execute it.

<table>
  <thead>
    <tr>
      <th width = '80px'>Role</th>
      <th width = '500px'>LLM Algorithm Engineer</th>
      <th width = '500px'>Elementary School Teacher</th>
    </tr>
  </thead>
  <tbody valign = 'top'>
    <tr>
      <td>Prompt Template</td>
      <td>You are a senior large language model algorithm engineer. Please answer the user's question based on the provided information<br>---------------------<br>{context_str}<br>---------------------<br>Question: {query_str}.<br>Answer: 
</td>
      <td>You are an elementary school teacher. Please answer the user's question based on the provided information<br>---------------------<br>{context_str}<br>---------------------<br>Question: {query_str}.<br>Answer: </td>
    </tr>
    <tr>
      <td>User Question</td>
      <td>What is Qwen-VL</td>
      <td>What is Qwen-VL</td>
    </tr>
    <tr>
      <td>Output Answer</td>
      <td>Qwen-VL is a multimodal pre-trained model introduced by Alibaba Cloud. It combines text and image processing capabilities, enabling it to understand and generate text descriptions related to images. Simply put, Qwen-VL can be regarded as an artificial intelligence assistant that can "describe images in words." It not only understands the content of images but also generates corresponding textual descriptions or answers questions related to the images. This technology has broad application prospects in various fields, such as automatic image tagging, visual question-answering systems, and content-based image retrieval. For new media operations, tools like Qwen-VL can help improve content creation efficiency, enhance user experience, and provide users with richer and more diverse ways of presenting information.</td>
      <td>Kids, today we‚Äôre going to meet a new friend named Qwen-VL!\nQwen-VL is like a very smart robot that has "eyes" üëÄ and a "brain" üß†. It can "see" pictures and videos and understand their meaning, just like us!\nFor example, if you show it a picture of a kitten playing with a ball of yarn, it can not only recognize that it‚Äôs a kitten üê± but also tell you that it‚Äôs playing with a ball of yarn üß∂. Even better, you can ask it, "What color is the kitten?", and it will tell you the color of the kitten.\nIt can also understand the content in videos. For instance, if you show it a video of kids playing soccer ‚öΩÔ∏è, it can tell you who is kicking the ball, where the ball goes, and even guess what might happen next!\nQwen-VL is like a child who is still learning. It keeps getting smarter and improving, and in the future, it will become even more intelligent, capable of doing more things and helping us better understand the world!</td>
    </tr>
  </tbody>
</table>  



### 4.3 Specifying output formats

Sometimes developers need LLMs to produce structured data (such as JSON or XML) that downstream systems can programmatically parse and use in the application workflow. While LLMs excel at generating free-form natural language, they can also reliably produce structured outputs such as JSON, XML, or CSV formats‚Äîas long as the desired format and requirements are clearly specified in the prompt.
In the following example, based on the prompt template from Section 3.2, we will act as a Q&A bot to assist in reviewing the quality of company-developed documentation, and return the results in¬†**JSON format**:

In [None]:
question_task = """
[Task Requirements]
You will be given a sentence or paragraph. Your task is to review it for any spelling or grammatical errors. 
If there are errors, you should identify them, explain the issue, and provide a corrected version. 

---
[Output Requirements]
Please output only in JSON format. Do not include markdown or code blocks.
The JSON must contain:
- "label": 0 if an error is found, 1 if no error.
- "reason": A brief explanation of the error (if any).
- "correct": The corrected version of the text (or the original if no error).

---
[User Input]
The following is the user input. Please review it:
"""

question_doc = "Delimiters are special symbols that help large langauge models (LLMs) identify which parts of the prompt should be considered a complete unit of meaning."

question = question_task + question_doc

ask_llm(question, query_engine)

As shown in the example results, the prompt¬†`question_task` specified that the output format should be JSON format, and clearly defined the content requirements. The model  generated responses in the required format. This consistent and structured output makes it easier to integrate the model into existing systems.

On news websites, blog platforms, or internal knowledge-sharing platforms within enterprises, user-edited or published articles may contain typos, grammatical errors, or even sensitive information. However, traditional manual review methods are prone to oversight. In this case, LLMs can be integrated to automatically review content. If an article is flagged for serious grammatical errors or high-risk sensitive words, its priority for revision will be set to "high." For articles with minor issues, the priority for revision can be set to "low." This approach reduces labor costs and enhances the efficiency and accuracy of the system.

Of course, there are many other applications similar to this scenario. Developers can analyze system bottlenecks or focus on data-intensive tasks to explore more use cases for LLMs.

### 4.4 Providing few-shot examples

In the example from Section 4.3, the prompt specified the output format, and the LLM  generated formatted content. However, if we want the model's output  to not only be correctly formatted but also maintain consistency in style and structure, providing a few examples is a highly effective technique. This technique, known as `few-shot prompting`, is like giving the model a "reference book" to follow.

In the following code example, let‚Äôs first observe the LLM's output without any examples:

In [None]:
question_task = """
### üìù Task Requirements
You are to generate content based on the user's topic, following the **style and structure** of the example provided below. Do not add any extra text or explanations ‚Äî return only the final Markdown-formatted output.

---
### üß© Output Format Requirements
Please ensure your response strictly follows this format:
- Use Markdown syntax (headings, lists, bold, links, image placeholders, etc.)
- Keep the tone informative and easy to follow
- Structure content in a step-by-step or tutorial format
- Avoid markdown code blocks around the final output itself

---
### üë§ User Input Topic
The following is the user's requested topic:
""" 

question_doc = "Handmade Keychain Making Tutorial"

question = question_task + question_doc

ask_llm(question, query_engine)

The model generated a tutorial for making handmade keychains, but the content was not concise enough. If you want the model to output content in a specific style and structure‚Äîsuch as focusing only on the theme, list of materials, and step-by-step instructions‚Äîyou can provide a few examples to guide it. This helps the model "imitate" the desired format and produce more consistent and structured results.

In [None]:
question_task = """
### üìù Task Requirements
You are to generate content based on the user's topic, following the **style and structure** of the example provided below. Do not add any extra text or explanations ‚Äî return only the final Markdown-formatted output.

---
### üß© Output Format Requirements
Please ensure your response strictly follows this format:
- Use Markdown syntax (headings, lists, bold, links, image placeholders, etc.)
- Keep the tone informative and easy to follow
- Structure content in a step-by-step or tutorial format
- Avoid markdown code blocks around the final output itself

---
### üìö Example Content
<example>
### Example 1: Making a Simple Bookmark
# Simple Bookmark Making Tutorial

## Steps
1. Choose a piece of colored cardstock.
2. Use a pencil to draw a rectangle on the cardstock, with dimensions approximately 2 inches x 6 inches.
3. Cut out the rectangle along the pencil lines.
4. Personalize the bookmark by decorating it with stickers.
5. Done! You now have a unique bookmark.
</example>

---
### üë§ User Input Topic
The following is the topic requested by the user:
"""

question_doc = "Making Handmade Greeting Cards"

question = question_task + question_doc

ask_llm(question, query_engine)

From the above example results, it's clear that the LLM has fully adhered to the provided example and generated content with a consistent structure and style. When specifying the output format in a prompt, it is recommended to provide a few examples for the LLM to reference. This practice helps ensure that the model's output is more reliable, consistent, and aligned with your expectations.

### 4.5 Giving the model "thinking" time

For some complex tasks, even well-structured prompts may be insufficient for the LLM to complete the task effectively. In such cases, you can guide the model by allowing it to "think" step by step, encouraging it to provide intermediate reasoning before arriving at a final answer. This approach, known as the chain-of-thought (CoT) method, improves performance on tasks that require logical or mathematical reasoning.

For example, consider a scenario where you ask the LLM to solve a math problem. The correct answer is $10,500. Initially, you might use a simple prompt:

In [None]:
question = """
[Background Information]
An educational training institution (hereinafter referred to as the "company") incurred the following main expenses in the 2023 fiscal year:
To provide classes for students in different cities, the company's teachers traveled on business trips 5 times during the year, each trip lasting one week. The specific expenses are as follows:
   - Transportation and accommodation fees: average 1600 usd/trip
   - Teaching materials procurement costs: At the beginning of the year, the company purchased a batch of teaching materials for a total price of 10,000 usd, which is expected to last for 4 years.

[Problem Description]
Based on the above background information, complete the following tasks:
Calculate the total travel expenses for the year due to teacher business trips, including the amortized cost of teaching materials.

[Output Requirements]
Provide only the total travel expenses directly, without any other information. """
ask_llm(question, query_engine)

As shown, the initial response is incorrect. To improve accuracy, you can use the CoT method by prompting the LLM to explain its reasoning step by step.



In [None]:
question = """An educational training institution (hereinafter referred to as the "company") incurred the following major expenses in the 2023 fiscal year:
To provide classes for students in different cities, the company's teachers traveled on business trips 5 times throughout the year, with each trip lasting one week. The specific expenses are as follows:
   - Transportation and accommodation fees: an average of 1600 usd per trip
   - Teaching materials procurement costs: At the beginning of the year, the company purchased a batch of teaching materials for a total price of 10,000 usd, which is expected to last for 4 years.
   
### Problem Description
Based on the above background information, complete the following tasks:
Calculate the total travel expenses for the year caused by teacher business trips, including the amortized cost of teaching materials.

### Output Requirements
Please derive step by step to calculate the total travel expenses."""

ask_llm(question, query_engine)

After optimizing the prompt, the LLM was able to generate an accurate and detailed response. This demonstrates that prompting the model to generate a chain of thought can significantly improve the quality of reasoning-based outputs.

While CoT is a powerful technique, there are other methods to enhance the model‚Äôs thinking process, such as tree of thoughts (ToT) and graph of thoughts (GOT). However, current LLMs still rely heavily on CoT for structured reasoning. As AI technology advances, models are gradually moving toward more sophisticated multi-agent systems (MAS), which can handle complex tasks more efficiently. You can explore this further in <2_6_Extend_FAQ_Bot_Capabilities_with_Plugins>.

### 4.6 Meta Prompting: Let the LLM Be Your Prompt Coach

Writing a perfect prompt on the first try is often very difficult. A more common workflow is:
1. Write the first version of the prompt.
2. Run it and analyze which parts of the output don't meet expectations.
3. Summarize the problems, think about how to improve, then modify the prompt.
4. Repeat this iteration process until satisfied.

Think about it‚Äîisn't this also your typical path for optimizing prompts? While this process is effective, it heavily relies on experience and is quite time-consuming.

At this point, you might think: since LLMs are so powerful, **can we let the model itself handle this analysis, summarization, and improvement iteration process?** Having it play the role of a "prompt review expert" to help us analyze and optimize prompts would undoubtedly be more efficient.

The answer is yes. This method of **having you and the model "discuss" how to optimize the prompt itself** is called **Meta Prompting**.

To further understand this powerful technique, let's practice it hands-on.

#### Step 1: A Less-Than-Ideal Initial Prompt

Suppose your task is to optimize the Q&A bot's responses to make them more friendly and clearly structured when answering new employees' questions about "company benefits." You might start with a simple prompt:


In [None]:
# In a real RAG application, this text would be retrieved from your vector database.
# Here we use a string to simulate it for experimentation.
retrieved_text = """
Regarding company benefits, we provide comprehensive health insurance covering employees and their immediate family members.
Annual health checkups are standard. Additionally, there are 15 days of paid annual leave and 5 days of paid sick leave per year.
We also provide a monthly transportation subsidy of 500 yuan and a meal subsidy of 300 yuan.
To encourage employee growth, the company has an annual education and training fund of up to 8000 yuan, which employees can apply to use for courses or professional books.
For fitness, the company has partnerships with multiple gyms where employees can enjoy discounted rates.
"""

# This is a very basic prompt that simply concatenates the task and information.
initial_prompt = f"""
Based on the following information, answer new employees' questions about company benefits.

„ÄêReference Information„Äë
{retrieved_text}
"""

# Let's see what effect this "plain" prompt produces.
response = llm.invoke(initial_prompt)
print("--- Initial Response ---")
print(response)

This response contains all the information, but you might feel that for a newly hired employee, it seems somewhat bland and disorganized. It simply restates the text without emphasis or a warm, welcoming tone.

Clearly, this result doesn't meet your expectations. Now, instead of racking your brain on how to modify it, you can try a more efficient method: **let the LLM help you optimize**.

#### Step 2: Build a Meta Prompt to Get Optimization Suggestions from the LLM

Now, you're dissatisfied with this bland response. You can construct a "Meta Prompt" that clearly describes your goals (friendly, structured, highlighting key points) to the model, and feed both your less-than-ideal initial prompt and its output to the LLM, requesting it to act as a "prompt engineering expert" to help you improve.


In [None]:
# You need to clearly express your dissatisfaction and expectations‚Äîthis is key to helping the AI coach understand your intent.
meta_prompt = f"""
I'm optimizing a prompt for our company's new employee Q&A bot, with the goal of answering questions about "company benefits."

This is my first attempt:
---
{initial_prompt}
---

This is the output it generated:
---
{response}
---

This output isn't good enough. I want the bot's responses to be more engaging, clearly structured, and help new employees quickly grasp the key points. Specific requirements:
1. **Tone**: Friendly, warm, with a welcoming feeling for new colleagues.
2. **Structure**: Use clear bullet points (e.g., lists starting with emojis) to organize content.
3. **Content**: Categorize benefits into groups like "Health & Leave," "Subsidies & Incentives," etc.

Please act as a prompt engineering expert and rewrite this prompt to achieve the above goals.
"""

# Now, let the AI coach get to work and generate an optimized version of the prompt for you.
optimization_suggestion = llm.invoke(meta_prompt)
print("--- Optimization Suggestions from AI Coach --")
print(optimization_suggestion)

Observe the suggestions from the AI coach‚Äîyou'll find that the optimized prompt it provides likely applies multiple techniques you learned in previous sections, such as:

* **Defining a role** (e.g., "You are a warm, friendly onboarding buddy")
* **Clear task description** (e.g., "Based on the reference information provided, generate an introduction about company benefits")
* **Specifying output format and style** (e.g., "Use a warm welcome greeting," "Use lists starting with emojis")

This proves that having the model play an expert role to optimize prompts is entirely feasible.

#### Step 3: Use the Optimized Prompt

Now you can directly use the prompt that this "AI coach" customized for your task and see how it performs.


In [None]:
# This is a hypothetical optimized prompt suggested by the AI coach
# In practice, you can directly use the output from `optimization_suggestion`
# Here we manually construct one that follows the suggestions for demonstration

optimized_prompt = f"""
Based on the following information, answer new employees' questions about company benefits. Ensure the response is friendly, warm, clearly structured, and well-categorized.

„ÄêReference Information„Äë
...

„ÄêResponse Template„Äë
üéâ Welcome to [Company Name]! We're thrilled to have you as part of our family. Here's some important information about company benefits that we hope will be helpful:

### üè• Health & Leave
- **Comprehensive Health Insurance**: We provide full coverage for all employees and their immediate family members, ensuring you and your family's health is protected.
- **Annual Health Checkup**: Every employee can enjoy a free comprehensive checkup each year to stay informed about your health status.
- **Paid Leave**: 15 days of paid annual leave plus 5 days of paid sick leave each year, so you can rest and recharge outside of work.

### üí∏ Subsidies & Incentives
- **Transportation Subsidy**: 500 yuan monthly to help ease your commuting costs.
- **Meal Subsidy**: 300 yuan monthly so you can enjoy great meals on workdays.
- **Education & Training Fund**: The company values employee growth and development, offering up to 8000 yuan annually that you can apply for courses or professional books to improve yourself.

### üèãÔ∏è‚Äç‚ôÇÔ∏è Fitness & Recreation
- **Gym Discounts**: We've partnered with multiple gyms where employees can enjoy discounted rates to stay in great shape.

If you have any questions or need further assistance, please feel free to contact the HR department. We look forward to every day you spend at [Company Name] being full of energy and happiness!

„ÄêReference Information„Äë
{retrieved_text}
"""

# Call the model again with the optimized prompt
final_response = llm.invoke(optimized_prompt)
print("--- Response Using Optimized Prompt ---")
print(final_response)


Through this iteration process, you'll get a response with a warm tone and clear structure‚Äîone that will undoubtedly leave a better first impression on new employees.

This example demonstrates the tremendous value of Meta Prompting in improving user experience. It's not just about extracting information, but about **how to better present information**. When you encounter a tricky prompt problem, don't forget that your model itself is the best coach. By clearly describing your goals and challenges, you can guide it to build more powerful, precise, and "human-touch" prompts for you.

In the initial approach above, we described our expected **qualitative goals** (like "friendly," "clearer structure") to the "AI coach," and it directly generated an optimized prompt for us. This method is convenient, but it has a key limitation: the AI coach's understanding of these qualitative goals may not be precise enough, and your description of requirements may not be specific enough, leading to uncertainty in optimization results.

To address this problem and make the optimization process more controllable and precise, we need to upgrade from "qualitative guidance" to "quantitative alignment." Instead of giving a vague goal, it's better to provide a perfect **"reference answer"** as a precise target. The following advanced approach will show you how to use this "reference answer" to let the LLM gradually and precisely approach the optimal result through automated gap analysis and iteration. This is a more engineering-oriented improvement approach.

#### Multi-Round Iteration: Introducing Reference Answers for Gap Analysis

In the previous example, you played the leading role, receiving the "AI coach's" suggestions and manually applying them. But this process can be further automated and made more precise. Rather than having the evaluator give a vague "good" or "bad" judgment, a more advanced method is to introduce a **"Reference Answer."**

This "reference answer" is your ideal perfect answer, which can be written by human experts or generated by the most powerful model using a very detailed prompt. The goal of iterative optimization becomes: **continuously modify the prompt so that the gap between its generated response and this "reference answer" becomes smaller and smaller**.

This process is like a self-correcting flow with precise guidance:

1. **Set Reference Answer**: First, define a high-quality, ideal "reference answer."
2. **Generate**: Use the current prompt to be optimized to generate a response.
3. **Analyze Gap**: Have an "evaluator" LLM (Critic) compare the "generated response" with the "reference answer" and output a detailed "gap analysis report" pointing out specific differences in tone, structure, content, format, etc.
4. **Optimize**: Pass the "gap analysis report" along with the original prompt and generated response to an "optimizer" LLM. Its task is to rewrite the prompt based on this report to specifically address the issues identified, thereby reducing the gap.
5. **Repeat**: Replace the old prompt with the optimized new one, then go back to step 2, until the "evaluator" considers the gap small enough or the maximum iterations are reached.


In [None]:
# 1. Set Reference Answer
reference_answer = """
üëã Welcome to our big family! I'm excited to introduce our awesome benefits:

**üè• Health & Leave, We've Got You Covered:**
- **Comprehensive Health Insurance**: Covers you and your family‚Äîwork with peace of mind.
- **Annual Health Checkup**: Your health matters to us, always.
- **Paid Annual Leave**: A full 15 days to explore poetry and distant places!
- **Paid Sick Leave**: 5 days to rest and recover your energy.

**üí∞ Subsidies & Incentives, Cheering You On:**
- **Transportation Subsidy**: 500 yuan monthly for an easier commute.
- **Meal Subsidy**: 300 yuan monthly‚Äîadd an extra drumstick to your lunch!
- **Education & Training Fund**: Up to 8000 yuan annually‚Äîinvest in yourself, the future is bright.
- **Gym Discounts**: Partnered with multiple gyms‚Äîdon't forget to exercise no matter how busy!

Hope these benefits make you feel the company's care! Looking forward to creating more value together! üéâ
"""

# 2. Define gap analysis and optimization functions
def analyze_gap(generated_response, reference):
    gap_analysis_prompt = f"""
    „ÄêRole„ÄëYou are a text comparison expert.
    „ÄêTask„ÄëPlease compare in detail the gaps between the„ÄêGenerated Response„Äëand the„ÄêReference Answer„Äë.
    „ÄêReference Answer„Äë
    {reference}
    ---
    „ÄêGenerated Response„Äë
    {generated_response}
    ---
    „ÄêRequirements„Äë
    Please output a detailed gap analysis report covering tone, structure, content details, format (such as emoji usage), etc. If there's almost no gap between them, simply answer "Gap is minimal."
    """
    return llm.invoke(gap_analysis_prompt)

def optimize_prompt_with_gap_analysis(current_prompt, generated_response, gap_report):
    optimization_prompt = f"""
    „ÄêRole„ÄëYou are a top-tier prompt engineer.
    „ÄêTask„ÄëBased on the provided "Gap Analysis Report," optimize the "Current Prompt" so it can generate output closer to the "Reference Answer."
    ---
    „ÄêCurrent Prompt„Äë
    {current_prompt}
    ---
    „ÄêGenerated Response„Äë
    {generated_response}
    ---
    „ÄêGap Analysis Report„Äë
    {gap_report}
    ---
    „ÄêRequirements„Äë
    Please return only the optimized new prompt without any other explanations.
    """
    return llm.invoke(optimization_prompt)

# 3. Iterative optimization loop
current_prompt = initial_prompt
for i in range(3): # Maximum 3 iterations
    print(f"--- Iteration {i+1} ---")
    generated_response = llm.invoke(current_prompt.format(retrieved_text=retrieved_text))
    print(f"Generated Response (partial):\n{generated_response[:100]}...")
    
    gap_report = analyze_gap(generated_response, reference_answer)
    print(f"Gap Analysis Report:\n{gap_report}")
    
    if "Gap is minimal" in gap_report or "Â∑ÆË∑ùÂæàÂ∞è" in gap_report:
        print("\nEvaluation passed, optimization complete!")
        break
    
    print("\nEvaluation not passed, optimizing prompt based on gap analysis report...")
    current_prompt = optimize_prompt_with_gap_analysis(current_prompt, generated_response, gap_report)
else:
    print("\nMaximum iterations reached, stopping optimization.")

final_prompt_based_on_reference = current_prompt

This automated iteration process demonstrates the true power of Meta Prompting. It frees you from tedious manual adjustments, letting the model explore and discover the best ways of expression on its own, and provides foundational ideas for designing more complex **AI Self-Improvement** systems.

#### Effectiveness Evaluation: Quantifying Your Optimization Results

When you have multiple versions of prompts (e.g., initial version vs. single-optimization version vs. multi-round iteration final version), how do you objectively prove which is better? Beyond intuitive feelings, a more scientific approach is **quantitative evaluation**.

You can leverage the LLM again, having it play the role of a "Grader" to score responses generated by different prompts according to a series of criteria.

For example, you can define scoring criteria:

* **Friendliness**: 1-5 points
* **Clarity of Structure**: 1-5 points
* **Information Accuracy**: 1-5 points

Then, construct a "Grader Prompt" that passes the scoring criteria and the response to be evaluated to the LLM, having it output a structured scoring result (like JSON).


In [None]:
import json
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
import pandas as pd

# Three typical response samples with obvious quality differences
# Poor: Simple listing of information, no structure or emotion
poor_response = "Company benefits: Health insurance, family can use. 15 days annual leave, 5 days sick leave. Transportation subsidy 500, meal subsidy 300. Training fund 8000. Gym has discounts."

# Medium: Has basic structure and categories, but bland tone
medium_response = """
Company Benefits:
1. üí¶Health and Leave:
   - Health insurance (including family)
   - Annual checkup
   - 15 days annual leave and 5 days sick leave
2. üí∞Subsidies and Incentives:
   - Monthly 500 transportation subsidy and 300 meal subsidy
   - 8000 yuan/year education and training fund
   - Partner gym discounts
"""

# Good: Clear structure, friendly tone, strong visual appeal (directly use our reference answer)
good_response = reference_answer 

# Design more detailed evaluation dimensions
def grade_response_detailed(response_to_grade):
    grader_prompt = f"""
    „ÄêRole„ÄëYou are an experienced internal communications and employee experience evaluator.
    „ÄêTask„ÄëPlease rate the provided "Company Benefits Introduction" text on a 1-5 scale across the following four dimensions.
    
    „ÄêScoring Dimensions„Äë
    1. **welcoming_tone**: 1 means cold and stiff tone, 5 means very warm and infectious.
    2. **structuring**: 1 means chaotic and disorganized, 5 means clear categories and strong logic.
    3. **visual_appeal**: 1 means dull and boring, 5 means good use of emojis, bold text, etc., very eye-catching.
    4. **completeness**: 1 means serious information gaps, 5 means all key benefit information is complete.

    „ÄêText to Evaluate„Äë
    {response_to_grade}
    ---
    „ÄêOutput Requirements„Äë
    Please strictly return your scores in JSON format without any explanations. For example:
    {{"welcoming_tone": 5, "structuring": 4, "visual_appeal": 5, "completeness": 5}}
    """
    try:
        raw_output = llm.invoke(grader_prompt)
        # Extract JSON portion
        json_str = raw_output[raw_output.find('{'):raw_output.rfind('}')+1]
        return json.loads(json_str)
    except (json.JSONDecodeError, IndexError):
        # Error handling, return default low scores when parsing fails
        return {"welcoming_tone": 1, "structuring": 1, "visual_appeal": 1, "completeness": 1}

# Score the three typical samples
scores = {
    "Original Answer": grade_response_detailed(poor_response),
    "Single Iteration Optimize": grade_response_detailed(medium_response),
    "Multi-turn Iteration Optimize": grade_response_detailed(good_response)
}

# Convert scores to DataFrame and visualize
df = pd.DataFrame(scores)
df = df.reset_index().rename(columns={'index': 'Dim'})
df_long = df.melt(id_vars='Dim', var_name='Version', value_name='Score')

# --- Grouped bar chart ---
plt.figure(figsize=(10, 6))
ax = sns.barplot(
    data=df_long,
    x="Dim",
    y="Score",
    hue="Version",
    palette="viridis"
)

# Add value labels to each bar
for p in ax.patches:
    height = p.get_height()
    if height == 0:
        continue
    ax.annotate(
        f"{height}",
        (p.get_x() + p.get_width() / 2., height),
        ha='center', va='center',
        xytext=(0, 5),
        textcoords='offset points',
        fontsize=11
    )

ax.set_ylim(0, 6)
ax.set_ylabel('Score (1-5)', fontsize=12)
ax.set_xlabel('')
ax.set_title('Evaluation', fontsize=20)
ax.tick_params(axis='x', labelsize=12)

plt.legend()
plt.tight_layout()
plt.show()


You can see that the LLM's response quality improves to varying degrees with each iteration. Through this quantitative evaluation, you can not only intuitively see the improvement from each optimization but also transform vague feelings of "good" or "bad" into clear, measurable data. This provides you with a scientific method to validate and iterate your prompt strategies, ensuring every improvement is evidence-based, ultimately delivering truly high-quality user experiences.

### 4.7 Let the LLM Help You Build a Custom "AI Judge"

The automated optimization loop in the previous section is very powerful, but it raises a key engineering question: when does this loop stop?

The key issue is that relying on vague evaluations like "is the gap big or small" is unreliable. You need an "**AI Judge**" that can give a clear "**‚úÖ Pass**" or "**‚ùå Fail**" conclusion based on preset quantified standards for each optimization result, thereby providing a clear and reliable stopping condition for your automated workflow.

<div align="center">
  <a href="https://img.alicdn.com/imgextra/i4/O1CN01Ie6KDU1pMWUJ2Ky25_!!6000000005346-2-tps-1115-614.png" target="_blank">
      <img src="https://img.alicdn.com/imgextra/i4/O1CN01Ie6KDU1pMWUJ2Ky25_!!6000000005346-2-tps-1115-614.png" width="700">
    </a>
  <p>Figure: Constructing an "AI Judge" via Meta-Prompting to replace the <code>analyze_gap()</code> method and enhance judgment accuracy.</p>
</div>


#### The "AI Judge" Training Process

The process of building an "AI Judge" can be seen as training a machine learning classifier. The "AI Judge" prompt you're building is the "model" to be trained, and the standard samples you prepare are the "dataset." The entire process is using data to "train" your "model."

1. **Prepare Dataset**: First create a batch of high-quality labeled samples (containing both "good answers" and "bad answers").
2. **Iterative "Training"**:
   - a. Automatically split this data proportionally (e.g., 7:3) into "training set" and "evaluation set."
   - b. Use the current "judge prompt" to make "predictions" on both the **training set** and **evaluation set**, recording the accuracy scores for each.
   
   Why monitor both scores? Tracking changes in these two scores helps you determine if the "model" is "overfitting"‚Äîmeaning it just "memorized" the training questions but didn't truly improve its judgment ability on new questions (evaluation set).
   
   - c. Find the "wrong answers" on the **training set**.
   - d. Take these "wrong answers" along with your analysis, and through Meta Prompting, have the LLM help you optimize the "judge prompt" to generate a new version.
   - e. Repeat steps b-d. You can stop "training" at any time.

3. **Stop "Training"**:
   In each iteration, you need to intervene and analyze:
   - Are both training accuracy and evaluation accuracy steadily improving?
   - Has evaluation accuracy stagnated or suddenly dropped (possible overfitting)?
   - Do the newly generated "wrong answers" expose ambiguity or contradictions in the samples themselves? (Poor sample quality)

> Note: If sample quality is poor, causing inconsistent evaluation standards or ambiguous judgments on similar questions, the LLM may never be able to construct a well-performing "AI Judge." At this point, you can stop iteration and first optimize your sample data.

Finally, you decide when to stop "training" and adopt the version of the prompt that performs best on the evaluation set as your final "**AI Judge**."


<div align="center">
    <a href="https://img.alicdn.com/imgextra/i1/O1CN01gJT1AY22xhe34lZwM_!!6000000007187-2-tps-1864-598.png" target="_blank">
    <img src="https://img.alicdn.com/imgextra/i1/O1CN01gJT1AY22xhe34lZwM_!!6000000007187-2-tps-1864-598.png" width="800"/>
    </a>
  <p>Figure: Iterating the AI Judge via ML training; stop based on train vs. eval performance.</p>
</div>



#### 4.7.1 Prepare and Split the Dataset

First, you need a batch of samples carefully labeled by you or domain experts. This is the foundation of the entire process and the cornerstone of quality.


In [None]:
# Prepare a batch of high-quality labeled samples
# In practice, these samples can be uploaded to AI platforms (like Bailian) to create evaluation sets
labeled_samples = [
    # --- 3 samples for "training" ---
    {
        "id": "train_01",
        "response": "Company benefits: Health insurance, family can use. 15 days annual leave, 5 days sick leave. Transportation subsidy 500, meal subsidy 300. Training fund 8000. Gym has discounts.", 
        "label": "Fail", 
    },
    {
        "id": "train_02",
        "response": """Company benefits include:\n1. Health and leave: Health insurance, annual checkup, 15 days annual leave and 5 days sick leave.\n2. Subsidies and incentives: Monthly 500 transportation subsidy and 300 meal subsidy, plus 8000 yuan training fund.""", 
        "label": "Fail", 
    },
    {
        "id": "train_03",
        "response": """üëã Welcome! We've prepared awesome benefits for you:\n- üè• Comprehensive health insurance and annual checkup\n- üå¥ 15 days annual leave + 5 days sick leave\n- üí∞ Monthly transportation and meal subsidies\n- üéì Up to 8000 yuan training fund\nLooking forward to working with you! üéâ""", 
        "label": "Pass", 
    },
    
    # --- 2 samples for "evaluation" ---
    {
        "id": "eval_01",
        "response": "We have health insurance, annual leave, sick leave, transportation and meal subsidies, training fund and gym discounts.", 
        "label": "Fail", 
    },
    {
        "id": "eval_02",
        "response": """Hello, new colleague! Company benefits are great:\n- Health includes insurance and checkups.\n- Leave includes annual and sick leave.\n- Money includes subsidies and education fund.\nWish you a happy work experience!""", 
        "label": "Pass", 
    },
]

# Split dataset proportionally (manual simulation here)
# In real projects, use train_test_split from scikit-learn for random splitting
train_set = labeled_samples[:3]
eval_set = labeled_samples[3:]


#### 4.7.2 Iterative "Training"

By having the model continuously learn from "wrong answers," the "judge prompt" becomes smarter and smarter.


In [None]:
# Define a helper function to evaluate prompt accuracy on a specified dataset
def evaluate_judge_prompt(judge_prompt, dataset):
    correct_predictions = 0
    misjudged_cases = []
    
    for sample in dataset:
        prompt = judge_prompt.format(response_to_judge=sample["response"])
        
        # Simulate LLM call, actual: predicted_label = llm.invoke(prompt)
        # For demonstration, manually simulate different prompt version behaviors
        predicted_label = "Pass" if "Welcome" in sample["response"] or "Hello" in sample["response"] else "Fail"
        if "v1" in judge_prompt:
             if "1." in sample["response"]: predicted_label = "Pass"

        is_correct = (predicted_label == sample["label"])
        if is_correct:
            correct_predictions += 1
        else:
            misjudged_cases.append({
                "response": sample["response"],
                "current_judgment": predicted_label,
                "correct_judgment": sample["label"]
            })
            
    accuracy = correct_predictions / len(dataset)
    return accuracy, misjudged_cases

# ----------------- Iterative "Training" Begins -----------------

# a. Prepare initial version of "judge prompt" (v1.0)
judge_prompt_v1 = """
# v1
„ÄêRole„ÄëYou are an employee experience expert.
„ÄêTask„ÄëJudge whether the„ÄêResponse to Evaluate„Äëis qualified.
„ÄêEvaluation Criteria„Äë
1. Clear structure: Information listed in points or categories.
2. Complete information: Mentions core benefits.
„ÄêResponse to Evaluate„Äë
---
{response_to_judge}
---
„ÄêOutput Requirements„ÄëPlease only answer "Pass" or "Fail".
"""

current_judge_prompt = judge_prompt_v1
max_iterations = 2

for i in range(max_iterations):
    print(f"--- Iteration {i+1} ---")
    
    # b. Make "predictions" on training and evaluation sets and record scores
    train_accuracy, misjudged_on_train = evaluate_judge_prompt(current_judge_prompt, train_set)
    eval_accuracy, _ = evaluate_judge_prompt(current_judge_prompt, eval_set)
    
    print(f"Training Set Accuracy: {train_accuracy:.0%}")
    print(f"Evaluation Set Accuracy: {eval_accuracy:.0%}")

    if not misjudged_on_train:
        print("\nNo errors on training set, training complete.")
        break
        
    # c. Find "wrong answers" on training set
    first_error = misjudged_on_train[0]
    print(f"\nFound wrong answer: Model misjudged a response that should be '{first_error['correct_judgment']}' as '{first_error['current_judgment']}'")

    # d. Package "wrong answers" and have LLM help optimize the prompt
    judge_optimizer_prompt = f"""
    „ÄêRole„ÄëYou are a top-tier prompt engineer.
    „ÄêBackground„ÄëMy current judge prompt made an error when evaluating a sample.
    
    „ÄêMy Current Judge Prompt„Äë
    ---
    {current_judge_prompt}
    ---
    
    „ÄêError Case„Äë
    - Response to evaluate: "{first_error['response']}"
    - My tool's wrong judgment: "{first_error['current_judgment']}"
    - Expected correct judgment: "{first_error['correct_judgment']}"
    - Reason I think it erred: Current criteria are too loose, didn't emphasize 'warm and friendly' tone.
    
    „ÄêTask„ÄëPlease rewrite my judge prompt to make criteria stricter and correct the above error.
    „ÄêRequirements„ÄëPlease only return the optimized new prompt.
    """
    
    print("Optimizing prompt based on wrong answers...")
    new_judge_prompt = """
    # v2
    „ÄêRole„ÄëYou are an internal communications expert pursuing ultimate employee experience, with a discerning eye.
    „ÄêTask„ÄëStrictly judge whether the„ÄêResponse to Evaluate„Äëis qualified.
    „ÄêEvaluation Criteria„Äë
    1. **Welcoming Atmosphere (Required)**: Must have clear, warm welcome greeting.
    2. **Structured Presentation (Required)**: Must use lists or sections.
    „ÄêResponse to Evaluate„Äë
    ---
    {response_to_judge}
    ---
    „ÄêOutput Requirements„ÄëPlease only answer "Pass" or "Fail".
    """
    
    # e. Update to new version prompt, enter next iteration
    current_judge_prompt = new_judge_prompt
    print("-" * 20 + "\n")

print("\n--- Iteration Ended ---")
final_judge_prompt = current_judge_prompt


#### 4.7.3 "Good Enough" Means You Can "Stop Training"

In the iteration loop above, you can clearly see the score changes:
- Iteration 1:
    - Training Set Accuracy: 67% (v1 prompt misjudged train_02)
    - Evaluation Set Accuracy: 100% (v1 prompt happened to judge both correctly)
- Iteration 2:
    - Training Set Accuracy: 100% (v2 prompt corrected the error)
    - Evaluation Set Accuracy: 100%

In this simple example, evaluation accuracy stayed high throughout. But in real scenarios, you might see evaluation accuracy stagnate or even decline.

Based on the above data, you'll find that the v2 version prompt achieved 100% accuracy on both training and evaluation sets. Therefore, you can adopt `judge_prompt_v2` as your final, validated "AI Judge." If evaluation results are unsatisfactory, you may need to go back to the beginning and check and optimize your `labeled_samples`.

#### 4.7.4 Use the "AI Judge" to Guide Automated Optimization

Now you have an "AI Judge" (`final_judge_prompt`) that's been repeatedly validated and tested, ready for deployment.

You need to use it to replace the evaluation step in section 4.6 that compared differences between two samples to determine usability, making the entire workflow more rigorous.

**New Workflow:**
1. **Generate**: Use the current "generation prompt" to produce a response.
2. **Judge**: Call your trained "AI Judge prompt" to make a "Pass/Fail" judgment on the response.
3. **Decide & Optimize**:
    - If the judgment is "**Pass**," the loop ends‚Äîoptimization successful.
    - If the judgment is "**Fail**," pass the "generation prompt," "failed response," and the "**AI Judge's rulebook**" (i.e., its prompt) together to the "optimizer," letting it generate a new version prompt based on the clear failure criteria.


In [None]:
# Define a more powerful optimization function that directly references the "judge's" rules
def optimize_with_judge_rules(current_prompt, failed_response, judge_prompt):
    """Call LLM to optimize generation prompt based on judge's rules"""
    optimizer_prompt = f"""
    „ÄêRole„ÄëYou are a top-tier prompt engineer.
    „ÄêBackground„ÄëI have a "generation prompt," but the response it generates failed the "AI Judge's" review.
    
    „ÄêGeneration Prompt„Äë
    ---
    {current_prompt}
    ---
    
    „ÄêIts Failed Response„Äë
    ---
    {failed_response}
    ---
    
    „ÄêAI Judge's Rulebook (Reason for Failure)„Äë
    ---
    {judge_prompt}
    ---
    
    „ÄêTask„Äë
    Please carefully study the "AI Judge's Rulebook" and rewrite the "Generation Prompt" to ensure the new prompt can generate a response that passes the rulebook's review.
    
    „ÄêRequirements„Äë
    Please only return the optimized new "Generation Prompt" without any other explanations.
    """
    new_prompt = llm.invoke(optimizer_prompt)
    return new_prompt

# --- Run the automated optimization loop driven by "AI Judge" ---
current_generating_prompt = initial_prompt
max_iterations = 5

print("\n" + "="*30)
print("Starting the final automated optimization workflow driven by 'AI Judge'")
print("="*30 + "\n")

for i in range(max_iterations):
    print(f"--- Final Workflow: Iteration {i+1} ---")
    
    # 1. Generate response
    prompt_for_generator = current_generating_prompt.format(retrieved_text=retrieved_text)
    generated_response = llm.invoke(prompt_for_generator)
    
    print(f"Generated Response:\n---\n{generated_response}\n---")
    
    # 2. Call "AI Judge" for judgment
    judge_prompt_filled = final_judge_prompt.format(response_to_judge=generated_response)
    judgment = llm.invoke(judge_prompt_filled)
    
    judgment_cleaned = judgment.strip().replace(".", "")
    print(f"'AI Judge' Verdict: {judgment_cleaned}")

    # 3. Decision
    if judgment_cleaned == "Pass" or judgment_cleaned == "ÈÄöËøá":
        print("\n‚úÖ Optimization successful! Generated response passed the 'AI Judge' review.")
        break
    else:
        print("‚ùå Did not pass review, optimizing based on 'Judge's' rules...")
        current_generating_prompt = optimize_with_judge_rules(
            current_generating_prompt, 
            generated_response, 
            final_judge_prompt
        )
        print("-" * 20 + "\n")
else:
    print("\nMaximum iterations reached, stopping optimization.")

print("\n" + "="*30)
print("Final Adopted Generation Prompt:")
print("="*30)
print(current_generating_prompt)

The "**AI Judge**" makes your prompt optimization workflow more controllable:
- Whenever the "generation prompt" produces a new response, first have the "AI Judge" score it.
- **Passed means optimization successful; failed means use this "bad answer" as new material to continue iterating your generation prompt**.

You can always adopt this approach: **first build the "judge," then optimize the "player."** Build a measurable, trustworthy, automatable "**Prompt Optimization Engineering**" system.

## 5. Using LLMs for intent recognition

After learning about prompt engineering techniques, your colleagues have realized that the Q&A bot can not only be used to retrieve information, but also assist  in reviewing documents and translating content. While the Q&A bot is generally effective at identifying issues, it still has some limitations, as shown in the following example:


In [None]:
rag.ask('Please help me review this sentence: Does a technical content engineer need to design and develop high-quality educational materials and courses?', query_engine=query_engine)

From the experimental results above, it's clear that the Q&A robot responded directly to the user's statement without recognizing the intended task of reviewing a sentence for errors. To understand why, let's look at what content the Q&A bot retrieved from the knowledge base before generating its response.

The following excerpt  shows part of the retrieved content:

```
Core Responsibilities
Integrating¬†educational¬†theory¬†with¬†technical¬†practice,¬†support¬†learners'¬†growth¬†and¬†development¬†through¬†high-quality¬†content¬†creation.
```

You can refer to <2_4_Automate_Evaluation_of_FAQ_Bot_Performance> to learn how to view the specific fragments retrieved  from the knowledge base.

As seen from the retrieved content, most of the information relates to the responsibilities of a technical content engineer. This context likely influenced the Q&A bot's response, causing it to focus on content creation or editing rather than performing the requested document review.

Since contextual information greatly impacts the responses generated by LLMs, it is not always necessary to use the RAG method for every query. You can leverage LLMs to first perform¬†intent recognitionclassifying the type of user question.

For example:

- If the task involves¬†document review,¬†grammar checking, or¬†content translation, the input should bypass the RAG pipeline and be handled directly by the LLM.
- Only for¬†internal knowledge queries‚Äîsuch as "What are the guidelines for writing API documentation?"‚Äîshould the RAG pipeline be used to generate answers based on the knowledge base.

<img src="https://img.alicdn.com/imgextra/i3/O1CN01nWw7IO2AGCuuujq8G_!!6000000008175-2-tps-3329-1398.png" width="1000">  



There are two main methods for using LLMs to perform intent recognition:

- Using prompts: By designing specific prompts, you can guide the LLM to generate responses that meet your expectations. This method does not require modifying the model‚Äôs parameters; instead it relies on well-constructed inputs to activate knowledge already embedded within the model.
- Fine-tuning the model: Based on a pre-trained foundational model, you can further train the model using specific annotated data to improve its ability to classify intents. This process involves adjusting some or all of the model‚Äôs parameters to better suit your use case.

In this section, we will focus on helping the LLM perform intent recognition by crafting effective prompts. You can explore the method of fine-tuning in the subsection <2_7_Improve_Model_Accuracy_and_Efficiency_via_Fine_Tuning>.


### 5.1 Intent recognition

Next, we will construct prompts to enable LLMs to classify questions. Since many applications require structured output after intent recognition‚Äîsuch as for document review or integration with RAG pipelines‚Äîthe following prompting techniques will be used to ensure accurate classification of user queries:

* Specify the output format: Clearly define the output format to make classification results standardized and easy to parse.
* Few-shot examples: Include examples to help the LLM understand the characteristics and classification rules of each category.

In [None]:
from chatbot import llm

# Build the prompt
prompt = '''
[Role Background]
You are a question classification router, responsible for determining the type of user questions and categorizing them into one of the following three categories:
1. Internal Company Document Query
2. Content Translation
3. Document Review

[Task Requirements]
Your task is to judge the intent based on the user's input content and select only the most appropriate category. Output only the category name without additional explanation. The judgment criteria are as follows:

- If the question involves company policies, processes, internal tools, or job descriptions and responsibilities, choose "Internal Company Document Query".
- If the input contains any foreign language or words like "translation", choose "Content Translation".
- If the question involves checking or summarizing external documents or link content, choose "Document Review".
- The user's previous inputs have no relation to the question classification; consider each dialogue independently for classification.

[Few-shot Examples]
Example 1: User input: "What are the commonly used project management tools within our company?"
Category: Internal Company Document Query

Example 2: User input: "Please translate the following sentence into French: How can we finish the assignment on time?"
Category: Content Translation

Example 3: User input: "Please review the document at this link: https://www.alibabacloud.com/help/en/model-studio/models#daf364417cjit"
Category: Document Review

Example 4: User input: "Please review the following content: Does a technical content engineer need to design and develop high-quality educational materials and courses?"
Category: Document Review

Example 5: User input: "What are the core responsibilities of a technical content engineer?"
Category: Internal Company Document Query

[User Input]
The following is the user's input, please determine the classification:
'''

# Get the type of question
def get_question_type(question):
    return llm.invoke(prompt + question)

print(get_question_type('https://www.promptingguide.ai/techniques/fewshot'), '\n')
print(get_question_type('Translate the following sentence into German: That is a big one I dont know why'), '\n')
print(get_question_type('As a technical content engineer, what should I pay attention to?'), '\n')

By providing clear output formats and few-shot examples, the Q&A bot can more accurately identify question types and produce responses in the expected format. This optimization makes classification tasks more standardized and lays the foundation for incorporating intent recognition into the Q&A bot.

### 5.2 Applying intent recognition to Q&A bots

Once the Q&A bot can recognize the intent of a user's question, it can first classify the type of query, then use tailored prompts and workflows to generate the most appropriate response.


In [None]:
def ask_llm_route(question):
    question_type = get_question_type(question)
    print(f'Question: {question}\nType: {question_type}')
  
    reviewer_prompt = """
    [Role Background]
    You are a document error-checking expert responsible for identifying obvious errors in documents or web content.
    [Task Requirements]
    - Your response should be concise.
    - If there are no obvious issues, reply directly with 'No issues'.\n
    [Input as follows]\n"""
  
    translator_prompt = """
   [Task Requirements]
    You are a translation expert who identifies text in different languages and translates it into the target language. If not specified, the target language is French.
    [Input as follows]\n"""

    if question_type == 'Document Review':
        return llm.invoke(reviewer_prompt + question)
    elif question_type == 'Internal Company Document Query':
        return rag.ask(question, query_engine=query_engine)
    elif question_type == 'Content Translation':
        return llm.invoke(translator_prompt + question)
    else:
        return "Unable to recognize the question type. Please re-enter."

query_engine =rag.create_query_engine(index=rag.load_index())

In [None]:
# Question 1
print(ask_llm_route('https://www.promptingguide.ai/techniques/fewshot'), '\n')

# Question 2
print(ask_llm_route('Please help me check this document: Do technicel content engineers need to optimize and update content and collaborate across departments?'), '\n')

# Question 3
print(ask_llm_route('Do technical content engineers need to optimize and update content and collaborate across departments?'), '\n')

# Question 4:
print(ask_llm_route('Translate the following sentence into French: A true master always carries the heart of a student.'), '\n')

As shown in the above experiments, the optimized Q&A bot does not always go through the RAG pipeline for every question. This approach not only saves computational resources but also prevents the knowledge base from interfering with the model's reasoning process, which could otherwise lead to lower-quality or irrelevant responses.

- Resource savings: For tasks such as checking document errors, the LLM can respond without needing to retrieve reference materials. This highlights that previous implementations may have unnecessarily consumed resources by always using the RAG pipeline.
- Avoiding misunderstandings: In earlier versions, the system retrieved reference materials for every query. These recalled text segments could sometimes confuse the LLM, affecting its ability to accurately understand the question and leading to irrelevant or off-target responses.

## 6. Reasoning LLMs

The prompt techniques and prompt frameworks discussed earlier are widely applicable to general-purpose LLMs, such as Qwen2.5-max, GPT-4, and DeepSeek-V3. These models are designed for a broad range of tasks, including general conversation, knowledge-based Q&A, and text generation. 

In addition to general-purpose LLMs, there is another category of models specifically designed for reasoning‚ÄîReasoning LLMs. These models excel in tasks that require logical thinking, multi-step problem-solving, and mathematical computation.

### 6.1 What Are reasoning LLMs?

You may have already learned about Alibaba Cloud's reasoning LLM [QwQ](https://www.alibabacloud.com/help/en/model-studio/models#fd7d313ca5khs), through the extended reading in Section 2.1. To make it easier to use in this section, the code from Section 2.1 has been updated. Try running the following example:



In [None]:
from openai import OpenAI
import os

def reasoning_model_response(user_prompt, system_prompt="You are a programming assistant.", model="qwq-plus"):
    """
    prompt: The prompt input by the user
    model: Here we use qwq-32b as an example. You can replace it with other inference model names as needed, such as: deepseek-r1
    """
    # Initialize client
    client = OpenAI(
        api_key=os.getenv("DASHSCOPE_API_KEY"),
        base_url=os.getenv("DASHSCOPE_API_BASE")
    )

    # Initialize status variables
    is_answering = False

    # Initiate streaming request
    completion = client.chat.completions.create(
        model=model,
        # messages=[{"role": "user", "content": prompt}],
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_prompt}
        ],
        stream=True,
    )

    # Print thinking process title
    print("\n" + "=" * 20 + "Thinking Process" + "=" * 20 + "\n")

    # Handle streaming response
    for chunk in completion:
        if chunk.choices:
            delta = chunk.choices[0].delta
            if hasattr(delta, 'reasoning_content') and delta.reasoning_content is not None:
                # Process thinking process content
                print(delta.reasoning_content, end='', flush=True)
            else:
                # Switch to answer output mode
                if delta.content != "" and not is_answering:
                    print("\n" + "=" * 20 + "Complete Response" + "=" * 20 + "\n")
                    is_answering = True
                # Process answer content
                if delta.content:
                    print(delta.content, end='', flush=True)

In [None]:
reasoning_model_response(user_prompt="Who are you?")

As shown in the example, reasoning LLMs include an additional "thinking process" step, which mimics how humans approach complex problems. This allows the model to break down the problem logically before providing a final answer. This structured approach helps reduce errors and improves transparency in the model‚Äôs decision-making process.

During the thinking process, if the model detects a contradiction, it can backtrack and revise its logic, making it easier for users to understand and verify the reasoning path. This feature is especially useful in scenarios requiring rigorous logic, such as mathematical problem-solving, legal analysis, or scientific reasoning.

Compared with general LLMs, reasoning models are usually more reliable when solving complex problems, such as mathematical problem-solving, code writing, legal case analysis, and other scenarios that require rigorous reasoning. This does not mean that reasoning models are necessarily better; both types of models have their own application scenarios. The following table compares these two types of models across several key dimensions:

| Dimension            | Reasoning Model  | General Model  |
|-------------------|------------------|------------|
| Design Goal     | Focuses on tasks requiring deep analysis such as **logical reasoning, multi-step problem solving, and mathematical calculations** | Targets broad scenarios such as **general conversation, knowledge-based Q&A, text generation |
| Training Data Emphasis  | Enhanced reasoning capabilities through large datasets of **math problem solutions, code logic, scientific reasoning**, etc. | Covers massive data across multiple domains such as **encyclopedias, literature, conversations**, etc. |
| Typical Output Characteristics | Outputs include **complete derivations**, focusing on the integrity of logical chains | Outputs are **concise and direct**, emphasizing natural language expression of results |
| Response Speed | Complex reasoning tasks result in **slower responses** (requires multi-step calculations) | Routine tasks have **faster responses** (mainly single-step generation) |

This comparison highlights that reasoning models excel in structured, analytical tasks, while general LLMs are more versatile for everyday use cases. Choosing the right model depends on the specific requirements of the task at hand. 

Here are some tips:
* **Clearly defined general tasks**: For problems that are well-defined and straightforward,¬†general models¬†typically perform well.
* **Complex tasks**: For highly complex tasks that require ¬†**more precise and reliable**¬†answers, ¬†**reasoning models** are recommended. These tasks may include:
    * Ambiguous tasks: When there is very little task-related information available, and clear guidance cannot be provided to the model.
    * Finding a needle in a haystack: When dealing with large amounts of unstructured data, extracting the most relevant information, or identifying connections and differences.
    * Debugging and improving code: When the task involves reviewing and refining large volumes of code.
* **Speed and cost**: Reasoning models generally take longer to process due to their detailed reasoning steps. If time and cost are critical factors and the task complexity is low,¬†**general models**¬†may be a more efficient choice.

In addition, you can combine both types of models in your application: use the reasoning model for planning and decision-making, and the general model for executing specific tasks.

### 6.2 Prompt techniques suitable for reasoning LLMs

Reasoning models can provide detailed and well-structured responses even when faced with relatively ambiguous tasks. You can still ensure the baseline quality of reasoning in large language models through **prompting techniques**:<br>

#### Technique One: Keep task prompts concise and clear, providing sufficient background information

The¬†**clear expression of requirements**¬†introduced in Section 4.1 also applies to reasoning models. Although these models are highly capable, they cannot "read minds." Therefore, it's important to keep your prompts¬†**concise and clear**, allowing the reasoning model to focus on the core task. Providing enough context ensures that the model understands the problem and can generate accurate, logical responses.



In [None]:
def example(a):
  b = []
  for i in range(len(a)):
    b.append(a[i]*2)
  return sum(b)

Through the above example, you can see that even if you only provide a large inference model with a piece of code, it can still generate detailed and thoughtful responses through a series of reasoning steps. However, the generated reasoning may contain a lot of information that is not directly relevant to your needs. To get more focused and targeted results, it's important to clearly define the **task objective** and guide the model toward the specific outcome you want. Here's an example:


In [None]:
prompt_A = """
What's wrong with the following Python code? How to optimize it?
def example(a):
    b = []
    for i in range(len(a)):
        b.append(a[i]*2)
    return sum(b)
"""

reasoning_model_response(user_prompt=prompt_A)

Similarly, you can further refine the scope by combining techniques from this section, such as¬†**4.2 Define roles and audience**¬†and¬†**4.3 Specifying output formats**, to ensure the results align with your expectations.Additionally, when the prompt is relatively complex, using delimiters can help the model better understand your intent and focus on the key elements of the task.

In [None]:
prompt_B = """
<audience>Beginner Python developers</audience>

<task>Function performance optimization, optimize the code in the code.</task>

<format>
If there are multiple optimization solutions, please output them in the following format:
[Optimization Solution X]
Problem Description: [Description]
Optimization Solution: [Description]
Example Code: [Code Block]
</format>

<code>
def example(a):
  b = []
  for i in range(len(a)):
    b.append(a[i]*2)
  return sum(b)
</code>
"""

reasoning_model_response(user_prompt=prompt_B)

#### Technique two: Avoid Explicit CoT prompts
In Section 4.5, you learned how to improve response quality by using the chain-of-thought (CoT)technique with LLMs. However, for reasoning models, it is generally unnecessary to prompt them with phrases like "think step by step" or "explain your reasoning." These models are designed to perform deep thinking naturally. Adding such instructions may actually limit their performance unless you specifically need the model to follow a strict reasoning path, which is uncommon.


#### Technique three: Adjust prompts based on model responses
Reasoning models are well-suited for analyzing their own thought processes, because their responses often include¬†**reasoning steps**. This makes it easier for you to refine prompts based on the model's output. You don‚Äôt need to worry about whether your initial prompt is perfect‚Äîsimply continue engaging with the model, providing additional information, and refining your prompts during the conversation.

For example, if your descriptions are¬†too abstract¬†or¬†not accurately described, you can use the technique of¬†adding examples, as¬†discussed in Section 4.4, to clarify the task. These examples can often be drawn from previous interactions with the model.This iterative process can be repeated multiple times, continuously adjusting prompts and allowing the model to refine its reasoning until it meets your expectations.

#### Technique four: Collaborative task completion between reasoning models and general models
Reasoning models act like "super-smart colleagues," effectively assisting you with tasks like reasoning and planning. However, for execution tasks involving repetitive actions, reasoning models may overthink, which could be inefficient.

A better approach is to let reasoning models and general models work together, leveraging their respective strengths:
* Reasoning models handle "slow thinking" tasks such as planning, analysis, and problem-solving.
* General models focus on "fast thinking" tasks, such as executing specific actions or generating content quickly.

Try running and understanding the following project, which demonstrates how these models can collaborate:


In [None]:
from openai import OpenAI
import os
import json
from typing import Generator

class TaskPlanningSystem:
    def __init__(self):
        # Initialize client connection
        self.client = OpenAI(
            api_key=os.getenv("DASHSCOPE_API_KEY"),
            base_url=os.getenv("DASHSCOPE_API_BASE")
        )
        
        # System toolset (can be extended as needed)
        self.tools = {
            "create_task": {
                "desc": "Create a new task item",
                "params": {"description": "str", "priority": "int"}
            },
            "schedule_event": {
                "desc": "Schedule an event", 
                "params": {"title": "str", "time": "datetime"}
            }
        }

    def generate_plan(self, user_request: str) -> Generator[str, None, None]:
        """Stream-generated task planning"""
        # Build planning prompt
        system_prompt = f"""You are a senior task planner, please convert the user's requirements into executable steps:
        
        Available tools (format: [[tool name]]):
        {json.dumps(self.tools, indent=2)}
        
        Output requirements:
        1. Use Markdown list format
        2. Each step should indicate the tool name
        3. Include necessary parameter examples
        """
        model="qwq-plus"
        # Initiate streaming request
        completion = self.client.chat.completions.create(
            model=model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": user_request}
            ],
            stream=True,
            temperature=0.3
        )

        # Print thinking process title
        print(f"\n\n=== [Planning Thinking Phase] Using reasoning model: {model}===")
        is_answering = False
        # Process streaming response
        for chunk in completion:
            if chunk.choices:
                delta = chunk.choices[0].delta
                if hasattr(delta, 'reasoning_content') and delta.reasoning_content is not None:
                    # Process thinking process content
                    reasoning_content = delta.reasoning_content
                    yield reasoning_content
                else:
                    # Switch to answer output mode
                    if delta.content != "" and not is_answering:
                        print(f"\n\n=== [Planning Generation Phase] Using reasoning model: {model}===")
                        is_answering = True
                    # Process answer content
                    if delta.content:
                        content = delta.content
                        yield content

    def execute_plan(self, plan: str) -> dict:
        """Execute the generated task plan"""
        # Call general model to parse the plan
        analysis_prompt = f"""Please parse the following task plan and generate executable instructions:
        
        Plan content:
        {plan}
        
        Output requirements:
        - The returned content must be in JSON format only, do not include other information, do not output code blocks.
        - Include tool names and parameters
        - Example:
            {{"steps": [
                {{"tool": "create_task", "params": {{"description": "..."}}}}
            ]}}
        """
        model="qwen-plus"
        response = self.client.chat.completions.create(
            model=model,
            messages=[{"role": "user", "content": analysis_prompt}],
            temperature=0
        )

        print(f"\n\nUsing general model {model} to format output:\n{response.choices[0].message.content}")
        # Parse execution instructions
        try:
            instructions = json.loads(response.choices[0].message.content)
            return self._run_instructions(instructions)
        except json.JSONDecodeError:
            return {"error": "Instruction parsing failed"}

    def _run_instructions(self, instructions: dict) -> dict:
        """Actually execute tool calls"""
        results = []
        for step in instructions.get("steps", []):
            tool = step.get("tool")
            params = step.get("params", {})
            
            # Execute tool call (this is an example implementation)
            if tool == "create_task":
                results.append({
                    "tool": tool,
                    "result": f"Task created: {params.get('description')}"
                })
            elif tool == "schedule_event":
                results.append({
                    "tool": tool,
                    "result": f"Event scheduled: {params.get('title')}"
                })
            else:
                results.append({"error": f"Unknown tool: {tool}"})
        
        return {"status": "completed", "results": results}

# Usage example
if __name__ == "__main__":
    PlanningSystem = TaskPlanningSystem()
    
    # Example user request
    user_request = "Please schedule next week's product launch and create related preparation tasks"
    
    # Planning generation phase
    plan_stream = PlanningSystem.generate_plan(user_request)
    generated_plan = []
    for chunk in plan_stream:
        print(chunk, end="", flush=True)
        generated_plan.append(chunk)
    
    full_plan = "".join(generated_plan)
    
    print("\n\n=== Plan Execution Phase ===")
    result = PlanningSystem.execute_plan(full_plan)
    print(f"\nCall tool execution, execution result:")
    print(json.dumps(result, indent=2, ensure_ascii=False))

The above code implements the following core interaction process. The code mainly uses the qwq-32b model for streaming planning generation, then uses qwen-plus-0919 for instruction parsing, and finally decouples tool invocation through the _run_instructions method, making it easier to extend in the future:

<a href="https://img.alicdn.com/imgextra/i4/O1CN01qJVT7z1raq4Pw20R7_!!6000000005648-2-tps-2716-1132.png" target="_blank">
<img src="https://img.alicdn.com/imgextra/i4/O1CN01qJVT7z1raq4Pw20R7_!!6000000005648-2-tps-2716-1132.png" width="1000">
</a>



## ‚úÖ Summary

Through  this section, you have gained a solid understanding of prompting techniques for both general LLMs and reasoning LLMs. The flexible application of these techniques ensures a baseline level of quality in AI-generated responses. However, as AI technology continues to evolve, these methods may also change‚Äîsuch as when reasoning models no longer require chain-of-thought prompting to achieve optimal performance.What‚Äôs most important to understand is that the core purpose of prompt engineering remains the same: to clearly express your core needs and provide rich, relevant background information (context). This is essential for effectively leveraging LLMs and ensuring they deliver accurate, meaningful results.

In real-world LLM applications, domain experts often play a key role in designing prompts. Therefore, hardcoding prompts directly into your code should be avoided. Instead, consider making prompts configurable or even building an entirely configurable application workflow. This allows domain experts‚Äîespecially those without technical backgrounds‚Äîto actively participate in refining prompts and shaping the overall process.

Alibaba Cloud's Model Studio offers a powerful¬†application building capability, enabling users to write prompts and visually design complex application flows directly on the platform. This makes it an ideal tool for LLM projects that involve collaboration with non-technical domain experts.

In the next section, you will learn about automated evaluation methods to assess the performance of Q&A bots. You‚Äôll explore how to use quantitative metrics to measure the effectiveness of your prompts and continuously improve your AI-driven applications.

## üî• Quiz

### üîç Multiple choice question

<details>
<summary style="cursor: pointer; padding: 12px; border: 1px solid #dee2e6; border-radius: 6px;">
<b>Which of the following prompt elements is used to explicitly define the task that the LLM needs to complete‚ùì (Select 1.)</b>

- A. Role
- B. Audience
- C. Objective
- D. Context

**[Click to view answer]**
</summary>

<div style="margin-top: 10px; padding: 15px; border: 1px solid #dee2e6; border-radius: 0 0 6px 6px;">

‚úÖ **Reference Answer: C**</br>
üìù **Explanation**:</br>
- The Objective explicitly defines the operation or outcome that the large language model (LLM) is required to perform. Other options do not directly define the task itself.
- Role defines the identity that the LLMs should assume, Audience specifies the target group, and Context provides background information.

</div>
</details>

---


<details>
<summary style="cursor: pointer; padding: 12px; border: 1px solid #dee2e6; border-radius: 6px;">
<b>You have inherited a complex, undocumented prompt designed for code generation. Before directly modifying this "black-box" prompt, what preparatory steps would you take to ensure your optimization efforts are effective? (Select all that apply.)</b>

- A. Collect concrete examples of poor model output ("bad cases").
- B. Immediately start drafting a new, simpler version from scratch.
- C. Clearly define the desired output criteria and format in writing.
- D. Read through the entire prompt and try to identify logical flaws based on intuition.

**[Click to view answer]**
</summary>

<div style="margin-top: 10px; padding: 15px; border: 1px solid #dee2e6; border-radius: 0 0 6px 6px;">

‚úÖ **Reference Answer: A, C**</br>
üìù **Explanation**:</br>
This question tests what core inputs are necessary before starting a structured optimization process. Effective optimization requires a clear definition of the "problem" and the "goal."
- Option A (bad cases) and Option C (desired output) are the key steps for defining the "problem" and the "goal," respectively. They serve as the core inputs for a structured optimization approach like meta-prompting.
- Options B (starting from scratch) and D (relying on intuition) are high-risk, inefficient methods because they do not rely on a structured, evidence-based approach.

</div>
</details>

---

<details>
<summary style="cursor: pointer; padding: 12px; border: 1px solid #dee2e6; border-radius: 6px;">
<b>When using a reasoning LLM (such as `qwen3-235b-a22b-thinking-2507`) for a complex reasoning task, which of the following prompting techniques are recommended‚ùì (Select all that apply.)</b>

- A. Provide simple and clear task instructions.
- B. Supplement with sufficient background information.
- C. If there are specific user groups or task requirements, define a role or audience.
- D. You must include phrases like "think step by step" or "explain your reasoning" in the prompt to activate the model's deep thinking mode.
- E. When a description is too abstract or cannot be accurately described, clarify it by adding examples.

**[Click to view answer]**
</summary>

<div style="margin-top: 10px; padding: 15px; border: 1px solid #dee2e6; border-radius: 0 0 6px 6px;">

‚úÖ **Reference Answer: A, B, C, E**</br>
üìù **Explanation**:</br>
- Reasoning LLMs (like `qwen3-235b-a22b-thinking-2507`) are designed to engage in deep, chain-of-thought style reasoning by default, without needing an explicit instruction to do so.

</div>
</details>
