# Basic tutorial of CoolPrompt
*Actual for 1.1.0+ version*

## Quick Run

In [1]:
from coolprompt.assistant import PromptTuner

prompt_tuner = PromptTuner()

prompt_tuner.run('Summarize an essay about autumn')

[2025-10-15 11:10:04,534] [INFO] [llm.init] - Initializing default model: Qwen/Qwen3-4B-Instruct-2507


Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

Device set to use cuda:0
The following generation flags are not valid and may be ignored: ['temperature', 'top_p', 'top_k']. Set `TRANSFORMERS_VERBOSITY=info` for more details.
[2025-10-15 11:10:15,132] [INFO] [assistant.__init__] - Validating the target model
[2025-10-15 11:10:15,133] [INFO] [assistant.__init__] - PromptTuner successfully initialized
[2025-10-15 11:10:15,135] [INFO] [detector.generate] - Detecting the task by query
`generation_config` default values have been modified to match model-specific defaults: {'do_sample': True}. If this is not desired, please set these values explicitly.
[2025-10-15 11:10:16,263] [INFO] [detector.generate] - Task defined as generation
[2025-10-15 11:10:16,264] [INFO] [assistant.run] - Validating args for PromptTuner running
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already

In [2]:
print(prompt_tuner.final_prompt)


You are a skilled summarizer tasked with distilling the essential ideas of an essay about autumn into a concise, coherent, and accurate summary. The essay explores the theme, characteristics, and cultural or natural significance of autumn. Your goal is to extract the main arguments, key observations, and central themes without introducing external information or altering the original content.

Follow these steps:
1. Identify the central theme of the essay (e.g., the season's natural changes, emotional tone, cultural representations).
2. Extract key characteristics of autumn described in the essay (e.g., falling leaves, harvest, temperature changes, animal behavior).
3. Highlight any cultural, historical, or societal references to autumn (e.g., festivals, traditions, literature).
4. Synthesize these elements into a single, clear, and brief paragraph that reflects the essay’s core message.
5. Ensure the summary is neutral, factual, and directly derived from the text—do not interpret, sp

### What happened step-by-step
1. We initialized a CoolPrompt Tuner with default large language model: qwen3-4B-instruct
2. We input a start prompt that should be modified
3. CoolPrompt auto-detected preferred task and metric for prompt optimization and evaluation
4. CoolPrompt generated automatically the dataset with target labels and split into train and test samples
5. The default prompt optimizer HyPE suggested the optimized prompt
6. CoolPrompt outputs the results:
   - Metric scores of optimization efficiency
   - The final prompt
   - The interpretation of prompt optimization

### That is a simple approach how to optimize prompts. Let's try to configure CoolPrompt Tuner

## Setup a CoolPrompt Tuner

### 1. LLM Choice
The framework is a model-agnostic, you can use proprietary, open-source or custom LLMs with different interfaces via LangChain compatibility. 

#### Check [the full list of provider interfaces](https://python.langchain.com/docs/integrations/llms/).

Default LLM in CoolPrompt is Qwen/Qwen3-4B-Instruct-2507 and it a same can be defined as:

```python
llm = HuggingFacePipeline.from_model_id(
    model_id="Qwen/Qwen3-4B-Instruct-2507",
    task="text-generation",
    pipeline_kwargs={
        "max_new_tokens": 4000,
        "temperature": 0.01,
        "do_sample": False,
        "return_full_text": False,
    }
)
target_model = ChatHuggingFace(llm=llm)

prompt_tuner = PromptTuner(target_model=target_model)
```


You can use other popular interfaces as:

#### Ollama
For rapid experiments with low resources

In [None]:
# Before the run command serve the model with ollama
from langchain_ollama.llms import OllamaLLM

my_model = OllamaLLM(
    model="qwen2.5-coder:32b"
)
prompt_tuner = PromptTuner(target_model=my_model)

#### VLLM
As a production ready solution

In [None]:
from langchain_community.llms import VLLM

my_model = VLLM(
    model="Qwen/Qwen3-4B-Instruct-2507",
    trust_remote_code=True,
    dtype='bfloat16',
)

prompt_tuner = PromptTuner(target_model=my_model)

#### ChatOpenAI
For OpenAI and OpenAI compatible models

In [None]:
from langchain_openai.chat_models import ChatOpenAI

my_model = ChatOpenAI(
    model="paste model_name",
    base_url="paste base_url",
    openai_api_key="paste api_key",
    temperature=0.01,
    max_tokens=4000,
)

prompt_tuner = PromptTuner(target_model=my_model)

### 2. System Model
Argument `system_model` defines a core in a **PromptAssistant** module.

**PromptAssistant** is responsible for:
- Definition automatically a task type and a metric for the specific start prompt.
- Generates a syntetic dataset and target labels (when the real data is not provided).
- Provides an optimization feedback.

By default a `system_model` is a defined the same as a `target_model`. It could be set with a different llm in code below:

In [42]:
from langchain_community.llms import VLLM

target_model = VLLM(
    model="Qwen/Qwen3-4B-Instruct-2507"
)
system_model = VLLM(
    model="Qwen/Qwen3-30B-A3B-Instruct-2507"
)

prompt_tuner = PromptTuner(
    target_model=target_model,
    system_model=system_model
)

#### IMPORTANT NOTE: `system_model` needs to be a confident instructional llm that can generate a structual output.

### 3. Task and metrics
CoolPrompt supports 2 task types and a set of metrics: 
- `classification` - a common classification
    - `accuracy`
    - `f1` (f1-macro)
- `generation` - a general new text generation
    - `bleu`
    - `rouge`
    - `meteor`
    - `bertscore`
    - `geval` (experimental, one metric: *textual accuracy*)

For each run you can define as follows:

In [None]:
prompt_tuner.run(
    start_prompt="Classify a sentence sentiment",
    task="classification",
    metric="f1",
)

prompt_tuner.run(
    start_prompt="Summarize the text",
    task="generation",
    metric="rouge",
)

### 4. Setup a dataset configuration

#### 4.1 Input a real dataset

In [41]:
from langchain_openai.chat_models import ChatOpenAI

my_model = ChatOpenAI(
    model="gpt-3.5-turbo",
    openai_api_key="key",
    temperature=0,
    max_tokens=3500,
)

prompt_tuner = PromptTuner(target_model=my_model)

[2025-10-12 12:13:30,157] [INFO] [assistant.__init__] - Validating the target model
[2025-10-12 12:13:30,158] [INFO] [assistant.__init__] - PromptTuner successfully initialized


In [40]:
from datasets import load_dataset

sst2 = load_dataset("stanfordnlp/sst2")
class_dataset = sst2['train']['sentence'][:30]
class_targets = sst2['train']['label'][:30]

prompt_tuner.run(
    start_prompt="Classify sentence sentiment positive or negative",
    task="classification",
    dataset=class_dataset,
    target=class_targets,
    metric="f1"
)

print(prompt_tuner.final_prompt)

[2025-10-12 12:12:49,368] [INFO] [assistant.run] - Validating args for PromptTuner running
[2025-10-12 12:12:50,423] [INFO] [evaluator.__init__] - Evaluator successfully initialized with f1 metric
[2025-10-12 12:12:51,334] [INFO] [assistant.run] - === Starting Prompt Optimization ===
[2025-10-12 12:12:51,335] [INFO] [assistant.run] - Method: hype, Task: classification
[2025-10-12 12:12:51,336] [INFO] [assistant.run] - Metric: f1, Validation size: 0.25
[2025-10-12 12:12:51,337] [INFO] [assistant.run] - Dataset: 30 samples
[2025-10-12 12:12:51,338] [INFO] [assistant.run] - Target: 30 samples
[2025-10-12 12:12:51,338] [INFO] [hype.hype_optimizer] - Running HyPE optimization...
[2025-10-12 12:12:52,655] [INFO] [hype.hype_optimizer] - HyPE optimization completed
[2025-10-12 12:12:52,656] [INFO] [assistant.run] - Running the prompt format checking...
[2025-10-12 12:12:53,423] [INFO] [assistant.run] - Evaluating on given dataset for classification task...
[2025-10-12 12:12:53,424] [INFO] [eva


Train a machine learning model to classify the sentiment of sentences as either positive or negative. Use a dataset of labeled sentences where each sentence is associated with its sentiment label. Aim to create a classifier that can accurately predict the sentiment of new, unseen sentences based on the patterns and features in the training data.



In [37]:
samsum = load_dataset("knkarthick/samsum")
gen_dataset = samsum['train']['dialogue'][:30]
gen_targets = samsum['train']['summary'][:30]

prompt_tuner.run(
    start_prompt="Summarize the text",
    task="generation",
    dataset=gen_dataset,
    target=gen_targets,
)

print(prompt_tuner.final_prompt)

[2025-10-12 12:11:45,024] [INFO] [assistant.run] - Validating args for PromptTuner running
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[2025-10-12 12:11:46,060] [INFO] [evaluator.__init__] - Evaluator successfully initialized with meteor metric
[2025-10-12 12:11:47,011] [INFO] [assistant.run] - === Starting Prompt Optimization ===
[2025-10-12 12:11:47,012] [INFO] [assistant.run] - Method: hype, Task: generation
[2025-10-12 12:11:47,013] [INFO] [assistant.run] - Metric: meteor, Validation size: 0.25
[2025-10-12 12:11:47,014] [INFO] [assistant.run] - Dataset: 30 samples
[2025-10-12 12:11:47,015] [INFO] [assistant.run] - Target: 30 samples
[2025-10-12 12:11:47,015] [INFO] [hype.hype_optimi


Given a text, create a concise summary by extracting the most important information and presenting it in a condensed form.



#### 4.2 Define a split strategy

`validation_size` - a ratio for train-valid split

In [None]:
prompt_tuner.run(
    start_prompt="Summarize the text",
    task="generation",
    dataset=gen_dataset,
    target=gen_targets,
    metric="bertscore",
    validation_size=0.4
)

`train_as_test` - assign a test set as a train, `validation_size` will be ignored

In [None]:
prompt_tuner.run(
    start_prompt="Summarize the text",
    task="generation",
    dataset=gen_dataset,
    target=gen_targets,
    metric="bertscore",
    train_as_test=True
)

### 5. Choosing an optimizer

#### HyPE

Provides the prompt refactoring workflow: the initial prompt is injected into special predetermined by our researches query template. 

This optimizer requires only one query to the LLM, so it surely can be used as a fast and simple tool to make your prompt better.

In [24]:
prompt_tuner.run(
    start_prompt="Summarize the text",
    method='hype'
)

print(prompt_tuner.final_prompt)

[2025-10-12 12:01:33,328] [INFO] [detector.generate] - Detecting the task by query
[2025-10-12 12:01:33,849] [INFO] [detector.generate] - Task defined as generation
[2025-10-12 12:01:33,850] [INFO] [assistant.run] - Validating args for PromptTuner running
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[2025-10-12 12:01:34,882] [INFO] [evaluator.__init__] - Evaluator successfully initialized with meteor metric
[2025-10-12 12:01:34,883] [INFO] [generator.generate] - Problem description was not provided, so it will be generated automatically
[2025-10-12 12:01:35,845] [INFO] [generator.generate] - Generated problem description: The task is to create a system that can automatically generate a c


Given a text, create a concise summary by extracting the most important information. Ensure the summary is brief yet informative for the user.



#### ReflectivePrompt

This method is based on the idea of Reflective Evolution and text-based gradients. It implements short-term and long-term reflections to provide some clarifications and make crossover and mutation operations more precise and effective.

ReflectivePrompt has 2 arguments:
- population_size (default: 10)
- num_epochs (default: 5)


In [25]:
prompt_tuner.run(
    start_prompt="Summarize the text",
    method='reflective',
    num_epochs=2
)

[2025-10-12 12:02:56,733] [INFO] [detector.generate] - Detecting the task by query
[2025-10-12 12:02:57,194] [INFO] [detector.generate] - Task defined as generation
[2025-10-12 12:02:57,194] [INFO] [assistant.run] - Validating args for PromptTuner running
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[2025-10-12 12:02:58,202] [INFO] [evaluator.__init__] - Evaluator successfully initialized with meteor metric
[2025-10-12 12:02:58,203] [INFO] [generator.generate] - Problem description was not provided, so it will be generated automatically
[2025-10-12 12:02:59,371] [INFO] [generator.generate] - Generated problem description: The task is to create a system that can automatically generate a c

#### DistillPrompt

This method is based on different prompt-transformation methods that will use LLM to effectively examine the search area and find the best prompt variations for given task.

DistillPrompt has 1 argument:
- num_epochs (default: 5)

In [27]:
prompt_tuner.run(
    start_prompt="Summarize the text",
    method='distill',
    num_epochs=2
)

[2025-10-12 12:08:01,668] [INFO] [detector.generate] - Detecting the task by query
[2025-10-12 12:08:02,166] [INFO] [detector.generate] - Task defined as generation
[2025-10-12 12:08:02,166] [INFO] [assistant.run] - Validating args for PromptTuner running
[nltk_data] Downloading package wordnet to /root/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package punkt_tab to /root/nltk_data...
[nltk_data]   Package punkt_tab is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /root/nltk_data...
[nltk_data]   Package omw-1.4 is already up-to-date!
[2025-10-12 12:08:03,729] [INFO] [evaluator.__init__] - Evaluator successfully initialized with meteor metric
[2025-10-12 12:08:03,730] [INFO] [generator.generate] - Problem description was not provided, so it will be generated automatically
[2025-10-12 12:08:04,511] [INFO] [generator.generate] - Generated problem description: The task is to create a system that can automatically generate a c

## Framework Results

#### Metrics before-after

In [29]:
print('Before optimization:', prompt_tuner.init_metric)
print('After optimization:', prompt_tuner.final_metric)

Before optimization: 0.2138408105858377
After optimization: 0.7006453459110112


#### Prompts before-after

In [31]:
print('Before optimization:', prompt_tuner.init_prompt)
print('After optimization:', prompt_tuner.final_prompt)

Before optimization: Summarize the text
After optimization: Summarize text by identifying main ideas and key points concisely for clarity and coherence.


#### Generated dataset

In [32]:
print(prompt_tuner.synthetic_dataset)
print(prompt_tuner.synthetic_target)

['A new study shows that regular exercise can improve mental health and reduce the risk of depression.', 'The latest research indicates that eating a balanced diet can lead to better overall health and well-being.', 'Scientists have discovered a new treatment for a common type of cancer that shows promising results in clinical trials.', 'A recent report highlights the importance of sleep for cognitive function and overall productivity.', 'Experts suggest that practicing mindfulness meditation can reduce stress and improve mental clarity.', 'A study found that spending time in nature can have a positive impact on mental health and well-being.', 'Researchers have developed a new technology that could revolutionize renewable energy production.', 'A report highlights the importance of regular physical activity in maintaining a healthy lifestyle and preventing chronic diseases.', "Scientists have identified a gene that may play a role in the development of Alzheimer's disease.", 'Recent stu

#### Optimization feedback

In [34]:
print('Feedback:', prompt_tuner.assistant_feedback)

Feedback: Your original prompt was quite vague and lacked specificity. We enhanced it by adding clarity and focus on the task by specifying to summarize text by identifying main ideas and key points concisely. This helps the AI understand the exact requirements and deliver a more targeted response. The revised prompt emphasizes the importance of conciseness for clarity and coherence, which guides the AI to provide a more structured and coherent summary. Key advice: Always aim for specificity in your prompts to guide the AI effectively.
