**To perform optimization, you need to initialize a model (or use already initialized)**

In [3]:
from coolprompt.language_model.llm import DefaultLLM

model = DefaultLLM.init(langchain_config={
    'max_new_tokens': 1000,
    "temperature": 0.0,
})


INFO 06-27 06:13:18 __init__.py:207] Automatically detected platform cuda.
INFO 06-27 06:13:26 config.py:549] This model supports multiple tasks: {'generate', 'classify', 'embed', 'score', 'reward'}. Defaulting to 'generate'.
INFO 06-27 06:13:26 llm_engine.py:234] Initializing a V0 LLM engine (v0.7.3) with config: model='t-tech/T-lite-it-1.0', speculative_config=None, tokenizer='t-tech/T-lite-it-1.0', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, tokenizer_revision=None, trust_remote_code=True, dtype=torch.float16, max_seq_len=32768, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=False, quantization=None, enforce_eager=False, kv_cache_dtype=auto,  device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='xgrammar'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=Fals

Loading safetensors checkpoint shards:   0% Completed | 0/4 [00:00<?, ?it/s]


INFO 06-27 06:13:35 model_runner.py:1115] Loading model weights took 14.2426 GB
INFO 06-27 06:13:38 worker.py:267] Memory profiling takes 3.21 seconds
INFO 06-27 06:13:38 worker.py:267] the current vLLM instance can use total_gpu_memory (79.15GiB) x gpu_memory_utilization (0.90) = 71.24GiB
INFO 06-27 06:13:38 worker.py:267] model weights take 14.24GiB; non_torch_memory takes 0.09GiB; PyTorch activation peak memory takes 4.35GiB; the rest of the memory reserved for KV Cache is 52.55GiB.
INFO 06-27 06:13:38 executor_base.py:111] # cuda blocks: 61497, # CPU blocks: 4681
INFO 06-27 06:13:38 executor_base.py:116] Maximum concurrency for 32768 tokens per request: 30.03x
INFO 06-27 06:13:42 model_runner.py:1434] Capturing cudagraphs for decoding. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI. If out-of-memory error occurs during cudagraph capture, consider decreasing `gpu_memory

Capturing CUDA graph shapes: 100%|██████████| 35/35 [00:24<00:00,  1.45it/s]

INFO 06-27 06:14:06 model_runner.py:1562] Graph capturing finished in 24 secs, took 0.19 GiB
INFO 06-27 06:14:06 llm_engine.py:436] init engine (profile, create kv cache, warmup model) took 31.69 seconds





**Then you initialize the tuner itself, as well as the datasets used for optimization**

In [15]:
from coolprompt.assistant import PromptTuner
tuner = PromptTuner(model=model)

In [16]:
from datasets import load_dataset

sst2 = load_dataset("sst2")
n_instances = 20
dataset = sst2['train']['sentence'][:n_instances]
targets = sst2['train']['label'][:n_instances]

In [17]:
start_prompt = "Please perform Sentiment Classification task."

**Now the optimization, might take a while if your dataset is large**

In [18]:
final_prompt = tuner.run(
    start_prompt=start_prompt,
    task='classification',
    dataset=dataset,
    target=targets,
    method='distill',
    use_cache=True,
    num_epochs=1
)

[2025-06-27 06:18:39,629] - Starting DistillPrompt optimization...
[2025-06-27 06:18:39,629] - Starting DistillPrompt optimization...
INFO:Distiller:Starting DistillPrompt optimization...
Processed prompts: 100%|██████████| 15/15 [00:12<00:00,  1.23it/s, est. speed input: 60.84 toks/s, output: 783.11 toks/s]
[2025-06-27 06:18:51,811] - Starting round 0
[2025-06-27 06:18:51,811] - Starting round 0
INFO:Distiller:Starting round 0

[Acessed prompts:   0%|          | 0/4 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]
[Acessed prompts:  25%|██▌       | 1/4 [00:00<00:00,  3.13it/s, est. speed input: 357.28 toks/s, output: 75.21 toks/s]
[Acessed prompts:  50%|█████     | 2/4 [00:00<00:00,  4.05it/s, est. speed input: 442.07 toks/s, output: 126.02 toks/s]
[Acessed prompts:  75%|███████▌  | 3/4 [00:00<00:00,  5.33it/s, est. speed input: 540.85 toks/s, output: 183.44 toks/s]
Processed prompts: 100%|██████████| 4/4 [00:00<00:00,  5.42it/s, est. speed input: 618.42 toks/s

In [19]:
print("PROMPT:", final_prompt)

PROMPT: Perform a sentiment classification task by analyzing the sentiment of the given text as either positive, negative, or neutral. Ensure that your analysis is clear, focused, and accurate.


In [20]:
print("INITIAL METRIC:", tuner.init_metric)

INITIAL METRIC: 0.375


In [21]:
print("FINAL METRIC:", tuner.final_metric)

FINAL METRIC: 0.9466666666666667


**Prompts are also saved inside tuner**

In [22]:
print("INITIAL PROMPT:", tuner.init_prompt)

INITIAL PROMPT: Please perform Sentiment Classification task.


In [23]:
print("FINAL PROMPT:", tuner.final_prompt)

FINAL PROMPT: Perform a sentiment classification task by analyzing the sentiment of the given text as either positive, negative, or neutral. Ensure that your analysis is clear, focused, and accurate.
