# Few-Shots Prompting

Few-shot prompting can be used as a technique to enable in-context learning where we provide demonstrations in the prompt to steer the model to better performance. The demonstrations serve as conditioning for subsequent examples where we would like the model to generate a response.

## References:
* [Touvron et al. 2023](https://arxiv.org/pdf/2302.13971.pdf): present few shot properties  when models were scaled to a sufficient size
* [Kaplan et al., 2020](https://arxiv.org/abs/2001.08361)
* [Brown et al. 2020](https://arxiv.org/abs/2005.14165)


## Running this code on MyBind.org

Note: remember that you will need to **adjust CONFIG** with **proper URL and API_KEY**!

[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/GenILab-FAU/prompt-eng/HEAD?urlpath=%2Fdoc%2Ftree%2Fprompt-eng%2Ffew_shots.ipynb)



In [11]:
############################################################
## FEW-SHOT PROMPTING: PROJECT OVERVIEW
############################################################

import os
import csv
from datetime import datetime
from _pipeline import create_payload, model_req

# 1) Prepare a few-shot example
FEW_SHOT_EXAMPLES = """
Example 1:
INPUT:
"You are an AI that generates project reports. 
Please create the 'Project Overview' section covering:
- Title
- Goal
- Problem Statement
- Key Objectives
- Scope
Format it in clear paragraphs, and use Markdown headings."

OUTPUT:
"# Project Overview
## Title
AI-Driven Data Analysis
## Goal
To automate data processing and generate insights...
## Problem Statement
Many organizations struggle with...
## Key Objectives
1. ...
2. ...
## Scope
This project focuses on..."

-------------------------------------------------------

Example 2:
INPUT:
"You are an AI that generates project reports.
Please create the 'Project Overview' section covering:
- Title
- Goal
- Problem Statement
- Key Objectives
- Scope
Format it in clear paragraphs, and use Markdown headings."

OUTPUT:
"# Project Overview
## Title
Natural Language Processing Toolkit
## Goal
To provide an end-to-end solution for text analytics...
## Problem Statement
Text data is abundant but difficult to process...
## Key Objectives
1. ...
2. ...
## Scope
We will address the text ingestion pipeline..."
"""

# 2) Now define the NEW request (the actual content you want)
NEW_REQUEST = """
You are an AI that generates project reports.
Please create the "Project Overview" section covering:
- Title
- Goal
- Problem Statement
- Key Objectives
- Scope
Format it in clear paragraphs, and use Markdown headings.

Project details:
- Title: "Automated QA System"
- Goal: "Improve software testing efficiency by 40%"
- Problem Statement: "Manual QA is time-consuming and prone to human error"
- Key Objectives:
  1) Integrate AI-based test generation
  2) Provide automated bug detection
- Scope: "Applicable to web and mobile platforms"
"""

# 3) Combine the examples + new request into one prompt
FEW_SHOT_PROMPT = f"{FEW_SHOT_EXAMPLES}\n\nNow, follow the format shown in the examples:\n\n{NEW_REQUEST}"

# 4) Create the payload for your model (adjust model name/params as needed)
payload = create_payload(
    target="open-webui",
    model="qwen2",          # Example model; adjust as desired
    prompt=FEW_SHOT_PROMPT,
    temperature=0.7,        # Balanced creativity
    num_ctx=200,            # Enough context for examples + new content
    num_predict=300         # Enough tokens to produce the entire overview
)

# 5) Make the request
time_taken, few_shot_response = model_req(payload=payload)

# 6) Display the output and timing
print("===== FEW-SHOT OUTPUT =====")
print(few_shot_response)
if time_taken:
    print(f"\nTime taken: {time_taken}s")

# 7) (Optional) Log the result for future reference
os.makedirs("data", exist_ok=True)  # ensure data folder exists
log_entry = [
    datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
    "few_shot",
    "qwen2",
    0.9,
    time_taken,
    few_shot_response.replace("\n", "\\n")  # escape newlines for CSV
]

with open("data/few_shot_logs.csv", "a", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    writer.writerow(log_entry)


Payload: {'model': 'qwen2', 'messages': [{'role': 'user', 'content': '\nExample 1:\nINPUT:\n"You are an AI that generates project reports. \nPlease create the \'Project Overview\' section covering:\n- Title\n- Goal\n- Problem Statement\n- Key Objectives\n- Scope\nFormat it in clear paragraphs, and use Markdown headings."\n\nOUTPUT:\n"# Project Overview\n## Title\nAI-Driven Data Analysis\n## Goal\nTo automate data processing and generate insights...\n## Problem Statement\nMany organizations struggle with...\n## Key Objectives\n1. ...\n2. ...\n## Scope\nThis project focuses on..."\n\n-------------------------------------------------------\n\nExample 2:\nINPUT:\n"You are an AI that generates project reports.\nPlease create the \'Project Overview\' section covering:\n- Title\n- Goal\n- Problem Statement\n- Key Objectives\n- Scope\nFormat it in clear paragraphs, and use Markdown headings."\n\nOUTPUT:\n"# Project Overview\n## Title\nNatural Language Processing Toolkit\n## Goal\nTo provide an

## How to improve it?

Following the findings from [Min et al. (2022)](https://arxiv.org/abs/2202.12837), here are a few more tips about demonstrations/exemplars when doing few-shot:

* "the label space and the distribution of the input text specified by the demonstrations are both important (regardless of whether the labels are correct for individual inputs)"
* the format you use also plays a key role in performance, even if you just use random labels, this is much better than no labels at all.
* additional results show that selecting random labels from a true distribution of labels (instead of a uniform distribution) also helps.