**Few-Shot Prompting**

	• What it is: You give the model a few examples of context → question → answer, then add your new context + question.
	• When to use:
	• When you want the model to mimic a specific style (tone, structure, format).
	• If you want consistent answers (e.g., always ending with “Sources: …”).
	• Good for teaching the model special formatting rules or domain-specific style.
	• Trade-off: More control, but you must prepare good examples, and the prompt can get longer.

Example:

Show 2–3 sample Q&A pairs → Then ask a new question with new context.
The model copies the answering style from your examples.

In [2]:
!pip install -q -U llama-index llama-index-llms-groq


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [4]:
# --- FEW-SHOT ---
import os
from typing import List, Dict
from llama_index.core import PromptTemplate
from llama_index.llms.groq import Groq

os.environ["GROQ_API_KEY"] = "your_groq_api_key"

def run_llm_fewshot(context: str, query: str, examples: List[Dict[str, str]]) -> str:
    llm = Groq(model="llama-3.3-70b-versatile", temperature=0)

    examples_str = "\n".join(
        f"Example {i+1}:\nContext:\n{ex.get('context','')}\n"
        f"Question: {ex.get('question','')}\n"
        f"Answer: {ex.get('answer','')}\n"
        for i, ex in enumerate(examples)
    )

    template_str = (
        "You are an expert AI assistant.\n"
        "Use ONLY the provided context to answer the user's question. "
        "If the context is insufficient or does not mention the answer, reply exactly: "
        "'Not enough information.'\n\n"
        "Follow the style and reasoning illustrated by the examples.\n\n"
        "Examples:\n{examples_str}\n"
        "--- End of Examples ---\n\n"
        "Context:\n{context_str}\n\n"
        "User Question: {query_str}\n\n"
        "Answering Rules:\n"
        "1) Be concise and precise (3–6 sentences, unless the question requires more).\n"
        "2) Use bullet points for lists.\n"
        "3) At the end, include a 'Sources:' section with short snippets or filenames from the context you used.\n\n"
        "Final Answer:"
    )
    prompt = PromptTemplate(template_str).format(
        examples_str=examples_str, context_str=context, query_str=query
    )
    return llm.complete(prompt=prompt).text

In [5]:
# Few-Shot: Add examples so the model mimics your style
shots = [
    {
        "context": "Positional encodings inject order information into sequences.",
        "question": "Why are positional encodings needed?",
        "answer": (
            "They give the model a sense of word order.\n"
            "- Without them, the model treats tokens as a bag of words.\n"
            "- Encodings ensure the sequence structure is preserved.\n"
            "Sources: lecture_notes.txt"
        )
    },
    {
        "context": "Multi-head attention projects queries, keys, and values into multiple subspaces.",
        "question": "What is the benefit of multi-head attention?",
        "answer": (
            "It lets the model learn from different representation subspaces.\n"
            "- Captures diverse relationships.\n"
            "- Improves contextual understanding.\n"
            "Sources: attention_paper.pdf"
        )
    },
]

context_text = (
    "Context from attention_mechanism.pdf"
    "In the attention mechanism, softmax is used on the similarity scores "
    "between queries and keys to produce attention weights."
)

query_text = "What does softmax do in attention?"

ansk = run_llm_fewshot(context=context_text, query=query_text, examples=shots)

print(ansk)

In the attention mechanism, softmax is applied to the similarity scores between queries and keys. This process produces attention weights, which represent the relative importance of each key with respect to the query. The softmax function normalizes the similarity scores, ensuring they add up to 1. This normalization allows the model to focus on the most relevant keys. Key benefits of using softmax include:
* Normalizing the attention weights
* Ensuring the weights add up to 1
Sources: attention_mechanism.pdf


**👉 Rule of thumb:**

	• Use Zero-Shot for quick, flexible answers.
	• Use Few-Shot when consistency, formatting, or special style matters.