**Zero-Shot Prompting**

	• What it is: You give the model only the instructions, context, and question. No examples.
	• When to use:
	• When you don’t have sample Q&A pairs.
	• For general-purpose answering.
	• When you want the model to reason freely but still stay grounded in the context.
	• Trade-off: Faster and simpler, but sometimes answers may not follow the exact style or format you want.

Example:

Context about Transformers → Ask: “How do they handle long-range dependencies?”

The model figures out the answer directly from context.

In [4]:
!pip install -q -U llama-index llama-index-llms-groq


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [5]:
# --- ZERO-SHOT ---
import os
from llama_index.core import PromptTemplate
from llama_index.llms.groq import Groq

os.environ["GROQ_API_KEY"] = "your_groq_api_key"

def run_llm_zeroshot(context: str, query: str) -> str:
    llm = Groq(model="llama-3.3-70b-versatile", temperature=0)

    template_str = (
        "You are an expert AI assistant.\n"
        "Use ONLY use the provided context to answer the user's question. "
        "If the context is insufficient or does not mention the answer, reply exactly: "
        "'Not enough information.'\n\n"
        "Context:\n{context_str}\n\n"
        "User Question: {query_str}\n\n"
        "Answering Rules:\n"
        "1) Be concise and precise (3–6 sentences, unless the question requires more).\n"
        "2) Use bullet points for lists.\n"
        "3) At the end, include a 'Sources:' section with short snippets or filenames from the context you used.\n\n"
        "Final Answer:"
    )
    prompt = PromptTemplate(template_str).format(context_str=context, query_str=query)
    response = llm.complete(prompt=prompt)
    output = response.text
    return output

In [6]:
# Zero-Shot: No examples, just context + query
context_text = (
    "Transformers use a self-attention mechanism that allows each token "
    "to attend to all other tokens in the sequence. This helps capture "
    "long-range dependencies without recurrence."
)

query_text = "How do Transformers handle long-range dependencies?"

ans0 = run_llm_zeroshot(context=context_text, query=query_text)

print(ans0)

Transformers handle long-range dependencies through their self-attention mechanism. This mechanism enables each token in a sequence to attend to all other tokens, regardless of their position. As a result, Transformers can capture dependencies between tokens that are far apart in the sequence without relying on recurrence. The self-attention mechanism is a key component of the Transformer architecture, allowing it to effectively model complex relationships between tokens. This approach helps Transformers to better understand the context and meaning of the input sequence.

Sources: Context snippet on Transformers and self-attention mechanism
