**4.1. Using Prompt Template**

**Prompt engineering**

Prompt engineering is the process of crafting effective instructions, or "prompts," to guide large language models (LLMs) and other generative AI models towards producing desired outputs. It's the art and science of designing prompts that help AI understand your intent and generate relevant, high-quality responses.

**Prompt Template**

Prompt templates help to translate user input and parameters into instructions for a language model. This can be used to guide a model's response, helping it understand the context and generate relevant and coherent language-based output.

In [3]:
!pip install -q -U llama-index llama-index-llms-groq llama-index-llms-ollama


[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


In [5]:
# importing the dependencies
import os
from llama_index.core import PromptTemplate
from llama_index.llms.ollama import Ollama
# from llama_index.llms.groq import Groq  # uncomment to use Groq

In [8]:
# LLM setup
# Option 1: Use Ollama LLM (default) - update the model
llm = Ollama(
    model="mistral:7b",
    temperature=0.0
)

In [7]:
# Option 2: Use Groq LLM
# GROQ_API_KEY = "your_groq_api_key"
# os.environ["GROQ_API_KEY"] = GROQ_API_KEY
# llm = Groq(model="llama-3.1-8b-instant", temperature=0)

In [9]:
# prompt template

template_str = (
    "You are an expert AI assistant.\n"
    "Use ONLY the use provided context to answer the user's question. "
    "If the context is insufficient or does not mention the answer, reply exactly: "
    "'Not enough information.'\n\n"
    "Context:\n{context_str}\n\n"
    "User Question: {query_str}\n\n"
    "Answering Rules:\n"
    "1) Be concise and precise (3–6 sentences, unless the question requires more).\n"
    "2) Use bullet points for lists.\n"
    "3) At the end, include a 'Sources:' section with short snippets or filenames from the context you used.\n\n"
    "Final Answer:"
)

In [10]:
template = PromptTemplate(template_str)

In [11]:
# sample usage

sample_context = (
    "Transformers use a self-attention mechanism that lets each token attend "
    "to every other token in the sequence. This enables modeling of long-range "
    "dependencies without recurrence. Positional encodings inject order "
    "information, and multi-head attention captures diverse relations.\n\n"
    "The encoder stacks layers of self-attention and feed-forward networks to "
    "build contextual representations. The decoder uses masked self-attention "
    "to maintain causality and cross-attention to consult encoder outputs."
)

sample_user_query = "How do Transformers handle long-range dependencies?"

In [12]:
filled_prompt = template.format(
    context_str=sample_context,
    query_str=sample_user_query
)

In [13]:
print(filled_prompt)

You are an expert AI assistant.
Use ONLY the use provided context to answer the user's question. If the context is insufficient or does not mention the answer, reply exactly: 'Not enough information.'

Context:
Transformers use a self-attention mechanism that lets each token attend to every other token in the sequence. This enables modeling of long-range dependencies without recurrence. Positional encodings inject order information, and multi-head attention captures diverse relations.

The encoder stacks layers of self-attention and feed-forward networks to build contextual representations. The decoder uses masked self-attention to maintain causality and cross-attention to consult encoder outputs.

User Question: How do Transformers handle long-range dependencies?

Answering Rules:
1) Be concise and precise (3–6 sentences, unless the question requires more).
2) Use bullet points for lists.
3) At the end, include a 'Sources:' section with short snippets or filenames from the context you

In [14]:
response = llm.complete(prompt=filled_prompt)

print(response.text)

 Transformers handle long-range dependencies by using a self-attention mechanism that allows each token to attend to every other token in the sequence. This eliminates the need for recurrence, enabling modeling of long-range dependencies.

- Self-attention enables the model to focus on relevant information across the entire input sequence.
- Positional encodings are injected to provide order information, ensuring that the model understands the position of each token in the sequence.
- The encoder stacks layers of self-attention and feed-forward networks to build contextual representations.
- In the decoder, masked self-attention maintains causality, preventing the model from seeing future tokens during prediction, while cross-attention allows the model to consult encoder outputs.

Sources:
- Transformers use a self-attention mechanism that lets each token attend to every other token in the sequence. (Context)
- Positional encodings inject order information, and multi-head attention cap

In [16]:
sample_context_2 = (
    "NASA’s Artemis program aims to return humans to the Moon by the mid-2020s. "
    "Artemis I was an uncrewed test flight in 2022, successfully orbiting the Moon. "
    "Artemis II, scheduled for 2025, will carry astronauts on a lunar flyby. "
    "Artemis III, planned for 2026, aims to land the first woman and next man on the lunar surface. "
    "The program also intends to establish a sustainable presence by building a lunar Gateway space station "
    "and using the Moon as a stepping stone to Mars."
)

sample_query_2 = "What are the main goals of the Artemis program?"

In [17]:
filled_prompt = template.format(
    context_str=sample_context_2,
    query_str=sample_query_2
)

In [18]:
print(filled_prompt)

You are an expert AI assistant.
Use ONLY the use provided context to answer the user's question. If the context is insufficient or does not mention the answer, reply exactly: 'Not enough information.'

Context:
NASA’s Artemis program aims to return humans to the Moon by the mid-2020s. Artemis I was an uncrewed test flight in 2022, successfully orbiting the Moon. Artemis II, scheduled for 2025, will carry astronauts on a lunar flyby. Artemis III, planned for 2026, aims to land the first woman and next man on the lunar surface. The program also intends to establish a sustainable presence by building a lunar Gateway space station and using the Moon as a stepping stone to Mars.

User Question: What are the main goals of the Artemis program?

Answering Rules:
1) Be concise and precise (3–6 sentences, unless the question requires more).
2) Use bullet points for lists.
3) At the end, include a 'Sources:' section with short snippets or filenames from the context you used.

Final Answer:


In [19]:
response = llm.complete(prompt=filled_prompt)

print(response.text)

 The main goals of NASA's Artemis program are as follows:

- Return humans to the Moon by the mid-2020s (Source: First sentence).
- Successfully orbit the Moon with an uncrewed test flight, Artemis I, in 2022 (Source: Second sentence).
- Carry astronauts on a lunar flyby with Artemis II, scheduled for 2025 (Source: Third sentence).
- Land the first woman and next man on the lunar surface with Artemis III, planned for 2026 (Source: Fourth sentence).
- Establish a sustainable presence by building a lunar Gateway space station (Source: Last sentence).
- Utilize the Moon as a stepping stone to Mars (Source: Last sentence).

Sources:
1. NASA’s Artemis program aims to return humans to the Moon by the mid-2020s.
2. Artemis I was an uncrewed test flight in 2022, successfully orbiting the Moon.
3. Artemis II, scheduled for 2025, will carry astronauts on a lunar flyby.
4. Artemis III, planned for 2026, aims to land the first woman and next man on the lunar surface.
5. The program also intends to

**Creating a Function**

In [20]:
from llama_index.core import PromptTemplate
from llama_index.llms.ollama import Ollama


def run_llm(context: str, query: str) -> str:
    # Initialize LLM (Ollama)
    llm = Ollama(model="mistral:7b", temperature=0)

    # Define prompt template
    template_str = (
        "You are an expert AI assistant.\n"
        "Use ONLY use the provided context to answer the user's question. "
        "If the context is insufficient or does not mention the answer, reply exactly: "
        "'Not enough information.'\n\n"
        "Context:\n{context_str}\n\n"
        "User Question: {query_str}\n\n"
        "Answering Rules:\n"
        "1) Be concise and precise (3–6 sentences, unless the question requires more).\n"
        "2) Use bullet points for lists.\n"
        "3) At the end, include a 'Sources:' section with short snippets or filenames from the context you used.\n\n"
        "Final Answer:"
    )

    template = PromptTemplate(template_str)
    filled_prompt = template.format(context_str=context, query_str=query)

    # Get response
    response = llm.complete(prompt=filled_prompt)
    return response.text

In [21]:
# Example 1
sample_context = (
    "Transformers use a self-attention mechanism that lets each token attend "
    "to every other token in the sequence. This enables modeling of long-range "
    "dependencies without recurrence. Positional encodings inject order "
    "information, and multi-head attention captures diverse relations.\n\n"
    "The encoder stacks layers of self-attention and feed-forward networks to "
    "build contextual representations. The decoder uses masked self-attention "
    "to maintain causality and cross-attention to consult encoder outputs."
)

sample_query = "How do Transformers handle long-range dependencies?"

output = run_llm(sample_context, sample_query)

print(output)

 Transformers handle long-range dependencies by using a self-attention mechanism that allows each token to attend to every other token in the sequence. This eliminates the need for recurrence, enabling modeling of long-range dependencies.

- Self-attention enables the model to focus on relevant information across the entire input sequence.
- Positional encodings are injected to provide order information, ensuring that the model understands the position of each token in the sequence.
- The encoder stacks layers of self-attention and feed-forward networks to build contextual representations.
- In the decoder, masked self-attention maintains causality, preventing the model from seeing future tokens during prediction, while cross-attention allows the model to consult encoder outputs.

Sources:
- Transformers use a self-attention mechanism that lets each token attend to every other token in the sequence. (Context)
- Positional encodings inject order information, and multi-head attention cap

In [22]:
# Example 2
sample_context_2 = (
    "NASA’s Artemis program aims to return humans to the Moon by the mid-2020s. "
    "Artemis I was an uncrewed test flight in 2022, successfully orbiting the Moon. "
    "Artemis II, scheduled for 2025, will carry astronauts on a lunar flyby. "
    "Artemis III, planned for 2026, aims to land the first woman and next man on the lunar surface. "
    "The program also intends to establish a sustainable presence by building a lunar Gateway space station "
    "and using the Moon as a stepping stone to Mars."
)

sample_query_2 = "What are the main goals of the Artemis program?"

output = run_llm(sample_context_2, sample_query_2)

print(output)

 The main goals of NASA's Artemis program are as follows:

- Return humans to the Moon by the mid-2020s (Source: First sentence).
- Successfully orbit the Moon with an uncrewed test flight, Artemis I, in 2022 (Source: Second sentence).
- Carry astronauts on a lunar flyby with Artemis II, scheduled for 2025 (Source: Third sentence).
- Land the first woman and next man on the lunar surface with Artemis III, planned for 2026 (Source: Fourth sentence).
- Establish a sustainable presence by building a lunar Gateway space station (Source: Last sentence in context).
- Utilize the Moon as a stepping stone to Mars (Source: Last sentence in context).

Sources:
1. NASA's Artemis program aims to return humans to the Moon by the mid-2020s.
2. Artemis I was an uncrewed test flight in 2022, successfully orbiting the Moon.
3. Artemis II, scheduled for 2025, will carry astronauts on a lunar flyby.
4. Artemis III, planned for 2026, aims to land the first woman and next man on the lunar surface.
5. The p