# Prompts Templating for Ollama - LangChain
Until 2021, to use an AI model for a specific use-case we would need to fine-tune the model weights themselves. That would require huge amounts of training data and significant compute to fine-tune any reasonably performing model.

Instruction fine-tuned Large Language Models (LLMs) changed this fundamental rule of applying AI models to new use-cases. Rather than needing to either train a model from scratch or fine-tune an existing model, these new LLMs could adapt incredibly well to a new problem or use-case with nothing more than a prompt change.

Prompts allow us to completely change the functionality of an AI pipeline. Through natural language we simply tell our LLM what it needs to do, and with the right AI pipeline and prompting, it often works.

LangChain naturally has many functionalities geared towards helping us build our prompts. We can build very dynamic prompting pipelines that modifying the structure and content of what we feed into our LLM based on essentially any parameter we would like. In this example, we'll explore the essentials to prompting in LangChain and apply this in a demo Retrieval Augmented Generation (RAG) pipeline.

# Basic Prompting
We'll start by looking at the various parts of our prompt. For RAG use-cases we'll typically have three core components however this is very use-cases dependant and can vary significantly. Nonetheless, for RAG we will typically see:

__Rules for our LLM:__ this part of the prompt sets up the behavior of our LLM, how it should approach responding to user queries, and simply providing as much information as possible about what we're wanting to do as possible. We typically place this within the system prompt of an chat LLM.

__Context:__ this part is RAG-specific. The context refers to some external information that we may have retrieved from a web search, database query, or often a vector database. This external information is the Retrieval Augmentation part of RAG. For chat LLMs we'll typically place this inside the chat messages between the assistant and user.

__Question:__ this is the input from our user. In the vast majority of cases the question/query/user input will always be provided to the LLM (and typically through a user message). However, the format and location of this being provided often changes.

__Answer:__ this is the answer from our assistant, again this is very typical and we'd expect this with every use-case.

The below is an example of how a RAG prompt may look:




Answer the question based on the context below,                 }
if you cannot answer the question using the                     }--->  (Rules) For Our Prompt
provided information answer with "I don't know"                 }

Context: Aurelio AI is an AI development studio                 }
focused on the fields of Natural Language Processing (NLP)      }
and information retrieval using modern tooling                  }--->   Context AI has
such as Large Language Models (LLMs),                           }
vector databases, and LangChain.                                }

Question: Does Aurelio AI do anything related to LangChain?     }--->   User Question

Answer:                                                         }--->   AI Answer

Here we can see how the AI will appoach our question, as you can see we have a formulated response, if the context has the answer, then use the context to answer the question, if not, say I don't know, then we also have context and question which are being passed into this similarly to paramaters in a function.

In [30]:
prompt = """
Answer the user's query based on the context below.
If you cannot answer the question using the
provided information answer with "I don't know".

Context: {context}
"""

In [31]:
from langchain.prompts import ChatPromptTemplate

# passing the template to the LangChain model
prompt_template = ChatPromptTemplate.from_messages([
    ("system", prompt),
    ("user", "{query}"),
])

When we call the template it will expect us to provide two variables, the context and the query. Both of these variables are pulled from the strings we wrote, as LangChain interprets curly-bracket syntax (ie {context} and {query}) as indicating a dynamic variable that we expect to be inserted at query time. We can see that these variables have been picked up by our template object by viewing it's input_variables attribute:

In [32]:
prompt_template.input_variables

['context', 'query']

We can also view the structure of the messages (currently prompt templates) that the ChatPromptTemplate will construct by viewing the messages attribute:

In [33]:
prompt_template.messages

[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\nAnswer the user\'s query based on the context below.\nIf you cannot answer the question using the\nprovided information answer with "I don\'t know".\n\nContext: {context}\n'), additional_kwargs={}),
 HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='{query}'), additional_kwargs={})]

In [34]:
from langchain.prompts import (
    SystemMessagePromptTemplate,
    HumanMessagePromptTemplate
)

prompt_template = ChatPromptTemplate.from_messages([
    SystemMessagePromptTemplate.from_template(prompt),
    HumanMessagePromptTemplate.from_template("{query}"),
])

In [35]:
prompt_template.messages

[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=['context'], input_types={}, partial_variables={}, template='\nAnswer the user\'s query based on the context below.\nIf you cannot answer the question using the\nprovided information answer with "I don\'t know".\n\nContext: {context}\n'), additional_kwargs={}),
 HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['query'], input_types={}, partial_variables={}, template='{query}'), additional_kwargs={})]

In [36]:
from langchain_ollama import ChatOllama

# For normal accurate responses
llm = ChatOllama(temperature=0.0, model="llama3.2")

In [37]:
pipeline = (
    {
        "query": lambda x: x["query"],
        "context": lambda x: x["context"]
    }
    | prompt_template
    | llm
)

In [38]:
context = """Aurelio AI is an AI company developing tooling for AI
engineers. Their focus is on language AI with the team having strong
expertise in building AI agents and a strong background in
information retrieval.

The company is behind several open source frameworks, most notably
Semantic Router and Semantic Chunkers. They also have an AI
Platform providing engineers with tooling to help them build with
AI. Finally, the team also provides development services to other
organizations to help them bring their AI tech to market.

Aurelio AI became LangChain Experts in September 2024 after a long
track record of delivering AI solutions built with the LangChain
ecosystem."""

query = "what does Aurelio AI do?"

In [39]:
pipeline.invoke({"query": query, "context": context})

AIMessage(content='Aurelio AI is an AI company that develops tooling for AI engineers, primarily focusing on language AI. They provide various tools and services to help build and deploy AI models, including:\n\n1. Open-source frameworks (e.g., Semantic Router and Semantic Chunkers)\n2. An AI Platform for building with AI\n3. Development services to help other organizations bring their AI tech to market\n\nThey also gained the title of LangChain Experts in September 2024, indicating a strong expertise in the LangChain ecosystem.', additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2025-05-14T05:55:55.1818144Z', 'done': True, 'done_reason': 'stop', 'total_duration': 12703891800, 'load_duration': 31710000, 'prompt_eval_count': 201, 'prompt_eval_duration': 627385100, 'eval_count': 105, 'eval_duration': 12041384100, 'model_name': 'llama3.2'}, id='run--f06508c6-a20a-471a-b93d-dbc42f595cd6-0', usage_metadata={'input_tokens': 201, 'output_tokens': 105, 'total_tokens'

# Few Shot Prompting
Many State-of-the-Art (SotA) LLMs are incredible at instruction following. Meaning that it requires much less effort to get the intended output or behavior from these models than is the case for older LLMs and smaller LLMs.

Before creating an example let's first see how to use LangChain's few shot prompting objects. We will provide multiple examples and we'll feed them in as sequential human and ai messages so we setup the template like this:

In [40]:
example_prompt = ChatPromptTemplate.from_messages([
    ("human", "{input}"),
    ("ai", "{output}"),
])

Then we define a list of examples with dictionaries containing the correct input and output keys.

In [41]:
examples = [
    {"input": "Here is query #1", "output": "Here is the answer #1"},
    {"input": "Here is query #2", "output": "Here is the answer #2"},
    {"input": "Here is query #3", "output": "Here is the answer #3"},
]

In [42]:
from langchain.prompts import FewShotChatMessagePromptTemplate

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)
# here is the formatted prompt
print(few_shot_prompt.format())

Human: Here is query #1
AI: Here is the answer #1
Human: Here is query #2
AI: Here is the answer #2
Human: Here is query #3
AI: Here is the answer #3


# Few-Shot Example
Using a tiny LLM limits it's ability, so when asking for specific behaviors or structured outputs it can struggle. For example, we'll ask the LLM to summarize the key points about Aurelio AI using markdown and bullet points. Let's see what happens.

In [44]:
new_system_prompt = """
Answer the user's query based on the context below.
If you cannot answer the question using the
provided information answer with "I don't know".

Always answer in markdown format. When doing so please
provide headers, short summaries, follow with bullet
points, then conclude.

Context: {context}
"""

prompt_template.messages[0].prompt.template = new_system_prompt

out = pipeline.invoke({"query": query, "context": context}).content
print(out)

**Overview of Aurelio AI**

Aurelio AI is an AI company that develops tooling for AI engineers, specializing in language AI. They provide a range of services and products to help build and deploy AI solutions.

### Key Activities:

*   Developing open-source frameworks (e.g., Semantic Router and Semantic Chunkers)
*   Creating an AI Platform for building with AI
*   Offering development services to other organizations

### Expertise:

*   Strong expertise in building AI agents
*   Background in information retrieval


We can display our markdown nicely with IPython like so:

In [45]:
from IPython.display import display, Markdown

display(Markdown(out))

**Overview of Aurelio AI**
==========================

Aurelio AI is an AI company that develops tooling for AI engineers, specializing in language AI. They provide a range of services and products to help build and deploy AI solutions.

### Key Activities:

*   Developing open-source frameworks (e.g., Semantic Router and Semantic Chunkers)
*   Creating an AI Platform for building with AI
*   Offering development services to other organizations

### Expertise:

*   Strong expertise in building AI agents
*   Background in information retrieval

In [46]:
examples = [
    {
        "input": "Can you explain gravity?",
        "output": (
            "## Gravity\n\n"
            "Gravity is one of the fundamental forces in the universe.\n\n"
            "### Discovery\n\n"
            "* Gravity was first discovered by Sir Isaac Newton in the late 17th century.\n"
            "* It was said that Newton theorized about gravity after seeing an apple fall from a tree.\n\n"
            "### In General Relativity\n\n"
            "* Gravity is described as the curvature of spacetime.\n"
            "* The more massive an object is, the more it curves spacetime.\n"
            "* This curvature is what causes objects to fall towards each other.\n\n"
            "### Gravitons\n\n"
            "* Gravitons are hypothetical particles that mediate the force of gravity.\n"
            "* They have not yet been detected.\n\n"
            "**To conclude**, Gravity is a fascinating topic and has been studied extensively since the time of Newton.\n\n"
        )
    },
    {
        "input": "What is the capital of France?",
        "output": (
            "## France\n\n"
            "The capital of France is Paris.\n\n"
            "### Origins\n\n"
            "* The name Paris comes from the Latin word \"Parisini\" which referred to a Celtic people living in the area.\n"
            "* The Romans named the city Lutetia, which means \"the place where the river turns\".\n"
            "* The city was renamed Paris in the 3rd century BC by the Celtic-speaking Parisii tribe.\n\n"
            "**To conclude**, Paris is highly regarded as one of the most beautiful cities in the world and is one of the world's greatest cultural and economic centres.\n\n"
        )
    }
]

In [47]:
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

In [48]:
out = few_shot_prompt.format()

display(Markdown(out))

Human: Can you explain gravity?
AI: ## Gravity

Gravity is one of the fundamental forces in the universe.

### Discovery

* Gravity was first discovered by Sir Isaac Newton in the late 17th century.
* It was said that Newton theorized about gravity after seeing an apple fall from a tree.

### In General Relativity

* Gravity is described as the curvature of spacetime.
* The more massive an object is, the more it curves spacetime.
* This curvature is what causes objects to fall towards each other.

### Gravitons

* Gravitons are hypothetical particles that mediate the force of gravity.
* They have not yet been detected.

**To conclude**, Gravity is a fascinating topic and has been studied extensively since the time of Newton.


Human: What is the capital of France?
AI: ## France

The capital of France is Paris.

### Origins

* The name Paris comes from the Latin word "Parisini" which referred to a Celtic people living in the area.
* The Romans named the city Lutetia, which means "the place where the river turns".
* The city was renamed Paris in the 3rd century BC by the Celtic-speaking Parisii tribe.

**To conclude**, Paris is highly regarded as one of the most beautiful cities in the world and is one of the world's greatest cultural and economic centres.



In [49]:
few_shot_prompt

FewShotChatMessagePromptTemplate(examples=[{'input': 'Can you explain gravity?', 'output': '## Gravity\n\nGravity is one of the fundamental forces in the universe.\n\n### Discovery\n\n* Gravity was first discovered by Sir Isaac Newton in the late 17th century.\n* It was said that Newton theorized about gravity after seeing an apple fall from a tree.\n\n### In General Relativity\n\n* Gravity is described as the curvature of spacetime.\n* The more massive an object is, the more it curves spacetime.\n* This curvature is what causes objects to fall towards each other.\n\n### Gravitons\n\n* Gravitons are hypothetical particles that mediate the force of gravity.\n* They have not yet been detected.\n\n**To conclude**, Gravity is a fascinating topic and has been studied extensively since the time of Newton.\n\n'}, {'input': 'What is the capital of France?', 'output': '## France\n\nThe capital of France is Paris.\n\n### Origins\n\n* The name Paris comes from the Latin word "Parisini" which refe

In [50]:
prompt_template = ChatPromptTemplate.from_messages([
    ("system", new_system_prompt),
    few_shot_prompt,
    ("user", "{query}"),
])

In [51]:
pipeline = prompt_template | llm
out = pipeline.invoke({"query": query, "context": context}).content
display(Markdown(out))

## Aurelio AI

Aurelio AI is an AI company that develops tooling for AI engineers, with a focus on language AI.

### Key Products and Services

* **Semantic Router**: An open-source framework for building conversational interfaces.
* **Semantic Chunkers**: Another open-source framework for chunking text into meaningful units.
* **AI Platform**: A platform providing tooling to help engineers build with AI.
* **Development Services**: Aurelio AI provides development services to other organizations to help them bring their AI tech to market.

### Expertise

* **Language AI**: The team has strong expertise in building AI agents and a background in information retrieval.

**To conclude**, Aurelio AI is a company that specializes in developing tooling for language AI and providing services to help organizations build with AI.

We can see that by adding a few examples to our prompt, ie few-shot prompting, we can get much more control over the exact structure of our LLM response. As the size of our LLMs increases, the ability of them to follow instructions becomes much greater and they tend to require less explicit prompting as we have shown here. However, even for SotA models like gpt-4o few-shot prompting is still a valid technique that can be used if the LLM is struggling to follow our intended instructions.

# Chain of Thought Prompting
We'll take a look at one more commonly used prompting technique called chain of thought (CoT). CoT is a technique that encourages the LLM to think through the problem step by step before providing an answer. The idea being that by breaking down the problem into smaller steps, the LLM is more likely to arrive at the correct answer and we are less likely to see hallucinations.

To implement CoT we don't need any specific LangChain objects, instead we are simply modifying how we instruct our LLM within the system prompt. We will ask the LLM to list the problems that need to be solved, to solve each problem individually, and then to arrive at the final answer.

Let's first test our LLM without CoT prompting.

In [52]:
no_cot_system_prompt = """
Be a helpful assistant and answer the user's question.

You MUST answer the question directly without any other
text or explanation.
"""

no_cot_prompt_template = ChatPromptTemplate.from_messages([
    ("system", no_cot_system_prompt),
    ("user", "{query}"),
])

In [53]:
query = (
    "How many keystrokes are needed to type the numbers from 1 to 500?"
)

no_cot_pipeline = no_cot_prompt_template | llm
no_cot_result = no_cot_pipeline.invoke({"query": query}).content
print(no_cot_result)

2,930


The actual answer is 1392, but the LLM without CoT just hallucinates and gives us a guess. Now, we can add explicit CoT prompting to our system prompt to see if we can get a better result.

In [54]:
# Define the chain-of-thought prompt template
cot_system_prompt = """
Be a helpful assistant and answer the user's question.

To answer the question, you must:

- List systematically and in precise detail all
  subproblems that need to be solved to answer the
  question.
- Solve each sub problem INDIVIDUALLY and in sequence.
- Finally, use everything you have worked through to
  provide the final answer.
"""
cot_prompt_template = ChatPromptTemplate.from_messages([
    ("system", cot_system_prompt),
    ("user", "{query}"),
])

cot_pipeline = cot_prompt_template | llm

In [55]:
cot_result = cot_pipeline.invoke({"query": query}).content
display(Markdown(cot_result))

To calculate the number of keystrokes needed to type the numbers from 1 to 500, we need to break down the problem into smaller subproblems. Here's a step-by-step approach:

**Subproblem 1: Counting single-digit numbers (1-9)**

* Each digit requires 1 keystroke.
* There are 9 single-digit numbers, so the total number of keystrokes for these numbers is 9.

**Subproblem 2: Counting two-digit numbers (10-99)**

* Each digit requires 1 keystroke.
* For each two-digit number, we need to type:
	+ The tens digit (1 keystroke)
	+ The units digit (1 keystroke)
* There are 90 two-digit numbers (from 10 to 99), so the total number of keystrokes for these numbers is 2 x 90 = 180.

**Subproblem 3: Counting three-digit numbers (100-500)**

* Each digit requires 1 keystroke.
* For each three-digit number, we need to type:
	+ The hundreds digit (1 keystroke)
	+ The tens digit (1 keystroke)
	+ The units digit (1 keystroke)
* There are 401 three-digit numbers (from 100 to 500), so the total number of keystrokes for these numbers is 3 x 401 = 1203.

**Subproblem 4: Counting the number '0'**

* We need to type the number '0' once, which requires 1 keystroke.

Now, let's add up the results from each subproblem:

9 (single-digit numbers) + 180 (two-digit numbers) + 1203 (three-digit numbers) + 1 (number '0') = 1393

Therefore, approximately **1393 keystrokes** are needed to type the numbers from 1 to 500.

In [58]:
system_prompt = """
Be a helpful assistant and answer the user's question.
"""

prompt_template = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("user", "{query}"),
])

pipeline = prompt_template | llm


In [59]:
result = pipeline.invoke({"query": query}).content
display(Markdown(result))

To calculate the number of keystrokes needed to type the numbers from 1 to 500, we need to consider two things:

1. The number of digits in each number
2. The number of keystrokes required for each digit (assuming a standard QWERTY keyboard layout)

Let's break it down:

* Single-digit numbers (1-9): 9 numbers x 1 keystroke = 9 keystrokes
* Two-digit numbers (10-99): 90 numbers x 2 keystrokes = 180 keystrokes
* Three-digit numbers (100-500): 401 numbers x 3 keystrokes = 1203 keystrokes

Now, let's add up the total number of keystrokes:

9 + 180 + 1203 = 1392 keystrokes

So, approximately 1392 keystrokes are needed to type the numbers from 1 to 500.