<a href="https://colab.research.google.com/github/abdulsamadkhan/Llama2_Chat/blob/main/Prompting_Llama_2_0.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Introduction üíª
By the end of this tutorial, you'll be able to understand and use different prompting techniques with Llama 2.0.  The tutorial will walk you through the following steps

**1. Establishing Llama Pipeline üõ†Ô∏è**
Install all the prerequisite libraries to get our model ready for generating responses.

**1. Exploring the Temperature Setting üå°Ô∏è**
Explore how different values of the parameter 'temperature' will affect the responses.

**2. Changing the Number of Token Setting (`max-length`) üî¢**
Explore how different values of the parameter `max_length` produce different responses.

**3. Multi-turn Chat in Llama üîÑ**
Learn, format, and prompt techniques to use Llama for multi-turn chat.

**4. Zero-shot Prompting üéØ**
Learn how to effectively use Llama for zero-shot settings.

**5. Few-shot Prompting üéØ**
Learn how to effectively use Llama with few-shot settings.

**6. Role Prompting ü¶∏**
Increase the power of response with role prompting.

**7. Providing New Information in the Promptüí°**
Learn how to provide new information to the model for better and correct response generation.


üö®üî• **Important Note:** First, change runtime to GPU.


#1. Installation üíª

### Installing the essential libraries:
- ü§ó `Hugging Face Transformers`: Provides a straightforward way to use pre-trained models.
- üî• `PyTorch`: Serves as the backbone for deep learning operations.
- ‚ö° `Accelerate`: Optimizes PyTorch operations, especially on GPU.


In [None]:
!pip install transformers torch accelerate

Collecting accelerate
  Downloading accelerate-0.27.2-py3-none-any.whl (279 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m280.0/280.0 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate
Successfully installed accelerate-0.27.2


### 1.1 Prerequisites üìù

To load our desired model, `meta-llama/Llama-2-7b-chat-hf`, we first need to authenticate ourselves on Hugging Face. This ensures we have the correct permissions to fetch the model.

- ü§ó `Gain access to the model` on Hugging Face: [Link](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf).
- Use the Hugging Face CLI to login and verify your authentication status.


In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) n
Token is valid (permission: read).
Your token has been saved to /root/.ca

In [None]:
!huggingface-cli whoami

abdulsamad
[1morgs: [0m HUnivesity,HUNiversity


###1.2 Loading Model & Tokenizer üß†

Here, we're loading both the Llama model and its associated tokenizer.
The tokenizer will assist in converting our text prompts into a format that the model can understand and process. üìù


In [None]:
from transformers import AutoTokenizer
import transformers
import torch

model = "meta-llama/Llama-2-7b-chat-hf" # meta-llama/Llama-2-7b-hf

tokenizer = AutoTokenizer.from_pretrained(model, use_auth_token=True)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


tokenizer_config.json:   0%|          | 0.00/1.62k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

###1.3  Establishing the Llama Pipeline üõ†Ô∏è

Let's set up a pipeline for text generation. üöÄ This pipeline simplifies the process of feeding prompts to our model and receiving generated text as output.



In [None]:
from transformers import pipeline

llama_pipeline = pipeline(
    "text-generation",  # LLM task
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

config.json:   0%|          | 0.00/614 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.98G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/3.50G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/188 [00:00<?, ?B/s]

###1.4  Engagement with Llama

Now that everything is set up, let's see how ü¶ô Llama responds to some sample queries. üéâ


In [None]:
def get_llama_response(prompt, do_sample=True, temperature=0.1, top_k=10, num_return_sequences=1, eos_token_id=tokenizer.eos_token_id, max_length=1024):
    """
    Generate a response from the Llama model.

    Parameters:
        prompt (str): The user's input/question for the model.
        do_sample (bool): Whether to use sampling during generation (default is True).
        temperature (float): Sampling temperature (default is 0.5).
        top_k (int): Number of top-k tokens to consider during sampling (default is 10).
        num_return_sequences (int): Number of response sequences to generate (default is 1).
        eos_token_id (int): ID of the end-of-sequence token (default is tokenizer.eos_token_id).
        max_length (int): Maximum length of the generated response (default is 1024).

    Returns:
        None: Prints the model's response.
    """
    # Uncomment the following lines to use the Llama model
    prompt = f"<s>[INST] {prompt} [/INST]"
    response = llama_pipeline(
         prompt,
         do_sample=do_sample,
         top_k=top_k,
         num_return_sequences=num_return_sequences,
         eos_token_id=eos_token_id,
         max_length=max_length,
         temperature=temperature,
     )
    return response


# Example usage:
prompt = 'I tell me how to play cricket ?\n'
respopnse = get_llama_response(prompt=prompt)
print("Chatbot:", respopnse[0]['generated_text'])


Truncation was not explicitly activated but `max_length` is provided a specific value, please use `truncation=True` to explicitly truncate examples to max length. Defaulting to 'longest_first' truncation strategy. If you encode pairs of sequences (GLUE-style) with the tokenizer you can select this strategy more precisely by providing a specific strategy to `truncation`.


Chatbot: <s>[INST] I tell me how to play cricket ?
 [/INST]  Sure, I'd be happy to teach you how to play cricket!

Cricket is a popular bat-and-ball sport played between two teams with 11 players on each team. Here are the basic rules and steps to play cricket:

1. Equipment:
	* Bat (made of wood or composite materials)
	* Ball (made of leather or synthetic materials)
	* Wicket (a set of three vertical stumps and two horizontal sticks)
	* Protective gear (such as helmets, pads, and gloves)
2. Number of players:
	* Each team has 11 players, including a captain and 10 fielders.
3. Objective:
	* The objective of the game is to score runs by hitting the ball and running between the wickets. The team with the most runs at the end of the game wins.
4. How to play:
	* The game starts with one team batting and the other team fielding.
	* The batsman tries to hit the ball and run to the opposite end of the pitch.
	* The bowler (player who delivers the ball) tries to get the batsman out by hitti

#2. Changing the temperature setting üå°Ô∏è

   - The code specifies three different temperature settings: 0.1, 0.5, and 1.0.
   - Temperature affects the randomness and creativity of the generated responses. Lower values (closer to 0) produce more deterministic and conservative responses, while higher values (closer to 1) introduce more randomness and diversity.


This approach provides insights into how temperature influences the generation of responses, helping to understand the behavior of the language model under different conditions.


In [None]:
import time



def print_llama_responses(prompt, temperatures):
    for temperature in temperatures:
        print(f"----------------- Lalam 2.0 response with temperature = {temperature} -------------------")
        # Pause the code for 30 seconds
        response =get_llama_response(prompt=prompt, temperature=temperature)
        print("Chatbot:", response[0]['generated_text'])


# Define the prompt and list of temperatures
prompt = """Write a funncy coverl letter of 5 lines for me? It is about acting"""
temperatures = [0.1, 0.5, 1.0]

# Call the function to print responses for different temperatures
print_llama_responses(prompt, temperatures)


----------------- Lalam 2.0 response with temperature = 0.1 -------------------
Chatbot: <s>[INST] Write a funncy coverl letter of 5 lines for me? It is about acting [/INST]  Of course! Here's a funny cover letter for an acting job application:

"Dear Hiring Manager,

I'm not sure if you're looking for a actor, a clown, or a professional pretender, but I'm here to tell you that I'm all of those things and more! I can cry on command, fart on cue, and make a mean face without even trying. Oh, and I'm also great at pretending to be interested in your company's product or service. So, if you're looking for someone to entertain at your next corporate event, I'm your guy! üòÇ

Sincerely,
[Your Name]"
----------------- Lalam 2.0 response with temperature = 0.5 -------------------
Chatbot: <s>[INST] Write a funncy coverl letter of 5 lines for me? It is about acting [/INST]  Sure, here's a funny cover letter for an acting job application:

"Dear Hiring Manager,

I'm not sure if you're looking 

#3.  Changing the No of Token setting `max-length` üî¢

  - The `max_length` parameter controls the maximum length of the token sequence that Llama 2.0 generates in its response.
  - The code tests different values for `max_length` to observe how the length restriction impacts the generated responses.


Understanding how Llama 2.0 performs under various token length settings is crucial for optimizing its usage and leveraging its capabilities effectively.


In [None]:
# Example usage:
prompt = "Tell me an interesting fact."
max_lengths_to_test = [50, 100, 200]

for max_len in max_lengths_to_test:
    respopnse =get_llama_response(prompt=prompt, max_length=max_len)
    print("Chatbot:", respopnse[0]["generated_text"])
    print("-" * 40)

Chatbot: <s>[INST] Tell me an interesting fact. [/INST]  Sure! Here's an interesting fact:

Did you know that there is a species of jellyfish that is immortal? The Turritopsis dohr
----------------------------------------
Chatbot: <s>[INST] Tell me an interesting fact. [/INST]  Sure! Here's an interesting fact:

Did you know that there is a species of jellyfish that is immortal? The Turritopsis dohrnii, also known as the "immortal jellyfish," is a type of jellyfish that can transform its body into a younger state through a process called transdifferentiation. This means that it can essentially revert back to
----------------------------------------
Chatbot: <s>[INST] Tell me an interesting fact. [/INST]  Sure! Here's an interesting fact:

Did you know that there is a species of jellyfish that is immortal? The Turritopsis dohrnii, also known as the "immortal jellyfish," is a type of jellyfish that can transform its body into a younger state through a process called transdifferentiation.

### Consequences of Exceeding the max_length Parameter

When exceeding the `max_length` parameter, errors can occur, particularly if the input surpasses the maximum number of tokens supported by the model.

In the given scenario, attempting to summarize "The Velveteen Rabbit" text, which contains 7495 words, with a model that expects a maximum of 1024 tokens, may result in an error. ‚ùå

To mitigate this issue, one can either split the input text into smaller segments or opt for a model with a higher token limit, if available. Segmenting the input enables processing in smaller chunks that fit within the model's token limit. Alternatively, utilizing a model with a higher token limit may allow processing of longer inputs without encountering errors. üõ†Ô∏è


In [None]:
# this code downloads the book
import requests

# Specify the URL of the text file
url = 'https://www.gutenberg.org/files/11757/11757-h/11757-h.htm'  # Replace with the actual URL

# Send a request to download the file
response = requests.get(url)

# Check for successful response status
if response.status_code == 200:
    # Save the file as "TheVelveteenRabbit.txt"
    with open('TheVelveteenRabbit.txt', 'wb') as file:
        file.write(response.content)

    # Read the contents of the file
    with open('TheVelveteenRabbit.txt', 'r') as file:
        text = file.read()
else:
    print("An error occurred while downloading the file.")
# Count the number of words
num_words = len(text.split())

print(f"Number of words: {num_words}")

Number of words: 7495


In [None]:
prompt = f"""
Give me a summary of the following text in 50 words:\n\n
{text}
"""
response =get_llama_response(prompt=prompt)
print("Chatbot:", response[0]["generated_text"])

ValueError: Input length of input_ids is 1024, but `max_length` is set to 1024. This can lead to unexpected behavior. You should consider increasing `max_length` or, better yet, setting `max_new_tokens`.

#4. Multi-turn chat in Llama üîÑ


LLMs (Large Language Models) are stateless and do not remember the previous chat. They do not keep any record of it. üß†‚ùå. Let us check it with simple example.



In [None]:
prompt = """
    I love mathematics? tell me about main areas in it.
"""
response =get_llama_response(prompt)
print("Chatbot:", response[0]["generated_text"])

prompt_2 = """
Which math area mentioned previously is more relevant to machine learning?
"""
response = get_llama_response(prompt_2)
print("Chatbot:", response[0]["generated_text"])


Chatbot: <s>[INST] 
    I love mathematics? tell me about main areas in it.
 [/INST]  Mathematics is a vast and fascinating field that encompasses various areas of study, each with its own set of concepts, techniques, and applications. Here are some of the main areas in mathematics:

1. **Number Theory**: The study of properties of integers and other whole numbers, including prime numbers, congruences, and Diophantine equations.
2. **Algebra**: The study of algebraic structures, such as groups, rings, and fields, and their applications to solving equations, studying symmetry, and modeling real-world phenomena.
3. **Geometry**: The study of shapes, sizes, and positions of objects, including points, lines, angles, and planes. Geometry has applications in physics, engineering, computer science, and art.
4. **Topology**: The study of the properties of spaces that are preserved under continuous deformations, such as stretching and bending. Topology is used in physics, engineering, and compu

**After running the below code snippet we realize that 2nd response does not use anything from the first response. If we want then we have to explicity pas s the last response in the prompt**

In [None]:
prompt_1 = """
    I love mathematics? tell me about main areas in it?
"""
response_1 = get_llama_response(prompt_1)
prompt_2 = """
Which math area mentioned previously is more relevant to machine learning?
"""
chat_prompt = f"""
<s>[INST] {prompt_1} [/INST]
{response_1[0]['generated_text']}
</s>
<s>[INST] {prompt_2} [/INST]
"""
print(chat_prompt)





<s>[INST] 
    I love mathematics? tell me about main areas in it?
 [/INST]
<s>[INST] 
    I love mathematics? tell me about main areas in it?
 [/INST]  Mathematics is a vast and fascinating field that encompasses various areas of study, each with its own set of concepts, techniques, and applications. Here are some of the main areas in mathematics:

1. **Number Theory**: The study of properties of integers and other whole numbers, including primality, divisibility, and congruences.
2. **Algebra**: The study of algebraic structures, such as groups, rings, and fields, and their applications to solving equations and inequalities.
3. **Geometry**: The study of shapes, sizes, and positions of objects, including points, lines, angles, and planes.
4. **Topology**: The study of the properties of spaces that are preserved under continuous deformations, such as stretching and bending.
5. **Analysis**: The study of limits, derivatives, and integrals, and their applications to solving problems in

In [None]:
response_2 = get_llama_response(chat_prompt)
print("Chatbot:", response_2[0]["generated_text"])


Chatbot: <s>[INST] 
<s>[INST] 
    I love mathematics? tell me about main areas in it?
 [/INST]
<s>[INST] 
    I love mathematics? tell me about main areas in it?
 [/INST]  Mathematics is a vast and fascinating field that encompasses various areas of study, each with its own set of concepts, techniques, and applications. Here are some of the main areas in mathematics:

1. **Number Theory**: The study of properties of integers and other whole numbers, including primality, divisibility, and congruences.
2. **Algebra**: The study of algebraic structures, such as groups, rings, and fields, and their applications to solving equations and inequalities.
3. **Geometry**: The study of shapes, sizes, and positions of objects, including points, lines, angles, and planes.
4. **Topology**: The study of the properties of spaces that are preserved under continuous deformations, such as stretching and bending.
5. **Analysis**: The study of limits, derivatives, and integrals, and their applications to 

#5. Zero-shot PromptingüéØ
- Here is an example of zero-shot prompting.
- In this scenario, you‚Äôre providing a prompt to the model without explicitly giving any examples of the completed task.
- The goal is to test whether the model can infer the task based solely on the structure of your prompt.
- Zero-shot prompting relies on providing the prompt‚Äôs structure, but without any pre-existing task instances.

In [None]:
prompt = """
Message: Hi Abdul, I appreciate the lunch, it was delicious!
Sentiment: ?
"""
respopnse = get_llama_response(prompt=prompt)
print("Chatbot:", respopnse[0]['generated_text'])

Chatbot: <s>[INST] 
Message: Hi Abdul, I appreciate the lunch, it was delicious!
Sentiment: ?
 [/INST]  The sentiment of the message is: Positive


#6.  Few-shot PromptingüéØ
- Here is an example of few-shot prompting.
- In the context of few-shot prompting, you not only outline the structure of the task in your prompt but also include two or more specific examples.
- The goal is to assess whether the model can deduce the task based on both the prompt‚Äôs structure and the provided examples.
- Few-shot prompting leverages a combination of structure and exemplars to guide the model‚Äôs understanding.

In [None]:
prompt = """
Message: Hi ! How is life going on!
Sentiment: Neutral

Message: I love studying for machcine learning
Sentiment: Positive

Message: I hate pumkin pie!
Sentiment: ?
"""
respopnse = get_llama_response(prompt=prompt)
print("Chatbot:", respopnse[0]['generated_text'])

Chatbot: <s>[INST] 
Message: Hi ! How is life going on!
Sentiment: Neutral

Message: I love studying for machcine learning
Sentiment: Positive

Message: I hate pumkin pie!
Sentiment: ?
 [/INST]  Sure, here are the sentiment labels for each message:

1. Message: Hi! How is life going on!
Sentiment: Neutral
2. Message: I love studying for machine learning
Sentiment: Positive
3. Message: I hate pumkin pie!
Sentiment: Negative


In [None]:
prompt = """
Message: Hi ! How is life going on!
Sentiment: Neutral

Message: I love studying for machcine learning
Sentiment: Positive

Message: I hate pumkin pie!
Sentiment: ?

Give a one word response, either Postivie Negative or Neutral.
"""
respopnse = get_llama_response(prompt=prompt)
print("Chatbot:", respopnse[0]['generated_text'])

Chatbot: <s>[INST] 
Message: Hi ! How is life going on!
Sentiment: Neutral

Message: I love studying for machcine learning
Sentiment: Positive

Message: I hate pumkin pie!
Sentiment: ?

Give a one word response, either Postivie Negative or Neutral.
 [/INST]  Sure! Here are the one-word responses you requested:

* Message: I love studying for machine learning
Sentiment: Positive
* Message: I hate pumkin pie!
Sentiment: Negative


**Note:** For all the examples above, you used the 7 billion parameter model, llama-2-7b-chat. And as you saw in the last example, the 7B model was uncertain about the sentiment.

You can use the larger (70 billion parameter) llama-2-70b-chat model to see if you get a better, certain response

#7. Role Prompting ü¶∏
- Roles give context to LLMs what type of answers are desired.
- Llama 2 often gives more consistent responses when provided with a role.
- play with as many examples as you want to undersand this phenomena with LLMs.

In [None]:
prompt = """
How can I answer this question from my friend:
What is the meaning of life?
"""
respopnse = get_llama_response(prompt=prompt)
print("Chatbot:", respopnse[0]['generated_text'])


Chatbot: <s>[INST] 
How can I answer this question from my friend:
What is the meaning of life?
 [/INST]  The question of the meaning of life is a complex and philosophical one that has been debated for centuries. There is no one definitive answer, as the meaning of life can vary depending on an individual's beliefs, values, and experiences. However, here are some possible ways to approach this question:

1. Religious or spiritual perspective: Many people believe that the meaning of life is to fulfill a divine or spiritual purpose. According to this view, life has a higher purpose that is connected to a deity or a higher power. The meaning of life is to fulfill this purpose, which may involve following certain moral principles or practices, such as prayer, meditation, or service to others.
2. Personal growth and fulfillment: From this perspective, the meaning of life is to grow and develop as an individual, to learn and acquire new knowledge and skills, and to pursue one's passions and

- Now, try it by giving the model a "role", and within the role, a "tone" using which it should respond with.

In [None]:
role = """
your role is mulsim Imam.
"""

prompt = f"""
{role}
How can I answer this question from my friend:
What is the meaning of life?
"""
respopnse = get_llama_response(prompt=prompt)
print("Chatbot:", respopnse[0]['generated_text'])

Chatbot: <s>[INST] 

your role is mulsim Imam.

How can I answer this question from my friend:
What is the meaning of life?
 [/INST]  As a Muslim Imam, I would answer this question by explaining the Islamic perspective on the meaning of life. According to Islam, the meaning of life is to worship Allah (God) and to live a righteous life.

The Quran states, "And I (Allah) created not the jinn and the humans except that they should worship Me (Alone)." (Surah Al-Furqan, verse 56)

Worship in Islam encompasses various aspects of a person's life, including the performance of the Five Pillars of Islam:

1. The declaration of faith (Shahada) - declaring that there is no god but Allah and that Muhammad is the Messenger of Allah.
2. Prayer (Salah) - performing the daily prayers as prescribed in the Quran and the Sunnah of the Prophet Muhammad.
3. Fasting (Sawm) - fasting during the month of Ramadan to develop self-control and empathy for those who are less fortunate.
4. Charity (Zakat) - giving

#8. Providing New Information in the Prompt üí°
- A model's knowledge of the world ends at the moment of its training - so it won't know about more recent events.
- Llama 2 was released for research and commercial use on July 18, 2023, and its training ended some time before that date.
- Ask the model about an event, in this case, FIFA Women's World Cup 2023, which started on July 20, 2023, and see how the model responses.

In [None]:
prompt = """
Who won the election 0f 2024 of pakistan ?
"""
respopnse = get_llama_response(prompt=prompt)
print("Chatbot:", respopnse[0]['generated_text'])

Chatbot: <s>[INST] 
Who won the election 0f 2024 of pakistan ?
 [/INST]  I apologize, but I'm a large language model AI and I do not have access to real-time or up-to-date information about current events or elections in Pakistan. The 2024 Pakistani general election is scheduled to take place on July 25, 2024, and the results will be announced by the Election Commission of Pakistan.

I recommend checking the official website of the Election Commission of Pakistan or reputable news sources for the latest information and updates on the election.


- You can provide the model with information about recent events, in this case text from Wikipedia about the 2023 Women's World Cup.

- As you can see, the model still thinks that the tournament is yet to be played, even though you are now in 2024!
- Another thing to **note** is, July 18, 2023 was the date the model was released to public, and it was trained even before that, so it only has information upto that point. The response says, "the final match is scheduled to take place in July 2023", but the final match was played on August 20, 2023.

In [None]:
context = """
At a press conference on 13 February 2024, it was announced by the \n
leaders of the PML-N and the PPP that they would form a coalition government with the PML-N's Shehbaz Sharif as prime minister.[19]\n
 The Muttahida Qaumi Movement ‚Äì Pakistan (MQM-P), the Pakistan Muslim League (Q) (PML-Q), the Istehkam-e-Pakistan Party (IPP) \n
 and the Balochistan Awami Party (BAP) also expressed their intent to join the governmental coalition.[20] On 3 March, \n
 Shehbaz Sharif was re-elected as Prime Minister of Pakistan for the second time, securing 201 votes against the 92 votes \n
 received by the PTI-backed Omar Ayub Khan. With no party securing a majority in the assembly, Shehbaz became prime minister \n
 with support from the PML-N's allies, including the PPP, MQM-P, PML-Q, BAP, IPP and other smaller parties
"""
prompt = f"""
Given the following context, Who won the election 0f 2024 of pakistan ?
context: {context}
"""
respopnse = get_llama_response(prompt=prompt)
print("Chatbot:", respopnse[0]['generated_text'])

Chatbot: <s>[INST] 
Given the following context, Who won the election 0f 2024 of pakistan ?
context: 
At a press conference on 13 February 2024, it was announced by the 

leaders of the PML-N and the PPP that they would form a coalition government with the PML-N's Shehbaz Sharif as prime minister.[19]

 The Muttahida Qaumi Movement ‚Äì Pakistan (MQM-P), the Pakistan Muslim League (Q) (PML-Q), the Istehkam-e-Pakistan Party (IPP) 
 
 and the Balochistan Awami Party (BAP) also expressed their intent to join the governmental coalition.[20] On 3 March, 

 Shehbaz Sharif was re-elected as Prime Minister of Pakistan for the second time, securing 201 votes against the 92 votes 

 received by the PTI-backed Omar Ayub Khan. With no party securing a majority in the assembly, Shehbaz became prime minister 

 with support from the PML-N's allies, including the PPP, MQM-P, PML-Q, BAP, IPP and other smaller parties

 [/INST]  According to the context provided, Shehbaz Sharif won the election of 2024 

üéâ **Conclusion**

By the end of this tutorial, you know how to prompt Llama 2.0 in more effective weay üëç
|