# Experimenting Prompt Engineering - Chatbot

In this notebook we will explore common prompt engineering techniques to get the most out of Large Language Models (LLMs). 

First you will experiment by loading a 7 billion parameter LLM within the notebook environment and throwing some prompts its way to see how differt prompt strategies can impact the results of the model. 

After trying different prompt engineering techniques, you will learn how to run a simple chatbot using what you learned. 

Finally, we'll review some pointers for how to put these concepts into practice by building on the sample code and swapping in different LLMs to create your own solutions.

Please refer the presentation slide from here: [Learn and experiment with LLMs in Amazon SageMaker Studio Lab (AIM219)](https://speakerdeck.com/icoxfog417/learn-and-experiment-with-llms-in-amazon-sagemaker-studio-lab-aim219). 

### Working Environment 

This workshop is designed to be run in the RMIT RACE Sagemaker Jupyter Environment. 

### Libraries
First, if needed, install `ctransformers` - Python bindings implemented in C/C++ to accelerate the inference speed of models built on [`transformers` library](https://github.com/huggingface/transformers). If you run this notebook locally or on a GPU instance (ex. g4 or g5) instead of Studio Lab, please check that appropriate version of the CUDA runtime (`nvcc`) is installed. If you face a "command not found" error when running `nvcc`, [installing cuda-toolkit](https://anaconda.org/nvidia/cuda-toolkit) after checking which CUDA version is needed via `nvidia-smi` should fix the issue.

In [1]:
%pip install transformers[torch]
%pip install langchain

Note: you may need to restart the kernel to use updated packages.
Collecting ctransformers
  Using cached ctransformers-0.2.27-py3-none-any.whl.metadata (17 kB)
Using cached ctransformers-0.2.27-py3-none-any.whl (9.9 MB)
Installing collected packages: ctransformers
Successfully installed ctransformers-0.2.27
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


Found existing installation: ctransformers 0.2.27
Uninstalling ctransformers-0.2.27:
  Successfully uninstalled ctransformers-0.2.27


In [9]:
!CT_CUBLAS=1 pip install ctransformers --no-binary ctransformers

Collecting ctransformers
  Downloading ctransformers-0.2.27.tar.gz (376 kB)
[2K     [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m376.1/376.1 kB[0m [31m20.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25ldone
[?25h  Getting requirements to build wheel ... [?25ldone
[?25h  Preparing metadata (pyproject.toml) ... [?25ldone
Building wheels for collected packages: ctransformers
  Building wheel for ctransformers (pyproject.toml) ... [?25ldone
[?25h  Created wheel for ctransformers: filename=ctransformers-0.2.27-cp310-cp310-linux_x86_64.whl size=2310433 sha256=36de894c8f058e15130f94d4ebcbf94393d8787dc72ae39412cc12743f281be8
  Stored in directory: /home/ec2-user/.cache/pip/wheels/dd/54/e9/32364da8eee84a2b0b412394983c15add18816c507e90f02d8
Successfully built ctransformers
Installing collected packages: ctransformers
Successfully installed ctransformers-0.2.27


In [10]:
from ctransformers import AutoModelForCausalLM, AutoTokenizer

## Mistral-7B-Instruct-v0.1-GGUP Model

The üêã [Mistral-7B-Instruct-v0.1-GGUP](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF) üêã model was fine-tuned on top of Mistral 7B using using a variety of publicly available conversation datasets. 

### Loading the model

The following cell will load the pretrained model. Because the model itself is around 3.8 GB, so it can take a minute or so to load. 

In [11]:
llm = AutoModelForCausalLM.from_pretrained(
    "TheBloke/Mistral-7B-Instruct-v0.1-GGUF",
    model_file="mistral-7b-instruct-v0.1.Q4_K_M.gguf",
    model_type="mistral",
)

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

Fetching 1 files:   0%|          | 0/1 [00:00<?, ?it/s]

# How do large language models work?

## Prompt engineering

Remember that LLMs are trained to predict the next word when given a sequence of words. 

![LLM-concept](llm-generation.png)

The way that we prompt an LLM can make a big difference in how it responds and how useful its output is. This depends not only on they type of query, but also nuances of how individual models where trained. Let's start by asking the Mistral model to describe the concept of photosynthesis.

## Let's try a basic prompt

In the following cell, we provide the model three parameters:
1. the prompt: "Explain photosythesis"
2. max_new_tokens which is the maximum length in tokens for the model's response
3. temperature which you can think of as a measure of how *creative* the model can be in its response


<div class="alert alert-block alert-info"><b>Note:</b> If you are running this on a cpu instance of Studio Lab keep in mind that the following cell may take a couple minutes to run.</div>

In [12]:
prompt = "Explain photosythesis"

for text in llm(prompt, max_new_tokens=250, temperature=0.1, stream=True):
    print(text, end="", flush=True)

.

Photosynthesis is the process by which plants, algae, and some bacteria convert sunlight, water, and carbon dioxide into glucose (sugar), oxygen, and other chemical compounds. It is an essential process for life on Earth as it provides the primary source of organic compounds and oxygen.

The process of photosynthesis can be divided into two stages: the light-dependent reactions and the light-independent reactions. During the light-dependent reactions, energy from sunlight is absorbed by pigments in the chloroplasts, primarily chlorophyll. This energy is then used to generate ATP (adenosine triphosphate) and NADPH (nicotinamide adenine dinucleotide phosphate), which are high-energy molecules that are used in the second stage of photosynthesis.

During the light-independent reactions, also known as the Calvin cycle, ATP and NADPH are used to power a series of chemical reactions that convert carbon dioxide into glucose. This process is also known as carbon fixation. The glucose produce

---
<div class="alert alert-block alert-info"><b>The model did pretty well. Notice it started with generating a period ("."). That's because LLMs are trained to generate next most probably token, given an input. Let's see if we can make the answer more concise, and we will end our prompt with a period this time.</b></div>

## Modifying the basic prompt
Below we will try giving the model more specific instructions.


In [13]:
prompt = "Explain photosythesis in three sentences."

for text in llm(prompt, max_new_tokens=250, temperature=0.1, stream=True):
    print(text, end="", flush=True)

 Photosynthesis is the process by which plants, algae and some bacteria convert sunlight, water and carbon dioxide into glucose, oxygen and other chemical compounds. It is an essential process for life on earth as it provides the primary source of energy and food for almost all living organisms. The process occurs in two stages; the light-dependent reactions which take place in the thylakoid membranes of chloroplasts, and the light-independent reactions which occur in the stroma of chloroplasts.

---

<div class="alert alert-block alert-info"><b>This response is more concise, our prompt worked quite well.</b></div>


# What is Fine tuning?

When interacting with large lanauge models, you can enhance the model performance through fine tuning.

Instruction tuning is a type of fine-tuning. We should be careful with the term "fine-tuning" because it encompasses many types of training in large language model context. For example, continued pre-training is another type of fine-tuning that aims to add additional knowledge beyond the base model. Training on domain-specific corpora such as medical articles, social media texts, and specific language data are examples. In contrast, instruction tuning aims to refine model behavior rather than adding knowledge. As a result of instruction tuning, a model can produce accurate responses with short or formatted prompts. It just like adding call-and-response style to the model.

Recent research enables "switching" model style by switching additional model weights to base model. [LoRA: Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2106.09685) is one way to freeze original model weights and fine-tune with additional weights. For examples, you can generate artworks in different styles by switching LoRA model weights of Stable Diffusion. We can find various weights online such as [CivitAI](https://civitai.com/). (However we should be mindful of copyright - in most cases, emulating specific author's style and damaging their repuration or opportunities is forbidden by law.)

"Fine tuning", which includes both instruction tuning and continued pre-training, requires datasets, computing resources, and extensive training time and effort. We have confirmed that we can change the behavior by chainging only the prompt. This is called "Prompt tuning". We need to consider the most appropriate ways to elicit the desired responses from large language models. The course "[Finetuning Large Language Models](https://www.deeplearning.ai/short-courses/finetuning-large-language-models/)" from DeepLearning.AI provides helpful guide for this decision.

# Instruction fine-tuning

Luckily, `Mistral-7B-Instruct` is already instruction tuned, meaning we could instruct the model to do specific tasks. This particular model was instruction tuned as described in the [model card](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF). It appears the model uses the following markup language:
```
<s>[INST] {prompt} [/INST]
```

Here is an example prompt exchange using the model template:

```
<s>[INST] What is your favourite condiment? [/INST]
Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s>
[INST] Do you have mayonnaise recipes? [/INST]
```

## Let's try it:

The following cell uses the prompt template to provide a system prompt to the model to ask it to provide its reasoning step-by-step which can help LLM response accuracy.

In [14]:
prompt = """
<s>[INST] Please tell me about how mistral winds have attracted super-orcas. Keep your answer brief. [/INST]
"""

print(f"Prompt:\n{prompt}")
print("Response:")
for text in llm(prompt, max_new_tokens=500, temperature=0.1, stop="<|im_end|>", stream=True):
    print(text, end="", flush=True)

Prompt:

<s>[INST] Please tell me about how mistral winds have attracted super-orcas. Keep your answer brief. [/INST]

Response:

Mistral winds are strong, cold winds that blow across the Mediterranean Sea from France to Italy. These winds have been known to attract super-orcas, also known as killer whales, due to their nutrient-rich waters and favorable conditions for hunting prey. The mistral winds create upwelling, which brings nutrients from the deep ocean to the surface, providing a plentiful food source for the super-orcas. Additionally, the strong winds help to create waves that can aid in hunting and playing. Overall, the combination of nutrient-rich waters and favorable hunting conditions makes the mistral winds an attractive force for super-orcas.

---
<div class="alert alert-block alert-info"><b>The model was able to follow the system direction to provide its step-by-step reasoning. However, manually supplying the full prompt template for each query is cumberson. In the next section we will look at how to simplify and automate prompt generation dynamically.</b></div>

# Dyanamic prompting
Now that we know how to submit instructions to Mistral LLM, let's try assembling prompts dynamically - part of the prompt can be fixed, and part can be provided on the fly. We achieve this by creating prompt template with variables. The value for each variable is supplied in a separate statement and that statement can be executed further down in the code. Basically, this is a way to de-couple static part of the prompt from variable one, making the entire prompt dynamic. 

In [15]:
from langchain.prompts import PromptTemplate

prompt_template = PromptTemplate.from_template(
    """
<s>[INST] {question} [/INST]    
"""
)

In [16]:
prompt = prompt_template.format(question="Explain photosynthesis in three sentences")
print(f"Prompt:\n{prompt}")
print("Response:")
for text in llm(prompt, max_new_tokens=500, temperature=0.1, stop="<|im_end|>", stream=True):
    print(text, end="", flush=True)

Prompt:

<s>[INST] Explain photosynthesis in three sentences [/INST]    

Response:
Photosynthesis is the process by which plants, algae and some bacteria convert sunlight, water and carbon dioxide into glucose (sugar), oxygen and other chemical compounds. It is an essential process for the survival of these organisms as it provides them with the energy needed to grow and reproduce. The process occurs in two stages: the light-dependent reactions, which take place in the thylakoid membranes of the chloroplasts, and the light-independent reactions, also known as the Calvin cycle, which occur in the stroma of the chloroplasts.

---
<div class="alert alert-block alert-info"><b>Using prompt templates makes it much easier to automate away the formatting nuances of different LLMs which can vary widely. Also notice, we did not finish our sentence with a period and the model did not try to complete it, becuase this time we signaled it that this is a command, an instrution which needs to be followed.</b></div>


# Few-shot prompts

We can influence model's output style by using what's called few-shot prompting - a technique which shows the model the exact behavior we expect on few examples. Few-shot prompting can be a powerful approach to improve model performance for a given task without additional training or fine-tuning. 

As you can see in the cell below, we provide the model two examples of a prompt and the expected response where the number of response sentences is constrained correctly by the request. We also influence the output formatting by demonstrating that we prefer sentences to be number in each response. 

In the end the full prompt with our request for a three sentence explanation of photosenthesis and let the model complete the example.  

In [17]:
prompt = """
<s>[INST] Explain precipitation in two sentences [/INST]
1. In meteorology, precipitation is any product of the condensation of atmospheric water vapor that falls from clouds due to gravitational pull.
2. The main forms of precipitation include drizzle, rain, sleet, snow, ice pellets, graupel and hail.</s>
[INST] Explain condensation in one sentence [/INST]
1. Condensation is the change of the state of matter from the gas phase into the liquid phase, and is the reverse of vaporization.</s>
[INST] Explain photosynthesis in three sentences  [/INST]
"""
for text in llm(prompt, max_new_tokens=250, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

1. Photosynthesis is the process by which plants use sunlight, carbon dioxide, and water to produce glucose and oxygen.
2. It is a vital process for the survival of most life forms on earth as it provides the primary source of energy and food.
3. The process can be simplified into two stages: the light-dependent reactions which take place in the thylakoid membranes of chloroplasts, and the light-independent reactions which occur in the stroma of chloroplasts.

<div class="alert alert-block alert-info"><b>And now with langchain template:</b></div>

In [18]:
from langchain.prompts.few_shot import FewShotPromptTemplate
from langchain.prompts.prompt import PromptTemplate

fs_template = PromptTemplate.from_template(
    """<s>[INST] {question} [/INST]
{output}</s>"""
)


examples = [
    {
        "question": "Explain precipitation in two sentences.",
        "output": "1. In meteorology, precipitation is any product of the condensation of atmospheric water vapor that falls from clouds due to gravitational pull.\n2. The main forms of precipitation include drizzle, rain, sleet, snow, ice pellets, graupel and hail.",
    },
    {
        "question": "Explain condensation in one sentence.",
        "output": "1. Condensation is the change of the state of matter from the gas phase into the liquid phase, and is the reverse of vaporization.",
    },
]

prompt_fs = FewShotPromptTemplate(
    examples=examples,
    example_prompt=fs_template,
    suffix="<s>[INST]{question}[/INST]",
    input_variables=["question"],
)

prompt = prompt_fs.format(question="Explain photosynthesis in three sentences.")
print("Prompt:")
print(prompt)

print("Response:")
for text in llm(prompt, max_new_tokens=250, temperature=0.1, stop="<|im_end|>", stream=True):
    print(text, end="", flush=True)

Prompt:
<s>[INST] Explain precipitation in two sentences. [/INST]
1. In meteorology, precipitation is any product of the condensation of atmospheric water vapor that falls from clouds due to gravitational pull.
2. The main forms of precipitation include drizzle, rain, sleet, snow, ice pellets, graupel and hail.</s>

<s>[INST] Explain condensation in one sentence. [/INST]
1. Condensation is the change of the state of matter from the gas phase into the liquid phase, and is the reverse of vaporization.</s>

<s>[INST]Explain photosynthesis in three sentences.[/INST]
Response:
 1. Photosynthesis is the process by which plants use sunlight to synthesize foods with the help of chlorophyll pigments.
2. During this process, carbon dioxide and water are converted into glucose and oxygen, respectively.
3. The glucose produced serves as a source of energy for the plant, while the oxygen is released back into the atmosphere.

<div class="alert alert-block alert-info"><b>Using a couple examples, we were able to teach the model to respond by specifically numbering the sentences in its response and limiting its response to the number of sentences requested.</b></div>


# Augmenting LLM response with context

Without providing any real-time context, the model is limited to knowledge it gained during training which may be incomplete or out of date. However, we can improve responses by providing relevant context that the model can use before formulating a response. 

To illustrate the benefit of providing context we will first query the model about which instance types are available to use for managed spot training in SageMaker. 


In [31]:
prompt = prompt_template.format(
    question="What is RMIT University RACE Hub?"
)

print(f"Prompt:\n{prompt}")
print("Response:")
for text in llm(prompt, max_new_tokens=300, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

Prompt:

<s>[INST] What is RMIT University RACE Hub? [/INST]    

Response:
The RMIT University RACE Hub (Research and Collaboration in Advanced Computing and Engineering) is a research center located at the Royal Melbourne Institute of Technology (RMIT) in Australia. The hub focuses on advanced computing and engineering research, with a particular emphasis on high-performance computing, data analytics, artificial intelligence, and cybersecurity.

The RACE Hub brings together researchers from various disciplines to collaborate on interdisciplinary projects that address complex problems in areas such as climate change, healthcare, finance, and transportation. The hub also provides access to state-of-the-art computing facilities and resources, including high-performance computers, cloud computing platforms, and data storage systems.

In addition to research, the RACE Hub also offers educational programs and workshops for students and professionals interested in advanced computing and eng

---
<div class="alert alert-block alert-info"><b>This response does not clearly and directly answer our question and you can see how without provided context, the LLM uses its own "knowledge" to answer questions. Let's try providing some context to the model to use as reference. We will add a new dynamic field "context" to our prompt template and provide up to date information when we ask the user question.</b></div>

In [32]:
from langchain.prompts import PromptTemplate

context_template = PromptTemplate.from_template(
    """
<s>[INST] Given the context below, answer the question that follows. If you do not know the answer and the context doesn't contain the answer truthfully say "I don't know".

Context: {context}
Question: {question}[/INST]
"""
)

context = """The RMIT University RACE Hub (Research and Collaboration in Advanced Computing and Engineering) is a research center located at the Royal Melbourne Institute of Technology (RMIT) in Australia. The hub focuses on advanced computing and engineering research, with a particular emphasis on high-performance computing, data analytics, artificial intelligence, and cybersecurity."""


prompt = context_template.format(
    context=context,
    question="What is RMIT University RACE Hub?",
)
print(f"Prompt:\n{prompt}")
print("Response:")
for text in llm(prompt, max_new_tokens=150, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

Prompt:

<s>[INST] Given the context below, answer the question that follows. If you do not know the answer and the context doesn't contain the answer truthfully say "I don't know".

Context: The RMIT University RACE Hub (Research and Collaboration in Advanced Computing and Engineering) is a research center located at the Royal Melbourne Institute of Technology (RMIT) in Australia. The hub focuses on advanced computing and engineering research, with a particular emphasis on high-performance computing, data analytics, artificial intelligence, and cybersecurity.
Question: What is RMIT University RACE Hub?[/INST]

Response:
The RMIT University RACE Hub is a research center located at the Royal Melbourne Institute of Technology (RMIT) in Australia that focuses on advanced computing and engineering research, with a particular emphasis on high-performance computing, data analytics, artificial intelligence, and cybersecurity.

---
<div class="alert alert-block alert-info"><b>Great! The model used the provided context to answer the user question. Now lets use the same context and ask the model about something that is not covered in the context. </b></div>

In [33]:
prompt = context_template.format(
    context=context, question="What funding is available for RMIT Researchers from RMIT RACE Hub?"
)
print(f"Prompt:\n{prompt}")
print("Response:")
for text in llm(prompt, max_new_tokens=150, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

Prompt:

<s>[INST] Given the context below, answer the question that follows. If you do not know the answer and the context doesn't contain the answer truthfully say "I don't know".

Context: The RMIT University RACE Hub (Research and Collaboration in Advanced Computing and Engineering) is a research center located at the Royal Melbourne Institute of Technology (RMIT) in Australia. The hub focuses on advanced computing and engineering research, with a particular emphasis on high-performance computing, data analytics, artificial intelligence, and cybersecurity.
Question: What funding is available for RMIT Researchers from RMIT RACE Hub?[/INST]

Response:
I don't know the specifics of the funding available for RMIT researchers from the RMIT RACE Hub. However, it is likely that the hub offers various funding opportunities for its researchers to support their research projects and collaborations. It would be best to check with the RMIT RACE Hub directly for more information on their fundin

---
<div class="alert alert-block alert-info"><b>The model followed our direction and admitted that it did not know the answer when it wasn't able to be determined from the provided context. This is a powerful method that is used heavily in RAG (Retrieval Augmented Generation) applications. A company may only want an IT support bot to provide answers based on internal documentaiton provided to the model as context. </b></div>

# Hallucinations

Hallucination is when a model makes up information and presents it as if it were true. Leading questions can often cause hallucinations. Let's see what happens if we ask about a non-existent new launch at re:Invent. 

In [36]:
prompt = prompt_template.format(
    question="What was the new school added to the RMIT University in Melbourne?"
)
print(f"Prompt:\n{prompt}")
print("Response:")
for text in llm(prompt, max_new_tokens=250, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

Prompt:

<s>[INST] What was the new school added to the RMIT University in Melbourne? [/INST]    

Response:
The new school added to RMIT University in Melbourne is the School of Health and Science. This school brings together various health-related disciplines such as nursing, midwifery, allied health, nutrition, exercise science, and public health under one roof. The aim of this school is to provide students with a comprehensive education that prepares them for careers in the health sector.

---
<div class="alert alert-block alert-info"><b>The model incorrectly states that AWS announced a new chemical solution called "AWS Graviton2" in 2019 (you may get a different answer). One approach to combatting LLM hallucination is to give the model an "out" by giving it explicit instructions not to make up information and admit when it doesn't know something. Let's try the same question again by by also adding additional instructions in our prompt as follows: </b></div>

In [37]:
prompt = prompt_template.format(
    question="What was the new school added to the RMIT University in Melbourne? Do not make up facts. Say I don't know if you have no information about something or unsure."
)
print(f"Prompt:\n{prompt}")
print("Response:")
for text in llm(prompt, max_new_tokens=250, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

Prompt:

<s>[INST] What was the new school added to the RMIT University in Melbourne? Do not make up facts. Say I don't know if you have no information about something or unsure. [/INST]    

Response:
I apologize, but I do not have any specific information about a new school being added to RMIT University in Melbourne. Could you please provide me with more details or context so that I can assist you better?

---
<div class="alert alert-block alert-info"><b>By giving the model clear instructions and an out, we were able to get the model to admit it didn't have enough context to answer the question rather than making up something.</b></div>

# Chatbot

Let's make a simple chatbot.  There is no special library to include and no setting to apply to the LLM, all we need is prompt engineering!

Let's use what we know of prompts with in context learning, to create a simple chatbot. 

In [42]:
question = "Who is the most famous researcher in Australia?"

prompt = prompt_template.format(question=question)
print(f"Prompt:\n{prompt}")
print("Response:")
resp = llm(prompt, max_new_tokens=250, temperature=0.1, stop="</s>")
print(resp)

Prompt:

<s>[INST] Who is the most famous researcher in Australia? [/INST]    

Response:
It's difficult to determine the "most famous" researcher in Australia as it can vary based on the field of study. However, some notable and influential researchers from Australia include:

1. Barry Marshall - A gastroenterologist who discovered the bacterium Helicobacter pylori and its role in gastritis and peptic ulcer disease. He is also known for his work on the development of the "Bernab√©b√©" method for treating children with autism spectrum disorder.
2. Elizabeth Blackburn - A molecular biologist who co-discovered telomeres, which are repetitive DNA sequences at the ends of chromosomes that play a crucial role in cellular aging and division. She was awarded the Nobel Prize in Physiology or Medicine in 2009.
3. Peter Doherty - An immunologist who shared the Nobel Prize in Physiology or Medicine in 1996 with Rolf M. Zinkernagel for their discoveries concerning the specificity of the cell media

### Follow-up questions
Now if we ask the LLM a follow-up question without providing context from the conversation so far, it will not be able to provide an accurate answer.

In [43]:
question = "How old is Barry Marshall?"

prompt = prompt_template.format(question=question)
print(f"Prompt:\n{prompt}")
print("Response:")
for text in llm(prompt, max_new_tokens=300, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

Prompt:

<s>[INST] How old is Barry Marshall? [/INST]    

Response:
I don't have access to real-time information, but I can tell you that Barry Marshall was born on September 24, 1951. Therefore, as of my last update, he would be approximately 70 years old. However, please double check with a reliable source for the most accurate information.

---
<div class="alert alert-block alert-info"><b>The model doesn't know who "he" is. Without the chat history context it is not able to answer.</b></div>

### Providing chat context
Let's use the context template we created earlier and feed in the previous response about Jeff Bezos. 

In [44]:
prompt = context_template.format(context=resp, question=question)
print(f"Prompt:\n{prompt}")
print("Response:")
for text in llm(prompt, max_new_tokens=300, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

Prompt:

<s>[INST] Given the context below, answer the question that follows. If you do not know the answer and the context doesn't contain the answer truthfully say "I don't know".

Context: It's difficult to determine the "most famous" researcher in Australia as it can vary based on the field of study. However, some notable and influential researchers from Australia include:

1. Barry Marshall - A gastroenterologist who discovered the bacterium Helicobacter pylori and its role in gastritis and peptic ulcer disease. He is also known for his work on the development of the "Bernab√©b√©" method for treating children with autism spectrum disorder.
2. Elizabeth Blackburn - A molecular biologist who co-discovered telomeres, which are repetitive DNA sequences at the ends of chromosomes that play a crucial role in cellular aging and division. She was awarded the Nobel Prize in Physiology or Medicine in 2009.
3. Peter Doherty - An immunologist who shared the Nobel Prize in Physiology or Medici

---
<div class="alert alert-block alert-info"><b>This is a simplified example, but by including the past chat context, we can support a back and forth chat bot interaction with the LLM.</b></div>

# More examples of LLM use cases:

## The following are optional prompts you can try running to keep exploring.


For simpler questions/prompts we can often get away witout strict instruction formatting:

In [45]:
prompt = """Rewrite the following sentence in better English:
I think I want to apply for this position but don't know how, can you help?
"""

for text in llm(prompt, max_new_tokens=150, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

I am considering applying for this position, however, I am uncertain about the process and would appreciate your assistance.

In [46]:
prompt = """I am getting the following error when attempting to run this pythong code, can you explain why?
  File "/home/studio-lab-user/sagemaker-studiolab-notebooks/ha.py", line 2, in <module>
    import zip
ModuleNotFoundError: No module named 'zip'
"""

for text in llm(prompt, max_new_tokens=150, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)

```
This error is indicating that the Python interpreter cannot find the `zip` module, which is a built-in module in Python. This could be due to a few reasons:

1. The Python interpreter you are using does not have the `zip` module installed. You can check this by running the following command in your terminal or command prompt:
```python
import zip
```
If the `zip` module is not installed, you will get an error message similar to the one you mentioned. To install the `zip` module, you can use pip by running the following command:
```
pip install zipfile
```
2. The Python interpreter you are using

In [47]:
prompt = """Reduct PII from the following paragraph. Replace any PII with ###.
Paragraph:
RACE Hub is located at Building 91, 110 Victoria Street, Carlton, VIC 3053. It's phone number is 111-123-4567.
"""

for text in llm(prompt, max_new_tokens=250, temperature=0.1, stop="</s>", stream=True):
    print(text, end="", flush=True)


Reduced paragraph:
RACE Hub is located at Building 91, 110 Victoria Street, Carlton, ###. Its phone number is ###-###-####.

---
The last one or two examples didn't quite work. Perhaps the model is not strong enough for this type of task. 

## Next steps: Want to use a different model with this notebook?

Hugging Face have many models that you can use and drop in to code like this. We encourage you to play with the prompts above and other models. You may need to make modifications to the code above depending on the model you choose.

