<a href="https://colab.research.google.com/github/LoniQin/Large-Language-Models-A-Hands-On-Guide/blob/main/Chatbot_with_mistral_7b_instruct.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chatbot with Mistral-7B-Instruct

The Mistral-7B-Instruct-v0.1 Large Language Model (LLM) is a instruct fine-tuned version of the Mistral-7B-v0.1 generative text model using a variety of publicly available conversation datasets.

This instruction model is based on Mistral-7B-v0.1, a transformer model with the following architecture choices:

* Grouped-Query Attention
* Sliding-Window Attention
* Byte-fallback BPE tokenizer

## References
* https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
* https://mistral.ai/news/announcing-mistral-7b/
* [Mistral 7B](https://arxiv.org/pdf/2310.06825v1.pdf)


## Install dependencies

In [2]:
!pip install accelerate
!pip install transformers==4.36.2

Collecting accelerate
  Downloading accelerate-0.26.1-py3-none-any.whl (270 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/270.9 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m [32m266.2/270.9 kB[0m [31m7.9 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m270.9/270.9 kB[0m [31m6.3 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: accelerate
Successfully installed accelerate-0.26.1
Collecting transformers==4.36.2
  Downloading transformers-4.36.2-py3-none-any.whl (8.2 MB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.2/8.2 MB[0m [31m30.0 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: transformers
  Attempting uninstall: transformers
    Found existing installation: transformers 4.35.2
    Uninstalling transformers-4.35.2:
      Successfully uninstalled transformers-4.35.2
Successfully installed transformers

## Load model

In [1]:
import torch
import gc
from transformers import AutoModelForCausalLM, AutoTokenizer

device = "cuda" # the device to load the model onto

model = AutoModelForCausalLM.from_pretrained(
    "mistralai/Mistral-7B-Instruct-v0.1",
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-7B-Instruct-v0.1")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/1.47k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [3]:

messages = [
    {"role": "user", "content": "What is your favourite condiment?"},
    {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"},
    {"role": "user", "content": "Do you have mayonnaise recipes?"}
]

encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")

model_inputs = encodeds.to(device)

generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded[0])

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] What is your favourite condiment? [/INST]Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!</s>  [INST] Do you have mayonnaise recipes? [/INST] Sure! Here's a simple recipe for classic mayonnaise:

Ingredients:

* 2 large egg yolks
* 1 tablespoon Dijon mustard
* 1 teaspoon white wine vinegar
* 1 teaspoon salt
* 1/4 teaspoon black pepper
* 1/2 to 1 cup light olive oil

Instructions:

1. In a small mixing bowl, whisk together the egg yolks, mustard, vinegar, salt, and pepper until smooth.
2. Slowly drizzle in the olive oil, whisking constantly as you do so, until the mixture thickens and emulsifies.
3. Taste and adjust seasoning as needed.

That's it! This classic mayonnaise recipe can be used as a base for other variations like aioli, remoulade, and even dressing for salads.</s>


## Usage

In [5]:
def chat(question, max_new_tokens=1000, role = "user"):
  messages = [
    {"role": role, "content": question},
  ]
  encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt")
  model_inputs = encodeds.to(device)
  generated_ids = model.generate(model_inputs, max_new_tokens=max_new_tokens, do_sample=True)
  decoded = tokenizer.batch_decode(generated_ids)
  return decoded[0]

### Direct completion

In [6]:
chat("Sebastien is in London today, it’s the middle of July yet it’s raining, so Sebastien is feeling gloomy. He")

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


"<s> [INST] Sebastien is in London today, it’s the middle of July yet it’s raining, so Sebastien is feeling gloomy. He [/INST] seems like he's in a bad mood today. It's understandable, given the weather and the fact that it's the middle of July. The weather can dampen one's spirits significantly. Hopefully, Sebastien can warm up his day by doing something he enjoys, like going to a museum or seeing a show.</s>"

### Chain-of-thought prompting

In [7]:
%%time
question = '''
Suppose Alice originally had 3 apples, then Bob gave Alice 7 apples, then Alice gave
  Cook 5 apples, and then Tim gave Alice 3x the amount of apples Alice had.  How many
  apples does Alice have now?
  Let’s think step by step.
'''
chat(question)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


CPU times: user 1min 29s, sys: 1.3 s, total: 1min 30s
Wall time: 1min 30s


"<s> [INST] \nSuppose Alice originally had 3 apples, then Bob gave Alice 7 apples, then Alice gave\n  Cook 5 apples, and then Tim gave Alice 3x the amount of apples Alice had.  How many\n  apples does Alice have now? \n  Let’s think step by step.\n [/INST] Let's start with the original number of apples Alice had: 3\n\nBob gave Alice 7 apples, so the new number of apples Alice had is 3 + 7 = 10\n\nAlice gave 5 apples to Cook, so the new number of apples Alice had is 10 - 5 = 5\n\nFinally, Tim gave Alice 3x the amount of apples Alice had. Since we know that Alice had 5 apples:\n\nTim gave Alice: 3 \\* 5 = 15 apples\n\nNow you can add the apples that Alice had before receiving any gifts (3 apples) to the apples she received from each person (7 apples from Bob + 5 apples from Cook) plus the apples Tim gave her:\n\nTotal apples: 3 + 7 + 5 + 15 = 25\n\nTherefore, Alice has 25 apples now.</s>"

## Explain code step by step

In [11]:
%%time
question = '''
What does this script do?
```
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('', 0))
s.listen(1)
conn, addr = s.accept()
print('Connected by', addr)
return conn.getsockname()[1]
```
Let’s think step by step.
'''
result = chat(question)
print(result)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] 
What does this script do?
```
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.bind(('', 0))
s.listen(1)
conn, addr = s.accept()
print('Connected by', addr)
return conn.getsockname()[1]
```
Let’s think step by step.
 [/INST] This script creates a socket, binds it to an address and port, and then listens for incoming connections. When a connection is accepted, the script gets the address of the connected client and prints it to the console before returning the port number of the connection. Here's a step-by-step breakdown of what the script does:

1. `s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)`: creates a new socket object. `socket.AF_INET` is the address family and specifies that we are working with IPv4 addresses, while `socket.SOCK_STREAM` is the socket type and specifies that we want a connection-oriented, stream-based socket.
2. `s.bind(('', 0))`: binds the socket to an address ('') and a port (0). The address '' means that the socket will listen on al

## Recommend a book

In [12]:
result = chat("Can you recommend a fondational and accessible book about quantum physics?")
print(result)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] Can you recommend a fondational and accessible book about quantum physics? [/INST] One highly recommended book for understanding the basics of quantum physics is "Quantum Physics: The Theoretical Minimum" by Leonard Susskind and Art Friedman. It's written in a lucid and accessible style that is perfect for readers with little to no prior knowledge of the subject. The book provides a comprehensive introduction to key topics such as wave-particle duality, superposition, entanglement, and measurement. It also includes explanations of more advanced concepts like quantum field theory and the unification of quantum mechanics with general relativity. Overall, this book is an excellent starting point for anyone interested in learning more about the fascinating world of quantum physics.</s>


## Recommend a movie

In [14]:
result = chat("Can you recommend a movie about Marval superhero?")
print(result)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] Can you recommend a movie about Marval superhero? [/INST] The Marvel Cinematic Universe has a wide range of films featuring the characters you have grown to love. The first movie that introduced the Marvel character "Marvel Cinematic Universe" is actually "Iron Man" (2008), directed by Jon Favreau, with Robert Downey Jr. (Tony Stark) and Gwyneth Paltrow (Pepper Potts) playing the main roles. If you haven't seen it already, I highly recommend this film as it sets the tone for the entire MCU.</s>


## Make a plan to learn machine learning in new year

In [None]:
result = chat("Can you help me make a plan to learn machine learning in new year?")
print(result)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
