## Setup

Load libraries:

Note: See Appendix on how to get your Hugging Face 🤗 API key.

In [25]:
import os
import torch
import textwrap
import transformers
from dotenv import load_dotenv
from langchain import HuggingFacePipeline, PromptTemplate, LLMChain

load_dotenv() # take environment variables from .env.
api_key = os.getenv("Huggingface_API_key")

## Check available device

Here, You are checking the device (`cuda`, `mps`, or `cpu`) available on your system. For Mac users, you will get either `cpu` or `mps`. For Windows or Linux users, you will get either `cpu` or `cuda`.

In [2]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

print(f"Using device: {device}")

Using device: cuda


## Configure Model

Here, we would configure our and download the Mixtral-8x7B (M8x7B) model using Hugging Face's transformers. Setting `device_map="auto"` first utilize the GPU(s) memory, then CPU memory if needed, and finally stores data on the disk when both memory types are full. Also, we are loading the 4-bits precision model to save memory.

Link to M8x7B on 🤗: [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).

In [3]:
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1" # the model id on 🤗

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    token=api_key
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    device_map='auto',
    token=api_key,
    load_in_4bit=True
)

Loading checkpoint shards:   0%|          | 0/19 [00:00<?, ?it/s]

* Let's set the model to evaluation mode

In [6]:
# set model to evaluation mode

model.eval()

print("Model set to evaluation mode.")

Model set to evaluation mode.


## Load Tokenizer

We will instantiate a `tokenizer` designed to process natural language input by converting it into token lists compatible with the input layer of the M8x7B LLM. Note that we set `padding_side='left'` because we are working with a *decoder only* model. You can learn mode about decoder only models here on [Hugging Face](https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt).

In [7]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    token=api_key,
    padding_side='left',
    truncation_side='right'
)

## Instruction Format

To get the most out of M8x7B, you must follow the instruction format as outlined by [MistralAi](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1#instruction-format). The instruction format for M8x7B is:

```{bash}
<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
```

Where:

* `<s> `and `</s>` are special tokens used by MixtralAI to signify the beginning of string (BOS) and end of string (EOS).
* Instruction is the user message.
* Model answer is where the model response goes.
* [/INST] and [INST] indicates the start and end of user messages.

Note: for enforcing guardrails, prepend the instruction with your safety/syatem prompt. For this project, I will use the safety prompt used in the [Mistral 7B paper](https://arxiv.org/pdf/2310.06825.pdf).

You can easily get the instruction form using 🤗's `apply_chat_template()` method. Let's see an example.

In [15]:
# 🤗's Approach

chat = [
  {"role": "user", "content": "Hello!"},
  {"role": "assistant", "content": "Hello. How can I help you today?"}
]

print(tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True))

<s>[INST] Hello! [/INST]Hello. How can I help you today?</s>


* Let's create a custom function that takes the user's prompt and dynamically converts it to M8x7B's format.

In [20]:
def text_to_mixtral_template(instruction: str, safety_mode: bool = True) -> str:

    if safety_mode:
        safety_prompt = (
            "Always assist with care, respect, and truth. Respond with utmost utility yet securely. "
            "Avoid harmful,unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity."
        )
        
        instruction = f"{safety_prompt} {instruction}"
    
    chat = [
        {"role": "user", "content": instruction},
        {"role": "assistant", "content": "Assistant: "}
    ]

    return tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)

* Let's define a function that formats our text. 

In [28]:
def text_wrap(text: str,) -> str:
    filled_text = textwrap.fill(text, width=50)
    print(filled_text)

formatted_text=text_to_mixtral_template("Hello, I would like to book a flight to Paris.", safety_mode=True)
text_wrap(formatted_text)

<s>[INST] Always assist with care, respect, and
truth. Respond with utmost utility yet securely.
Avoid harmful,unethical, prejudiced, or negative
content. Ensure replies promote fairness and
positivity. Hello, I would like to book a flight
to Paris. [/INST]Assistant: </s>


## Create Text Generation Pipeline

Let's create a text generation pipeline to plug into the LangChain API.

In [30]:
hf_pipeline = transformers.pipeline(
    model=model, 
    tokenizer=tokenizer,
    return_full_text=True,  
    task='text-generation',
    framework="pt",
    temperature=0.7,
    max_new_tokens=512,  
    repetition_penalty=1.1, # 
    do_sample=True, 
)

local_llm =HuggingFacePipeline(
    pipeline=hf_pipeline,
)

## Setup LangChain

In [18]:
input_text="{question}"
question: str = "What is the name of the current president of Nigeria?"
template = text_to_mixtral_template(input_text)

prompt_template = PromptTemplate(template=template, input_variables=["question"])
# llm_chain = LLMChain(prompt=prompt_template, llm=local_llm)
# result=llm_chain.invoke(question)

In [19]:
prompt_template

PromptTemplate(input_variables=['question'], template="<s> [INST] You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. \nIf you don't know the answer to a question, please don't share false information. [/INST] {question} </s>")

In [50]:
input_text="{question}"
question: str = "What is Twitter's new name?"
template = text_to_mixtral_template(input_text)

prompt_template = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt_template, llm=local_llm)
result=llm_chain.invoke(question)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


In [51]:
print_result(result)

As of now, Twitter has not announced any changes to their name. They are still known as Twitter, Inc. If there are any updates in the future, I will make sure to provide you with accurate and reliable information. Is there anything else I can assist you with?


In [41]:
def print_result(result):
    formatted_result = result['text'][result['text'].find("\n"):].strip()
    print(formatted_result)

In [21]:
print(result['text'][result['text'].find("\n"):].strip())

As of my last update, the current president of Nigeria is Muhammadu Buhari. He has been in office since May 29, 2015. However, I recommend checking the most recent sources to confirm as this information might have changed.


In [35]:
prompt_template=PromptTemplate(
    input_variables=["question", "context"],
    template="""<s> [INST] You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. 
    If you don't know the answer to a question, please don't share false information. [/INST] Answer the following question using the provided context. 
    \n context: {context} \n\n question: {question} </s>""",
)

In [36]:
context="""
On 29 May 1999, General Abdulsalami Abubakar stepped down, and handed over power to a former military head of state, Olusegun Obasanjo, after being elected some months prior. Obasanjo served two terms in office.

On 29 May 2007, Umaru Musa Yar'Adua was sworn in as president of the Federal Republic of Nigeria and the 13th head of state completing the first successful transition of power, from one democratically elected president to another in Nigeria. Yar'Adua died on 5 May 2010 at the presidential villa, in Abuja, Nigeria, becoming the second head of state to die there after General Sani Abacha.

On 6 May 2010, Vice President Goodluck Jonathan was sworn in as president of the Federal Republic of Nigeria and the 14th head of state.

On 29 May 2015, Muhammadu Buhari, a former military head of state was sworn in as president of the Federal Republic of Nigeria and the 15th head of state after winning the general election. He also served two terms in office.

On 29 May 2023, Bola Tinubu was sworn in as president of the Federal Republic of Nigeria and the 16th head of state after winning the 2023 Nigerian general election.
"""

In [37]:
llm_chain = LLMChain(prompt=prompt_template, llm=local_llm)
result=llm_chain.invoke({"question": question, "context": context})

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.
A decoder-only architecture is being used, but right-padding was detected! For correct generation results, please set `padding_side='left'` when initializing the tokenizer.


In [42]:
print_result(result)

The current president of Nigeria is Bola Tinubu, who was sworn in as president on 29 May 2023.


## Appendix