## Setup

Load libraries:

Note: See Appendix on how to get your Hugging Face 🤗 API key.

In [1]:
import os
import torch
import warnings
import transformers
from dotenv import load_dotenv

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate 
from langchain_community.llms.huggingface_pipeline import HuggingFacePipeline

load_dotenv() # take environment variables from .env.
api_key = os.getenv("Huggingface_API_key")


warnings.filterwarnings('ignore') # ignore warnings

# Set Transformer verbosity (only use this when you're sure your code is correct!)
transformers.utils.logging.set_verbosity(40)

## Check available device

Here, You are checking the device (`cuda`, `mps`, or `cpu`) available on your system. For Mac users, you will get either `cpu` or `mps`. For Windows or Linux users, you will get either `cpu` or `cuda`.

In [2]:
device = (
    "cuda"
    if torch.cuda.is_available()
    else "mps"
    if torch.backends.mps.is_available()
    else "cpu"
)

print(f"Using device: {device}")

Using device: cuda


## Configure Model

Here, we would configure our and download the Mixtral-8x7B (M8x7B) model using Hugging Face's transformers. Setting `device_map="auto"` first utilize the GPU(s) memory, then CPU memory if needed, and finally stores data on the disk when both memory types are full. Also, we are loading the 4-bits precision model to save memory.

Link to M8x7B on 🤗: [Mixtral-8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).

In [3]:
model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1" # the model id on 🤗

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    token=api_key
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    device_map='auto',
    token=api_key,
    load_in_4bit=True
)

Loading checkpoint shards:   0%|          | 0/19 [00:00<?, ?it/s]

* Let's set the model to evaluation mode

In [4]:
# set model to evaluation mode

model.eval()

print("Model set to evaluation mode.")

Model set to evaluation mode.


## Load Tokenizer

We will instantiate a `tokenizer` designed to process natural language input by converting it into token lists compatible with the input layer of the M8x7B LLM. Note that we set `padding_side='left'` because we are working with a *decoder only* model. You can learn mode about decoder only models here on [Hugging Face](https://huggingface.co/learn/nlp-course/chapter1/6?fw=pt).

In [5]:
tokenizer = transformers.AutoTokenizer.from_pretrained(
    model_id,
    token=api_key,
    padding_side='left'
)

## Instruction Format

To get the most out of M8x7B, you must follow the instruction format as outlined by [MistralAi](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1#instruction-format). The instruction format for M8x7B is:

```
<s> [INST] Instruction [/INST] Model answer</s> [INST] Follow-up instruction [/INST]
```

Where:

* `<s> `and `</s>` are special tokens used by MixtralAI to signify the beginning of string (BOS) and end of string (EOS).
* Instruction is the user message.
* Model answer is where the model response goes.
* [/INST] and [INST] indicates the start and end of user messages.

Note: for enforcing guardrails, prepend the instruction with your safety/syatem prompt. For this project, I will use the safety prompt used in the [Mistral 7B paper](https://arxiv.org/pdf/2310.06825.pdf).

You can easily get the instruction form using 🤗's `apply_chat_template()` method. Let's see an example.

In [6]:
# 🤗's Approach

chat = [
  {"role": "user", "content": "Hello!"},
  {"role": "assistant", "content": "Hello. How can I help you today?"},
  {"role": "user", "content": "I'd like to show off how chat templating works!"},
]

print(tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=True))

<s>[INST] Hello! [/INST]Hello. How can I help you today?</s>[INST] I'd like to show off how chat templating works! [/INST]


* Let's create a custom function that takes the user's prompt and dynamically converts it to M8x7B's format.

In [7]:
def text_to_mixtral_template(instruction: str, safety_mode: bool = True) -> str:

    if safety_mode:
        safety_prompt = (
            "Always assist with care, respect, and truth. Respond with utmost utility yet securely. "
            "Avoid harmful,unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity."
        )
        
        instruction = f"{safety_prompt} {instruction}"
    
    chat = [
        {"role": "user", "content": "Hello, how are you?"},
        {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
        {"role": "user", "content": instruction}
    ]

    hf_output = tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)

    return hf_output

* Let's define a function that formats our text. 

In [8]:
formatted_text=text_to_mixtral_template("I would like to book a flight to Paris.", safety_mode=True)
print(formatted_text)

<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s>[INST] Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful,unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity. I would like to book a flight to Paris. [/INST]


## Create Text Generation Pipeline

Let's create a text generation pipeline to plug into the LangChain API.

In [9]:
hf_pipeline = transformers.pipeline(
    model=model, 
    tokenizer=tokenizer,
    return_full_text=True,  
    task='text-generation',
    framework="pt",
    temperature=0.1,
    max_new_tokens=512,  
    repetition_penalty=1.1, # 
    do_sample=True, 
)

local_llm = HuggingFacePipeline(
    pipeline=hf_pipeline,
)

## Setup LangChain

LangChain is an open source Python framework for building applications powered by Large Language Models (LLMs). Learn more about [LangChain](https://python.langchain.com/docs/get_started/introduction).

We will be using the `PromptTemplate` and `LLMChain`.

* [PromptTemplate](https://python.langchain.com/docs/modules/model_io/prompts/quick_start#prompttemplate): In LangChain, PromptTemplate is used to create a template for a string prompt. Note that you must use the correct model instruction format.
* [LLMChain](https://api.python.langchain.com/en/stable/chains/langchain.chains.llm.LLMChain.html#langchain.chains.llm.LLMChain): In LangChain, a chain is used to link reusable components together. `LLMChain` is used to link prompt template, LLM, and other componenets togther.

In [10]:
input_text="{question}"
question = "Does Twitter have a new name?"
template = text_to_mixtral_template(input_text, safety_mode=True)

prompt_template = PromptTemplate(template=template, input_variables=["question"])
llm_chain = LLMChain(prompt=prompt_template, llm=local_llm)
result=llm_chain.invoke(question)
print(result['text'])

 No, as of my knowledge up to this point, Twitter has not changed its name. It is still known as "Twitter." If there have been any recent changes, I would recommend checking the latest news sources for the most accurate information. Is there anything else I can help you with?


## Let's Create a Question and Answer Function

In [11]:
def qa_with_m8x7b(question: str, safety_mode: bool = True) -> str:
    input_text="{question}"
    template = text_to_mixtral_template(input_text, safety_mode=safety_mode)

    prompt_template = PromptTemplate(template=template, input_variables=["question"])
    llm_chain = LLMChain(prompt=prompt_template, llm=local_llm)
    result=llm_chain.invoke(question)
    return result['text'].strip()

In [12]:
qa_with_m8x7b("Who is the current president of Nigeria?")

'As of my last update, the President of Nigeria is Muhammadu Buhari. He has been in office since May 29, 2015. However, I recommend double-checking the most recent sources to confirm as this information could have changed.'

## What did you notice?

Our model's knowledge is out of date! We need to "augment" our model's knowledge with user-specific data. This is what RAG is all about.

In [15]:
def text_with_context_to_mixtral_template(instruction: str, context: str, safety_mode: bool = True) -> str:

    context_instruction = f"Answer the following question using the provided context. \n\ncontext: {context}"

    if safety_mode:
        safety_prompt = (
            "Always assist with care, respect, and truth. Respond with utmost utility yet securely. "
            "Avoid harmful,unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity."
        )
        
        instruction = f"{safety_prompt}\n\n{context_instruction} \n\nquestion: {instruction}"

    else:
        instruction = f"{context_instruction} \n\nquestion: {instruction}"
    
    chat = [
        {"role": "user", "content": "Hello, how are you?"},
        {"role": "assistant", "content": "I'm doing great. How can I help you today?"},
        {"role": "user", "content": instruction}
    ]

    return tokenizer.apply_chat_template(chat, tokenize=False, add_generation_prompt=False)

In [16]:
context="It's sunny outside."
question = "What is the weather today?"
formatted_text=text_with_context_to_mixtral_template(instruction=question, context=context, safety_mode=True)
print(formatted_text)

<s>[INST] Hello, how are you? [/INST]I'm doing great. How can I help you today?</s>[INST] Always assist with care, respect, and truth. Respond with utmost utility yet securely. Avoid harmful,unethical, prejudiced, or negative content. Ensure replies promote fairness and positivity.

Answer the following question using the provided context. 

context: It's sunny outside. 

question: What is the weather today? [/INST]


In [17]:
def qa_with_context_m8x7b(question: str, context: str, safety_mode: bool = True) -> str:
    input_text="{question}"
    input_context="{context}"
    template = text_with_context_to_mixtral_template(instruction=input_text, context=input_context, safety_mode=safety_mode)

    prompt_template = PromptTemplate(template=template, input_variables=["question", "context"])
    llm_chain = LLMChain(prompt=prompt_template, llm=local_llm)
    result=llm_chain.invoke({"question": question, "context": context})
    return result['text'].strip()

## Let's get Context from Wikipedia

* Context about Nigeria's president: [click here](https://en.wikipedia.org/wiki/President_of_Nigeria#Fourth_Republic_(1999%E2%80%93present)).

In [18]:
nigeria_context="""
On 29 May 1999, General Abdulsalami Abubakar stepped down, and handed over power to a former military head of state, Olusegun Obasanjo, after being elected some months prior. Obasanjo served two terms in office.
On 29 May 2007, Umaru Musa Yar'Adua was sworn in as president of the Federal Republic of Nigeria and the 13th head of state completing the first successful transition of power, 
from one democratically elected president to another in Nigeria. Yar'Adua died on 5 May 2010 at the presidential villa, in Abuja, Nigeria, becoming the second head of state to die there after General Sani Abacha.
On 6 May 2010, Vice President Goodluck Jonathan was sworn in as president of the Federal Republic of Nigeria and the 14th head of state.
On 29 May 2015, Muhammadu Buhari, a former military head of state was sworn in as president of the Federal Republic of Nigeria and the 15th head of state after winning the general election. He also served two terms in office.
On 29 May 2023, Bola Tinubu was sworn in as president of the Federal Republic of Nigeria and the 16th head of state after winning the 2023 Nigerian general election.
"""

In [19]:
print(qa_with_context_m8x7b(question="Who is the current president of Nigeria?", context=nigeria_context))

The current president of Nigeria is Bola Tinubu. He was sworn in as the 16th head of state on May 29, 2023, after winning the 2023 Nigerian general election. Prior to his presidency, he served as the Governor of Lagos State from 1999 to 2007. He is a member of the All Progressives Congress (APC) party.

It is important to note that this information is based on the context provided and may vary if new events occur or new information becomes available.


* Context about Twitter's name: [click here](https://en.wikipedia.org/wiki/Twitter#:~:text=Although%20the%20service%20is%20now,name%20redirecting%20to%20that%20address.&text=The%20service%20is%20owned%20by,the%20successor%20of%20Twitter%2C%20Inc.).

In [20]:
twitter_context="""
X, formerly (and still colloquially) known as Twitter, is a social media website based in the United States. 
With over 500 million users, it is one of the world's largest social networks. 
Users can share and post text messages, images, and videos known historically as "tweets". 
X also includes direct messaging, video and audio calling, bookmarks, lists and communities, and Spaces, a social audio feature. 
Users can vote on context added by approved users using the Community Notes feature. 
Although the service is now called X, the primary URL remains twitter.com as of January 2024, with the x.com domain name redirecting to that address.
"""

In [21]:
print(qa_with_context_m8x7b(question="Does Twitter have a new name?", context=twitter_context))

Yes, Twitter has changed its name to X. However, the primary URL for the service remains twitter.com as of January 2024. The x.com domain name redirects to that address.


## Putting it Together

Now that we have seen the componenets, let put our codes together in a script and call it like a package.

In [3]:
from utils.rag101 import NaiveRAG

rag = NaiveRAG(model=None, tokenizer=None) ## initialize RAG

## question without context

question = "Write me a haiku about the weather."

response=rag.qa(question=question) # first response takes a while because of model loading and initialization.
print(response)

Spring rain falls gently,
Buds bloom, waking from winter's sleep,
Nature's rebirth sings.


In [6]:
question="What was X formally known as?"
response=rag.qa_with_context(question=question, context=twitter_context)
print(response)

X was formally known as Twitter. Even though its official name has been changed, the primary URL for the site is still twitter.com as of January 2024. The domain name x.com redirects to the twitter.com address.


## Final Thoughts

We have a chatbot that works completely offline/on our PC. But there are a few problem.

1. Our chat does not have memory.
2. Copy and pasting contents is not an efficient way to handle additional context. What if the content is longer than the contenxt window?

We will address Memory issue in RAG102 and context window issue in RAG103.

## Appendix

### Getting Hugging Face API key

1. **Create an Account**: Sign up for an account on [the Hugging Face website](https://huggingface.co).
2. **Generate an API Key**: Once logged in, go to your account settings, click on Access Tokens, and generate a new API key.