### Installations

In [None]:
!pip install langchain_huggingface
!pip install -q -U bitsandbytes
!pip install -q -U accelerate
!pip install -q -U transformers

### Imports and preliminary checks

In [1]:
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig, BitsAndBytesConfig
import torch
from langchain_huggingface import HuggingFacePipeline

device = 'cuda' if torch.cuda.is_available() else 'cpu'

print("Device:", device)
if device == 'cuda':
  print(torch.cuda.get_device_name(0))

Device: cuda
NVIDIA GeForce RTX 4060 Laptop GPU


In [2]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Model loading

**Sharding:** Sharding is the process of loading a larger model in smaller increments (approximately 1.9 GB each) in the notebook and subsequently aggregated into a single file. This allows us to seamlessly utilize the LLM on a GPU with lesser memory than its total size.

We use sharding to load the 14 GB Mistral 7B model onto the 8GB RTX 4060 GPU by loading it in small packets of 1.9GB each. We are loading a 4-bit quantized version of the model, which effectively compresses 7B parameters to ~14 GB.

Other dependencies used:
1. BitsAndBytesConfig file:
    - `load_in_4bit`: set to `True` to allow loading the 4-bit quantized version of the model
    - `bnb_4bit_use_double_quant`: set to `True` to allow additional quantization after `load_in_4bit` to save an additional 0.4 bits per parameter
    - `bnb_4bit_quant_type`: set to `nf4` to allow operations on 4-bit data types
    - `bnb_4bit_compute_dtype`: set to bfloat16 to reduce memory overhead during inference
2. AutoModelForCausalLM.from_pretrained:
    - loads a pre-trained model given the model path and the BitsAndBytes config file. We are loading Mistral 7B here.
3. AutoTokenizer.from_pretrained:
    - loads the pre-trained tokenizer for Mistral 7B.

In [3]:
# Downloading Mistral 7B
original_model_path = "mistralai/Mistral-7B-Instruct-v0.1"
model_path = "filipealmeida/Mistral-7B-Instruct-v0.1-sharded"
bnb_config = BitsAndBytesConfig (
                load_in_4bit=True,
                bnb_4bit_use_double_quant=True,
                bnb_4bit_quant_type="nf4",
                bnb_4bit_compute_dtype=torch.bfloat16
            )
model = AutoModelForCausalLM.from_pretrained(model_path, trust_remote_code=True,
                                             quantization_config=bnb_config,
                                             device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(original_model_path)

Loading checkpoint shards:   0%|          | 0/8 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development


tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

## Pipelines for running LLMs
1. HuggingFace Transformers pipeline
    - Simple API to load all the complex code, including those for Natural Entity Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction, and Question-Answering.
    - Simple inference is a `text-generation` task.
2. LangChain HuggingFace pipeline
    - Integrates `transformers.pipeline` with LangChain
    - LangChain is used extensively with LLMs. Hence, it is integrated from the get go.

In [4]:
text_generation_pipeline = transformers.pipeline(
    model=model,
    tokenizer=tokenizer,
    task="text-generation",
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.eos_token_id,
    repetition_penalty=1.1,
    return_full_text=True,
    max_new_tokens=100,
    temperature=0.5,
    do_sample=True,
)
mistral_llm = HuggingFacePipeline(pipeline=text_generation_pipeline)

## Querying the model

In [5]:
text = "What is the purpose of life (answer in 100 words)?"
response = mistral_llm.invoke(text)
print(response)

  attn_output = torch.nn.functional.scaled_dot_product_attention(
Starting from v4.46, the `logits` model output will have the same type as the model (except at train time, where it will always be FP32)


What is the purpose of life (answer in 100 words)? The purpose of life is to live a fulfilling existence that brings happiness, peace, and satisfaction. This can be achieved through various means such as pursuing one's passions, developing meaningful relationships, contributing to society, and striving for personal growth and self-improvement. Ultimately, the purpose of life is to find joy and contentment in the present moment, while also striving to make a positive impact on the world around us.


In [6]:
import re
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

# Prompt template
prompt_template = '''You are a reliable and trustworthy assistant, providing helpful and respectful responses that preceisely address the context.
Answer the question below with the given context:
{context}
{question}
Answer:
'''

question = 'When did Virat Kohli first win?'
context = '''
Virat Kohli was born on November 5, 1988, in Delhi, India. He grew up in Delhi and was one of the first to train at the West Delhi Cricket Academy, created in 1998. In 2002 he played for the Delhi Under-15 team and was the highest run scorer in the 2003–04 Vijay Merchant Trophy, playing for the Delhi Under-17 team.

In February 2006 Kohli made his domestic debut for Delhi in a one-day match against Services (a team representing the Indian armed forces) but did not get a chance to bat. He scored only 10 runs in his first-class debut (first-class cricket refers to matches that last three or more days and feature two sides of 11 players each) against Tamil Nadu in November that year. He scored 90 runs in difficult conditions in a first-class match against Karnataka in December, helping Delhi draw the Test. In April 2007 he scored 35 runs in his T20 domestic debut against Himachal Pradesh.

Kohli captained the Under-19 Indian cricket team to victory at the ICC Under-19 World Cup in Kuala Lumpur, Malaysia, in 2008. His exploits were rewarded with an IPL contract from RCB for $30,000. He also made his international debut in an ODI that year, opening the batting and scoring 12 in a defeat of Sri Lanka in Dambulla, Sri Lanka. In 2009 he scored 405 runs in nine innings in the Emerging Players Tournament in Australia, ensuring that he would be at the top of the national team selectors’ minds.
'''

prompt = PromptTemplate(template=prompt_template, input_variables=['question', 'context'])

llm_chain = prompt | mistral_llm | output_parser

response = llm_chain.invoke({"question":question, "context":context})
print(response)
print()

# Applying regex to extract only answer from the LLM response
match = re.search(r'Answer:\n(.*?)$', response, re.DOTALL | re.MULTILINE)
if match:
    answer = match.group(1).strip()
    print(answer)

You are a reliable and trustworthy assistant, providing helpful and respectful responses that preceisely address the context.
Answer the question below with the given context:

Virat Kohli was born on November 5, 1988, in Delhi, India. He grew up in Delhi and was one of the first to train at the West Delhi Cricket Academy, created in 1998. In 2002 he played for the Delhi Under-15 team and was the highest run scorer in the 2003–04 Vijay Merchant Trophy, playing for the Delhi Under-17 team.

In February 2006 Kohli made his domestic debut for Delhi in a one-day match against Services (a team representing the Indian armed forces) but did not get a chance to bat. He scored only 10 runs in his first-class debut (first-class cricket refers to matches that last three or more days and feature two sides of 11 players each) against Tamil Nadu in November that year. He scored 90 runs in difficult conditions in a first-class match against Karnataka in December, helping Delhi draw the Test. In April