
**Model Card for Mistral-7B**: [Click here](https://huggingface.co/mistralai/Mistral-7B-v0.1)


**About Mistral-7B LLM**: [Click here](https://mistral.ai/news/announcing-mistral-7b/)

* Working with Mistral-7B model base model is not possible using free tier Colab. It might require 12-15 GBs of GPU RAM. So, we are using a Mistral-7B sharded model and quantizing it for inferencing in our free-tier colab.


* Sharding large models involves dividing them into smaller, self-contained pieces or shards to leverage parallel processing across devices, enhancing memory efficiency, and achieving faster inference times.

* Sharding is particularly advantageous for running extensive models on devices with limited memory, enabling distributed processing for scalability, and facilitating large-scale distributed systems with multiple GPUs.

* The 'Accelerate' library simplifies the sharding process, making it easier to implement distributed inference for improved computational resource utilization and reduced communication overhead.

## Install Required Libraries ✅

In [1]:
!pip install git+https://github.com/huggingface/transformers -q peft  accelerate bitsandbytes safetensors sentencepiece

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m174.7/174.7 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m265.7/265.7 kB[0m [31m9.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.6/92.6 MB[0m [31m6.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m30.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Building wheel for transformers (pyproject.toml) ... [?25l[?25hdone


In [2]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

## Define the Model Name w.r.t HuggingFace ✅

In [3]:
MODEL_NAME = "bn22/Mistral-7B-Instruct-v0.1-sharded"

## Define Tokenizer and Loading Quantized Model ✅

In [7]:
def load_quantized_model(model_name:str):
  """
  """
  bnb_config = BitsAndBytesConfig(
      load_in_4bit = True,
      bnb_4bit_use_double_quant=True,
      bnb_4bit_quant_type="nf4",
      bnb_4bit_compute_dtype=torch.bfloat16,
  )

  model = AutoModelForCausalLM.from_pretrained(
      model_name,
      load_in_4bit=True,
      torch_dtype = torch.bfloat16,
      quantization_config=bnb_config
  )

  return model

In [8]:
def get_tokenizer(model_name:str):
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  tokenizer.bos_token_id = 1  # Set beginning of sentence token id
  return tokenizer

In [9]:
model = load_quantized_model(MODEL_NAME)
tokenizer = get_tokenizer(MODEL_NAME)
stop_token_ids = [0]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

pytorch_model.bin.index.json:   0%|          | 0.00/23.9k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/11 [00:00<?, ?it/s]

pytorch_model_00001-of-00010.bin:   0%|          | 0.00/1.54G [00:00<?, ?B/s]

pytorch_model_00002-of-00010.bin:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

pytorch_model_00003-of-00010.bin:   0%|          | 0.00/1.31G [00:00<?, ?B/s]

pytorch_model_00004-of-00010.bin:   0%|          | 0.00/1.83G [00:00<?, ?B/s]

pytorch_model_00005-of-00010.bin:   0%|          | 0.00/1.35G [00:00<?, ?B/s]

pytorch_model_00006-of-00010.bin:   0%|          | 0.00/1.35G [00:00<?, ?B/s]

pytorch_model_00007-of-00010.bin:   0%|          | 0.00/1.54G [00:00<?, ?B/s]

pytorch_model_00008-of-00010.bin:   0%|          | 0.00/1.43G [00:00<?, ?B/s]

pytorch_model_00009-of-00010.bin:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

pytorch_model_00010-of-00010.bin:   0%|          | 0.00/1.33G [00:00<?, ?B/s]

pytorch_model_00011-of-00010.bin:   0%|          | 0.00/33.6M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/11 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/963 [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

## Inference ✅

In [10]:
text = "[INST] What are the ingredients for making Biriyani ? [/INST]"

encoded_text = tokenizer(text,
                         return_tensors="pt",
                         add_special_tokens=False)


model_input = encoded_text
print(model_input)
generated_ids = model.generate(**model_input, max_new_tokens=200, do_sample=True)
decoded = tokenizer.batch_decode(generated_ids)
print(decoded)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'input_ids': tensor([[  733, 16289, 28793,  1824,   460,   272, 13506,   354,  2492, 21562,
         11672,  4499,  1550,   733, 28748, 16289, 28793]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}




['[INST] What are the ingredients for making Biriyani ? [/INST] The ingredients for making Biriyani are:\n\n- Basmati rice\n- Meat (such as chicken, lamb, or beef) or vegetables (such as potatoes, carrots, and peas)\n- Ghee or oil\n- Spices such as cumin, coriander, turmeric, cinnamon, cardamom, and cloves\n- Yogurt\n- Onion\n- Garlic\n- Ginger\n- Tomato paste or canned tomatoes\n- Salt to taste\n- Fresh cilantro or coriander leaves for garnish\n\nThese are the basic ingredients for a simple Biriyani. The ingredients and methods may vary depending on the type of Biriyani you are making, such as chicken Biriyani, vegetable Biriyani, or fish Biriyani.</s>']


In [11]:
print(decoded[0])

[INST] What are the ingredients for making Biriyani ? [/INST] The ingredients for making Biriyani are:

- Basmati rice
- Meat (such as chicken, lamb, or beef) or vegetables (such as potatoes, carrots, and peas)
- Ghee or oil
- Spices such as cumin, coriander, turmeric, cinnamon, cardamom, and cloves
- Yogurt
- Onion
- Garlic
- Ginger
- Tomato paste or canned tomatoes
- Salt to taste
- Fresh cilantro or coriander leaves for garnish

These are the basic ingredients for a simple Biriyani. The ingredients and methods may vary depending on the type of Biriyani you are making, such as chicken Biriyani, vegetable Biriyani, or fish Biriyani.</s>
