  # **Fine-Tuning Mistral-8x7B**



This step installs necessary Python packages. Notably, it includes bitsandbytes, transformers (Hugging Face library), peft, accelerate, and other dependencies for various tasks like training, fine-tuning, and evaluation.

#**Goal:**


The primary objective of this project was to fine-tune the Mistral 7B language model on a custom dataset and deploy it using Gradio for real-time interaction. The fine-tuned model aimed to showcase improved performance on a specific task, while the deployment allowed users to interact with the model through a user-friendly interface.

In [1]:
!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q trl xformers wandb datasets einops gradio sentencepiece``

[0m

In [2]:
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig,HfArgumentParser,TrainingArguments,pipeline, logging, TextStreamer
from peft import LoraConfig, PeftModel, prepare_model_for_kbit_training, get_peft_model
import os,torch, wandb, platform, gradio, warnings
from datasets import load_dataset
from trl import SFTTrainer
from huggingface_hub import notebook_login

In [3]:
!git config --global credential.helper store
notebook_login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

## Load Dataset

In [4]:
!pip install datasets
from datasets import load_dataset


MATH_dataset = load_dataset("IntellectusAI/Company-law-2013")

[0m

Downloading data:   0%|          | 0.00/886k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/484 [00:00<?, ? examples/s]

In [5]:
print(MATH_dataset)

DatasetDict({
    train: Dataset({
        features: ['Section', 'Title', 'Content'],
        num_rows: 484
    })
})


## Data Splitting

In [6]:
# Assuming you have a "train" split
train_dataset = MATH_dataset["train"]

# Specify the desired number of samples
desired_samples = 10000

# Ensure that the desired number of samples is not greater than the total number of samples
desired_samples = min(desired_samples, len(train_dataset))

# Take the first desired_samples rows
selected_samples = train_dataset.select(list(range(desired_samples)))

# Specify the desired ratio for your train/test split (e.g., 80% train, 20% test)
train_ratio = 0.8

# Calculate the number of samples for the train split
num_samples_train = int(desired_samples * train_ratio)

# Create the train and test splits
train_split = selected_samples.select(list(range(num_samples_train)))
test_split = selected_samples.select(list(range(num_samples_train, desired_samples)))

# Now you have train_split and test_split with 10,000 rows


In [7]:
print(train_split,test_split)

Dataset({
    features: ['Section', 'Title', 'Content'],
    num_rows: 387
}) Dataset({
    features: ['Section', 'Title', 'Content'],
    num_rows: 97
})


## Create a Prompt

In [8]:
def create_prompt(sample):
    query = sample['Title']
    response = sample['Content']
    prompt = f"<s>[INST] {query} [/INST]\n"

    # Include the response from the 'response' column
    prompt += f"{response}</s>"

    return prompt

prompt_example = create_prompt(train_split[0])
print(prompt_example)

<s>[INST] Short Title, Extent, Commencement and Application. [/INST]
(1) This Act may be called the Companies Act, 2013.

(2) It extends to the whole of India.

(3) This section shall come into force at once and the remaining provisions of this Act shall come into force on such date as the Central Government may, by notification in the Official Gazette, appoint and different dates may be appointed for different provisions of this Act and any reference in any provision to the commencement of this Act shall be construed as a reference to the coming into force of that provision.

(4) The provisions of this Act shall apply to—
(a) companies incorporated under this Act or under any previous company law;
(b) insurance companies, except in so far as the said provisions are inconsistent with the provisions of the Insurance Act, 1938 (4 of 1938) or the Insurance Regulatory and Development Authority Act, 1999 (41 of 1999);
(c) banking companies, except in so far as the said provisions are incons

# **Load the base Mistral 7B model with quantization configurations.**


Quantization is a technique used to reduce the memory and computation requirements of a neural network model. It involves representing the model's weights and activations with fewer bits, typically lower-precision data types, such as 8-bit integers or even lower. This reduction in precision helps in compressing the model, making it more efficient for deployment on resource-constrained devices, including edge devices and mobile platforms. thats why we used Quantization.


The tokenizer corresponding to the Mistral 7B model is loaded to preprocess input text and prepare it for the model. Setting specific tokenizer parameters, such as padding_side, model_max_length, and enabling trust_remote_code, ensures consistency with the tokenization used during pre-training. Additionally, adjustments are made to include end-of-sequence (EOS) tokens by assigning eos_token to the pad_token and enabling the add_eos_token attribute. These configurations align the tokenizer with the model's expectations, facilitating accurate and consistent input processing during inference.

In [9]:
# Load base model(Mistral 7B)
bnb_config = BitsAndBytesConfig(
    load_in_4bit= True,
    bnb_4bit_quant_type= "nf4",
    bnb_4bit_compute_dtype= torch.bfloat16,
    bnb_4bit_use_double_quant= False,
)
model = AutoModelForCausalLM.from_pretrained(
   "mistralai/Mixtral-8x7B-Instruct-v0.1",
    quantization_config=bnb_config,
    device_map={"": 0}
)
model.config.use_cache = False
model.config.pretraining_tp = 1
model.gradient_checkpointing_enable()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(
    "mistralai/Mixtral-8x7B-Instruct-v0.1",
    padding_side="left",
     model_max_length=512,
    trust_remote_code=True)

tokenizer.pad_token = tokenizer.eos_token
tokenizer.add_eos_token = True
tokenizer.add_bos_token, tokenizer.add_eos_token


config.json:   0%|          | 0.00/720 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/92.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/19 [00:00<?, ?it/s]

model-00001-of-00019.safetensors:   0%|          | 0.00/4.89G [00:00<?, ?B/s]

model-00002-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00003-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00004-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00005-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00006-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00007-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00008-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00009-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00010-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00011-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00012-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00013-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00014-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00015-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00016-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00017-of-00019.safetensors:   0%|          | 0.00/4.90G [00:00<?, ?B/s]

model-00018-of-00019.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]

model-00019-of-00019.safetensors:   0%|          | 0.00/4.22G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/19 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

(True, True)

In [10]:
def generate_response(prompt):
    encoded_input = tokenizer.apply_chat_template(prompt, return_tensors="pt")
    # attention_mask = encoded_input['attention_mask']
    model_inputs = encoded_input.to('cuda')
    generated_ids = model.generate(model_inputs, max_new_tokens=10, do_sample=True)
    decoded = tokenizer.batch_decode(generated_ids)
    return decoded[0] 


In [12]:
messages = [
    {"role": "user", "content": "[INST]What is Definition of Related party?[/INST]"},
    {"role": "assistant", "content": "In accounting and finance, a related party is a person or entity that has a significant influence or control over another entity."},
    {"role": "user", "content": "[INST]If 24 out of every 60 individuals like football and out of those that like it, 50% play it, how many people would you expect play football out of a group of 250?, just give me one word answer in number[/INST]"}
]
response = generate_response(messages)
print(response)

The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


<s> [INST] [INST]What is Definition of Related party?[/INST] [/INST]In accounting and finance, a related party is a person or entity that has a significant influence or control over another entity.</s> [INST] [INST]If 24 out of every 60 individuals like football and out of those that like it, 50% play it, how many people would you expect play football out of a group of 250?, just give me one word answer in number[/INST] [/INST] 20

Here's the reasoning


# Prepare for K-Bit Training



Adapters are additional neural network components that can be fine-tuned to capture task-specific information without extensively modifying the pre-trained model. The prepare_model_for_kbit_training function readies the model for knowledge distillation, while the subsequent lines instantiate a LoraConfig object, specifying parameters such as adapter dimensions, dropout rates, and the target modules where adapters will be applied.

The get_peft_model function then integrates these adapters into the Mistral 7B model, allowing for the extraction and utilization of task-specific knowledge during the fine-tuning process. This facilitates the model's adaptability to the specific requirements of the downstream task.

In [13]:
#Adding the adapters in the layers
model = prepare_model_for_kbit_training(model)
peft_config = LoraConfig(
        r=16,
        lora_alpha=16,
        lora_dropout=0.05,
        bias="none",
        task_type="CAUSAL_LM",
        target_modules=["q_proj", "k_proj", "v_proj", "o_proj","gate_proj"]
    )
model = get_peft_model(model, peft_config)

# Monitor the Language Model

Initializing WandB (Weights and Biases) serves the purpose of monitoring and tracking the training process. WandB provides a platform for experiment tracking, visualization, and collaboration. By logging various metrics, parameters, and visualizations during the model training, it enables effective analysis and comparison of different experiments. In this specific context, the wandb.login and wandb.init functions authenticate the user, set up the project, and initialize a run for tracking the fine-tuning process of the Mistral 7B model. This integration with WandB enhances the reproducibility and visibility of the training procedure, facilitating collaboration and insights into the model's performance over time.

In [14]:
# Monitering the LLM
wandb.login(key = "32ce16ad2275199307e5dd6282169e5612852b7f")
run = wandb.init(project='Fine tuning mistral 8x7B', job_type="training", anonymous="allow")

Failed to detect the name of this notebook, you can set it manually with the WANDB_NOTEBOOK_NAME environment variable to enable code saving.
[34m[1mwandb[0m: W&B API key is configured. Use [1m`wandb login --relogin`[0m to force relogin
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33mshraddhasri9648[0m. Use [1m`wandb login --relogin`[0m to force relogin


# Training Configuration

 The code sets up training arguments, initializes a SFTTrainer for training the model, and saves the trained model. The model is then pushed to the Hugging Face Model Hub for easy sharing and retrieval.

The TrainingArguments encapsulates key parameters and configurations for the training process. In this instance, it specifies the output directory for storing results, sets the number of training epochs to 1, defines the batch size and gradient accumulation steps, schedules saving of checkpoints, logs training progress every 10 steps, and incorporates additional settings such as learning rate, weight decay, and gradient clipping. The use of mixed-precision training (fp16) for faster computations and integration with WandB for real-time monitoring and reporting adds further versatility to the training setup.

In [18]:
training_arguments = TrainingArguments(
    output_dir= "./results",
    num_train_epochs= 20,
    per_device_train_batch_size= 2,
    gradient_accumulation_steps= 8,
    save_steps= 1000,
    logging_steps= 10,
    learning_rate= 2e-4,
    weight_decay= 0.001,
    fp16= True,
    max_grad_norm= 0.3,
    max_steps= -1,
    warmup_ratio= 0.3,
    group_by_length= True,
    lr_scheduler_type= "constant",
    report_to="wandb",

)


The SFTTrainer is initialized with the specified parameters for training the model. It involves the Mistral 7B model, a maximum sequence length of 256 tokens, training and evaluation datasets, as well as the configuration for adapter-based knowledge integration (peft_config). Additionally, the trainer incorporates a formatting function (create_prompt) for generating input prompts, uses the defined tokenizer, and adheres to the training arguments set in training_arguments, which include key details such as batch size, gradient accumulation steps, and optimization settings. The optional usage of packing is employed to handle variable-length sequences efficiently during training.

In [19]:
trainer = SFTTrainer(
    model=model,
    max_seq_length = 1024,
    train_dataset=train_split,
    eval_dataset=test_split,
    peft_config=peft_config,
    formatting_func=create_prompt,
    # callbacks=[early_stopping],
    # dataset_text_field="text",
    tokenizer=tokenizer,
    args=training_arguments,
    packing= True)


Generating train split: 0 examples [00:00, ? examples/s]

Generating train split: 0 examples [00:00, ? examples/s]



# Start the training process.

In [20]:
trainer.train()



Step,Training Loss
10,0.6523
20,0.582
30,0.5125
40,0.476
50,0.4598
60,0.4157
70,0.3616
80,0.3402
90,0.2889
100,0.2692


TrainOutput(global_step=220, training_loss=0.2685711795633489, metrics={'train_runtime': 8301.3018, 'train_samples_per_second': 0.429, 'train_steps_per_second': 0.027, 'total_flos': 1.007495821614121e+18, 'train_loss': 0.2685711795633489, 'epoch': 19.775280898876403})

#**Save the fine-tuned model and push it to the Hugging Face Model Hub.**


In [21]:
trainer.save_model("mixtral_8x7B_law")

In [22]:
model.push_to_hub("IntellectusAI/mixtral_8x7B_law")

adapter_model.safetensors:   0%|          | 0.00/54.6M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/IntellectusAI/mixtral_8x7B_law/commit/dd52f31c44af1d813d08e40b0de1e22081d22df3', commit_message='Upload model', commit_description='', oid='dd52f31c44af1d813d08e40b0de1e22081d22df3', pr_url=None, pr_revision=None, pr_num=None)

# Conclusion:

The project effectively demonstrated the fine-tuning and deployment of the Mistral 7B model, showcasing its adaptability to specific tasks and providing a user-friendly interface for real-world applications. The integration of quantization, knowledge adapters, and monitoring tools contributed to a robust and efficient workflow. Ongoing improvements and user feedback will guide future iterations of the project.