<a href="https://colab.research.google.com/github/AdityaShirke8005/Fine_tuning_Llama_2_7b-Material_recommendation_for_road_construction_on_custom_dataset/blob/main/Fine_tuning_Llama_2_7b_on_a_custom_dataset_for_Material_recommendation_and_optimization_for_road_construction.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tuning Llama-2-7b on a **custom dataset** for **Material recommendation and optimization for road construction**

We will leverage PEFT library from Hugging Face ecosystem, as well as QLoRA for more memory efficient finetuning.



## Setup

Run the cells below to setup and install the required libraries. For our experiment we will need `accelerate`, `peft`, `transformers`, `datasets` and TRL to leverage the recent [`SFTTrainer`](https://huggingface.co/docs/trl/main/en/sft_trainer). We will use `bitsandbytes` to [quantize the base model into 4bit](https://huggingface.co/blog/4bit-transformers-bitsandbytes). We will also install `einops` as it is a requirement to load Falcon models.

In [1]:
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.4/77.4 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.4/7.4 MB[0m [31m62.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m244.2/244.2 kB[0m [31m25.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m486.2/486.2 kB[0m [31m42.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m268.8/268.8 kB[0m [31m25.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.8/7.8 MB[0m [31m114.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m70.7 MB/s[0m eta [36m

The !pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git command installs the following packages:

- **trl**: This package stands for "Transformer Reinforcement Learning" and provides tools for using reinforcement learning with transformer models.
- **transformers**: This is the popular "Hugging Face Transformers" library, which contains pre-trained transformer models and utilities for working with them.
- **accelerate**: This package helps in optimizing the training process on NVIDIA GPUs.
- git+https://github.com/huggingface/peft.git: This installs a specific version of the **PEFT (Pipeline Efficient Fine-Tuning)** package from the Hugging Face GitHub repository.

The !pip install -q datasets bitsandbytes einops wandb command installs the following packages:

- **datasets**: This package provides easy access to various datasets for natural language processing tasks and other machine learning tasks.
- **bitsandbytes**: This package likely contains custom functions or utilities that are specific to the fine-tuning task.
- **einops**: This package is used for manipulating tensors in a flexible and concise way.
- **wandb**: This stands for "Weights and Biases" and is a tool used for tracking and visualizing the machine learning training process.

## Dataset

In [2]:
from datasets import load_dataset

# Material recommendation and optimization for road construction
dataset = load_dataset("json", data_files="Data.json")

Downloading and preparing dataset json/default to /root/.cache/huggingface/datasets/json/default-d34fc9fc3598f73d/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/json/default-d34fc9fc3598f73d/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96. Subsequent calls will reuse this data.


  0%|          | 0/1 [00:00<?, ?it/s]

In [3]:
dataset

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 105
    })
})

### Custom Dataset Summary

- The custom-made dataset consists of 105 rows of diverse data related to material recommendation and optimization in road construction projects in India. Each row represents a dialogue between a human (presumably someone involved in the road construction project) and the assistant, discussing different aspects of material selection and optimization.

- The dataset covers various scenarios and considerations, such as durability, cost-effectiveness, noise reduction, and sustainability. Each dialogue provides examples of different road construction segments and the materials recommended for each scenario. The dataset also mentions specific materials like concrete, asphalt, bitumen, and recycled rubber, as well as techniques like warm mix asphalt (WMA) that align with the goal of reducing carbon emissions and promoting sustainability.

- The dataset is valuable for fine-tuning Llama 2 7B because it contains real-world examples and recommendations, which can help the model learn patterns and correlations between different road construction scenarios and suitable materials. By training the model on this dataset, it can become more adept at understanding the specific requirements of various road construction projects and providing accurate, context-aware material recommendations based on factors like budget constraints, environmental concerns, and functional needs.

- With a diverse dataset like this, the fine-tuned model can better assist engineers and stakeholders in making informed decisions about material selection and optimization for road construction projects in India. It will contribute to improved road design, cost efficiency, environmental sustainability, and overall project success by providing more relevant and tailored material recommendations.

## Loading the model

In [4]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, AutoTokenizer

model_name = "TinyPixel/Llama-2-7B-bf16-sharded"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)
model.config.use_cache = False

Downloading (…)lve/main/config.json:   0%|          | 0.00/626 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/14 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00014.bin:   0%|          | 0.00/981M [00:00<?, ?B/s]

Downloading (…)l-00002-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00003-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00004-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00005-of-00014.bin:   0%|          | 0.00/944M [00:00<?, ?B/s]

Downloading (…)l-00006-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00007-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00008-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00009-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00010-of-00014.bin:   0%|          | 0.00/944M [00:00<?, ?B/s]

Downloading (…)l-00011-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00012-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00013-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00014-of-00014.bin:   0%|          | 0.00/847M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

#### Importing Libraries:

- **torch**: The PyTorch library used for deep learning computations.
AutoModelForCausalLM: A class from the Transformers library that loads a pre-trained causal language model.
- **AutoTokenizer**: A class from the Transformers library that loads the appropriate tokenizer for a pre-trained model.
- **BitsAndBytesConfig**: A custom configuration class used for quantization.

#### Defining the Model and Tokenizer Names:

- **model_name**: A string representing the name of the pre-trained model to be used ("TinyPixel/Llama-2-7B-bf16-sharded").
Creating a BitsAndBytesConfig:

- **bnb_config**: An instance of the BitsAndBytesConfig class, which defines the quantization settings for the model. It enables 4-bit quantization with a "nf4" quantization type and sets the computation data type to torch.float16 (half-precision).

#### Loading the Pre-trained Model:

- **model = AutoModelForCausalLM.from_pretrained(model_name, quantization_config=bnb_config, trust_remote_code=True)**: This line loads the pre-trained model "Llama-2-7B-bf16-sharded" using the - AutoModelForCausalLM class. The quantization configuration (bnb_config) is applied during the model loading process, enabling 4-bit quantization. The trust_remote_code=True parameter allows the loading of the model's custom quantization code from the Hugging Face model hub.

####Disabling Caching:

- **model.config.use_cache = False**: This line disables the caching mechanism in the model. When caching is disabled, the model will not store the hidden states of previous predictions, which might be useful when memory constraints are a concern.

##### Let's also load the tokenizer below

In [5]:
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Downloading (…)okenizer_config.json:   0%|          | 0.00/676 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

In [6]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM"
)

## Loading the trainer

Here we will use the [`SFTTrainer` from TRL library](https://huggingface.co/docs/trl/main/en/sft_trainer) that gives a wrapper around transformers `Trainer` to easily fine-tune models on instruction based datasets using PEFT adapters. Let's first load the training arguments below.

In [7]:
from transformers import TrainingArguments

output_dir = "./results"
per_device_train_batch_size = 4
gradient_accumulation_steps = 4
optim = "paged_adamw_32bit"
save_steps = 100
logging_steps = 10
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 100
warmup_ratio = 0.03
lr_scheduler_type = "constant"

training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)

#### Then finally pass everthing to the trainer

In [12]:
from trl import SFTTrainer

max_seq_length = 512

trainer = SFTTrainer(
    model=model,
    train_dataset=dataset["train"],  # Use dataset["train"] to get the 'train' split.
    peft_config=peft_config,
    dataset_text_field="text",  # Set this to "text" since it is the feature name in your dataset.
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)

Map:   0%|          | 0/105 [00:00<?, ? examples/s]

We will also pre-process the model by upcasting the layer norms in float 32 for more stable training

In [13]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

## Train the model

Now let's train the model! Simply call `trainer.train()`

In [14]:
trainer.train()

[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
10,0.3477
20,2.3465
30,419.1957
40,2.0589
50,212.9809
60,1.532
70,132.3012
80,1.3054
90,63.4025
100,1.2406


TrainOutput(global_step=100, training_loss=83.67113718748092, metrics={'train_runtime': 968.4361, 'train_samples_per_second': 1.652, 'train_steps_per_second': 0.103, 'total_flos': 5137460017643520.0, 'train_loss': 83.67113718748092, 'epoch': 14.81})

In [15]:
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model  # Take care of distributed/parallel training
model_to_save.save_pretrained("outputs")

In [16]:
lora_config = LoraConfig.from_pretrained('outputs')
model = get_peft_model(model, lora_config)

In [22]:
text = "Explain different ways to construct roads.### Assistant:"
device = "cuda:0"

inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=400)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Explain different ways to construct roads.### Assistant:
Explain different ways to construct roads.
Assistant: Explain different ways to construct roads.
Road construction is the process of an improving of roads, which includes laying asphalt, concrete, or gravel on the ground.
Road construction is the process of an improving of roads, which includes laying asphalt, concrete, or gravel on the ground. Road construction is important because it improves the safety of the road and makes it easier to drive on.
There are many different ways to construct roads. The most common way is to use asphalt or concrete. Asphalt is a black, sticky substance that is used to make roads. Concrete is a white, hard substance that is used to make roads.
Another way to construct roads is to use gravel. Gravel is a small, round stone that is used to make roads. Gravel is cheaper than asphalt or concrete, but it is not as durable.
The most important thing to remember when constructing roads is to make sure that

### Output Generated by fine-tuned model

Road construction is the process of an improving of roads, which includes laying asphalt, concrete, or gravel on the ground. Road construction is important because it improves the safety of the road and makes it easier to drive on.

There are many different ways to construct roads. The most common way is to use asphalt or concrete.

- Asphalt is a black, sticky substance that is used to make roads. Concrete is a white, hard substance that is used to make roads.

- Another way to construct roads is to use gravel. Gravel is a small, round stone that is used to make roads. Gravel is cheaper than asphalt or concrete, but it is not as durable.

In [23]:
from huggingface_hub import login
login()

VBox(children=(HTML(value='<center> <img\nsrc=https://huggingface.co/front/assets/huggingface_logo-noborder.sv…

In [24]:
model.push_to_hub("llama2-qlora-finetunined-Material-recommendation-and-optimization-for-road-construction")

adapter_model.bin:   0%|          | 0.00/134M [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/Aditya8005/llama2-qlora-finetunined-Material-recommendation-and-optimization-for-road-construction/commit/61677873ad024c5f19ae6f95a4f0075a494139b0', commit_message='Upload model', commit_description='', oid='61677873ad024c5f19ae6f95a4f0075a494139b0', pr_url=None, pr_revision=None, pr_num=None)