### Mount Google Drive
This cell mounts your Google Drive to the Colab environment, allowing you to access and save files directly to your Drive.

In [2]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


### Navigate to Project Folder
Switches the active directory to the `llama2` folder within Google Drive for centralized file management.


In [3]:
%cd /content/drive/MyDrive/llama2

/content/drive/MyDrive/llama2


### Maintain Colab Connection
Executes a JavaScript snippet to periodically click the "Connect" button, ensuring that the Colab session remains active during long-running tasks.


In [4]:
import IPython
from google.colab import output

display(IPython.display.Javascript('''
 function ClickConnect(){
   btn = document.querySelector("colab-connect-button")
   if (btn != null){
     console.log("Click colab-connect-button");
     btn.click()
     }

   btn = document.getElementById('ok')
   if (btn != null){
     console.log("Click reconnect");
     btn.click()
     }
  }

setInterval(ClickConnect,60000)
'''))

print("Done.")

<IPython.core.display.Javascript object>

Done.


### GPU Information
Displays the specifications of the allocated GPU, in this case, a Tesla T4. Includes details like memory usage and temperature, aiding in resource monitoring.


In [5]:
!nvidia-smi

Mon Oct  9 05:50:20 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
| N/A   53C    P8    10W /  70W |      0MiB / 15360MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

# Finetune Llama-2-7b on a Google colab
Welcome to this Google Colab notebook that shows how to fine-tune the recent Llama-2-7b model on a single Google colab and turn it into a chatbot

We will leverage PEFT library from Hugging Face ecosystem, as well as QLoRA for more memory efficient finetuning

# Setup
Run the cells below to setup and install the required libraries. For our experiment we will need `accelerate`, `peft`, `transformers`, `datasets` and `TRL` to leverage the recent `SFTTrainer`. We will use `bitsandbytes` to [quantize the base model into 4bit](https://huggingface.co/blog/4bit-transformers-bitsandbytes). We will also install einops as it is a requirement to load Falcon models.

In [6]:
!pip install -q -U trl transformers accelerate git+https://github.com/huggingface/peft.git
!pip install -q datasets bitsandbytes einops wandb

  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m118.0/118.0 kB[0m [31m3.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m7.7/7.7 MB[0m [31m26.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m258.1/258.1 kB[0m [31m29.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m519.6/519.6 kB[0m [31m33.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m295.0/295.0 kB[0m [31m34.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3.8/3.8 MB[0m [31m53.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.3/1.3 MB[0m [31m53.4 MB/s[0m eta [36m

# Loading the Original Model

### Model Specification
Defines the model to be used for fine-tuning as `TinyPixel/Llama-2-7B-bf16-sharded`, a 7-billion parameter variant of LLaMA 2 with bf16 quantization and sharding for efficiency.

In [7]:
model_name = "TinyPixel/Llama-2-7B-bf16-sharded"

### Model Initialization and Configuration
Imports the required modules and sets up the BitsAndBytesConfig for 4-bit quantization. This technique reduces the number of bits used to represent the weights, making the model more memory-efficient and faster at the cost of some precision. It's particularly important here for handling a large model like LLaMA 2 within the resource constraints of Google Colab. Initializes the specified model with these settings and disables caching for dynamic computations.


In [8]:
from transformers import AutoModelForCausalLM, BitsAndBytesConfig#, accelerate
import torch

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    trust_remote_code=True
)
model.config.use_cache = False

Downloading (…)lve/main/config.json:   0%|          | 0.00/626 [00:00<?, ?B/s]

Downloading (…)model.bin.index.json:   0%|          | 0.00/26.8k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/14 [00:00<?, ?it/s]

Downloading (…)l-00001-of-00014.bin:   0%|          | 0.00/981M [00:00<?, ?B/s]

Downloading (…)l-00002-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00003-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00004-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00005-of-00014.bin:   0%|          | 0.00/944M [00:00<?, ?B/s]

Downloading (…)l-00006-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00007-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00008-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00009-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00010-of-00014.bin:   0%|          | 0.00/944M [00:00<?, ?B/s]

Downloading (…)l-00011-of-00014.bin:   0%|          | 0.00/990M [00:00<?, ?B/s]

Downloading (…)l-00012-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00013-of-00014.bin:   0%|          | 0.00/967M [00:00<?, ?B/s]

Downloading (…)l-00014-of-00014.bin:   0%|          | 0.00/847M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/14 [00:00<?, ?it/s]

Downloading (…)neration_config.json:   0%|          | 0.00/132 [00:00<?, ?B/s]

In [9]:
text = """
Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Analyze and explain the legal reasoning behind the judgment in the given case.

### Input:
Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278

### Response:"""
print (text)


Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Analyze and explain the legal reasoning behind the judgment in the given case.

### Input:
Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278

### Response:


In [10]:
Correct_response = """

### Response:The Supreme Court in this case applied a broad interpretation of the term 'State' under Article 12 of the Constitution.
The court reasoned that a government company undertaking public functions qualifies as 'State' based on factors like government control, public importance of activities etc.
This interpretation was based on previous decisions that have defined 'State' under Article 12 broadly to include various agencies and instrumentalities beyond just statutory bodies.
The court also applied the principle that unreasonable and arbitrary contractual terms can be struck down under Article 14 of the Constitution.
The court found that Rule 9(i) of the service rules, which allowed for termination of service without reason, conferred unfettered power to terminate employment without hearing.
This was deemed arbitrary and violative of principles of natural justice and right to equality under Article 14.
Furthermore, the court held that the right to life and livelihood under Article 21 is affected by arbitrary termination of employment. The court reasoned that the right to livelihood is an integral part of the right to life, and any arbitrary action that affects a person's livelihood would be a violation of Article 21.In conclusion, the court's legal reasoning was based on a broad interpretation of the term 'State', the application of the principle of equality and natural justice under Article 14, and the protection of the right to life and livelihood under Article 21."""

### Tokenization
The code initializes the tokenizer, a crucial step for converting human-readable text into a numerical format that the model can process. Tokenization is essential for both training and generating responses. Padding is set to align sequences using


In [11]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

Downloading (…)okenizer_config.json:   0%|          | 0.00/676 [00:00<?, ?B/s]

Downloading tokenizer.model:   0%|          | 0.00/500k [00:00<?, ?B/s]

Downloading (…)/main/tokenizer.json:   0%|          | 0.00/1.84M [00:00<?, ?B/s]

Downloading (…)cial_tokens_map.json:   0%|          | 0.00/411 [00:00<?, ?B/s]

### Setting Compute Device
Specifies that the model will use the first available CUDA-enabled GPU (`cuda:0`) for computations.


In [12]:
device = "cuda:0"

### Generating Model Output
Tokenizes the input text and feeds it into the model to generate a response. The generated output is then decoded to produce human-readable text, serving as a quick test of the model's capabilities.


In [13]:
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))




Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Analyze and explain the legal reasoning behind the judgment in the given case.

### Input:
Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278

### Response:
The case of Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278, is a landmark case in the field of contract law in India. The case involved a dispute between the Central Inland Water Transport Corporation Ltd. (CIWT) and Brojo Nath Ganguly & Anr. (BNG) over the payment of freight charges for the transportation of goods by water.

The CIWT had contracted with BNG to transport goods from Calcutta to Dhaka, Bangladesh. The contract provided that the freight charges would be paid in Indian rupees, and that the payment would be made within 30 days

# Dataset

### Hugging Face Login
Authenticates the user with Hugging Face, enabling access to private datasets or models stored on the Hugging Face Hub.


In [14]:
from huggingface_hub import login
login()

### Dataset Loading
Loads the specified dataset, which contains articles and instructions related to the Indian Constitution. This dataset is used for training the model to be capable of providing legal reasoning and responses based on the Indian Constitution.


In [15]:
from datasets import load_dataset

dataset_name = 'nisaar/Articles_Constitution_3300_Instruction_Set'
dataset = load_dataset(dataset_name, split="train")

Downloading readme:   0%|          | 0.00/1.54k [00:00<?, ?B/s]

Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/8.01M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

In [16]:
dataset[1]

{'instruction': 'Identify and summarize the key legal issues in the provided case.',
 'input': 'Case Citation: Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278. The case revolves around the termination of employment of the respondents by the appellant without assigning any reason by invoking Rule 9(i) of the service rules. The respondents challenged the termination orders and the validity of Rule 9(i) in the High Court under Article 226. The Division Bench of the High Court struck down Rule 9(i) as arbitrary and violative of Article 14. The appellant filed appeals in the Supreme Court against the High Court judgment.',
 'output': "The key legal issues in the case Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr. are as follows: 1. The first issue is whether a government company falls under the definition of 'State' as per Article 12 of the Indian Constitution. This is important as the constitutional

### Dataset Overview
 The dataset contains features like 'instruction', 'input', 'output', and 'prompt', and has a total of 3311 rows.


In [17]:
print(dataset)

Dataset({
    features: ['instruction', 'input', 'output', 'prompt'],
    num_rows: 3311
})


### Dataset Preprocessing
Transforms the dataset by combining the 'prompt' and 'output' fields into a single 'text' field. This preprocessing step simplifies the dataset structure for easier handling during model training.


In [18]:
from datasets import Dataset

# Assuming `dataset` is your Dataset object
dataset = dataset.map(lambda example: {'text': example['prompt'] + example['output']})

Map:   0%|          | 0/3311 [00:00<?, ? examples/s]

### Dataset Augmentation
The dataset now includes an additional 'text' feature, created by merging the 'prompt' and 'output' fields. This augmented dataset is geared for streamlined model training.


In [19]:
dataset

Dataset({
    features: ['instruction', 'input', 'output', 'prompt', 'text'],
    num_rows: 3311
})

# Fine-Tuning the Model

### LoRA Configuration

Here, we set up the Low-Rank Adaptation (LoRA) parameters for fine-tuning. LoRA allows us to adapt the model for specific tasks efficiently.

- `lora_alpha`: Rank of the low-rank matrices (16)
- `lora_dropout`: Dropout rate for LoRA layers (0.1)
- `lora_r`: Compression factor for the original dimensions (64)


In [21]:
from peft import LoraConfig, get_peft_model

lora_alpha = 16
lora_dropout = 0.1
lora_r = 64

## Configure PEFT with LoRA
Here we set up the Parameter-Efficient Fine-Tuning (PEFT) configuration using Low-Rank Adaptation (LoRA). This allows us to fine-tune the model efficiently by adapting only a small number of parameters.


In [22]:
peft_config = LoraConfig(
    lora_alpha=lora_alpha,
    lora_dropout=lora_dropout,
    r=lora_r,
    bias="none",
    task_type="CAUSAL_LM"
)

## Training Configuration
Sets up various hyperparameters and configurations for training the model. This includes specifying the output directory, batch size, optimizer type, learning rate, and other training-related settings.


In [23]:
from transformers import TrainingArguments

In [24]:
output_dir = "./results"
per_device_train_batch_size = 1
gradient_accumulation_steps = 2
optim = "paged_adamw_32bit"
save_steps = 1
num_train_epochs = 4
logging_steps = 1
learning_rate = 2e-4
max_grad_norm = 0.3
max_steps = 20
warmup_ratio = 0.03
lr_scheduler_type = "linear"

### Initialize Training Arguments
Creates a `TrainingArguments` object to consolidate all the training configurations and hyperparameters. This object will be passed to the trainer for model fine-tuning.


In [25]:
training_arguments = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=per_device_train_batch_size,
    gradient_accumulation_steps=gradient_accumulation_steps,
    optim=optim,
    num_train_epochs=num_train_epochs,
    save_steps=save_steps,
    logging_steps=logging_steps,
    learning_rate=learning_rate,
    fp16=True,
    max_grad_norm=max_grad_norm,
    max_steps=max_steps,
    warmup_ratio=warmup_ratio,
    group_by_length=True,
    lr_scheduler_type=lr_scheduler_type,
)

In [26]:
from trl import SFTTrainer

### Maximum Sequence Length
Sets the maximum sequence length to 2048 tokens. This parameter limits the length of the input sequences for both training and inference.


In [27]:
max_seq_length = 2048

Initialize the Supervised Fine-tuning Trainer (SFTTrainer) to manage the fine-tuning process. This trainer takes in the model, dataset, and various configurations to streamline the training.


In [28]:
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=peft_config,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    tokenizer=tokenizer,
    args=training_arguments,
)



Map:   0%|          | 0/3311 [00:00<?, ? examples/s]



### Adjusting Normalization Layers
Converts the data type of all normalization layers in the model to 32-bit floating point for better numerical stability.


In [29]:
for name, module in trainer.model.named_modules():
    if "norm" in name:
        module = module.to(torch.float32)

### Model Training
Invokes the `train()` method on the SFTTrainer object to kick off the fine-tuning process. This step will use all the configurations and datasets previously set up. Log in to `wandb`.


In [30]:
trainer.train()

<IPython.core.display.Javascript object>

[34m[1mwandb[0m: Logging into wandb.ai. (Learn how to deploy a W&B server locally: https://wandb.me/wandb-server)
[34m[1mwandb[0m: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter, or press ctrl+c to quit:

 ··········


[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc


You're using a LlamaTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.


Step,Training Loss
1,1.0806
2,1.0844
3,1.2259
4,1.2442
5,1.033
6,1.1301
7,1.2449
8,1.1173
9,1.0158
10,1.1708


TrainOutput(global_step=20, training_loss=1.1421822309494019, metrics={'train_runtime': 542.5547, 'train_samples_per_second': 0.074, 'train_steps_per_second': 0.037, 'total_flos': 1090687813607424.0, 'train_loss': 1.1421822309494019, 'epoch': 0.01})

### Saving the Trained Model
Stores the fine-tuned model in the 'outputs' directory for future use or deployment.


In [31]:
model_to_save = trainer.model.module if hasattr(trainer.model, 'module') else trainer.model  # Take care of distributed/parallel training
model_to_save.save_pretrained("outputs")

### Load LoRA Configuration and Update Model
Retrieves the saved LoRA configuration from the 'outputs' directory and updates the model with this configuration.


In [33]:
lora_config = LoraConfig.from_pretrained('outputs')
model = get_peft_model(model, lora_config)

### Push Model to Hugging Face Hub
for pushing the fine-tuned model to the Hugging Face Model Hub.

In [None]:
#model.push_to_hub("mahmoud/Llama2_Finetuned_Articles_Constitution_3300_Instruction_Set",create_pr=1)

In [36]:
inputs = tokenizer(text, return_tensors="pt").to(device)
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))




Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.

### Instruction:
Analyze and explain the legal reasoning behind the judgment in the given case.

### Input:
Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278

### Response:
The case of Central Inland Water Transport Corporation Ltd. vs Brojo Nath Ganguly & Anr., 1986 AIR 1571, 1986 SCR (2) 278, is a landmark case in the field of contract law in India. The case involved a dispute between the Central Inland Water Transport Corporation Ltd. (CIWT) and Brojo Nath Ganguly & Anr. (BNG) over the payment of freight charges for the transportation of goods by water.

The CIWT had contracted with BNG to transport goods from Calcutta to Dhaka, Bangladesh. The contract provided that the freight charges would be paid in Indian rupees, and that the payment would be made within 30 days