# Installation and model loading

In [1]:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

In [2]:
from google.colab import userdata
from huggingface_hub import login
login(token=userdata.get('HF_TOKEN1'))


In [3]:
from unsloth import FastLanguageModel


max_seq_length = 2048
dtype = None
load_in_4bit = True


model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "deepseek-ai/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length = max_seq_length,
    dtype = dtype,
    load_in_4bit = load_in_4bit,
    token = userdata.get('HF_TOKEN1'), 
)



🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
🦥 Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.18: Fast Llama patching. Transformers: 4.49.0.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/5.96G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/236 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/53.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/483 [00:00<?, ?B/s]

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
    ],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth", 
    random_state=3407,
    use_rslora=False,
    loftq_config=None,


)

Unsloth 2025.3.18 patched 32 layers with 32 QKV layers, 32 O layers and 32 MLP layers.


# Loading & Formatting the Dataset

In [5]:
train_prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a data analysis expert with advanced knowledge in data processing, structuring, and extracting meaningful insights from CSV files.
Please analyze the given CSV file and provide a detailed response.

### Question:
{}

### Response:
<think>
{}
</think>
{}"""

In [6]:
EOS_TOKEN = tokenizer.eos_token  

def formatting_prompts_func(examples):
    inputs = examples["Question"]
    cots = examples["Complex_CoT"]
    outputs = examples["Response"]
    texts = []
    for input, cot, output in zip(inputs, cots, outputs):
        text = train_prompt_style.format(input, cot, output) + EOS_TOKEN
        texts.append(text)
    return {
        "text": texts,
    }



In [7]:
import pandas as pd
from datasets import Dataset
from google.colab import files

uploaded = files.upload()

Saving LAW2.csv to LAW2.csv


In [10]:
import pandas as pd
from datasets import Dataset
from google.colab import files

Upload the CSV file
uploaded = files.upload()

# Get the uploaded file name
csv_file_path = list(uploaded.keys())[0]

# Load the CSV file into a DataFrame
df = pd.read_csv(csv_file_path)

# Print the column names to verify them
print("Column Names:", df.columns)

# Ensure the correct column names are used
page_col = "Page Number" if "Page Number" in df.columns else df.columns[0]  # Use first column if unsure
text_col = "Text" if "Text" in df.columns else df.columns[1]  # Use second column if unsure

# Define the formatting function
def formatting_prompts_func(examples):
    formatted_examples = {
        "Question": [f"What does the law on page {pn} state?" for pn in examples[page_col]],
        "Answer": examples[text_col],
    }
    return formatted_examples

# Convert the DataFrame to a Hugging Face Dataset
dataset = Dataset.from_pandas(df)

# Map the formatting function to the dataset
dataset = dataset.map(formatting_prompts_func, batched=True)

# Display the first few rows of the dataset
print(dataset)




Saving LAW2.csv to LAW2 (2).csv
Column Names: Index(['Question', 'Answer'], dtype='object')


Map:   0%|          | 0/261 [00:00<?, ? examples/s]

Dataset({
    features: ['Question', 'Answer'],
    num_rows: 261
})


In [11]:
# Print column names to debug the issue
print("Column Names:", df.columns)


Column Names: Index(['Question', 'Answer'], dtype='object')


In [12]:
print(dataset["Answer"][0])


From:
Published
Updated:Immigration Rules
Home Ofﬁce
( / g o v e r n m e n t / o r g a n i s a t i o n s / h o m e - o f ﬁ c e )
25 February 2016
12 March 2025 - See all updates
Immigration Rules part 1: leave to enter or
stay in the UK
General provisions regarding entry
clearance, leave to enter or remain in
the United Kingdom (paragraphs 7 to
39E).
Leave to enter the United Kingdom
7.A person who is neither a British citizen nor a
Commonwealth citizen with the right of abode nor a
person who is entitled to enter or remain in the
United Kingdom by virtue of section 3ZA of the
Immigration Act 1971 requires leave to enter the
United Kingdom.
8.Under Sections 3 and 4 of the Immigration Act
1971 an Immigration Ofﬁcer when admitting to the
United Kingdom a person subject to immigration
control under that Act may give leave to enter for a
limited period and, if he does, may impose all or
any of the following conditions:Back to contents15/03/2025, 15:26 Immigration Rules - Immigration Rules 

# Setting Training Arguments

In [13]:
from trl import SFTTrainer
from transformers import TrainingArguments
from unsloth import is_bfloat16_supported

# Define a function to format the dataset for training
def formatting_func(examples):
    return {
        "input_ids": tokenizer(examples["Question"], truncation=True, padding="max_length", max_length=max_seq_length)["input_ids"],
        "labels": tokenizer(examples["Answer"], truncation=True, padding="max_length", max_length=max_seq_length)["input_ids"],
    }

# Now initialize the trainer with the formatting function
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="Answer",  
    max_seq_length=max_seq_length,
    dataset_num_proc=2,
    formatting_func=formatting_func,  
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=10,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",  
    ),
)


Unsloth: Tokenizing ["Answer"] (num_proc=2):   0%|          | 0/261 [00:00<?, ? examples/s]

# Train

In [14]:
trainer_stats = trainer.train()

==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 261 | Num Epochs = 2 | Total steps = 60
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 41,943,040/8,000,000,000 (0.52% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
10,2.105
20,1.5019
30,1.2477
40,1.1115
50,0.8192
60,0.7408


# Testing the model

In [15]:
prompt_style = """Below is an instruction that describes a task, paired with an input that provides further context.
Write a response that appropriately completes the request.
Before answering, think carefully about the question and create a step-by-step chain of thoughts to ensure a logical and accurate response.

### Instruction:
You are a legal expert with advanced knowledge in legal reasoning, document analysis, and structured data interpretation.
Please analyze the following legal text and provide an accurate response.

### Question:
{}

### Response:
<think>{}
"""

In [17]:
from unsloth import FastLanguageModel
import torch

# Example legal question extracted from CSV
question = "For what topic does the third page talks?"
# Format the input using the CSV-adapted prompt
formatted_input = prompt_style.format(question, "")

# Enable fast inference with Unsloth
FastLanguageModel.for_inference(model)  # Unsloth enables 2x faster inference!

# Tokenize input and send it to GPU
inputs = tokenizer([formatted_input], return_tensors="pt").to("cuda")

# Generate output from model
outputs = model.generate(
    input_ids=inputs.input_ids,
    attention_mask=inputs.attention_mask,
    max_new_tokens=1200,
    use_cache=True,
)

# Decode and extract the response
response = tokenizer.batch_decode(outputs)
print(response[0].split("### Response:")[1])





<think>
Alright, so I'm trying to figure out what the third page of the document is about. The user mentioned that the document is a legal text, so I should approach this with a legal perspective. Let me break it down step by step.

First, I need to understand the structure of the document. It's divided into three pages. The first page talks about the fundamental principles of EU competition law, which includes the prohibition of abuse of dominance, cartels, and anti-competitive agreements, as well as the relevant EU regulations like Articles 3 and 101 of the Treaty on the Functioning of the European Union (TFEU). It also mentions the powers of competition authorities to impose fines and other penalties.

The second page discusses the concept of dominance in the context of abuse of dominance. It covers market definition, market shares, and how dominance is assessed, including examples of abuse such as price manipulation, refusal to supply, and tying arrangements.

The third page is wh

# Saving to float16 for VLLM

We also support saving to `float16` directly. Select `merged_16bit` for float16 or `merged_4bit` for int4. We also allow `lora` adapters as a fallback. Use `push_to_hub_merged` to upload to your Hugging Face account! You can go to https://huggingface.co/settings/tokens for your personal tokens.

In [None]:
# Merge to 16bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_16bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_16bit", token = "")

# Merge to 4bit
if False: model.save_pretrained_merged("model", tokenizer, save_method = "merged_4bit",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "merged_4bit", token = "")

# Just LoRA adapters
if False: model.save_pretrained_merged("model", tokenizer, save_method = "lora",)
if False: model.push_to_hub_merged("hf/model", tokenizer, save_method = "lora", token = "")