# Fine-tuning Llama 3 with Custom CSV Data

This notebook demonstrates how to fine-tune Llama 3 models using custom CSV data. We'll cover:
1. Setting up the environment
2. Loading and formatting our CSV data
3. Creating a properly formatted dataset for Llama 3
4. Fine-tuning with LoRA (Low-Rank Adaptation)
5. Testing the fine-tuned model

This approach is simpler than the standard Unsloth tutorial, focusing specifically on custom CSV data processing.

## 1. Setting Up the Environment

First, let's install the necessary libraries. We're using Unsloth which provides optimized training for Llama models.

In [1]:
!pip install --no-deps bitsandbytes accelerate xformers==0.0.29.post3 peft trl triton cut_cross_entropy unsloth_zoo
!pip install sentencepiece protobuf datasets huggingface_hub hf_transfer
!pip install --no-deps unsloth

Collecting bitsandbytes
  Downloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl.metadata (5.0 kB)
Collecting xformers==0.0.29.post3
  Downloading xformers-0.0.29.post3-cp311-cp311-manylinux_2_28_x86_64.whl.metadata (1.0 kB)
Collecting trl
  Downloading trl-0.16.1-py3-none-any.whl.metadata (12 kB)
Collecting cut_cross_entropy
  Downloading cut_cross_entropy-25.1.1-py3-none-any.whl.metadata (9.3 kB)
Collecting unsloth_zoo
  Downloading unsloth_zoo-2025.3.17-py3-none-any.whl.metadata (8.0 kB)
Downloading xformers-0.0.29.post3-cp311-cp311-manylinux_2_28_x86_64.whl (43.4 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m [32m43.4/43.4 MB[0m [31m19.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading bitsandbytes-0.45.5-py3-none-manylinux_2_24_x86_64.whl (76.1 MB)
[2K   [90m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚î

## 2. Load and Examine the CSV Data

Now let's load our CSV file which contains prompts and their rewritten versions. We'll examine the structure to understand how to format it for Llama 3.

In [2]:
import pandas as pd

# Load the CSV file
df = pd.read_csv('data.csv')

# Display basic information about the dataset
print(f"Dataset has {len(df)} rows")
print("\nFirst few rows:")
display(df.head())

Dataset has 83 rows

First few rows:


Unnamed: 0,prompt,gemini_rewrite
0,"Using WebPilot, create an outline for an artic...","Craft a detailed article outline (for a 2,000-..."
1,"I want you to act as an English translator, sp...","I want you to act as an English translator, sp..."
2,I want you to act as an interviewer. I will be...,Assume the role of an interviewer. I will be t...
3,I want you to act as a javascript console. I w...,Simulate a JavaScript console. Respond to my c...
4,I want you to act as a text based excel. you'l...,I want you to act as a text-based Excel. You'l...


## 3. Load the Llama 3 Model

We'll load the base Llama 3 model. For this tutorial, we're using the smaller 1B parameter model which is faster to fine-tune.

In [3]:
import torch
from unsloth import FastLanguageModel

# Model configuration
max_seq_length = 8192
load_in_4bit = True  # Use 4-bit quantization to reduce memory usage

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct",  # Using the smaller model for speed
    max_seq_length = max_seq_length,
    load_in_4bit = load_in_4bit,
    # token = "hf_...",  # Uncomment if using gated models
)

print("Model loaded successfully!")

ü¶• Unsloth: Will patch your computer to enable 2x faster free finetuning.
ü¶• Unsloth Zoo will now patch everything to make training faster!
==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.50.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!


model.safetensors:   0%|          | 0.00/1.10G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/234 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.7k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/17.2M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/454 [00:00<?, ?B/s]

Model loaded successfully!


## 4. Add LoRA Adapters

We'll use LoRA (Low-Rank Adaptation) to efficiently fine-tune the model. This allows us to only update a small percentage of parameters, making fine-tuning faster and more memory-efficient.

In [4]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,  # Rank of the adaptation matrices
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",  # Uses less VRAM
    random_state = 3407,
)

print("LoRA adapters added to the model.")

Unsloth 2025.3.19 patched 16 layers with 16 QKV layers, 16 O layers and 16 MLP layers.


LoRA adapters added to the model.


## 5. Format Data for Llama 3

Now we'll format our CSV data to match the Llama 3 chat template. Instead of using the built-in template functions, we'll create our own formatter that directly applies the Llama 3 formatting.

In [5]:
from datasets import Dataset

def format_as_llama3(row):
    """
    Format a row from our CSV into the Llama 3 chat format:
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>

    System instruction

    <|eot_id|><|start_header_id|>user<|end_header_id|>

    User prompt

    <|eot_id|><|start_header_id|>assistant<|end_header_id|>

    Assistant response<|eot_id|>
    """
    # System message explaining the task
    system_msg = "Your task is to rewrite AI-generated prompts to make them more human-like."

    # User message is the AI-generated text
    user_msg = row['gemini_rewrite']

    # Assistant message is the human-written text
    assistant_msg = row['prompt']

    # Format in Llama 3 template
    formatted_text = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_msg}\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{user_msg}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n{assistant_msg}<|eot_id|>"

    return formatted_text

# Apply our formatter to each row in the dataframe
formatted_texts = df.apply(format_as_llama3, axis=1).tolist()

# Create a HuggingFace dataset
dataset = Dataset.from_dict({"text": formatted_texts})

print(f"Dataset created with {len(dataset)} examples")
print("\nExample of formatted text:")
print(dataset[0]['text'])

Dataset created with 83 examples

Example of formatted text:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Your task is to rewrite AI-generated prompts to make them more human-like.

<|eot_id|><|start_header_id|>user<|end_header_id|>

Craft a detailed article outline (for a 2,000-word article) on the keyword 'Best SEO prompts', leveraging the top 10 Google results via WebPilot. Ensure the outline is comprehensive enough to facilitate the creation of the full 2,000-word article.

**Part 1 & Part 2 Outline Structure:**

*   Include every relevant heading, maintaining a high keyword density within the headings.
*   Specify the word count for each section of the outline.
*   Incorporate a FAQs section, drawing from Google's "People also ask" section for the keyword 'Best SEO prompts'.

**Keyword & Related Term Generation:**

*   Compile an extensive list of LSI and NLP keywords associated with 'Best SEO prompts'.
*   List any other words related to the keyword.

**External L

## 6. Configure Training

Now we'll set up the training configuration using TRL's SFTTrainer, which makes it easy to fine-tune language models.

In [6]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

# Training arguments
training_args = TrainingArguments(
    output_dir="./llama3_finetuned",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    num_train_epochs = 2, # Set this for 1 full training run.
    # max_steps=60,
    learning_rate= 2e-4,
    fp16=not is_bfloat16_supported(),
    bf16=is_bfloat16_supported(),
    logging_steps=1,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,
    report_to="none",  # Set to "wandb" if you want to use Weights & Biases
)

# Initialize the trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
    packing=False,  # Set to True for shorter sequences to speed up training
    args=training_args,
)

print("Training configuration complete!")

Unsloth: Tokenizing ["text"] (num_proc=2):   0%|          | 0/83 [00:00<?, ? examples/s]

Training configuration complete!


## 7. Train the Model

Now let's train our model! This will fine-tune the Llama 3 model on our custom data.

In [7]:
# Optional: setup to monitor GPU usage
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

# Start training
print("Starting training...")
trainer_stats = trainer.train()

# Print training stats
print(f"\nTraining complete in {trainer_stats.metrics['train_runtime']} seconds")
print(f"({round(trainer_stats.metrics['train_runtime']/60, 2)} minutes)")

GPU = Tesla T4. Max memory = 14.741 GB.
1.088 GB of memory reserved.
Starting training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 83 | Num Epochs = 2 | Total steps = 20
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 11,272,192/1,000,000,000 (1.13% trained)


Unsloth: Will smartly offload gradients to save VRAM!


Step,Training Loss
1,3.2991
2,2.8411
3,2.6767
4,3.0544
5,2.6064
6,3.0491
7,2.558
8,2.4476
9,1.9939
10,2.196



Training complete in 41.2026 seconds
(0.69 minutes)


## 8. Save the Model

Now we'll save our fine-tuned model for future use.

In [8]:
# Save the model locally
output_dir = "llama3_humanizing_finetuned"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
print(f"Model saved to {output_dir}")

# Uncomment to save to Hugging Face Hub
# model.push_to_hub("your-username/llama3-humanizing", token="your_hf_token")
# tokenizer.push_to_hub("your-username/llama3-humanizing", token="your_hf_token")

Model saved to llama3_humanizing_finetuned


## 9. Test the Fine-tuned Model

Finally, let's test our model to see how well it performs!

In [9]:
model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "llama3_humanizing_finetuned", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        load_in_4bit = load_in_4bit
    )

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

llama3_chatformat = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{}\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n{}<|eot_id|>"

inputs = tokenizer(
[
    llama3_chatformat.format(
        "Your task is to rewrite AI-generated prompts to make them more human-like.", # instruction
        "Assume you are a helpful assistant explaining a process. I am learning. Explain the basic steps for fine-tuning a language model like Llama 3, one step at a time. Wait for me to say 'Okay' before you tell me the next step. Start with the very first step.", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

==((====))==  Unsloth 2025.3.19: Fast Llama patching. Transformers: 4.50.3.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.6.0+cu124. CUDA: 7.5. CUDA Toolkit: 12.4. Triton: 3.2.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.29.post3. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>

Your task is to rewrite AI-generated prompts to make them more human-like.

<|eot_id|><|start_header_id|>user<|end_header_id|>

Assume you are a helpful assistant explaining a process. I am learning. Explain the basic steps for fine-tuning a language model like Llama 3, one step at a time. Wait for me to say 'Okay' before you tell me the next step. Start with the very first step.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<|eot_id|><|start_header_i

## 10. Conclusion

Congratulations! You've successfully:
1. Loaded and formatted custom CSV data for Llama 3 fine-tuning
2. Applied the correct Llama 3 chat template format
3. Fine-tuned a Llama 3 model with LoRA
4. Saved and tested your fine-tuned model

This notebook provides a simplified approach to fine-tuning Llama 3 models with custom data, focusing specifically on the humanizing prompts task.