<a href="https://colab.research.google.com/github/aviadarn/AiNotebooks/blob/main/full_train_humiazie_llm.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Fine-tuning Llama 3 with Custom CSV Data
This notebook demonstrates how to fine-tune Llama 3 models using custom CSV data. We'll cover:

1. Setting up the environment
2. Loading and examining the data
3. Creating a properly formatted dataset for Llama 3
4. Fine-tuning with LoRA (Low-Rank Adaptation)
5. Testing the fine-tuned model

This approach is simpler than the standard Unsloth tutorial, focusing specifically on custom CSV data processing.

# 1. Setting Up the Environment
First, let's install the necessary libraries. We're using Unsloth which provides optimized training for Llama models.

In [12]:
%%capture
import os, re
if "COLAB_" not in "".join(os.environ.keys()):
    !pip install unsloth
else:
    # Do this only in Colab notebooks! Otherwise use pip install unsloth
    import torch; v = re.match(r"[0-9]{1,}\.[0-9]{1,}", str(torch.__version__)).group(0)
    xformers = "xformers==" + ("0.0.33.post1" if v=="2.9" else "0.0.32.post2" if v=="2.8" else "0.0.29.post3")
    !pip install --no-deps bitsandbytes accelerate {xformers} peft trl triton cut_cross_entropy unsloth_zoo
    !pip install sentencepiece protobuf "datasets==4.3.0" "huggingface_hub>=0.34.0" hf_transfer
    !pip install --no-deps unsloth
!pip install transformers==4.56.2
!pip install --no-deps trl==0.22.2
!pip install pandas

# 2. Downloading a Dataset from Hugging Face

To download a dataset from Hugging Face, we'll use the `datasets` library. You'll need to have a Hugging Face account and generate an access token to download private datasets or datasets that require authentication.

1.  **Get your Hugging Face Token**: Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) and generate a new token with at least 'read' access.
2.  **Load the dataset**: Use the `load_dataset` function, passing your token.

In [13]:
# First, install the datasets library if you haven't already

# Import the necessary library
from datasets import load_dataset
from huggingface_hub import login
from google.colab import userdata

HUGGING_FACE_TOKEN =userdata.get('HUGGING_FACE_TOKEN')

try:
    print("Attempting to load a dataset from Hugging Face...")
    dataset_name = "dmitva/human_ai_generated_text"

    dataset = load_dataset(dataset_name, token=HUGGING_FACE_TOKEN if HUGGING_FACE_TOKEN != "hf_YOUR_ACTUAL_TOKEN_HERE" else None)

    print(f"Dataset '{dataset_name}' loaded successfully!")
    print(dataset)

    # Display the first few rows of the 'train' split if available
    if 'train' in dataset:
        print("\nFirst 5 rows of the 'train' split:")

        df_train = dataset['train'].to_pandas()
        print(df_train.head())

except Exception as e:
    print(f"Error loading dataset: {e}")
    print("Please ensure you have the correct dataset name and a valid Hugging Face token.")

Attempting to load a dataset from Hugging Face...


README.md: 0.00B [00:00, ?B/s]

model_training_dataset.csv:   0%|          | 0.00/3.93G [00:00<?, ?B/s]

Generating train split:   0%|          | 0/1000000 [00:00<?, ? examples/s]

Dataset 'dmitva/human_ai_generated_text' loaded successfully!
DatasetDict({
    train: Dataset({
        features: ['id', 'human_text', 'ai_text', 'instructions'],
        num_rows: 1000000
    })
})

First 5 rows of the 'train' split:
                                     id  \
0  cc902a20-27c4-4c18-8012-048a328206d1   
1  c4d2fbe3-e966-479d-89c4-62e1729b6255   
2  710f585e-5e98-42b8-81f6-265d7c934645   
3  e4db6c43-7b6b-4385-9b67-04652c71df0c   
4  7a48bcf1-cbb4-4f41-b99a-ea859c56afdf   

                                          human_text  \
0  Also they feel more comfortable at home. Some ...   
1  I can get another job to work on the weekends,...   
2  parents and school should agree on the desicio...   
3  Base in my experiences I'm growing, I try hard...   
4  Many people around the world have different ch...   

                                             ai_text  \
0  \n\nTherefore, when it comes to allowing stude...   
1  It is important to weigh both the potential co...   


In [14]:
df = df_train.sample(n=100, random_state=42)

#3. Load the Llama 3 Model
We'll load the base Llama 3 model. For this tutorial, we're using the smaller 1B parameter model which is faster to fine-tune.

In [15]:
import torch
from unsloth import FastLanguageModel

# Model configuration
max_seq_length = 8192
load_in_4bit = True  # Use 4-bit quantization to reduce memory usage

# Load the model and tokenizer
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name = "unsloth/Llama-3.2-1B-Instruct",  # Using the smaller model for speed
    max_seq_length = max_seq_length,
    load_in_4bit = load_in_4bit,
    # token = "hf_...",  # Uncomment if using gated models
)

print("Model loaded successfully!")

==((====))==  Unsloth 2025.12.5: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
Model loaded successfully!


#4. Add LoRA Adapters
We'll use LoRA (Low-Rank Adaptation) to efficiently fine-tune the model. This allows us to only update a small percentage of parameters, making fine-tuning faster and more memory-efficient.

In [16]:
model = FastLanguageModel.get_peft_model(
    model,
    r = 16,  # Rank of the adaptation matrices
    target_modules = ["q_proj", "k_proj", "v_proj", "o_proj",
                      "gate_proj", "up_proj", "down_proj"],
    lora_alpha = 16,
    lora_dropout = 0,
    bias = "none",
    use_gradient_checkpointing = "unsloth",  # Uses less VRAM
    random_state = 3407,
)

print("LoRA adapters added to the model.")

LoRA adapters added to the model.


#5. Format Data for Llama 3
Now we'll format our CSV data to match the Llama 3 chat template. Instead of using the built-in template functions, we'll create our own formatter that directly applies the Llama 3 formatting.

In [17]:
from datasets import Dataset

def format_as_llama3(row):
    """
    Format a row from our CSV into the Llama 3 chat format:
    <|begin_of_text|><|start_header_id|>system<|end_header_id|>

    System instruction

    <|eot_id|><|start_header_id|>user<|end_header_id|>

    User prompt

    <|eot_id|><|start_header_id|>assistant<|end_header_id|>

    Assistant response<|eot_id|>
    """
    # System message explaining the task
    system_msg = "Your task is to rewrite AI-generated prompts to make them more human-like."

    # User message is the AI-generated text
    user_msg = row['ai_text']

    # Assistant message is the human-written text
    assistant_msg = row['human_text']

    # Format in Llama 3 template
    formatted_text = f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{system_msg}\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{user_msg}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n{assistant_msg}<|eot_id|>"

    return formatted_text

# Apply our formatter to each row in the dataframe
formatted_texts = df.apply(format_as_llama3, axis=1).tolist()

# Create a HuggingFace dataset
dataset = Dataset.from_dict({"text": formatted_texts})

print(f"Dataset created with {len(dataset)} examples")
print("\nExample of formatted text:")
print(dataset[0]['text'])

Dataset created with 100 examples

Example of formatted text:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

Your task is to rewrite AI-generated prompts to make them more human-like.

<|eot_id|><|start_header_id|>user<|end_header_id|>



Ultimately, the decision as to whether students should work in schools or their homes depends on many factors and must be made on an individual basis. 

When students work in a school setting, they benefit from frequent social interactions and the varied perspectives of their classmates. This can be particularly beneficial for students who are more independent and self-motivated and need less guidance. Social interactions, academic improvement, and peer support are the main choices that need to be weighed in this decision. They can utilize their differences to collaborate closely on projects, which can in turn lead to better academic performance. Students are able to work in a comfortable and distraction-free environment. 
When consideri

#6. Configure Training
Now we'll set up the training configuration using TRL's SFTTrainer, which makes it easy to fine-tune language models.

In [18]:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

# Training arguments
training_args = TrainingArguments(
    output_dir="./llama3_finetuned",
    per_device_train_batch_size=2,
    gradient_accumulation_steps=4,
    warmup_steps=5,
    num_train_epochs = 2, # Set this for 1 full training run.
    # max_steps=60,
    learning_rate= 2e-4,
    fp16=not is_bfloat16_supported(),
    bf16=is_bfloat16_supported(),
    logging_steps=1,
    optim="adamw_8bit",
    weight_decay=0.01,
    lr_scheduler_type="linear",
    seed=3407,
    report_to="none",  # Set to "wandb" if you want to use Weights & Biases
)

# Initialize the trainer
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
    packing=False,  # Set to True for shorter sequences to speed up training
    args=training_args,
)

print("Training configuration complete!")

Unsloth: Tokenizing ["text"] (num_proc=6):   0%|          | 0/100 [00:00<?, ? examples/s]

Training configuration complete!


#7. Train the Model
Now let's train our model! This will fine-tune the Llama 3 model on our custom data.



In [19]:
# Optional: setup to monitor GPU usage
gpu_stats = torch.cuda.get_device_properties(0)
start_gpu_memory = round(torch.cuda.max_memory_reserved() / 1024 / 1024 / 1024, 3)
max_memory = round(gpu_stats.total_memory / 1024 / 1024 / 1024, 3)
print(f"GPU = {gpu_stats.name}. Max memory = {max_memory} GB.")
print(f"{start_gpu_memory} GB of memory reserved.")

# Start training
print("Starting training...")
trainer_stats = trainer.train()

# Print training stats
print(f"\nTraining complete in {trainer_stats.metrics['train_runtime']} seconds")
print(f"({round(trainer_stats.metrics['train_runtime']/60, 2)} minutes)")

GPU = Tesla T4. Max memory = 14.741 GB.
3.562 GB of memory reserved.
Starting training...


==((====))==  Unsloth - 2x faster free finetuning | Num GPUs used = 1
   \\   /|    Num examples = 100 | Num Epochs = 2 | Total steps = 26
O^O/ \_/ \    Batch size per device = 2 | Gradient accumulation steps = 4
\        /    Data Parallel GPUs = 1 | Total batch size (2 x 4 x 1) = 8
 "-____-"     Trainable parameters = 11,272,192 of 1,247,086,592 (0.90% trained)


Step,Training Loss
1,3.0323
2,3.1075
3,3.0109
4,3.2563
5,3.095
6,3.0228
7,2.8896
8,2.8965
9,2.8641
10,2.8337



Training complete in 75.95 seconds
(1.27 minutes)


#8. Save the Model
Now we'll save our fine-tuned model for future use.

In [20]:
# Save the model locally
output_dir = "llama3_humanizing_finetuned"
model.save_pretrained(output_dir)
tokenizer.save_pretrained(output_dir)
print(f"Model saved to {output_dir}")

Model saved to llama3_humanizing_finetuned


# 9. Test the Fine-tuned Model
Finally, let's test our model to see how well it performs!

In [21]:
model, tokenizer = FastLanguageModel.from_pretrained(
        model_name = "llama3_humanizing_finetuned", # YOUR MODEL YOU USED FOR TRAINING
        max_seq_length = max_seq_length,
        load_in_4bit = load_in_4bit
    )

FastLanguageModel.for_inference(model) # Enable native 2x faster inference

llama3_chatformat = "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n{}\n\n<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n{}<|eot_id|>"

inputs = tokenizer(
[
    llama3_chatformat.format(
        "Your task is to rewrite AI-generated prompts to make them more human-like.", # instruction
        "Assume you are a helpful assistant explaining a process. I am learning. Explain the basic steps for fine-tuning a language model like Llama 3, one step at a time. Wait for me to say 'Okay' before you tell me the next step. Start with the very first step.", # input
        "", # output - leave this blank for generation!
    )
], return_tensors = "pt").to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer)
_ = model.generate(**inputs, streamer = text_streamer, max_new_tokens = 128)

==((====))==  Unsloth 2025.12.5: Fast Llama patching. Transformers: 4.56.2.
   \\   /|    Tesla T4. Num GPUs = 1. Max memory: 14.741 GB. Platform: Linux.
O^O/ \_/ \    Torch: 2.9.0+cu126. CUDA: 7.5. CUDA Toolkit: 12.6. Triton: 3.5.0
\        /    Bfloat16 = FALSE. FA [Xformers = 0.0.33.post1. FA2 = False]
 "-____-"     Free license: http://github.com/unslothai/unsloth
Unsloth: Fast downloading is enabled - ignore downloading bars which are red colored!
<|begin_of_text|><|begin_of_text|><|start_header_id|>system<|end_header_id|>

Your task is to rewrite AI-generated prompts to make them more human-like.

<|eot_id|><|start_header_id|>user<|end_header_id|>

Assume you are a helpful assistant explaining a process. I am learning. Explain the basic steps for fine-tuning a language model like Llama 3, one step at a time. Wait for me to say 'Okay' before you tell me the next step. Start with the very first step.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

<|eot_id|><|start_header_i