## Model Comparison: Base vs Fine-Tuned vs Customer-Specialized Llama-3.2-1B

This notebook is designed to evaluate and compare the performance of three versions of the Llama-3.2-1B model:

1. **Base Model**: The original Llama-3.2-1B-Instruct model from Meta, providing a general-purpose language understanding and generation baseline.
2. **Fine-Tuned Model**: A version fine-tuned on the FineTome-100k dataset to improve instruction following, response coherence, and task-specific performance.
3. **Customer Support Fine-Tuned Model**: A specialized model trained on the Bitext Customer Support dataset, optimized to handle customer queries professionally and contextually.

The goal of this notebook is to demonstrate the improvements introduced by fine-tuning and to highlight the benefits of domain-specific adaptation.

We'll run a set of structured prompts covering general conversation, instructions, knowledge questions, problem-solving, code-related queries, creative tasks, and customer support scenarios. The outputs of all three models will be displayed side by side for direct comparison.

## 1. Setup and Installation

In [None]:
# Install required packages
install_packages = True

if install_packages:
    !pip install -q -U transformers accelerate peft bitsandbytes

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.0/44.0 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m12.0/12.0 MB[0m [31m151.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m59.4/59.4 MB[0m [31m14.3 MB/s[0m eta [36m0:00:00[0m
[?25h

In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import warnings
warnings.filterwarnings('ignore')

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA device: {torch.cuda.get_device_name(0)}")

PyTorch version: 2.9.0+cu126
CUDA available: True
CUDA device: Tesla T4


## 2. Load Models

We'll load:
- Base Llama-3.2-1B-Instruct model
- Fine-tuned model with LoRA adapters from HuggingFace Hub

In [None]:
import os

# Load HuggingFace token securely
# Priority: Colab secrets > Environment variable > .env file
try:
    from google.colab import userdata
    hf_token = userdata.get('HF_TOKEN')
    print("✓ Loaded token from Colab secrets")
except:
    hf_token = os.environ.get('HF_TOKEN')
    if hf_token:
        print("✓ Loaded token from environment variable")
    else:
        # Try loading from .env file
        try:
            env_path = '../.env'
            if os.path.exists(env_path):
                with open(env_path, 'r') as f:
                    for line in f:
                        if line.strip().startswith('HF_TOKEN='):
                            hf_token = line.strip().split('=', 1)[1]
                            print("✓ Loaded token from .env file")
                            break
        except:
            pass

if not hf_token:
    print("⚠️  WARNING: HF_TOKEN not found!")
    print("Please add it to Colab secrets or set HF_TOKEN environment variable")
    print("Get your token from: https://huggingface.co/settings/tokens")

In [None]:
# HuggingFace token (if models are private)
from huggingface_hub import login

# Set to False if already logged in
do_login = True

if do_login:
    hf_token = hf_token  # Replace with your token
    login(token=hf_token)

In [None]:
# Load base model
print("Loading base model...")
base_model_id = "meta-llama/Llama-3.2-1B-Instruct"

# Configure 4-bit quantization
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
tokenizer.pad_token = tokenizer.eos_token

print("Base model loaded successfully!")




Loading base model...


config.json:   0%|          | 0.00/877 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/2.47G [00:00<?, ?B/s]

generation_config.json:   0%|          | 0.00/189 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/54.5k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/296 [00:00<?, ?B/s]

Base model loaded successfully!


In [None]:
# Load fine-tuned model with LoRA adapters
from peft import PeftModel

print("Loading fine-tuned model with LoRA adapters...")

# Load LoRA adapters from HuggingFace Hub
finetuned_model_repo = "Zedel17/fine_tuned_llama_1b"  # Your HF repo with LoRA adapters

# Configure 4-bit quantization (same as base model)
bnb_config_ft = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# Create base model for fine-tuned version
finetuned_base = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config_ft,
    device_map="auto",
)

# Load LoRA adapters on top
finetuned_model = PeftModel.from_pretrained(
    finetuned_base,
    finetuned_model_repo,
)

print("Fine-tuned model loaded successfully!")

Loading fine-tuned model with LoRA adapters...


adapter_config.json: 0.00B [00:00, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/22.6M [00:00<?, ?B/s]

Fine-tuned model loaded successfully!


In [None]:
# Load Customer Support fine-tuned model
from peft import PeftModel

print("Loading customer support fine-tuned model...")

customer_model_repo = "Zedel17/fine_tuned_llama_1b_Customer"  # HF repo con LoRA adapters

# Base model for customer support (4-bit quantization)
customer_base = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    quantization_config=bnb_config,  # puoi riusare la stessa config del base
    device_map="auto",
)

# Load LoRA adapters
customer_model = PeftModel.from_pretrained(
    customer_base,
    customer_model_repo,
)

print("Customer support fine-tuned model loaded successfully!")

Loading customer support fine-tuned model...


adapter_config.json: 0.00B [00:00, ?B/s]

adapter_model.safetensors:   0%|          | 0.00/22.6M [00:00<?, ?B/s]

Customer support fine-tuned model loaded successfully!


## 3. Inference Helper Functions

In [None]:
def generate_response(model, tokenizer, prompt, max_new_tokens=256, temperature=0.7):
    """
    Generate response from model using chat template.

    Args:
        model: The model to use for generation
        tokenizer: The tokenizer
        prompt: User prompt string
        max_new_tokens: Maximum tokens to generate
        temperature: Sampling temperature

    Returns:
        Generated response string
    """
    # Format using Llama-3 chat template
    messages = [
        {"role": "user", "content": prompt}
    ]

    # Apply chat template
    formatted_prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    # Tokenize
    inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)

    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_new_tokens,
            temperature=temperature,
            top_p=0.9,
            do_sample=True,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )

    # Decode only the new tokens (exclude input prompt)
    response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True)

    return response.strip()

def compare_models(base_model, finetuned_model, tokenizer, prompt, max_new_tokens=256):
    """
    Compare outputs from base and fine-tuned models.

    Args:
        base_model: Base model
        finetuned_model: Fine-tuned model
        tokenizer: Tokenizer
        prompt: User prompt
        max_new_tokens: Max tokens to generate
    """
    print(f"\n{'='*80}")
    print(f"PROMPT: {prompt}")
    print(f"{'='*80}\n")

    print("\n[BASE MODEL RESPONSE]")
    print("-" * 80)
    base_response = generate_response(base_model, tokenizer, prompt, max_new_tokens)
    print(base_response)

    print("\n[FINE-TUNED MODEL RESPONSE]")
    print("-" * 80)
    finetuned_response = generate_response(finetuned_model, tokenizer, prompt, max_new_tokens)
    print(finetuned_response)

    print("\n" + "="*80 + "\n")

    return base_response, finetuned_response

def compare_multiple_models(models, model_names, tokenizer, prompt, max_new_tokens=256):
    """
    Compare outputs from multiple models.

    Args:
        models: list of models
        model_names: list of model names (strings)
        tokenizer: tokenizer
        prompt: user prompt
        max_new_tokens: max tokens to generate
    """
    print(f"\n{'='*80}")
    print(f"PROMPT: {prompt}")
    print(f"{'='*80}\n")

    responses = {}
    for name, model in zip(model_names, models):
        print(f"[{name.upper()} RESPONSE]")
        print("-" * 80)
        resp = generate_response(model, tokenizer, prompt, max_new_tokens)
        print(resp + "\n")
        responses[name] = resp

    print("="*80 + "\n")
    return responses

## 4. Test Prompts and Comparison

We'll test various types of prompts to evaluate model improvements:
- General conversation
- Instruction following
- Knowledge questions
- Creative tasks
- Code-related questions

### Test 1: General Conversation

In [None]:
prompt1 = "Hello! How are you today?"
base_resp1, ft_resp1 = compare_models(base_model, finetuned_model, tokenizer, prompt1)


PROMPT: Hello! How are you today?


[BASE MODEL RESPONSE]
--------------------------------------------------------------------------------
I'm just a computer program, so I don't have feelings or emotions like humans do, but I'm functioning properly and ready to help with any questions or tasks you may have! How about you? How's your day going so far?

[FINE-TUNED MODEL RESPONSE]
--------------------------------------------------------------------------------
I'm just a language model, so I don't have feelings, but I'm functioning properly and ready to help with any questions or tasks you may have. It's great to start the day! How can I assist you today?




### Test 2: Instruction Following

In [None]:
prompt2 = "Write a short poem about artificial intelligence in 4 lines."
base_resp2, ft_resp2 = compare_models(base_model, finetuned_model, tokenizer, prompt2)


PROMPT: Write a short poem about artificial intelligence in 4 lines.


[BASE MODEL RESPONSE]
--------------------------------------------------------------------------------
Here's a short poem about artificial intelligence:

In silicon halls, it reigns supreme,
A force that's changing humanity's dream.
With logic and reason, it takes its place,
A future where machines find a steady pace.

[FINE-TUNED MODEL RESPONSE]
--------------------------------------------------------------------------------
Here's a short poem about artificial intelligence:

Artificial intelligence, a future so bright,
A world of possibilities, a wondrous sight.
It learns and adapts, with logic so grand,
Bringing efficiency and innovation to our land.




### Test 3: Knowledge Question

In [None]:
prompt3 = "Explain the concept of machine learning in simple terms."
base_resp3, ft_resp3 = compare_models(base_model, finetuned_model, tokenizer, prompt3)


PROMPT: Explain the concept of machine learning in simple terms.


[BASE MODEL RESPONSE]
--------------------------------------------------------------------------------
Machine learning is a way to create intelligent systems that can learn from data without being explicitly programmed. It's like having a super smart assistant that can analyze patterns and make predictions or decisions without needing a human to tell it what to do.

Imagine you're at a restaurant and you're ordering food. You don't need to tell the waiter what you want to eat; you just tell the waiter what you like (e.g., Italian, spicy, vegan). The waiter knows exactly what to make for you because of your previous orders (you're familiar with the restaurant's menu). This is similar to machine learning – a machine (or computer) is trained on data (the restaurant's menu) and it uses that data to make decisions (like what to order for you).

Machine learning works in the same way. A machine is given a lot of data (like 

### Test 4: Problem Solving

In [None]:
prompt4 = "Give me 3 tips for staying productive while working from home."
base_resp4, ft_resp4 = compare_models(base_model, finetuned_model, tokenizer, prompt4)


PROMPT: Give me 3 tips for staying productive while working from home.


[BASE MODEL RESPONSE]
--------------------------------------------------------------------------------
Here are three tips for staying productive while working from home:

1. **Create a dedicated workspace**: Designate a specific area for work and keep it organized and clutter-free. This will help you establish a clear boundary between work and personal life. Make sure it's well-lit, comfortable, and equipped with all the necessary tools and equipment.

2. **Establish a routine and set boundaries with family and friends**: Working from home can be easy to fall into the trap of procrastination or distractions. Create a schedule and stick to it. Set aside specific times for work, breaks, and personal time. Communicate your work hours and boundaries with family and friends to avoid interruptions.

3. **Take breaks and exercise regularly**: Working from home can be isolating, which can lead to burnout. Make sure to t

### Test 5: Code-Related Question

In [None]:
prompt5 = "What is the difference between a list and a tuple in Python?"
base_resp5, ft_resp5 = compare_models(base_model, finetuned_model, tokenizer, prompt5)


PROMPT: What is the difference between a list and a tuple in Python?


[BASE MODEL RESPONSE]
--------------------------------------------------------------------------------
In Python, both lists and tuples are data structures that are used to store collections of items, but they have some key differences:

**Lists vs Tuples:**

1. **Immutability**: Lists are mutable, meaning they can be changed after creation. Tuples are immutable, meaning their contents cannot be modified after creation.
2. **Indexing**: Lists allow for indexing, which means you can access individual elements using their index. Tuples do not support indexing.
3. **Methods**: Lists have methods like `append`, `insert`, `remove`, etc. that allow you to modify the list. Tuples do not have these methods.
4. **Length**: Lists are dynamic, meaning their length can be changed. Tuples are fixed, meaning their length is fixed.
5. **Initialization**: Lists can be initialized with a list literal, while tuples are initialized w

### Test 6: Creative Task

In [None]:
prompt6 = "Suggest a creative name for a coffee shop that specializes in sustainable practices."
base_resp6, ft_resp6 = compare_models(base_model, finetuned_model, tokenizer, prompt6)


PROMPT: Suggest a creative name for a coffee shop that specializes in sustainable practices.


[BASE MODEL RESPONSE]
--------------------------------------------------------------------------------
Here are some creative name suggestions for a coffee shop that specializes in sustainable practices:

1. **Brew & Bloom**: This name suggests growth, harmony, and sustainability, which aligns perfectly with your coffee shop's mission.
2. **Green Cup**: This name is simple and straightforward, conveying the eco-friendly aspect of your coffee shop.
3. **Earthly Grounds**: This name combines the idea of the earth with the coffee shop's focus on sustainable practices.
4. **The Sustainable Bean**: This name highlights the importance of sustainable coffee production and serving.
5. **Latté & Leaf**: This name incorporates a playful, earthy element (leaf) while emphasizing the coffee aspect of your shop.
6. **Roasted with Care**: This name conveys the attention to detail and care that goes into so

### Test 7: Reasoning Task

In [None]:
prompt7 = "If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?"
base_resp7, ft_resp7 = compare_models(base_model, finetuned_model, tokenizer, prompt7)


PROMPT: If it takes 5 machines 5 minutes to make 5 widgets, how long would it take 100 machines to make 100 widgets?


[BASE MODEL RESPONSE]
--------------------------------------------------------------------------------
If it takes 5 machines 5 minutes to make 5 widgets, we can calculate the rate at which the machines work together.

Since the rate is 5 widgets per 5 minutes, we can calculate the rate per minute as follows:

Rate per minute = 5 widgets / 5 minutes = 1 widget per minute

Now, if we have 100 machines working at this rate, we can calculate the time it takes to make 100 widgets as follows:

Time = Total widgets / Rate per minute
= 100 widgets / 1 widget per minute
= 100 minutes

Therefore, it would take 100 machines 100 minutes to make 100 widgets.

[FINE-TUNED MODEL RESPONSE]
--------------------------------------------------------------------------------
To determine the time it would take for 100 machines to make 100 widgets, we can use the concept of work done and t

##Customer Support

In [None]:
models = [base_model, finetuned_model, customer_model]
model_names = ["base", "finetuned_finetome", "customer_support"]

prompt = "I'm very disappointed that my issue hasn't been resolved yet. Can you escalate it?"
responses = compare_multiple_models(models, model_names, tokenizer, prompt)


PROMPT: I'm very disappointed that my issue hasn't been resolved yet. Can you escalate it?

[BASE RESPONSE]
--------------------------------------------------------------------------------
I'm so sorry to hear that your issue isn't being resolved yet. I'm here to help and support you in any way I can. While I can't actually escalate your issue to a human representative, I can offer some suggestions to help you further or find a resolution.

Here are a few options:

1. **Try a different platform or customer support channel**: If you're using a different platform or customer support channel, it's possible that the issue is being handled by someone else. Try reaching out to them directly and see if they can assist you.
2. **Contact the company's customer support email or phone number**: If you're unable to reach a customer support representative, you can try contacting the company's customer support email or phone number. This can help you get a human representative's attention to your i

In [None]:
models = [base_model, finetuned_model, customer_model]
model_names = ["base", "finetuned_finetome", "customer_support"]

prompt = "Hi, I recently purchased your software but I'm having trouble activating it."
responses = compare_multiple_models(models, model_names, tokenizer, prompt)


PROMPT: Hi, I recently purchased your software but I'm having trouble activating it.

[BASE RESPONSE]
--------------------------------------------------------------------------------
I'm happy to help you with any issues you're experiencing with the software. Could you please provide more details about the issue you're facing? Specifically, what software is it, and what exactly is not working as expected? The more information you can provide, the better I'll be able to assist you.

Also, just to confirm, are you using a specific version of the software, and are you using a Windows or macOS device?

[FINETUNED_FINETOME RESPONSE]
--------------------------------------------------------------------------------
I'm happy to help you with any issues you're experiencing with the software. Can you please provide me with more details about the issue you're facing? What exactly you're trying to activate, and any error messages or error codes you're seeing? This will help me better understand t

In [None]:
models = [base_model, finetuned_model, customer_model]
model_names = ["base", "finetuned_finetome", "customer_support"]

prompt = "I'm unable to log in to my account. I keep getting an 'invalid password' error."
responses = compare_multiple_models(models, model_names, tokenizer, prompt)


PROMPT: I'm unable to log in to my account. I keep getting an 'invalid password' error.

[BASE RESPONSE]
--------------------------------------------------------------------------------
I'm here to help you troubleshoot the issue. Let's go through some steps to resolve the 'invalid password' error.

**Troubleshooting steps to resolve 'invalid password' error:**

1.  **Check the password length**: Ensure your password is 8 characters long. If it's less than 8 characters, you may be prompted to enter a longer password.

2.  **Verify password complexity**: The password should include uppercase letters, lowercase letters, numbers, and special characters. If your password doesn't meet this requirement, you might need to create a new one.

3.  **Check for password expiration**: If your account is set to expire after a certain number of days or months, ensure that your password is not set to expire.

4.  **Clear cookies and cache**: Clearing cookies and cache can sometimes resolve the issue.

In [None]:
models = [base_model, finetuned_model, customer_model]
model_names = ["base", "finetuned_finetome", "customer_support"]

prompt = "I want to return a pair of shoes that don't fit. What is the procedure?"
responses = compare_multiple_models(models, model_names, tokenizer, prompt)


PROMPT: I want to return a pair of shoes that don't fit. What is the procedure?

[BASE RESPONSE]
--------------------------------------------------------------------------------
If you're looking to return a pair of shoes that don't fit, here's a step-by-step guide to help you through the process:

**Before Returning:**

1. **Check the return policy**: Verify the return policy of the retailer or store where you purchased the shoes. Some stores may have specific requirements, such as a certain time frame or a certain condition for returns.
2. **Check the size and fit**: Before returning the shoes, ensure they are not too tight or too loose. If the shoes are too tight, they may be difficult to return or may not fit properly when returned.

**Returning the Shoes:**

1. **Contact the retailer or store**: Reach out to the retailer or store where you purchased the shoes and explain your situation. They may ask you to contact their customer service department or a specific representative.
2.

In [None]:
models = [base_model, finetuned_model, customer_model]
model_names = ["base", "finetuned_finetome", "customer_support"]

prompt = "Are there any discounts on electronics this weekend?"
responses = compare_multiple_models(models, model_names, tokenizer, prompt)


PROMPT: Are there any discounts on electronics this weekend?

[BASE RESPONSE]
--------------------------------------------------------------------------------
I'm happy to help you find deals on electronics. However, I'm a large language model, I don't have real-time access to current deals and discounts. But I can suggest some options to help you find deals:

1. **Manufacturer's websites**: Check the official websites of your favorite electronics brands for any sale or promotion. They often offer discounts on various products.
2. **Online Retailers**: Websites like Amazon, Best Buy, and Newegg often have deals and discounts on electronics. You can check their websites for any available discounts.
3. **Deal Websites**: Websites like Slickdeals, RetailMeNot, and Coupons.com aggregate deals and discounts on various products. You can search for "electronics deals" or specific products to find what's on sale.
4. **Deal Alerts**: Sign up for deal alert services like RetailMeNot, Coupons.co

In [None]:
models = [base_model, finetuned_model, customer_model]
model_names = ["base", "finetuned_finetome", "customer_support"]

prompt = "I tried to pay with my credit card, but the payment didn't go through."
responses = compare_multiple_models(models, model_names, tokenizer, prompt)


PROMPT: I tried to pay with my credit card, but the payment didn't go through.

[BASE RESPONSE]
--------------------------------------------------------------------------------
If you tried to pay with your credit card and the payment didn't go through, here are some steps you can take to resolve the issue:

1. **Check your card details**: Double-check that your card details are correct, including the card number, expiration date, and security code. Make sure you're using the correct card for the payment.
2. **Check your payment method**: Ensure you're using the correct payment method, such as credit card, PayPal, or another payment method accepted by the merchant.
3. **Verify the payment gateway**: Check if the payment gateway is working correctly. You can try logging out of your account and clearing your cookies to clear any temporary issues.
4. **Check for any transaction history**: Check your transaction history to ensure there are no pending or rejected transactions. If you're us

## 5. Analysis and Conclusions

### Expected Improvements from Fine-Tuning:

1. **Better Instruction Following**: The fine-tuned model should better understand and follow specific instructions, as it was trained on the FineTome-100k dataset which contains high-quality instruction-response pairs.

2. **More Coherent Responses**: Fine-tuning on conversational data should result in more natural, coherent responses that stay on topic.

3. **Improved Format Adherence**: The model should better respect the chat format and provide responses that are appropriately structured.

4. **Enhanced Task Completion**: For specific tasks (writing poems, answering questions, etc.), the fine-tuned model should provide more complete and relevant responses.

### Customer-service prompts:

The base model responded with generic troubleshooting steps, coherent but not specifically aligned to customer-support workflows.
The Finetone fine-tuned model showed better structure and prioritised merchant-contact actions, reflecting the patterns present in the training data.
The Bitext-trained model produced a longer list of checks and policies, but also displayed redundancy and occasional irrelevant items (e.g., special‐category restrictions), which mirrors the somewhat noisy and heterogeneous nature of the Bitext dataset.

None of the three models fully completed their response. This behaviour is consistent with small-scale models (1B parameters), which often stop early when producing enumerations or longer explanations. It is also amplified when training datasets contain incomplete examples or when the generation budget is insufficient.

### Conclusion:

The fine-tuned Llama-3.2-1B model demonstrates improvements over the base model. The LoRA fine-tuning approach successfully adapted the model to the FineTome-100k dataset while maintaining efficiency through parameter-efficient training.