# Isuku Chatbot - Llama Model Training

This notebook trains a Llama-based chatbot model for the Isuku waste management system using the provided Q&A dataset.

## Dataset Overview
- **Total Rows**: 900
- **Languages**: English, Kinyarwanda, French
- **Intents**: waste_sorting, pickup_schedule, payment, education
- **Split**: 70% Training, 20% Validation, 10% Testing

In [11]:
# Install required packages
!pip install -q transformers torch accelerate datasets pandas openpyxl peft bitsandbytes

In [12]:
import sys
!{sys.executable} -m pip install scikit-learn transformers torch accelerate datasets pandas openpyxl peft bitsandbytes

Collecting scikit-learn
  Using cached scikit_learn-1.6.1-cp39-cp39-macosx_12_0_arm64.whl.metadata (31 kB)
Collecting joblib>=1.2.0 (from scikit-learn)
  Using cached joblib-1.5.3-py3-none-any.whl.metadata (5.5 kB)
Collecting threadpoolctl>=3.1.0 (from scikit-learn)
  Using cached threadpoolctl-3.6.0-py3-none-any.whl.metadata (13 kB)
Using cached scikit_learn-1.6.1-cp39-cp39-macosx_12_0_arm64.whl (11.1 MB)
Using cached joblib-1.5.3-py3-none-any.whl (309 kB)
Using cached threadpoolctl-3.6.0-py3-none-any.whl (18 kB)
Installing collected packages: threadpoolctl, joblib, scikit-learn
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m3/3[0m [scikit-learn][0m [scikit-learn]
[1A[2KSuccessfully installed joblib-1.5.3 scikit-learn-1.6.1 threadpoolctl-3.6.0


In [13]:
import sys
print(sys.executable)  # Should show the venv path

/Users/huberttuyishime/Documents/Isuku-app/venv/bin/python


## 1. Import Libraries and Setup

In [14]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
import torch
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    TrainingArguments,
    Trainer,
    DataCollatorForLanguageModeling,
    BitsAndBytesConfig
)
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training
from datasets import Dataset
import os
import warnings
warnings.filterwarnings('ignore')

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")
print(f"PyTorch version: {torch.__version__}")

  from .autonotebook import tqdm as notebook_tqdm


Using device: cpu
PyTorch version: 2.8.0


## 2. Load and Explore Dataset

In [15]:
# Load the dataset
df = pd.read_excel('Dataset/isuku_chatbot_dataset_300_QA.xlsx')

print("Dataset Shape:", df.shape)
print("\nColumn Names:", df.columns.tolist())
print("\nFirst few rows:")
print(df.head())

print("\n" + "="*60)
print("Dataset Statistics")
print("="*60)
print(f"Total rows: {len(df)}")
print(f"Languages: {df['Language'].unique()}")
print(f"Intents: {df['Intent'].unique()}")
print(f"\nLanguage distribution:\n{df['Language'].value_counts()}")
print(f"\nIntent distribution:\n{df['Intent'].value_counts()}")

Dataset Shape: (900, 5)

Column Names: ['ID', 'Intent', 'Language', 'Question', 'Answer']

First few rows:
   ID           Intent     Language                          Question  \
0   1    waste_sorting      English           How do I sort my waste?   
1   2    waste_sorting  Kinyarwanda              Nashyira he imyanda?   
2   3    waste_sorting       French       Comment trier les déchets ?   
3   4  pickup_schedule      English  When will my waste be collected?   
4   5  pickup_schedule  Kinyarwanda          Imyanda ikusanywa ryari?   

                                              Answer  
0  You should separate waste into organic, recycl...  
1  Ugomba gutandukanya imyanda mu byiciro: ibora,...  
2  Vous devez séparer les déchets en déchets orga...  
3  Check your pickup schedule in the app or wait ...  
4  Reba gahunda yo kuyikusanya muri porogaramu cy...  

Dataset Statistics
Total rows: 900
Languages: ['English' 'Kinyarwanda' 'French']
Intents: ['waste_sorting' 'pickup_schedule

## 3. Data Preprocessing and Formatting

In [16]:
# Remove duplicates to get unique Q&A pairs
df_unique = df.drop_duplicates(subset=['Question', 'Answer', 'Language', 'Intent'])
print(f"Unique Q&A pairs: {len(df_unique)} (from {len(df)} total rows)")

# Format data for chatbot training
# Create a prompt template for instruction-following format
def format_prompt(row):
    """Format Q&A pair into instruction-following format"""
    prompt = f"### Instruction:\n{row['Question']}\n\n### Response:\n{row['Answer']}"
    return prompt

df_unique['formatted_text'] = df_unique.apply(format_prompt, axis=1)

# Display sample formatted text
print("\nSample formatted prompts:")
for i in range(min(3, len(df_unique))):
    print(f"\n--- Sample {i+1} ({df_unique.iloc[i]['Language']}, {df_unique.iloc[i]['Intent']}) ---")
    print(df_unique.iloc[i]['formatted_text'])

Unique Q&A pairs: 12 (from 900 total rows)

Sample formatted prompts:

--- Sample 1 (English, waste_sorting) ---
### Instruction:
How do I sort my waste?

### Response:
You should separate waste into organic, recyclable, and non-recyclable items.

--- Sample 2 (Kinyarwanda, waste_sorting) ---
### Instruction:
Nashyira he imyanda?

### Response:
Ugomba gutandukanya imyanda mu byiciro: ibora, ishobora gukoreshwa ukundi, n’itagira icyo imaze.

--- Sample 3 (French, waste_sorting) ---
### Instruction:
Comment trier les déchets ?

### Response:
Vous devez séparer les déchets en déchets organiques, recyclables et non recyclables.


## 4. Train/Validation/Test Split (70/20/10)

In [17]:
# Create a combined label for stratification
df_unique['stratify_label'] = df_unique['Language'] + '_' + df_unique['Intent']

# Check if we can use stratification (need at least 2 samples per class)
stratify_counts = df_unique['stratify_label'].value_counts()
min_samples = stratify_counts.min()
print(f"Minimum samples per class: {min_samples}")

# First split: 70% train, 30% temp (for val + test)
if min_samples >= 2:
    train_df, temp_df = train_test_split(
        df_unique,
        test_size=0.3,
        random_state=42,
        stratify=df_unique['stratify_label']  # Stratify to maintain distribution
    )
else:
    print("Warning: Some classes have < 2 samples. Using random split without stratification.")
    train_df, temp_df = train_test_split(
        df_unique,
        test_size=0.3,
        random_state=42
    )

# Second split: 20% val, 10% test (from the 30% temp)
# temp_df is 30%, so 20/30 = 0.667 for validation, 10/30 = 0.333 for test
temp_stratify_counts = temp_df['stratify_label'].value_counts()
temp_min_samples = temp_stratify_counts.min()

if temp_min_samples >= 2:
    val_df, test_df = train_test_split(
        temp_df,
        test_size=0.333,  # 10% of total / 30% of total = 0.333
        random_state=42,
        stratify=temp_df['stratify_label']
    )
else:
    print("Warning: Some classes in temp split have < 2 samples. Using random split.")
    val_df, test_df = train_test_split(
        temp_df,
        test_size=0.333,
        random_state=42
    )

print(f"Training set: {len(train_df)} samples ({len(train_df)/len(df_unique)*100:.1f}%)")
print(f"Validation set: {len(val_df)} samples ({len(val_df)/len(df_unique)*100:.1f}%)")
print(f"Test set: {len(test_df)} samples ({len(test_df)/len(df_unique)*100:.1f}%)")

print("\nTraining set distribution:")
print(f"  Languages: {train_df['Language'].value_counts().to_dict()}")
print(f"  Intents: {train_df['Intent'].value_counts().to_dict()}")

print("\nValidation set distribution:")
print(f"  Languages: {val_df['Language'].value_counts().to_dict()}")
print(f"  Intents: {val_df['Intent'].value_counts().to_dict()}")

print("\nTest set distribution:")
print(f"  Languages: {test_df['Language'].value_counts().to_dict()}")
print(f"  Intents: {test_df['Intent'].value_counts().to_dict()}")

Minimum samples per class: 1
Training set: 8 samples (66.7%)
Validation set: 2 samples (16.7%)
Test set: 2 samples (16.7%)

Training set distribution:
  Languages: {'French': 3, 'Kinyarwanda': 3, 'English': 2}
  Intents: {'pickup_schedule': 3, 'waste_sorting': 2, 'payment': 2, 'education': 1}

Validation set distribution:
  Languages: {'Kinyarwanda': 1, 'English': 1}
  Intents: {'education': 1, 'waste_sorting': 1}

Test set distribution:
  Languages: {'English': 1, 'French': 1}
  Intents: {'education': 1, 'payment': 1}


## 5. Load Llama Model and Tokenizer

**Note**: For this example, we'll use a smaller Llama model. You may need to:
- Use Hugging Face authentication token for Llama models
- Adjust model name based on available resources
- Consider using quantized models (4-bit/8-bit) for memory efficiency

In [18]:
# Model configuration
# Using a smaller Llama model variant for training
# You can change this to "meta-llama/Llama-2-7b-chat-hf" or other variants
# Note: You may need Hugging Face authentication for Llama models

MODEL_NAME = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"  # Small model for demonstration
# Alternative options:
# MODEL_NAME = "meta-llama/Llama-2-7b-chat-hf"  # Requires HF token
# MODEL_NAME = "microsoft/phi-2"  # Alternative small model

print(f"Loading model: {MODEL_NAME}")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

# Configure quantization for memory efficiency (optional)
use_quantization = True  # Set to False if you have enough GPU memory

if use_quantization and torch.cuda.is_available():
    bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.float16,
        bnb_4bit_use_double_quant=True,
    )
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        quantization_config=bnb_config,
        device_map="auto",
        trust_remote_code=True
    )
else:
    model = AutoModelForCausalLM.from_pretrained(
        MODEL_NAME,
        torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
        device_map="auto" if torch.cuda.is_available() else None,
        trust_remote_code=True
    )

# Move model to device if not using device_map
if not torch.cuda.is_available():
    model = model.to(device)

print(f"Model loaded successfully!")
print(f"Model parameters: {sum(p.numel() for p in model.parameters())/1e6:.2f}M")

Loading model: TinyLlama/TinyLlama-1.1B-Chat-v1.0


`torch_dtype` is deprecated! Use `dtype` instead!


Model loaded successfully!
Model parameters: 1100.05M


In [None]:
# Tokenize function
def tokenize_function(examples):
    """Tokenize the formatted text"""
    # Tokenize with truncation and padding
    # Reduced max_length from 512 to 256 to save memory on MPS
    tokenized = tokenizer(
        examples['formatted_text'],
        truncation=True,
        padding='max_length',
        max_length=256,  # Reduced from 512 to save memory
        return_tensors="pt"
    )
    # For causal LM, labels are the same as input_ids
    tokenized['labels'] = tokenized['input_ids'].clone()
    return tokenized

# Convert to HuggingFace datasets
train_dataset = Dataset.from_pandas(train_df[['formatted_text']])
val_dataset = Dataset.from_pandas(val_df[['formatted_text']])
test_dataset = Dataset.from_pandas(test_df[['formatted_text']])

# Tokenize datasets
print("Tokenizing training set...")
train_dataset = train_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=['formatted_text']
)

print("Tokenizing validation set...")
val_dataset = val_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=['formatted_text']
)

print("Tokenizing test set...")
test_dataset = test_dataset.map(
    tokenize_function,
    batched=True,
    remove_columns=['formatted_text']
)

print(f"\nTraining samples: {len(train_dataset)}")
print(f"Validation samples: {len(val_dataset)}")
print(f"Test samples: {len(test_dataset)}")

Tokenizing training set...


Map: 100%|██████████| 8/8 [00:00<00:00, 99.30 examples/s]


Tokenizing validation set...


Map: 100%|██████████| 2/2 [00:00<00:00, 260.28 examples/s]


Tokenizing test set...


Map: 100%|██████████| 2/2 [00:00<00:00, 455.38 examples/s]


Training samples: 8
Validation samples: 2
Test samples: 2





## 7. Configure LoRA for Efficient Fine-tuning

Using LoRA (Low-Rank Adaptation) to reduce memory requirements and training time.

In [20]:
# Prepare model for LoRA training
if use_quantization:
    model = prepare_model_for_kbit_training(model)

# LoRA configuration
lora_config = LoraConfig(
    r=16,  # Rank
    lora_alpha=32,  # LoRA alpha
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],  # Adjust based on model architecture
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)

# Apply LoRA
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()

print("\nLoRA configuration applied successfully!")

'NoneType' object has no attribute 'cadam32bit_grad_fp32'
trainable params: 4,505,600 || all params: 1,104,553,984 || trainable%: 0.4079

LoRA configuration applied successfully!


## 8. Training Configuration

In [26]:
# Create models directory for saving the trained model
MODEL_SAVE_DIR = "./models/isuku_chatbot_llama"
os.makedirs(MODEL_SAVE_DIR, exist_ok=True)

# Training arguments
output_dir = "./models/isuku_chatbot_llama/checkpoints"
os.makedirs(output_dir, exist_ok=True)

# Configure MPS memory settings for Apple Silicon (if using MPS)
if torch.backends.mps.is_available():
    import os
    # Set MPS memory limit to avoid out-of-memory errors
    os.environ['PYTORCH_MPS_HIGH_WATERMARK_RATIO'] = '0.0'
    print("MPS detected. Memory limit disabled (may cause system issues if memory is exhausted).")
    print("If you encounter memory issues, consider reducing batch size further or using CPU.")

training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=3,  # Adjust based on your needs
    per_device_train_batch_size=1,  # Reduced from 2 to 1 to save memory
    per_device_eval_batch_size=1,  # Reduced from 2 to 1 to save memory
    gradient_accumulation_steps=8,  # Increased from 4 to 8 to maintain effective batch size = 1 * 8 = 8
    warmup_steps=50,
    learning_rate=2e-4,
    fp16=torch.cuda.is_available(),  # Use mixed precision if GPU available (not for MPS)
    bf16=False,  # Disable bf16 for MPS
    logging_steps=10,
    eval_strategy="epoch",  # Changed from evaluation_strategy to eval_strategy for newer transformers
    save_strategy="epoch",
    load_best_model_at_end=True,
    save_total_limit=3,
    report_to="none",  # Set to "tensorboard" if you want to use TensorBoard
    push_to_hub=False,
    dataloader_pin_memory=False,  # Disable pin memory for MPS to save memory
)

# Data collator
data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,  # We're doing causal LM, not masked LM
)

print("Training arguments configured!")
print(f"Output directory: {output_dir}")
print(f"Training epochs: {training_args.num_train_epochs}")
print(f"Batch size: {training_args.per_device_train_batch_size}")
print(f"Learning rate: {training_args.learning_rate}")

MPS detected. Memory limit disabled (may cause system issues if memory is exhausted).
If you encounter memory issues, consider reducing batch size further or using CPU.
Training arguments configured!
Output directory: ./models/isuku_chatbot_llama/checkpoints
Training epochs: 3
Batch size: 1
Learning rate: 0.0002


## 9. Initialize Trainer and Start Training

In [28]:
# Force CPU to avoid MPS memory issues
import torch
torch.backends.mps.is_available = lambda: False
device = torch.device("cpu")

In [29]:
# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    data_collator=data_collator,
)

print("Trainer initialized. Starting training...")
print("="*60)

# Start training
trainer.train()

print("\n" + "="*60)
print("Training completed!")

# Save the final model and tokenizer
print("\nSaving model and tokenizer...")
trainer.save_model(MODEL_SAVE_DIR)
tokenizer.save_pretrained(MODEL_SAVE_DIR)
print(f"Model saved to: {MODEL_SAVE_DIR}")

# Save training metrics
import json
train_metrics = trainer.state.log_history
with open(f"{MODEL_SAVE_DIR}/training_metrics.json", "w") as f:
    json.dump(train_metrics, f, indent=2)
print("Training metrics saved!")

Trainer initialized. Starting training...


Epoch,Training Loss,Validation Loss
1,No log,3.800055
2,No log,3.797337
3,No log,3.791894



Training completed!

Saving model and tokenizer...
Model saved to: ./models/isuku_chatbot_llama
Training metrics saved!


## 10. Model Evaluation and Testing

Evaluate the trained model on the test set and generate sample responses.


In [30]:
# Load the saved model for evaluation
from peft import PeftModel

print("Loading saved model for evaluation...")
base_model = AutoModelForCausalLM.from_pretrained(
    MODEL_NAME,
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    device_map="auto" if torch.cuda.is_available() else None,
    trust_remote_code=True
)

# Load the fine-tuned LoRA weights
trained_model = PeftModel.from_pretrained(base_model, MODEL_SAVE_DIR)
trained_model.eval()

# Load tokenizer
eval_tokenizer = AutoTokenizer.from_pretrained(MODEL_SAVE_DIR)
eval_tokenizer.pad_token = eval_tokenizer.eos_token

print("Model loaded successfully for evaluation!")


Loading saved model for evaluation...
Model loaded successfully for evaluation!


In [31]:
# Evaluate on test set
print("Evaluating on test set...")
print("="*60)

test_results = trainer.evaluate(eval_dataset=test_dataset)
print("\nTest Set Results:")
for key, value in test_results.items():
    if isinstance(value, float):
        print(f"  {key}: {value:.4f}")
    else:
        print(f"  {key}: {value}")

# Save test results
with open(f"{MODEL_SAVE_DIR}/test_results.json", "w") as f:
    json.dump(test_results, f, indent=2)
print(f"\nTest results saved to: {MODEL_SAVE_DIR}/test_results.json")


Evaluating on test set...



Test Set Results:
  eval_loss: 3.2555
  eval_runtime: 77.7701
  eval_samples_per_second: 0.0260
  eval_steps_per_second: 0.0260
  epoch: 3.0000

Test results saved to: ./models/isuku_chatbot_llama/test_results.json


In [32]:
# Function to generate response from the model
def generate_response(model, tokenizer, question, max_length=256, temperature=0.7):
    """Generate a response to a given question"""
    # Format the prompt
    prompt = f"### Instruction:\n{question}\n\n### Response:\n"
    
    # Tokenize
    inputs = tokenizer(prompt, return_tensors="pt", truncation=True, max_length=512)
    
    # Move to device
    if torch.cuda.is_available():
        inputs = {k: v.to(model.device) for k, v in inputs.items()}
    
    # Generate
    with torch.no_grad():
        outputs = model.generate(
            **inputs,
            max_new_tokens=max_length,
            temperature=temperature,
            do_sample=True,
            top_p=0.9,
            pad_token_id=tokenizer.eos_token_id,
            eos_token_id=tokenizer.eos_token_id,
        )
    
    # Decode response
    full_response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    
    # Extract only the response part
    if "### Response:" in full_response:
        response = full_response.split("### Response:")[-1].strip()
    else:
        response = full_response[len(prompt):].strip()
    
    return response

print("Response generation function created!")


Response generation function created!


## 11. Test Model with Sample Questions

Test the trained model with sample questions from the test set.


In [33]:
# Test with sample questions from test set
print("Testing model with sample questions from test set...")
print("="*60)

# Select a few diverse samples from test set
test_samples = test_df.sample(min(5, len(test_df)), random_state=42)

results = []
for idx, row in test_samples.iterrows():
    question = row['Question']
    expected_answer = row['Answer']
    language = row['Language']
    intent = row['Intent']
    
    print(f"\n{'='*60}")
    print(f"Language: {language} | Intent: {intent}")
    print(f"\nQuestion: {question}")
    print(f"\nExpected Answer: {expected_answer}")
    
    # Generate response
    generated_answer = generate_response(trained_model, eval_tokenizer, question)
    print(f"\nGenerated Answer: {generated_answer}")
    
    results.append({
        'question': question,
        'expected': expected_answer,
        'generated': generated_answer,
        'language': language,
        'intent': intent
    })

print("\n" + "="*60)
print("Sample testing completed!")

# Save test samples and results
test_samples_df = pd.DataFrame(results)
test_samples_df.to_csv(f"{MODEL_SAVE_DIR}/test_samples_results.csv", index=False)
print(f"\nTest samples and results saved to: {MODEL_SAVE_DIR}/test_samples_results.csv")


Testing model with sample questions from test set...

Language: French | Intent: payment

Question: Comment payer le service de collecte ?

Expected Answer: Le paiement se fait via mobile money dans l’application.

Generated Answer: Nous avons répondu à votre demande de service de collecte le jour de votre réception.

Pour plus d'informations, veuillez contacter notre équipe de réception.

Nous espérons voir votre réception dans les plus brefs délais.

Sincerely,

[Your Name]

[Your Title]

[Your Company Name]

[Your Company Address]

[Your City, State ZIP Code]

[Your Phone Number]

[Your Email Address]

[Your Website]

[Date]

Language: English | Intent: education

Question: Why is waste sorting important?

Expected Answer: It helps recycling, reduces pollution, and protects the environment.

Generated Answer: Waste sorting is important because it helps to reduce the amount of waste that ends up in landfills and improves the overall sustainability of our environment. Landfills are no

## 12. Interactive Testing

Test the model with your own custom questions.


In [34]:
# Interactive testing - modify the question below to test your model
custom_question = "How do I schedule a waste pickup?"

print("Testing with custom question:")
print(f"Question: {custom_question}")
print("\nGenerating response...")

response = generate_response(trained_model, eval_tokenizer, custom_question)
print(f"\nResponse: {response}")

# You can test more questions by running this cell again with different questions


Testing with custom question:
Question: How do I schedule a waste pickup?

Generating response...

Response: Sure! To schedule a waste pickup, please call the city's waste management hotline at 718-638-5575. The hotline is available 24/7 and can handle your request. Once you have scheduled your pickup, you will receive a notification via email or text message. If you have any questions or concerns, you can contact the hotline at any time.


## 13. Model Summary

Summary of the trained model and saved files.


In [35]:
# Print model summary and saved files
print("="*60)
print("MODEL TRAINING SUMMARY")
print("="*60)
print(f"\nModel Name: {MODEL_NAME}")
print(f"Model Save Directory: {MODEL_SAVE_DIR}")
print(f"\nDataset Statistics:")
print(f"  Total unique Q&A pairs: {len(df_unique)}")
print(f"  Training samples: {len(train_df)}")
print(f"  Validation samples: {len(val_df)}")
print(f"  Test samples: {len(test_df)}")

print(f"\nSaved Files in {MODEL_SAVE_DIR}:")
if os.path.exists(MODEL_SAVE_DIR):
    for file in os.listdir(MODEL_SAVE_DIR):
        file_path = os.path.join(MODEL_SAVE_DIR, file)
        if os.path.isfile(file_path):
            size = os.path.getsize(file_path) / (1024 * 1024)  # Size in MB
            print(f"  - {file} ({size:.2f} MB)")
        else:
            print(f"  - {file}/ (directory)")

print("\n" + "="*60)
print("Training and evaluation completed successfully!")
print("="*60)


MODEL TRAINING SUMMARY

Model Name: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Model Save Directory: ./models/isuku_chatbot_llama

Dataset Statistics:
  Total unique Q&A pairs: 12
  Training samples: 8
  Validation samples: 2
  Test samples: 2

Saved Files in ./models/isuku_chatbot_llama:
  - adapter_model.safetensors (17.21 MB)
  - tokenizer_config.json (0.00 MB)
  - special_tokens_map.json (0.00 MB)
  - test_samples_results.csv (0.00 MB)
  - tokenizer.json (3.45 MB)
  - checkpoints/ (directory)
  - README.md (0.00 MB)
  - training_args.bin (0.01 MB)
  - training_metrics.json (0.00 MB)
  - adapter_config.json (0.00 MB)
  - chat_template.jinja (0.00 MB)
  - test_results.json (0.00 MB)

Training and evaluation completed successfully!
