# ðŸ¤– Simple Jenkins Chatbot Fine-Tuning

**Minimal fine-tuning script for Llama-2 on Jenkins Q&A data**

### Requirements:
- Google Colab with GPU runtime (T4 or better)
- Google Drive for saving model
- ~10GB GPU VRAM
- ~2 hours training time

### What this does:
1. Installs required packages
2. Loads Jenkins Q&A data
3. Fine-tunes Llama-2-7b with QLoRA
4. Saves model to Google Drive


## Step 1: Install Dependencies


In [None]:
# Install required packages
%pip install -q accelerate==0.21.0 peft==0.4.0 bitsandbytes==0.40.2 transformers==4.31.0 trl==0.4.7 datasets

print("âœ“ Packages installed!")


## Step 2: Mount Google Drive (to save model)


In [None]:
from google.colab import drive
drive.mount('/content/drive')

print("âœ“ Google Drive mounted!")


## Step 3: Clone Repository and Load Data


In [None]:
# Clone the repository with training data
!git clone https://github.com/YOUR_USERNAME/Enhancing-LLM-with-Jenkins-Knowledge.git
%cd Enhancing-LLM-with-Jenkins-Knowledge

print("âœ“ Repository cloned!")


## Step 4: Prepare Training Data


In [None]:
import pandas as pd
from datasets import Dataset

def format_instruction(question, answer):
    """Format Q&A in Llama-2 instruction format."""
    return f"<s>[INST] {question.strip()} [/INST] {answer.strip()} </s>"

# Load Jenkins datasets
print("Loading datasets...")

# Dataset 1: Stack Overflow
df1 = pd.read_csv('datasets/QueryResultsUpdated.csv')
df1['text'] = df1.apply(lambda x: format_instruction(x['Question Body'], x['Answer Body']), axis=1)

# Dataset 2: Jenkins Docs
df2 = pd.read_csv('datasets/Jenkins Docs QA.csv')
df2['text'] = df2.apply(lambda x: format_instruction(x['Question'], x['Answer']), axis=1)

# Dataset 3: Community Questions
df3 = pd.read_csv('datasets/Community Questions Refined.csv')
df3['text'] = df3.apply(lambda x: format_instruction(x['questions'], x['answers']), axis=1)

# Combine all datasets
all_data = pd.concat([df1[['text']], df2[['text']], df3[['text']]], ignore_index=True)

# Remove very long examples (keep under 2000 chars)
all_data = all_data[all_data['text'].str.len() < 2000]

# Remove empty entries
all_data = all_data.dropna()

# Convert to HuggingFace Dataset
dataset = Dataset.from_pandas(all_data)

print(f"\nâœ“ Loaded {len(dataset)} training examples")
print(f"\nSample example:")
print(dataset[0]['text'][:300] + "...")


## Step 5: Load Model with 4-bit Quantization


In [None]:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

# Model to fine-tune
model_name = "meta-llama/Llama-2-7b-chat-hf"

print(f"Loading model: {model_name}")
print("This may take a few minutes...")

# 4-bit quantization config (saves memory)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

# Load model
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",
    trust_remote_code=True,
)

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

print("âœ“ Model and tokenizer loaded!")


## Step 6: Setup LoRA (Parameter-Efficient Fine-Tuning)


In [None]:
from peft import LoraConfig, get_peft_model, prepare_model_for_kbit_training

# Prepare model for training
model = prepare_model_for_kbit_training(model)

# LoRA configuration
lora_config = LoraConfig(
    r=16,                      # LoRA rank
    lora_alpha=16,             # LoRA alpha
    lora_dropout=0.05,         # Dropout
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],  # Which layers to adapt
)

# Apply LoRA to model
model = get_peft_model(model, lora_config)

# Show trainable parameters
trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
total_params = sum(p.numel() for p in model.parameters())
print(f"Trainable parameters: {trainable_params:,} / {total_params:,}")
print(f"Percentage: {100 * trainable_params / total_params:.2f}%")
print("\nâœ“ LoRA adapters added!")


## Step 7: Train the Model


In [None]:
from transformers import TrainingArguments
from trl import SFTTrainer

# Where to save the model
output_dir = "/content/drive/MyDrive/jenkins-llama-model"

# Training configuration
training_args = TrainingArguments(
    output_dir=output_dir,
    num_train_epochs=3,                    # Number of epochs (increase for better results)
    per_device_train_batch_size=4,         # Batch size
    gradient_accumulation_steps=1,
    learning_rate=2e-4,                    # Learning rate
    logging_steps=25,                      # Log every N steps
    save_steps=100,                        # Save checkpoint every N steps
    warmup_ratio=0.03,
    lr_scheduler_type="cosine",
    optim="paged_adamw_32bit",
    fp16=True,                             # Use mixed precision
    report_to="none",                      # Disable wandb/tensorboard
)

# Create trainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=512,                    # Max sequence length
    tokenizer=tokenizer,
    args=training_args,
    packing=False,
)

print("Starting training...")
print(f"Total examples: {len(dataset)}")
print(f"Epochs: {training_args.num_train_epochs}")
print(f"This will take ~1-2 hours on T4 GPU\n")

# Start training
trainer.train()

print("\nâœ“ Training complete!")


## Step 8: Save the Fine-Tuned Model


In [None]:
# Save model and tokenizer
final_model_path = "/content/drive/MyDrive/jenkins-llama-final"

print(f"Saving model to: {final_model_path}")
trainer.model.save_pretrained(final_model_path)
trainer.tokenizer.save_pretrained(final_model_path)

print("\nâœ“ Model saved successfully!")
print(f"\nYour fine-tuned model is saved at: {final_model_path}")
print("\nNext steps:")
print("1. Test the model")
print("2. Merge LoRA weights with base model")
print("3. Convert to GGUF format for deployment")


## Step 9: Test the Model (Optional)


In [None]:
# Quick test
def test_model(question):
    """Test the fine-tuned model with a question."""
    prompt = f"<s>[INST] {question} [/INST]"
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    
    outputs = model.generate(
        **inputs,
        max_new_tokens=200,
        temperature=0.7,
        do_sample=True,
    )
    
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    # Extract just the answer part
    answer = response.split("[/INST]")[-1].strip()
    return answer

# Test with some questions
test_questions = [
    "What is Jenkins?",
    "How do I install Jenkins?",
    "What is a Jenkins pipeline?",
]

print("Testing the model:\n")
for question in test_questions:
    print(f"Q: {question}")
    answer = test_model(question)
    print(f"A: {answer}\n")
    print("-" * 80 + "\n")


---

## ðŸŽ‰ Done!

Your Jenkins-specialized Llama-2 model is now fine-tuned!

### What you have:
- âœ… Fine-tuned LoRA weights saved to Google Drive
- âœ… Model specialized for Jenkins questions
- âœ… ~300MB adapter weights (not full 13GB model)

### To use in production:
1. **Merge LoRA weights** with base model to create standalone model
2. **Quantize to GGUF** format for CPU inference (6-7GB)
3. **Deploy** with Flask backend from this repo

### Resources:
- [Full Documentation](https://github.com/YOUR_USERNAME/Enhancing-LLM-with-Jenkins-Knowledge)
- [Convert to GGUF Guide](https://github.com/ggerganov/llama.cpp)
- [HuggingFace Model Upload](https://huggingface.co/docs/hub/models-uploading)
