<a href="https://colab.research.google.com/github/alixa2003/AI-ML-Internship-Tasks-Month2/blob/main/DHC_Task5_Final_Auto_Tagging_Systems.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#**Task5: Auto-Tagging System usinf Few-shot & Zero-Shot Prompting.**

##**Importing Libraries**

In [None]:
import pandas as pd
import google.generativeai as genai
import os
import random
import time

##**Data Loading and Preprocessing**

In [2]:
def load_and_preprocess_data(file_path):
    print(f"Loading dataset from: {file_path}")

    # 1. Load the CSV file
    df = pd.read_csv(file_path, encoding='latin1')

    # 2. Combine Subject and Body
    # We combine them to give the LLM the full context of the user's issue.
    df['text'] = df['subject'].fillna('') + " \n " + df['body'].fillna('')

    # 3. Extract Tags
    # Tags are spread across columns 'tag_1' to 'tag_8'. We need to collect them into a single list.
    tag_cols = [f'tag_{i}' for i in range(1, 9)]

    def collect_tags(row):
        # List comprehension to get values that are not Null/NaN
        tags = [str(row[col]) for col in tag_cols if pd.notna(row[col])]
        return list(set(tags)) # Remove duplicates if any

    df['actual_tags'] = df.apply(collect_tags, axis=1)

    # 4. Get the Master List of Allowed Tags
    # We need a unique list of ALL possible tags to tell the LLM what it can choose from.
    all_unique_tags = sorted(list(set([tag for tags in df['actual_tags'] for tag in tags])))

    print(f"âœ… Data Loaded Successfully!")
    print(f"Total Tickets: {len(df)}")
    print(f"Total Unique Tags: {len(all_unique_tags)}")
    print(f"Sample Tags: {', '.join(all_unique_tags[:10])}...")

    return df, all_unique_tags

# --- Execution ---
if __name__ == "__main__":
    filename = '/content/aa_dataset-tickets-multi-lang-5-2-50-version.csv'

    # Run the function
    df_processed, allowed_tags = load_and_preprocess_data(filename)

    # Display first few rows to verify
    print("\n--- Processed Data Sample ---")
    print(df_processed[['text', 'actual_tags']].head(3))

Loading dataset from: /content/aa_dataset-tickets-multi-lang-5-2-50-version.csv
âœ… Data Loaded Successfully!
Total Tickets: 28587
Total Unique Tags: 1255
Sample Tags: AI, API, API Integration, AR, AWS, Abrechnungssystem, Access, Access Control, Access Difficulty, Access Issue...

--- Processed Data Sample ---
                                                text  \
0  Wesentlicher Sicherheitsvorfall \n Sehr geehrt...   
1  Account Disruption \n Dear Customer Support Te...   
2  Query About Smart Home System Integration Feat...   

                                       actual_tags  
0      [Outage, Disruption, Security, Data Breach]  
1  [Tech Support, Disruption, Account, Outage, IT]  
2                 [Tech Support, Feature, Product]  


We load the raw CSV data and combine the Subject and Body columns to provide the model with the full context of the user's issue. We also consolidate the scattered tag_1 through tag_8 columns into a single list of actual_tags for each ticket. Finally, we extract the allowed_tagsâ€”the 'universe' of valid categories the model must choose from.

In [5]:
# Ensure your key is set
os.environ["GEMINI_API_KEY"] = ""
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

print("Checking available models...")
for m in genai.list_models():
    if 'generateContent' in m.supported_generation_methods:
        print(m.name)

Checking available models...
models/gemini-2.5-flash
models/gemini-2.5-pro
models/gemini-2.0-flash-exp
models/gemini-2.0-flash
models/gemini-2.0-flash-001
models/gemini-2.0-flash-exp-image-generation
models/gemini-2.0-flash-lite-001
models/gemini-2.0-flash-lite
models/gemini-2.0-flash-lite-preview-02-05
models/gemini-2.0-flash-lite-preview
models/gemini-exp-1206
models/gemini-2.5-flash-preview-tts
models/gemini-2.5-pro-preview-tts
models/gemma-3-1b-it
models/gemma-3-4b-it
models/gemma-3-12b-it
models/gemma-3-27b-it
models/gemma-3n-e4b-it
models/gemma-3n-e2b-it
models/gemini-flash-latest
models/gemini-flash-lite-latest
models/gemini-pro-latest
models/gemini-2.5-flash-lite
models/gemini-2.5-flash-image-preview
models/gemini-2.5-flash-image
models/gemini-2.5-flash-preview-09-2025
models/gemini-2.5-flash-lite-preview-09-2025
models/gemini-3-pro-preview
models/gemini-3-flash-preview
models/gemini-3-pro-image-preview
models/nano-banana-pro-preview
models/gemini-robotics-er-1.5-preview
models

In [6]:
os.environ["GEMINI_API_KEY"] = ""
genai.configure(api_key=os.environ["GEMINI_API_KEY"])

model = genai.GenerativeModel('gemini-2.5-flash')

def get_zeroshot_prompt(ticket_text, all_tags_list):
    """
    Zero-Shot: Asks the model to tag without seeing any ticket examples.
    """
    return f"""
    Role: You are an automated support ticket tagging system.
    Task: specificy the top 3 most relevant tags for the ticket below.

    Constraints:
    1. Output ONLY a comma-separated list of tags. No explanations.
    2. Select tags ONLY from the 'Allowed Tags' list provided.

    Allowed Tags:
    {", ".join(all_tags_list)}

    Ticket:
    "{ticket_text}"

    Output:
    """

def get_fewshot_prompt(ticket_text, all_tags_list, df_examples):
    """
    Few-Shot: Includes 3 real examples of correctly tagged tickets to guide the model.
    """
    examples_str = ""

    sample_rows = df_examples.sample(3)

    for _, row in sample_rows.iterrows():
        tags_str = ", ".join(row['actual_tags'])

        examples_str += f'Ticket: "{row["text"][:200]}..."\nTags: {tags_str}\n---\n'

    return f"""
    Role: You are an automated support ticket tagging system.
    Task: specificy the top 3 most relevant tags for the ticket below.

    Constraints:
    1. Output ONLY a comma-separated list of tags.
    2. Select tags ONLY from the 'Allowed Tags' list.
    3. Learn from the examples provided below.

    Allowed Tags:
    {", ".join(all_tags_list)}

    Examples:
    {examples_str}

    Target Ticket:
    "{ticket_text}"

    Output:
    """

# --- EXECUTION LOOP ---

def run_comparison(df, tags_list, num_samples=3):
    print(f"--- Running Comparison on {num_samples} Tickets ---\n")


    test_set = df.iloc[10:10+num_samples]

    results = []

    for idx, row in test_set.iterrows():
        ticket_text = row['text']
        actual = row['actual_tags']

        print(f"ðŸŽ« Ticket ID {idx}: {ticket_text[:60]}...")
        print(f"   âœ… Ground Truth: {actual}")

        # 1. Zero-Shot
        try:
            p_zero = get_zeroshot_prompt(ticket_text, tags_list)
            resp_zero = model.generate_content(p_zero)
            pred_zero = [t.strip() for t in resp_zero.text.split(',')]
            print(f"   ðŸ”¹ Zero-Shot:    {pred_zero}")
        except Exception as e:
            pred_zero = ["Error"]
            print(f"   ðŸ”¹ Zero-Shot:    Error ({e})")

        # 2. Few-Shot (using the full df to pull random examples)
        try:
            p_few = get_fewshot_prompt(ticket_text, tags_list, df)
            resp_few = model.generate_content(p_few)
            pred_few = [t.strip() for t in resp_few.text.split(',')]
            print(f"   ðŸ”¸ Few-Shot:     {pred_few}")
        except Exception as e:
            pred_few = ["Error"]
            print(f"   ðŸ”¸ Few-Shot:     Error ({e})")

        print("-" * 50)
        time.sleep(4)
        )

if __name__ == "__main__":

    if 'df_processed' in locals() and 'allowed_tags' in locals():
        run_comparison(df_processed, allowed_tags, num_samples=3)
    else:
        print("Error: Please run the Data Loading step first.")

--- Running Comparison on 3 Tickets ---

ðŸŽ« Ticket ID 10: VPN Access Issue 
 Customer Support,\n\nWe are encountering ...
   âœ… Ground Truth: ['Tech Support', 'Disruption', 'Network', 'VPN']
   ðŸ”¹ Zero-Shot:    ['VPN', 'Access Issue', 'Connectivity']
   ðŸ”¸ Few-Shot:     ['VPN', 'Access Issue', 'Technical Support']
--------------------------------------------------
ðŸŽ« Ticket ID 11: Issue with SaaS Platform Functionality 
 Sehr geehrtes Suppo...
   âœ… Ground Truth: ['Disruption', 'Feature', 'Bug', 'Performance']
   ðŸ”¹ Zero-Shot:    ['SaaS Platform', 'Performance', 'Disruption']
   ðŸ”¸ Few-Shot:     ['SaaS Platform', 'Performance', 'Technical Support']
--------------------------------------------------
ðŸŽ« Ticket ID 12: Immediate Help Needed: Technical Problem with Cloud SaaS Ser...
   âœ… Ground Truth: ['Crash', 'Tech Support', 'Bug', 'Network', 'Disruption', 'Performance', 'Outage']
   ðŸ”¹ Zero-Shot:    ['Technical Problem', 'Cloud SaaS', 'Connectivity Issue']
   ðŸ”¸ Few

Tested two prompting strategies using the Gemini 2.5 Flash model:

* Zero-Shot: We provide the model with the list of allowed tags and the ticket text, relying entirely on its pre-trained knowledge to classify the issue.

* Few-Shot: We dynamically inject 3 real examples of tagged tickets into the prompt. This 'In-Context Learning' teaches the model the specific tagging style and logic of our dataset without updating the model's weights.

In [10]:
import pandas as pd
import numpy as np
import torch
from sklearn.preprocessing import MultiLabelBinarizer
from sklearn.model_selection import train_test_split
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
from datasets import Dataset

print("--- Loading Data for Fine-Tuning ---")
df = pd.read_csv('/content/aa_dataset-tickets-multi-lang-5-2-50-version.csv', encoding='latin1')


df['text'] = df['subject'].fillna('') + " " + df['body'].fillna('')


tag_cols = [f'tag_{i}' for i in range(1, 9)]
df['tags'] = df[tag_cols].apply(lambda x: [str(t) for t in x if pd.notna(t)], axis=1)


df = df[df['tags'].map(len) > 0]


df = df.sample(2000, random_state=42)

mlb = MultiLabelBinarizer()
labels_matrix = mlb.fit_transform(df['tags'])
label_list = mlb.classes_

print(f"Training on {len(df)} tickets.")
print(f"Number of unique tags: {len(label_list)}")

dataset = Dataset.from_dict({
    'text': df['text'].tolist(),
    'labels': [x.astype(float) for x in labels_matrix]
})

# Split Train/Test
dataset = dataset.train_test_split(test_size=0.2)

model_id = "xlm-roberta-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True, padding="max_length", max_length=128)

print("Tokenizing data...")
tokenized_datasets = dataset.map(preprocess_function, batched=True)

model = AutoModelForSequenceClassification.from_pretrained(
    model_id,
    num_labels=len(label_list),
    problem_type="multi_label_classification"
)

training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    eval_strategy="epoch",
    save_strategy="no",
    fp16=torch.cuda.is_available(),
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
)

print("\n--- Starting Training (This may take minutes) ---")
trainer.train()

print("\n--- Evaluation ---")
# Predict on Test Set
predictions = trainer.predict(tokenized_datasets["test"])
preds = torch.sigmoid(torch.tensor(predictions.predictions))

pred_tags = []
for p in preds:

    indices = (p > 0.3).nonzero(as_tuple=True)[0]
    pred_tags.append([label_list[i] for i in indices])

print("\n--- Fine-Tuned Results Sample ---")
for i in range(3):
    print(f"Ticket: {tokenized_datasets['test'][i]['text'][:60]}...")
    print(f"Predicted: {pred_tags[i]}")
    print("-" * 30)

--- Loading Data for Fine-Tuning ---
Training on 2000 tickets.
Number of unique tags: 331
Tokenizing data...


Map:   0%|          | 0/1600 [00:00<?, ? examples/s]

Map:   0%|          | 0/400 [00:00<?, ? examples/s]

Some weights of XLMRobertaForSequenceClassification were not initialized from the model checkpoint at xlm-roberta-base and are newly initialized: ['classifier.dense.bias', 'classifier.dense.weight', 'classifier.out_proj.bias', 'classifier.out_proj.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
  trainer = Trainer(



--- Starting Training (This may take minutes) ---


  | |_| | '_ \/ _` / _` |  _/ -_)
[34m[1mwandb[0m: (1) Create a W&B account
[34m[1mwandb[0m: (2) Use an existing W&B account
[34m[1mwandb[0m: (3) Don't visualize my results
[34m[1mwandb[0m: Enter your choice:

 1


[34m[1mwandb[0m: You chose 'Create a W&B account'
[34m[1mwandb[0m: Create an account here: https://wandb.ai/authorize?signup=true&ref=models
[34m[1mwandb[0m: Paste an API key from your profile and hit enter:

 Â·Â·Â·Â·Â·Â·Â·Â·Â·Â·


[34m[1mwandb[0m: No netrc file found, creating one.
[34m[1mwandb[0m: Appending key for api.wandb.ai to your netrc file: /root/.netrc
[34m[1mwandb[0m: Currently logged in as: [33madarr417[0m ([33madarr417-university-of-management-technology-sialkot-campus[0m) to [32mhttps://api.wandb.ai[0m. Use [1m`wandb login --relogin`[0m to force relogin


Epoch,Training Loss,Validation Loss
1,No log,0.112054
2,No log,0.069966
3,0.174000,0.063555



--- Evaluation ---



--- Fine-Tuned Results Sample ---
Ticket: Verschlusssicherung medizinischer Daten-Systeme Brauche Info...
Predicted: ['IT', 'Performance', 'Tech Support']
------------------------------
Ticket: VerschlÃ¼sselung medizinischer Daten in PostgreSQL Sehr geehr...
Predicted: ['IT', 'Performance', 'Tech Support']
------------------------------
Ticket: Performance Issue in Analytics System The analytics platform...
Predicted: ['IT', 'Performance', 'Tech Support']
------------------------------


We train a specific Transformer model (xlm-roberta-base) on our dataset. Unlike the LLM approach, which interprets text freely, this model updates its internal weights to map text patterns directly to our specific tags. We use Multi-Label Binarization to handle tickets that have multiple tags simultaneously

##**Saving The Model**

In [11]:
save_directory = "./saved_ticket_model"
model.save_pretrained(save_directory)
tokenizer.save_pretrained(save_directory)
print(f"Model saved to {save_directory}")

Model saved to ./saved_ticket_model


**Final Insights: LLM Prompting vs. Fine-Tuning**

Semantic Understanding vs. Strict Compliance:

* Gemini (LLM) demonstrated superior understanding of the content. It correctly identified that a ticket was about "Connectivity" or "SaaS". However, it often "hallucinated" new tags that were semantically correct but didn't exist in our database (e.g., predicting Connectivity instead of Network).

* Fine-Tuning demonstrated strict adherence to the schema. It never invented a tag. However, due to class imbalance in the training data, it became conservative, often defaulting to the most common tags (IT, Tech Support) for every ticket.

* The "Hybrid" Solution is Best:
For a production system, a Fine-Tuned model is safer because it guarantees valid outputs. To fix the repetitive predictions seen in this experiment, we would need to train on the full dataset (not just 2,000 rows) and use "Class Weights" to penalize the model for ignoring rare tags.

* Both approaches handled the mixed English/German dataset successfully without needing a translation step. XLM-RoBERTa is specifically designed for this, while Gemini handles it natively via its large pre-training corpus.