<a href="https://colab.research.google.com/github/agdev/Routing/blob/main/Fine_tuning_Classification_for_Routing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

I want to compare 3 ways to route incoming chat
1. Fine tuned model to classify the input
2. Semantic Router

We will start with Machine Learning

# **Installing dependencies**

In [1]:
!pip install datasets --quiet
!pip install transformers --quiet
!pip install transformers[torch] --quiet
!pip install accelerate -U --quiet
!pip install evaluate --quiet
!pip install torch --quiet
# !pip install sentencepiece --quiet
# !pip install rouge_score --quiet
# !pip install rouge --quiet

# **Importing Libraries**

In [2]:
import pandas as pd
import numpy as np
from pprint import pprint
from huggingface_hub import login
from google.colab import userdata
import evaluate
import torch

In [3]:
hf_api_key=userdata.get('HuggingFace')
login(token = hf_api_key, add_to_git_credential=True)

Token is valid (permission: write).
Your token has been saved in your configured git credential helpers (store).
Your token has been saved to /root/.cache/huggingface/token
Login successful


# **Dataset**

In [4]:
from datasets import load_dataset

dataset_name = "bitext/Bitext-customer-support-llm-chatbot-training-dataset"
# Load the dataset
base_ds = load_dataset(dataset_name, split="train")
# set columns
pprint(base_ds.features)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading readme:   0%|          | 0.00/11.9k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/19.2M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/26872 [00:00<?, ? examples/s]

{'category': Value(dtype='string', id=None),
 'flags': Value(dtype='string', id=None),
 'instruction': Value(dtype='string', id=None),
 'intent': Value(dtype='string', id=None),
 'response': Value(dtype='string', id=None)}


In [5]:
pprint(base_ds[:10])

{'category': ['ORDER',
              'ORDER',
              'ORDER',
              'ORDER',
              'ORDER',
              'ORDER',
              'ORDER',
              'ORDER',
              'ORDER',
              'ORDER'],
 'flags': ['B', 'BQZ', 'BLQZ', 'BL', 'BCELN', 'BI', 'BCLN', 'BL', 'BL', 'BLQ'],
 'instruction': ['question about cancelling order {{Order Number}}',
                 'i have a question about cancelling oorder {{Order Number}}',
                 'i need help cancelling puchase {{Order Number}}',
                 'I need to cancel purchase {{Order Number}}',
                 'I cannot afford this order, cancel purchase {{Order Number}}',
                 'can you help me cancel order {{Order Number}}?',
                 'I can no longer afford order {{Order Number}}, cancel it',
                 'I am trying to cancel purchase {{Order Number}}',
                 'I have got to cancel purchase {{Order Number}}',
                 'i need help canceling purchase {

# **Testing model before fine tuning**

In [6]:
base_df = base_ds.to_pandas()
base_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 26872 entries, 0 to 26871
Data columns (total 5 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   flags        26872 non-null  object
 1   instruction  26872 non-null  object
 2   category     26872 non-null  object
 3   intent       26872 non-null  object
 4   response     26872 non-null  object
dtypes: object(5)
memory usage: 1.0+ MB


In [7]:
base_df['category'].value_counts() # -> Pretty heavily concentrated on ACCOUNT, ORDER and REFUND categories.

Unnamed: 0_level_0,count
category,Unnamed: 1_level_1
ACCOUNT,5986
ORDER,3988
REFUND,2992
INVOICE,1999
CONTACT,1999
PAYMENT,1998
FEEDBACK,1997
DELIVERY,1994
SHIPPING,1970
SUBSCRIPTION,999


# **Converting labels to numeric**

In [22]:
numeric_labels, unique_labels = pd.factorize(base_df['category'].unique())
label_mapping = {label: int(numeric_label) for label, numeric_label in zip(unique_labels, numeric_labels)}
id_to_label = {str(numeric_label): label for label, numeric_label in label_mapping.items()}

In [23]:
label_mapping

{'ORDER': 0,
 'SHIPPING': 1,
 'CANCEL': 2,
 'INVOICE': 3,
 'PAYMENT': 4,
 'REFUND': 5,
 'FEEDBACK': 6,
 'CONTACT': 7,
 'ACCOUNT': 8,
 'DELIVERY': 9,
 'SUBSCRIPTION': 10}

In [24]:
id_to_label

{'0': 'ORDER',
 '1': 'SHIPPING',
 '2': 'CANCEL',
 '3': 'INVOICE',
 '4': 'PAYMENT',
 '5': 'REFUND',
 '6': 'FEEDBACK',
 '7': 'CONTACT',
 '8': 'ACCOUNT',
 '9': 'DELIVERY',
 '10': 'SUBSCRIPTION'}

# **Loading Model**

In [25]:
from transformers import AutoModelForSequenceClassification, Trainer, TrainingArguments

model_name: str = "vineetsharma/customer-support-intent-albert"
# Load the model
model = AutoModelForSequenceClassification.from_pretrained(model_name,
                                                           num_labels=len(id_to_label),
                                                           ignore_mismatched_sizes=True)
model.config.id2label = id_to_label
model.config.label2id = label_mapping

Some weights of AlbertForSequenceClassification were not initialized from the model checkpoint at vineetsharma/customer-support-intent-albert and are newly initialized because the shapes did not match:
- classifier.bias: found shape torch.Size([27]) in the checkpoint and torch.Size([11]) in the model instantiated
- classifier.weight: found shape torch.Size([27, 768]) in the checkpoint and torch.Size([11, 768]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [26]:
import torch

# Set the device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Move the model to the device
model.to(device)

def classify_text(text):
    # Tokenize the input text
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)

    # Move inputs to the same device as the model
    inputs = {key: value.to(device) for key, value in inputs.items()}

    # Get model predictions
    outputs = model(**inputs)
    logits = outputs.logits
    probabilities = torch.nn.functional.softmax(logits, dim=-1)

    # Get the predicted class
    predicted_class = torch.argmax(probabilities, dim=-1).item()

    # Return the class and confidence
    return predicted_class, probabilities[0][predicted_class].item()

# **Splitting dataset**

In [27]:
# Split the dataset into training and validation sets
train_test_split = base_ds.train_test_split(test_size=0.2)  # 20% for validation
train_dataset = train_test_split['train']
validation_dataset = train_test_split['test']

# **Loading Tokenizer**

In [28]:
from transformers import AutoTokenizer
# Load the tokenizer for ALBERT
tokenizer = AutoTokenizer.from_pretrained(model_name)

# **Testing Model before training**

In [16]:
# prompt: select 100 items from train_dataset, pass value of category column to classify_text
from sklearn.metrics import accuracy_score, f1_score

# Select 100 items from train_dataset
subset = train_dataset.select(range(100))
predictions = []
true_labels = []
# Classify and print results
for item in subset:
  predicted_class, confidence = classify_text(item['instruction'])
  predictions.append(predicted_class)
  true_labels.append(label_mapping[item['category']])
  if predicted_class in id_to_label:
      print(f"Text: {item['instruction']}, Predicted Class: {predicted_class} ({id_to_label[predicted_class]}), Confidence: {confidence}")
  else:
      print(f"Text: {item['instruction']}, Predicted Class: {predicted_class}, Confidence: {confidence} (No label mapping available)")

accuracy = accuracy_score(true_labels, predictions)
f1 = f1_score(true_labels, predictions, average='weighted')

print(f"Accuracy: {accuracy}")
print(f"F1 Score: {f1}")

Text: I need help to set a secondary shipping address up, Predicted Class: 5 (REFUND), Confidence: 0.1297203004360199
Text: I'm waiting for a rebate of {{Currency Symbol}}{{Refund Amount}} was it processed, Predicted Class: 5 (REFUND), Confidence: 0.2035149186849594
Text: how do I see at what time customer assistance available is?, Predicted Class: 0 (ORDER), Confidence: 0.1385951191186905
Text: can i place an order from {{Delivery City}}, Predicted Class: 4 (PAYMENT), Confidence: 0.16120626032352448
Text: need help to  file a consumer claim, Predicted Class: 9 (DELIVERY), Confidence: 0.1755235642194748
Text: I want help seeing when will my item arrive, Predicted Class: 5 (REFUND), Confidence: 0.16627326607704163
Text: seeing bill from {{Person Name}}, Predicted Class: 8 (ACCOUNT), Confidence: 0.17483046650886536
Text: check under which circumstances can I request my money back, Predicted Class: 9 (DELIVERY), Confidence: 0.12998858094215393
Text: I have a problem with the termination o

# **DataCollator**

In [29]:
from transformers import DataCollatorWithPadding
# Data collator for dynamic padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# **Tokenize the dataset**

In [30]:
# Tokenize the dataset

def preprocess_function(examples):
  inputs = tokenizer(examples['instruction'], truncation=True, padding=True)
  labels = [label_mapping[label] for label in examples['category']]
  # pprint(labels)
  inputs['labels'] = labels

  return inputs

train_procs_ds = train_dataset.map(preprocess_function, batched=True)
val_procs__ds = validation_dataset.map(preprocess_function, batched=True)

Map:   0%|          | 0/21497 [00:00<?, ? examples/s]

Map:   0%|          | 0/5375 [00:00<?, ? examples/s]

In [31]:
train_procs_ds.to_pandas().head()

Unnamed: 0,flags,instruction,category,intent,response,input_ids,token_type_ids,attention_mask,labels
0,BLM,I am trying to delete some items from order {{...,ORDER,change_order,Thank you for bringing this to our attention. ...,"[2, 31, 589, 749, 20, 27448, 109, 3755, 37, 38...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...",0
1,BELQ,i cannot set up my shipping address,SHIPPING,set_up_shipping_address,I appreciate that you're having difficulties s...,"[2, 31, 1967, 309, 71, 51, 7405, 3218, 3, 0, 0...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, ...",1
2,BIL,how do I check the current status of order {{O...,ORDER,track_order,Hello! Thank you for reaching out. I understan...,"[2, 184, 107, 31, 2631, 14, 866, 1782, 16, 389...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...",0
3,BLQ,want help checking when will my product arrive,DELIVERY,delivery_period,We understand your eagerness to track your pro...,"[2, 259, 448, 9886, 76, 129, 51, 2374, 6140, 3...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, ...",9
4,BCILMPQ,i lost my invoices from {{Person Name}} could ...,INVOICE,check_invoice,I'm following your concern about misplacing yo...,"[2, 31, 529, 51, 19, 13379, 18, 37, 13, 1, 727...","[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...","[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...",3


In [20]:
metric = evaluate.load("accuracy")

def compute_accuracy(eval_pred):
    logits, labels = eval_pred

    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

Downloading builder script:   0%|          | 0.00/4.20k [00:00<?, ?B/s]

In [32]:
# Define training arguments
training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
)

# Initialize Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_procs_ds,
    eval_dataset=val_procs__ds,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_accuracy  # Custom metrics function
)

# Train the model
trainer.train()

eval_results = trainer.evaluate()

print(eval_results)



Epoch,Training Loss,Validation Loss,Accuracy
1,0.016,0.008775,0.997953
2,0.0031,0.004994,0.999442
3,0.0021,0.002536,0.999628


{'eval_loss': 0.002536403015255928, 'eval_accuracy': 0.9996279069767442, 'eval_runtime': 10.9661, 'eval_samples_per_second': 490.146, 'eval_steps_per_second': 30.64, 'epoch': 3.0}


## **Evaluate the model**

In [33]:
# Evaluate the model
trainer.evaluate()

{'eval_loss': 0.002536403015255928,
 'eval_accuracy': 0.9996279069767442,
 'eval_runtime': 11.3331,
 'eval_samples_per_second': 474.275,
 'eval_steps_per_second': 29.648,
 'epoch': 3.0}

In [34]:

fine_tuned_model_name: str ="customer-support-categ_classification-albert_v2"
model.save_pretrained(fine_tuned_model_name, push_to_hub=True, private=False)

model.safetensors:   0%|          | 0.00/46.8M [00:00<?, ?B/s]

In [35]:
# Save the tokenizer to a directory
# tokenizer.save_pretrained(fine_tuned_model_name)
# Save the tokenizer to hub
tokenizer.push_to_hub(f"AIEnthusiast369/{fine_tuned_model_name}")

README.md:   0%|          | 0.00/5.17k [00:00<?, ?B/s]

CommitInfo(commit_url='https://huggingface.co/AIEnthusiast369/customer-support-categ_classification-albert_v2/commit/3583b62cc6129961e41a6a242e70520a0ab68c27', commit_message='Upload tokenizer', commit_description='', oid='3583b62cc6129961e41a6a242e70520a0ab68c27', pr_url=None, pr_revision=None, pr_num=None)

# **Testing fine tuned model**

In [41]:
def test_model(samples):
  for text in samples:
    predicted_class, confidence = classify_text(text)
    print(f"Text: {text}\nPredicted Class: {predicted_class} ({id_to_label[str(predicted_class)]}), Confidence: {confidence:.4f}\n")
    # if predicted_class in id_to_label:
    #   print(f"Text: {text}\nPredicted Class: {predicted_class} ({id_to_label[predicted_class]}), Confidence: {confidence:.4f}\n")
    # else:
    #   print(f"Text: {text}\nPredicted Class: {predicted_class}, Confidence: {confidence:.4f}\n")

In [42]:
# Test with some example texts
text_samples = [
    "I reqeust immediate refund",
    "I was billed incorrectly",
    "Where do I leave a tip",
    "Not worth the money, would not buy again. I want to cancel order.",
    "I would like to speak with the manager"
]

test_model(text_samples)

Text: I reqeust immediate refund
Predicted Class: 5 (REFUND), Confidence: 0.9999

Text: I was billed incorrectly
Predicted Class: 4 (PAYMENT), Confidence: 0.9433

Text: Where do I leave a tip
Predicted Class: 6 (FEEDBACK), Confidence: 0.9998

Text: Not worth the money, would not buy again. I want to cancel order.
Predicted Class: 0 (ORDER), Confidence: 0.9999

Text: I would like to speak with the manager
Predicted Class: 7 (CONTACT), Confidence: 0.9997

