# Runnable Notebook for Testing the trained model

- Model hosted over kaggle : https://www.kaggle.com/models/shah2001aayush/deberta_classification
- Dataset to generate Test data : https://www.kaggle.com/datasets/shah2001aayush/dataprocessed
- Use GPU for generating LLM final query
- **Important - Change HuggingFace Token below** - to access Mistral LLM model

Name  : **Aayush Shah**


email : 2001aayushshah@gmail.com


Contact : +91 8879090901

## Examples and flow of user query  -> prediction -> task_query_generation

<a href="https://ibb.co/PZtV4SXG"><img src="https://i.ibb.co/0jQzZw4y/Calendar-Prediction-Example.png" alt="Calendar-Prediction-Example" border="0"></a>
<a href="https://ibb.co/99fQHt7w"><img src="https://i.ibb.co/ccp4hLjb/Calendar-Prediction-Example2.png" alt="Calendar-Prediction-Example2" border="0"></a>
<a href="https://ibb.co/scwsWY1"><img src="https://i.ibb.co/X6sFYR5/email-Example-Pred.png" alt="email-Example-Pred" border="0"></a>

### Executing generated Task Query on Gmail Directly to fetch the desired emails

<a href="https://ibb.co/Fb51yXqB"><img src="https://i.ibb.co/wNSqP7rK/gmail-Search-Result.png" alt="gmail-Search-Result" border="0"></a>

# Query Classification with DeBERTa V3 - Testing

### Model Details
* **MODEL:** DeBERTa-v3-base
* **Architecture:** GPU P-100 (Kaggle)
* **Framework:** PyTorch, Transformers (Hugging Face library)
* **Transformer Model:** microsoft/deberta-v3-base
* **Tokenizer:** microsoft/deberta-v3-base
* **Loss Metric:** F1 score
* **Inference Metrics:** Precision, F1 score, Recall, Accuracy
* **Logging:** MLflow (Experiment Name: "Query_Classification")
* **Device:** CUDA
* **Post-Prediction Modification:** LLM - Mistral:7b
* **Time Taken:** 3-4 hours

### Why DeBERTa-v3-base?

* **DeBERTa V3 (by Microsoft):** The chosentransformer model is `microsoft/deberta-v3-base`.
* **Improved Contextual Understanding:** It introduces **disentangled attention**, which allows the model to better understand the relationships between words and their positions in a sentence, leading to improved contextual understanding.
* **Strong Classification Performance:** DeBERTa V3 is **ranked among the top** models on many classification benchmarks, indicating its effectiveness for this type of task.
* **Tokenizer: DebertaV2Tokenizer:** The associated tokenizer is `DebertaV2Tokenizer`.
* **Excellent Accuracy:** DeBERTa models often achieve **excellent accuracy**, frequently outperforming RoBERTa on classification tasks.
* Latest in comparsion to other transformer models  : Released in 2021 . compared to Bert , Roberta in 2019

In [1]:
#installation of required libraries
!pip install -q transformers datasets scikit-learn mlflow sentencepiece

[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m29.0/29.0 MB[0m [31m66.6 MB/s[0m eta [36m0:00:00[0m:00:01[0m00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m6.3/6.3 MB[0m [31m97.8 MB/s[0m eta [36m0:00:00[0m:00:01[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m193.6/193.6 kB[0m [31m11.9 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m114.9/114.9 kB[0m [31m6.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m85.0/85.0 kB[0m [31m5.2 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m65.5/65.5 kB[0m [31m3.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m700.2/700.2 kB[0m [31m31.0 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m95.2/95.2 kB[0m [31m6.6 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━

In [2]:
import pandas as pd
import numpy as np
import torch
from torch.utils.data import Dataset, DataLoader
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, precision_recall_fscore_support
from transformers import AutoTokenizer, AutoModelForSequenceClassification, Trainer, TrainingArguments
import mlflow
import mlflow.pytorch

2025-05-16 04:12:08.153896: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1747368728.384197      35 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1747368728.457288      35 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [3]:
# Configuration
MODEL_NAME = "microsoft/deberta-v3-base"
MAX_LEN = 32
BATCH_SIZE = 16
EPOCHS = 4
LEARNING_RATE = 2e-5

In [4]:
df = pd.read_csv("/kaggle/input/dataprocessed/data_preprocessed.csv")  # <-- Change path if needed

In [5]:
len(df)

942

In [6]:
#load DeBERTa Tokenizer
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, use_fast=False)

tokenizer_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/579 [00:00<?, ?B/s]

spm.model:   0%|          | 0.00/2.46M [00:00<?, ?B/s]

## Use below class to prepare data  for custom testing data

In [7]:

class QueryDataset(Dataset):
    def __init__(self, queries, labels, tokenizer, max_len):
        self.queries = queries
        self.labels = labels
        self.tokenizer = tokenizer
        self.max_len = max_len

    def __len__(self):
        return len(self.queries)

    def __getitem__(self, item):
        query = str(self.queries[item])
        label = int(self.labels[item])

        encoding = self.tokenizer(
            query,
            truncation=True,
            padding='max_length',
            max_length=self.max_len,
            return_tensors='pt'
        )

        return {
            'input_ids': encoding['input_ids'].flatten(),
            'attention_mask': encoding['attention_mask'].flatten(),
            'labels': torch.tensor(label, dtype=torch.long)
        }

In [8]:
train_texts, temp_texts, train_labels, temp_labels = train_test_split(df['Query'], df['label'], test_size=0.3, stratify=df['label'], random_state=42)

train_dataset = QueryDataset(train_texts.values, train_labels.values, tokenizer, MAX_LEN)
test_dataset = QueryDataset(temp_texts.values, temp_labels.values, tokenizer, MAX_LEN)

In [9]:
print(len(train_dataset))
print(len(test_dataset))

659
283


In [10]:
def compute_metrics(pred):
    labels = pred.label_ids
    preds = np.argmax(pred.predictions, axis=1)
    precision, recall, f1, _ = precision_recall_fscore_support(labels, preds, average='binary')
    acc = accuracy_score(labels, preds)
    return {"accuracy": acc, "f1": f1, "precision": precision, "recall": recall}

In [11]:
import torch

In [12]:
# preprocessing function same as the one used while training the model
import re
from nltk.corpus import stopwords, wordnet
from nltk.stem import WordNetLemmatizer
from textblob import TextBlob
import nltk

nltk.download("stopwords")
nltk.download("wordnet")
nltk.download("omw-1.4")

# stop_words = set(stopwords.words('english'))
retain_words = {
    "what", "how", "when", "where", "who", "which", "whom", "whose", "why",
    "can", "should", "would", "could", "do", "did", "does", "will", "may",
    "show", "find", "search", "get", "have"
}

# Base stopwords from NLTK
default_stop_words = set(stopwords.words("english"))

# Final custom stopword list
custom_stop_words = default_stop_words - retain_words

# lemmatizer = WordNetLemmatizer()
def preprocess_query(text):
    if not isinstance(text, str):
        return ""

    # 1. Lowercase
    text = text.lower()

    # 2. Spell correction using TextBlob
    text = str(TextBlob(text).correct())

    # 3. Remove punctuation except '@'
    text = re.sub(r"[^\w\s@.]", "", text)

    # 4. Remove stop words
    words = text.split()
    # words = [
    #     # lemmatizer.lemmatize(word)
    #     for word in words
    #     if word not in custom_stop_words
    # ]

    filtered_words = [word for word in words if word not in custom_stop_words]

    # 5. Join and strip
    text = " ".join(filtered_words).strip()

    return text

[nltk_data] Downloading package stopwords to /usr/share/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!
[nltk_data] Downloading package wordnet to /usr/share/nltk_data...
[nltk_data]   Package wordnet is already up-to-date!
[nltk_data] Downloading package omw-1.4 to /usr/share/nltk_data...


### Setting up Mistral LLM Model for executable query generation

In [None]:
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch
import os


from huggingface_hub import login

# login() # Replace with your actual token

# ⚙️ Step 3: Hardware detection
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Running on: {device.upper()}")

# ✅ Use fp16 for GPU, fallback to fp32 for CPU
precision = torch.float16 if device == "cuda" else torch.float32

# 🧠 Step 4: Load mistral model
model_id = "mistralai/Mistral-7B-Instruct-v0.1"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=precision,
    device_map="auto"
)

generator = pipeline("text-generation", max_new_tokens=256, model=model, tokenizer=tokenizer)

Running on: CUDA


tokenizer_config.json:   0%|          | 0.00/2.10k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/414 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/571 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Fetching 2 files:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/9.94G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

Device set to use cuda:0


In [14]:
def generate_email_search_query(email_text):
    """
    Generates an advanced search query using an LLM.

    Args:
        email_text (str): The text of the email.
        classification_label (str): The classification label (e.g., "calendar", "email").

    Returns:
        str: The generated search query.
    """
    #  <----------------------- LLM Interaction --------------------------->
    #  This is the core part where you'd use Mistral.
    #  The prompt should guide Mistral to generate a good search query.
    prompt = f"""
    You are an AI assistant helping a user search their emails.  Generate advanced search queries for Gmail based on the user's request.  Use Gmail's search operators (from:, to:, subject:, has:, etc.) to make the search as precise as possible.  Here are some examples:

Text Query: Find my email from John about the project.
Gmail Advanced Search Query: from:john subject:project

Text Query: Show me the email my boss sent last week.
Gmail Advanced Search Query: from:boss@example.com after:2024/05/08 before:2024/05/15

Text Query: I need the document attached to Mary's email.
Gmail Advanced Search Query: from:mary has:attachment has:document

Text Query: Search for the meeting agenda.
Gmail Advanced Search Query: subject:meeting agenda

Text Query: Find the email where I was copied.
Gmail Advanced Search Query: cc:me@example.com OR bcc:me@example.com

Text Query: Show me the email about the party from Susan before Friday.
Gmail Advanced Search Query: from:susan subject:party before:2024/05/10

Text Query: Find the email with the spreadsheet.
Gmail Advanced Search Query: has:spreadsheet

Text Query: Show me the email from the mailing list.
Gmail Advanced Search Query: list:info@example.org

Text Query: I'm looking for the email with the presentation.
Gmail Advanced Search Query: has:presentation

Text Query: Find the email about the "urgent" report.
Gmail Advanced Search Query: subject:"urgent report"

Text Query: Show me emails from March 6th, 2023 mentioning 'Google'
Gmail Advanced Search Query: after:2025/05/12 Google

Text Query: Show me emails from January 11th, 2021 with word 'DRDO' in it
Gmail Advanced Search Query: after:2021/01/11 DRDO

Based  on the above examples generate  Gmail Advanced Search Query for the below Text Query . Answer 1 short search query and nothing else.
Text Query: {email_text}
Gmail Advanced Search Query:"""

    
    response = generator(prompt,max_new_tokens=500, temperature=0.1)[0]["generated_text"]
    return response

In [15]:

def generate_calendar_search_query(query):
    """
    Generates an advanced search query using an LLM.

    Args:
        email_text (str): The text of the email.
        classification_label (str): The classification label (e.g., "calendar", "email").

    Returns:
        str: The generated search query.
    """
    #  <----------------------- LLM Interaction --------------------------->
    #  This is the core part where you'd use Mistral.
    #  The prompt should guide Mistral to generate a good search query.
    
    prompt = f"""
    You are a helpful assistant designed to generate calendar search queries based on user requests.  Each search query should have two parts:

* **Action:** The action to perform (e.g., "search", "create", "reschedule", "cancel").
* **Search:** The details to use for the search (e.g., "meeting with John", "appointment with Sarah").

Here are some examples of user requests and the corresponding search queries:

Example 1:
User Request: Find my meeting with John.
Search Query: Action: search, Search: meeting with John

Example 2:
User Request: Show me my appointments for tomorrow.
Search Query: Action: search, Search: appointments tomorrow

Example 3:
User Request: When is my next appointment with the doctor?
Search Query: Action: search, Search: next doctor appointment

Example 4:
User Request: Reschedule my meeting with the team to Friday.
Search Query: Action: reschedule, Search: team meeting Friday

Example 5:
User Request: Find the meeting about the project on June 10th.
Search Query: Action: search, Search: meeting about the project June 10th

Example 6:
User Request: Show me all my meetings next week.
Search Query: Action: search, Search: meetings next week

Example 7:
User Request: Cancel my appointment with Sarah.
Search Query: Action: cancel, Search: appointment Sarah

Example 8:
User Request: Find the event scheduled by the "XYZ project group".
Search Query: Action: search, Search: event scheduled by XYZ project group

Example 9:
User Request: Show me all my events in the month of December.
Search Query: Action: search, Search: events December

Example 10:
User Request: Find the event called "conference" that occurs in the next 3 days
Search Query: Action: search, Search: conference next 3 days

Example 11:
User Request: Create a meeting with John tomorrow at 2pm
Search Query: Action: create, Search: meeting with John tomorrow at 2pm

Example 12:
User Request: Update my meeting with John to next Monday
Search Query: Action: update, Search: meeting with John to next Monday

Based  on the above examples generate Search Query for the below Text Query . Answer 1 short search query and nothing else.
Text Query: {query}
Calendar Search Query:"""
    response = generator(prompt,max_new_tokens=500, temperature=0.1)[0]["generated_text"]
    return response

In [16]:
import re

def extract_after_last_search_query(text):
    pattern = r"Search Query:"
    matches = list(re.finditer(pattern, text))

    if matches:
        last_match = matches[-1]
        return text[last_match.end():].strip()
    else:
        return None

In [25]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support, confusion_matrix
def get_prediction(text, model, tokenizer, max_len, device='cuda',gen_task_query=False):
    """
    Predicts the sentiment of a given text using the trained model.

    Args:
        text (str): The input text to predict the sentiment for.
        model (torch.nn.Module): The trained PyTorch model.
        tokenizer (transformers.PreTrainedTokenizer): The tokenizer used for the model.
        max_len (int): The maximum sequence length.
        device (str, optional): The device to use ('cuda' or 'cpu'). Defaults to 'cuda'.

    Returns:
        dict: A dictionary containing the predicted sentiment ('Positive' or 'Negative')
              and its corresponding probability.
    """
    user_query = text
    text = preprocess_query(text)  # Apply the preprocessing here
    model.eval()  # Set the model to evaluation mode
    encoding = tokenizer(
        text,
        truncation=True,
        padding='max_length',
        max_length=max_len,
        return_tensors='pt'
    )
    encoding = {k: v.to(device) for k, v in encoding.items()}

    with torch.no_grad():  # Disable gradient calculation for inference
        outputs = model(**encoding)
        logits = outputs.logits

    sigmoid = torch.nn.Sigmoid()
    probs = sigmoid(logits.squeeze().cpu()).numpy()  # Get probabilities, move to CPU, convert to numpy

    label = np.argmax(probs, axis=-1)
    task_query = ""
    # gen_task_query = True
    if(gen_task_query == True):
        if label == 1:
            task_query = generate_calendar_search_query(user_query)
            task_query = extract_after_last_search_query(task_query)
        else:
            task_query = generate_email_search_query(user_query)
            task_query = extract_after_last_search_query(task_query)
    
    if label == 1:
        return {
            'prediction': 'Calendar',
            'probability': probs[1],
            'task_query' : task_query
        }
    else:
        return {
            'prediction': 'Email',
            'probability': probs[0],
            'task_query' : task_query
        }

def display_predictions(model, test_dataset, tokenizer, max_len, device='cuda',gen_task_query=False):
    """
    Displays the predictions for the queries in the test set.

    Args:
        model (torch.nn.Module): The trained PyTorch model.
        test_dataset (torch.utils.data.Dataset): The test dataset.
        tokenizer (transformers.PreTrainedTokenizer): The tokenizer used for the model.
        max_len (int): The maximum sequence length.
        device (str, optional): The device to use ('cuda' or 'cpu'). Defaults to 'cuda'.
    """
    model.to(device)  # Ensure model is on the correct device
    all_preds = []
    all_labels = []

    for i in range(len(test_dataset)):
        
        sample = test_dataset[i]
        text = test_dataset.queries[i]  # Access the original text from the dataset
        if(gen_task_query):
            prediction = get_prediction(text, model, tokenizer, max_len, device,True)
        else:
            prediction = get_prediction(text, model, tokenizer, max_len, device)
        ground_truth = test_dataset.labels[i]  # Access the ground truth label.
        all_labels.append(ground_truth)

        if prediction['prediction'] == 'Calendar':
            all_preds.append(1)
        else:
            all_preds.append(0)
        if i < 20:
            print(f"Query: {text}")
            print(f"Predicted Sentiment: {prediction['prediction']}")
            print(f"Probability: {prediction['probability']:.4f}")  # Format probability
            print(f"Ground Truth: {ground_truth}")
            if gen_task_query:
                print(f"Ground Truth: {prediction['task_query']}")
            print("-" * 20)
    
    # Calculate and print metrics after the loop
    accuracy = accuracy_score(all_labels, all_preds)
    precision, recall, f1, _ = precision_recall_fscore_support(all_labels, all_preds, average='weighted')
    conf_matrix = confusion_matrix(all_labels, all_preds)

    print(f"Overall Accuracy: {accuracy:.4f}")
    print(f"Overall Precision: {precision:.4f}")
    print(f"Overall Recall: {recall:.4f}")
    print(f"Overall F1 Score: {f1:.4f}")
    print("Confusion Matrix:")
    print(conf_matrix)

### Loading trained Model to generate prediction for test data

In [26]:
model = AutoModelForSequenceClassification.from_pretrained('/kaggle/input/deberta_classification/pytorch/default/1/').to('cuda') # Or 'cpu'
    # 2.  Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/deberta-v3-base", use_fast=False)

# Generating prediction for test_data
- ## We are not generating executable queries for test data . If you wish to do so use below function setting gen_task_queries=True
-   Use - **display_predictions(model, test_dataset, tokenizer, MAX_LEN,'cuda',True)**
- We are printing first 20 predictions of test_data for display
- Below entire results of test data with confusion matrix is printed

In [27]:
display_predictions(model, test_dataset, tokenizer, MAX_LEN)

Query: search meeting agenda
Predicted Sentiment: Email
Probability: 0.9353
Ground Truth: 0
--------------------
Query: what appointments next tuesday
Predicted Sentiment: Calendar
Probability: 0.9167
Ground Truth: 1
--------------------
Query: email where tenders email address domain common one like email yakov.
Predicted Sentiment: Email
Probability: 0.9651
Ground Truth: 0
--------------------
Query: show followup responses sent
Predicted Sentiment: Email
Probability: 0.9665
Ground Truth: 0
--------------------
Query: find email companies asking shop
Predicted Sentiment: Email
Probability: 0.9697
Ground Truth: 0
--------------------
Query: hey assistant what coming deadline week
Predicted Sentiment: Calendar
Probability: 0.8534
Ground Truth: 1
--------------------
Query: where conference call clients located friday 2 pm
Predicted Sentiment: Calendar
Probability: 0.9191
Ground Truth: 1
--------------------
Query: what came weekend
Predicted Sentiment: Email
Probability: 0.9110
Ground 

### Below are examples of generating individual predictions with ot without final executable query

- Below we are generating task_query via Mistral LLM

In [21]:
print(get_prediction("What are my upcoming conferences or offsite events?", model, tokenizer, MAX_LEN,'cuda',True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'prediction': 'Calendar', 'probability': 0.9089517, 'task_query': 'Action: search, Search: upcoming conferences or offsite events'}


In [30]:
print(get_prediction("cancel birthday party schedule tomorrow", model, tokenizer, MAX_LEN,'cuda',True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'prediction': 'Calendar', 'probability': 0.91840917, 'task_query': 'Action: cancel, Search: birthday party schedule tomorrow'}


- Below we are **not** generating task_query setting **gen_task_query = False**

In [22]:
print(get_prediction("What are my upcoming conferences or offsite events?", model, tokenizer, MAX_LEN,'cuda',False))

{'prediction': 'Calendar', 'probability': 0.9089517, 'task_query': ''}


- Below we are generating task_query setting **gen_task_query = True**

In [21]:
print(get_prediction("Show me emails from May 8th, 2025 mentioning 'juspay'", model, tokenizer, MAX_LEN,'cuda',True))

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


{'prediction': 'Email', 'probability': 0.9643836, 'task_query': 'after:2025/05/08 juspay'}


### Further , I have created a StreamLit app for interacting with the model

<a href="https://ibb.co/0p6HfHy3"><img src="https://i.ibb.co/ymZH8HFJ/image.png" alt="image" border="0"></a>

### LLM Model not enabled in streamlit app for now .

## Below I am executing the generated task_query in gmail


<a href="https://ibb.co/Fb51yXqB"><img src="https://i.ibb.co/wNSqP7rK/gmail-Search-Result.png" alt="gmail-Search-Result" border="0"></a>

In [1]:
### Further , I have created a StreamLit app for interacting with the model

<a href="https://ibb.co/0p6HfHy3"><img src="https://i.ibb.co/ymZH8HFJ/image.png" alt="image" border="0"></a>

### LLM Model not enabled in streamlit app for now .

SyntaxError: invalid syntax (1429803418.py, line 3)