# Refactored code for
* Setting up and running Ollama in Kaggle
* Downloading THUIAR dataset
* Zero-Shot Prompt
* Use LLM to classify intent from an input 'question' dataset
* To configure your file/folder paths, LLM, dataset, start_index and end_index for each run, please update the config.py file

This notebook will also be used as the base to test any fixes to the LLM intent classification pipeline.
* 2025.05.26: Updated results output file from JSON to Pickle, to store list of dictionaries. 1 dictionary contains the results for each record. Lists of dictionaries can be downloaded from multiple notebooks, then concatenated for analysis
* 2025.05.30: Update prompt and bulletpts_intent.
  * Check if dataset contains 'oos' (out of scope) category
  * If dataset has no 'oos' (out of scope) category, turn 1 category into 'oos'. Use updated categories in bulletpts_intent. Also update prompt instructions on when to classify an example as 'oos'
  * **This force_oos fix is implemented in [notebook 01E](https://www.kaggle.com/code/kaiquanmah/01e-kaggle-ollama-llama3-2-w-force-oos?scriptVersionId=242648764)**
* 2025.05.30: Add pydantic schema with enums
  * From an analysis of errors, the model previously had a 45% average accuracy rate across categories. The model predicted a set of categories outside of what we gave it in 'bulletpts_intent'
  * To fix this, we will try to implement a pydantic schema solution for the model to only predict categories from the allowed list of categories ('bulletpts_intent')
* 2025.05.30: Set Ollama chat temperature to 0
  * Previously, we used the default temperature of 0.8, which might have caused the model to predict categories we did not provide to it ([Reading](https://docs.spring.io/spring-ai/reference/api/chat/ollama-chat.html))
  * **The pydantic schema and temperature fixes are implemented in [notebook 01F](https://www.kaggle.com/code/kaiquanmah/01f-kaggle-ollama-llama3-2-w-pydantic-schema)**
* 2025.06.03:
  1. **Remove 'oos' from `bulletpts_intent` input into prompt**, to be consistent with the team's approach when exploring embedding approaches to classify 'oos' examples. **Keep 'oos' in pydantic enums/Literal (for LLM to output 'oos' as an allowed class value)**
  2. **Remove 0.99 when defining the prompt format - to avoid anchoring LLM on outputting confidence of 0.99**
  3. **Added ability for user to define which classes are 'oos'**
  * **These 3 fixes are in [notebook 01G](https://www.kaggle.com/code/kaiquanmah/01g-kaggle-ollama-llama3-2-oos-update)**

In [1]:
# 1. create dirs if they do not exist
import os
os.makedirs('/kaggle/working/src', exist_ok=True)
os.makedirs('/kaggle/working/prediction', exist_ok=True)

In [2]:
%%writefile /kaggle/working/src/setup_ollama.py
import os
import subprocess
import time
from src.config import Config # absolute import

# 1. Install Ollama (if not already installed)
try:
    # Check if Ollama is already installed
    subprocess.run(["ollama", "--version"], capture_output=True, check=True)
    print("Ollama is already installed.")
except FileNotFoundError:
    print("Installing Ollama...")
    subprocess.run("curl -fsSL https://ollama.com/install.sh  | sh", shell=True, check=True)

# 2. Start Ollama server in the background
print("Starting Ollama server...")
process = subprocess.Popen("ollama serve", shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

# Wait for the server to initialize
time.sleep(5)


# 3. Pull the model
model_name = Config.model_name
print(f"Pulling {model_name} model...")
subprocess.run(["ollama", "pull", model_name], check=True)

# 4. Install Python client
subprocess.run(["pip", "install", "ollama"], check=True)

print("Ollama setup complete!")

Writing /kaggle/working/src/setup_ollama.py


In [3]:
%%writefile requirements.txt
pandas
# numpy
pydantic
typing
# enum
huggingface-hub

Writing requirements.txt


In [4]:
%%writefile /kaggle/working/src/__init__.py
# folder for config

Writing /kaggle/working/src/__init__.py


In [5]:
%%writefile /kaggle/working/src/config.py
class Config:
    target_dir = '/kaggle/working/data' # data directory to clone into
    cloned_data_dir = target_dir + '/data'
    prediction_dir = target_dir + '/prediction'
    dataset_name = 'banking' # options: 'banking', 'stackoverflow', 'oos'
    idx2label_target_dir = '/kaggle/working/idx2label'
    idx2label_filename_hf = 'banking77_idx2label.csv' # options: banking77_idx2label.csv, stackoverflow_idx2label.csv, clinc150_oos_idx2label.csv
    list_oos_idx = [5, 6, 8, 17, 27, 33, 36, 38, 39, 45, 48, 49, 51, 53, 61, 62, 64, 65, 70] # gathered from within the team - for reproducible, comparable results with other open intent classification approaches
    model_name = 'qwen3:8b'
    start_index=0
    end_index=None # eg: 10 or None (use end_index=None to process the full dataset)
    log_every_n_examples=1000 # 2
    force_oos = True  # NEW: Add flag to force dataset to contain 'oos' class for the last class value (sorted alphabetically), if 'oos' class does not exist in the original dataset
    

Writing /kaggle/working/src/config.py


In [6]:
%%writefile download_dataset.py
from src.config import Config
import os
import subprocess
target_dir = Config.target_dir # data directory to clone into
cloned_data_dir = Config.cloned_data_dir

# Create target directory if it doesn't exist
os.makedirs(target_dir, exist_ok=True)

# do not clone dataset repo if cloned data folder exists
if os.path.exists(cloned_data_dir):
    print("Dataset has already been downloaded. If this is incorrect, please delete the Adaptive-Decision-Boundary 'data' folder.")
else:
    # Clone the repository
    subprocess.run(["git",
                    "clone",
                    "https://github.com/thuiar/Adaptive-Decision-Boundary.git",
                    target_dir
                   ])

Writing download_dataset.py


In [7]:
%%writefile predict_class.py
from src.config import Config
import pandas as pd
import os
import ollama
import json
import pickle
import time
from pydantic import BaseModel
from typing import Literal
# from enum import Enum
from huggingface_hub import snapshot_download



# Config.target_dir
# Config.cloned_data_dir'
# Config.dataset_name
# Config.model_name
# Config.start_index
# Config.end_index
# Config.log_every_n_examples


#######################
# load data
#######################
def load_data(data_dir):
    """Loads train, dev, and test datasets from a specified directory."""

    main_df = pd.DataFrame()
    for split in ['train', 'dev', 'test']:
        file_path = os.path.join(data_dir, f'{split}.tsv')
        if os.path.exists(file_path):
          try:
            df = pd.read_csv(file_path, sep='\t')
            df['dataset'] = os.path.basename(data_dir)
            df['split'] = split
            main_df = pd.concat([main_df, df], ignore_index=True)
          except pd.errors.ParserError as e:
            print(f"Error parsing {file_path}: {e}")
            # Handle the error appropriately, e.g., skip the file, log the error, etc.
        else:
            print(f"Warning: {split}.tsv not found in {data_dir}")
    return main_df

all_data = pd.DataFrame()

data_dir = os.path.join(Config.cloned_data_dir, Config.dataset_name)
if os.path.exists(data_dir):
  df = load_data(data_dir)
  print(f"Loaded dataset into dataframe: {Config.dataset_name}")
  print(f"Dimensions: {df.shape}")
  print(f"Col names: {df.columns}")
else:
  print(f"Warning: Directory {data_dir} not found.")
#######################



#######################
# unique intents
#######################
sorted_intent = list(sorted(df.label.unique()))
print("="*80)
print(f"Original dataset intents: {sorted_intent}")
print(f"Number of original intents: {len(sorted_intent)}\n")


# 2025.06.03
# New OOS approach - get 25/50/75% of class indexes for each dataset within the team (for reproducibility and comparable results)
# Change their class labels to 'oos'
snapshot_download(repo_id="KaiquanMah/open-intent-query-classification", repo_type="space", allow_patterns="*_idx2label.csv", local_dir=Config.idx2label_target_dir)
idx2label_filepath = Config.idx2label_target_dir + '/dataset_idx2label/' + Config.idx2label_filename_hf
idx2label = pd.read_csv(idx2label_filepath)
idx2label_oos = idx2label[idx2label.index.isin(Config.list_oos_idx)]
idx2label_oos.reset_index(drop=True, inplace=True)
print("="*80)
print("Original intents to convert to OOS class")
print(idx2label_oos)
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}\n")

oos_labels = idx2label_oos['label'].values
list_sorted_intent_aft_conversion = ['oos' if intent.lower() in oos_labels else intent for intent in sorted_intent]
list_sorted_intent_aft_conversion_deduped = sorted(set(list_sorted_intent_aft_conversion))
print("="*80)
print("Unique intents after converting some to OOS class")
print(list_sorted_intent_aft_conversion_deduped)
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}\n")



# unique intents - from set to bullet points (to use in prompts)
# bulletpts_intent = "\n".join(f"- {category}" for category in set_intent)
# 2025.06.03: do not show 'oos' in the prompt (to avoid leakage of 'oos' class)
bulletpts_intent = "\n".join(f"- {category}" for category in list_sorted_intent_aft_conversion_deduped if category and (category!='oos'))

# 2025.06.04: fix adjustment if 'oos' is already in the original dataset
int_oos_in_orig_dataset = int('oos' in idx2label.label.values)
adjust_if_oos_not_in_orig_dataset = [0 if int_oos_in_orig_dataset == 1 else 1][0]

print("="*80)
print("sanity check")
print(f"Number of original intents: {len(sorted_intent)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset): {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset}")
print(f"Number of original intents to convert to OOS class: {len(idx2label_oos)}")
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}")
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset) - converted classes: {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)}")
print(f"Numbers match: {(len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)) == len(list_sorted_intent_aft_conversion_deduped)}")
print("Prepared unique intents")
#######################




#######################
# Enforce schema on the model (e.g. allowed list of predicted categories)
#######################

class IntentSchema(BaseModel):
    # dynamically unpack list of categories for different dataset(s)
    category: Literal[*list_sorted_intent_aft_conversion_deduped]
    confidence: float
    
#######################




#######################
# Prompt
#######################
# prompt 2 with less information/compute, improve efficiency
def get_prompt(dataset_name, split, question, categories):
    
    prompt = f'''
You are an expert in understanding and identifying what users are asking you.

Your task is to analyze an input query from a user.
Then assign the most appropriate category to the query from a predefined list below:
{categories}

If you are unable to find the most appropriate category, please assign to the 'oos' (i.e. out of scope) category.

===============================

Question: {question}

===============================

Provide your final classification in **valid JSON format** with the following structure:
{{
  "category": "your_chosen_category_name",
  "confidence": confidence_level_rounded_to_the_nearest_2_decimal_places
}}


Ensure the JSON has:
- Opening and closing curly braces
- Double quotes around keys and string values
- Confidence as a number (not a string), with maximum 2 decimal places

Do not include any explanations or extra text.
            '''
    return prompt



#######################


#######################
# Model on 1 Dataset
#######################
# Save a list of dictionaries 
# containing a dictionary for each record's
# - predicted category
# - confidence level and
# - original dataframe values

def predict_intent(model_name, df, categories, start_index=0, end_index=None, log_every_n_examples=100):
    start_time = time.time()
    results = []  # Store processed results
    
    # Slice DataFrame based on start/end indices
    if end_index is None:
        subset_df = df.iloc[start_index:]
    else:
        subset_df = df.iloc[start_index:end_index+1]
    
    total_rows = len(subset_df)
    
    for row in subset_df.itertuples():
        prompt = get_prompt(row.dataset, row.split, row.text, categories)
        if row.Index == 0:
            print("Example of how prompt looks, for the 1st example in this subset of data")
            print(prompt)

            print("Example of how IntentSchema looks")
            print(IntentSchema.model_json_schema())
        
        
        try:
            response = ollama.chat(model=model_name, 
                                   messages=[
                                                {'role': 'user', 'content': prompt}
                                            ],
                                   format = IntentSchema.model_json_schema(),
                                   options = {'temperature': 0},  # Set temperature to 0 for a more deterministic output
                                  )
            msg = response['message']['content']
            parsed = json.loads(msg)

            
            # Safely extract keys with defaults - resolve parsing error
            # maybe LLM did not output a particular key-value pair
            category = parsed.get('category', 'error')
            confidence = parsed.get('confidence', 0.0)
            parsed = {'category': category, 'confidence': confidence}
        except (json.JSONDecodeError, KeyError, Exception) as e:
            parsed = {'category': 'error', 'confidence': 0.0}
        
        # Combine original row data with predictions
        results.append({
            "Index": row.Index,
            "text": row.text,
            "label": row.label,
            "dataset": row.dataset,
            "split": row.split,
            "predicted": parsed['category'],
            "confidence": parsed['confidence']
        })
        
        # Log progress
        if row.Index % log_every_n_examples == 0:
            elapsed_time = time.time() - start_time
            
            avg_time_per_row = elapsed_time / (row.Index - start_index + 1)
            remaining_rows = total_rows - (row.Index - start_index + 1)
            eta = avg_time_per_row * remaining_rows
            
            print(f"Processed original df idx {row.Index} (subset row {row.Index - start_index}) | "
                  f"Elapsed: {elapsed_time:.2f}s | ETA: {eta:.2f}s")
    
    return results  # Return list of dictionaries
    

print(f"Starting intent classification using {Config.model_name}")
subset_results = predict_intent(Config.model_name, 
                                df, 
                                bulletpts_intent, 
                                start_index = Config.start_index, 
                                end_index = Config.end_index,
                                log_every_n_examples = Config.log_every_n_examples)



# update end_index for filename (if None is used for the end of the df)
# Get the last index of the DataFrame
last_index = df.index[-1] 
# Use last index if Config.end_index is None
end_index = Config.end_index if Config.end_index is not None else last_index



# 2025.05.23 changed from JSON to PKL
# because we are saving list of dictionaries
# Save to PKL
# 2025.06.04 explore changing back to JSON
# with open(f'results_{Config.model_name}_{Config.dataset_name}_{Config.start_index}_{end_index}.pkl', 'wb') as f:
#     pickle.dump(subset_results, f)
with open(f'results_{Config.model_name}_{Config.dataset_name}_{Config.start_index}_{end_index}.json', 'w') as f:
    json.dump(subset_results, f, indent=2)

print("Completed intent classification")


#######################


Writing predict_class.py


In [8]:
%%writefile /kaggle/working/main.py
import subprocess
import sys


# 1. Install libraries from requirements.txt
print("Installing dependencies...")
subprocess.run([sys.executable, "-m", "pip", "install", "-r", "/kaggle/working/requirements.txt"], check=True)

# 2. Run setup_ollama.py
print("Starting Ollama setup...")
# subprocess.run(["python3", "/kaggle/working/src/setup_ollama.py"], check=True)
print("Starting Ollama setup...")
subprocess.run(
    ["python3", "-m", "src.setup_ollama"],  # Run as a module
    cwd="/kaggle/working",  # Set working directory to parent of 'src'
    check=True
)

# 3. Run download_dataset.py
print("Downloading dataset...")
subprocess.run(["python3", "/kaggle/working/download_dataset.py"], check=True)

# 4. Run predict_class.py
print("Running prediction script...")
subprocess.run(["python3", "/kaggle/working/predict_class.py"], check=True)

Writing /kaggle/working/main.py


# Model on subset of examples

In [9]:
!python3 /kaggle/working/main.py

Installing dependencies...
Collecting typing (from -r /kaggle/working/requirements.txt (line 4))
  Downloading typing-3.7.4.3.tar.gz (78 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m78.6/78.6 kB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: typing
  Building wheel for typing (setup.py) ... [?25l[?25hdone
  Created wheel for typing: filename=typing-3.7.4.3-py3-none-any.whl size=26304 sha256=fdecfd68f6c7e9a660d19362c96f16fc67fd7e8b6665a6d8ef223163aa32e381
  Stored in directory: /root/.cache/pip/wheels/9d/67/2f/53e3ef32ec48d11d7d60245255e2d71e908201d20c880c08ee
Successfully built typing
Installing collected packages: typing
Successfully installed typing-3.7.4.3
Starting Ollama setup...
Starting Ollama setup...
Installing Ollama...
>>> Installing ollama to /usr/local
>>> Downloading Linux amd64 bundle
#####################################################

# Sanity check folders

In [10]:
!cd /kaggle/working/ && ls -la

total 3684
drwxr-xr-x 6 root root    4096 Jun  4 17:57 .
drwxr-xr-x 8 root root    4096 Jun  4 13:40 ..
drwxr-xr-x 7 root root    4096 Jun  4 13:42 data
-rw-r--r-- 1 root root     695 Jun  4 13:41 download_dataset.py
drwxr-xr-x 4 root root    4096 Jun  4 13:42 idx2label
-rw-r--r-- 1 root root     846 Jun  4 13:41 main.py
---------- 1 root root  430833 Jun  4 17:57 __notebook__.ipynb
-rw-r--r-- 1 root root   10337 Jun  4 13:41 predict_class.py
drwxr-xr-x 2 root root    4096 Jun  4 13:41 prediction
-rw-r--r-- 1 root root      54 Jun  4 13:41 requirements.txt
-rw-r--r-- 1 root root 3285935 Jun  4 17:57 results_qwen3:8b_banking_0_13082.json
drwxr-xr-x 3 root root    4096 Jun  4 13:41 src


In [11]:
!cd /kaggle/working/src && ls -la

total 24
drwxr-xr-x 3 root root 4096 Jun  4 13:41 .
drwxr-xr-x 6 root root 4096 Jun  4 17:57 ..
-rw-r--r-- 1 root root 1016 Jun  4 13:41 config.py
-rw-r--r-- 1 root root   20 Jun  4 13:41 __init__.py
drwxr-xr-x 2 root root 4096 Jun  4 13:41 __pycache__
-rw-r--r-- 1 root root  965 Jun  4 13:41 setup_ollama.py


In [12]:
!cd /kaggle/working/data/data && ls -la

total 20
drwxr-xr-x 5 root root 4096 Jun  4 13:42 .
drwxr-xr-x 7 root root 4096 Jun  4 13:42 ..
drwxr-xr-x 2 root root 4096 Jun  4 13:42 banking
drwxr-xr-x 2 root root 4096 Jun  4 13:42 oos
drwxr-xr-x 2 root root 4096 Jun  4 13:42 stackoverflow


# idx2label_oos examples

In [13]:
pip install huggingface-hub

Note: you may need to restart the kernel to use updated packages.


In [14]:
from huggingface_hub import snapshot_download
snapshot_download(repo_id="KaiquanMah/open-intent-query-classification", repo_type="space", allow_patterns="*_idx2label.csv", local_dir='/kaggle/working/idx2label')

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

'/kaggle/working/idx2label'

In [15]:
import pandas as pd
idx2label = pd.read_csv('/kaggle/working/idx2label/dataset_idx2label/banking77_idx2label.csv')
idx2label

Unnamed: 0,index,label
0,0,activate_my_card
1,1,age_limit
2,2,apple_pay_or_google_pay
3,3,atm_support
4,4,automatic_top_up
...,...,...
72,72,virtual_card_not_working
73,73,visa_or_mastercard
74,74,why_verify_identity
75,75,wrong_amount_of_cash_received


In [16]:
idx2label_oos = idx2label[idx2label.index.isin([5, 6, 8, 17, 27, 33, 36, 38, 39, 45, 48, 49, 51, 53, 61, 62, 64, 65, 70])]
idx2label_oos

Unnamed: 0,index,label
5,5,balance_not_updated_after_bank_transfer
6,6,balance_not_updated_after_cheque_or_cash_deposit
8,8,cancel_transfer
17,17,card_payment_wrong_exchange_rate
27,27,declined_transfer
33,33,exchange_via_app
36,36,fiat_currency_support
38,38,get_physical_card
39,39,getting_spare_card
45,45,pending_card_payment


In [17]:
print(idx2label_oos)

    index                                             label
5       5           balance_not_updated_after_bank_transfer
6       6  balance_not_updated_after_cheque_or_cash_deposit
8       8                                   cancel_transfer
17     17                  card_payment_wrong_exchange_rate
27     27                                 declined_transfer
33     33                                  exchange_via_app
36     36                             fiat_currency_support
38     38                                 get_physical_card
39     39                                getting_spare_card
45     45                              pending_card_payment
48     48                                  pending_transfer
49     49                                       pin_blocked
51     51                             refund_not_showing_up
53     53                            reverted_card_payment?
61     61                                   top_up_reverted
62     62                               

In [18]:
idx2label_oos.shape

(19, 2)

In [19]:
# percentage of OOS classes over ALL classes in the dataset
len(idx2label_oos)/len(idx2label)

0.24675324675324675

# Test changing output file back from PKL to JSON

In [20]:
import subprocess
import sys


# 1. Install libraries from requirements.txt
print("Installing dependencies...")
subprocess.run([sys.executable, "-m", "pip", "install", "-r", "/kaggle/working/requirements.txt"], check=True)

# 2. Run setup_ollama.py
print("Starting Ollama setup...")
# subprocess.run(["python3", "/kaggle/working/src/setup_ollama.py"], check=True)
print("Starting Ollama setup...")
subprocess.run(
    ["python3", "-m", "src.setup_ollama"],  # Run as a module
    cwd="/kaggle/working",  # Set working directory to parent of 'src'
    check=True
)

# 3. Run download_dataset.py
print("Downloading dataset...")
subprocess.run(["python3", "/kaggle/working/download_dataset.py"], check=True)


Installing dependencies...
Starting Ollama setup...
Starting Ollama setup...


[?2026h[?25l[1Gpulling manifest ⠋ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠼ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling a3de86cd1c13: 100% ▕██████████████████▏ 5.2 GB                         [K
pulling ae370d884f10: 100% ▕██████████████████▏ 1.7 KB                         [K
pulling d18a5cc71b84: 100% ▕██████████████████▏  11 KB                         [K
pulling cff3f395ef37: 100% ▕██████████████████▏  120 B                         [K
pulling 05a61d37b084: 100% ▕██████████████████▏  487 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


Ollama is already installed.
Starting Ollama server...
Pulling qwen3:8b model...
Ollama setup complete!
Downloading dataset...
Dataset has already been downloaded. If this is incorrect, please delete the Adaptive-Decision-Boundary 'data' folder.


CompletedProcess(args=['python3', '/kaggle/working/download_dataset.py'], returncode=0)

In [21]:
from src.config import Config
import pandas as pd
import os
import ollama
import json
import pickle
import time
from pydantic import BaseModel
from typing import Literal
# from enum import Enum
from huggingface_hub import snapshot_download



# Config.target_dir
# Config.cloned_data_dir'
# Config.dataset_name
# Config.model_name
# Config.start_index
# Config.end_index
# Config.log_every_n_examples


#######################
# load data
#######################
def load_data(data_dir):
    """Loads train, dev, and test datasets from a specified directory."""

    main_df = pd.DataFrame()
    for split in ['train', 'dev', 'test']:
        file_path = os.path.join(data_dir, f'{split}.tsv')
        if os.path.exists(file_path):
          try:
            df = pd.read_csv(file_path, sep='\t')
            df['dataset'] = os.path.basename(data_dir)
            df['split'] = split
            main_df = pd.concat([main_df, df], ignore_index=True)
          except pd.errors.ParserError as e:
            print(f"Error parsing {file_path}: {e}")
            # Handle the error appropriately, e.g., skip the file, log the error, etc.
        else:
            print(f"Warning: {split}.tsv not found in {data_dir}")
    return main_df

all_data = pd.DataFrame()

data_dir = os.path.join(Config.cloned_data_dir, Config.dataset_name)
if os.path.exists(data_dir):
  df = load_data(data_dir)
  print(f"Loaded dataset into dataframe: {Config.dataset_name}")
  print(f"Dimensions: {df.shape}")
  print(f"Col names: {df.columns}")
else:
  print(f"Warning: Directory {data_dir} not found.")
#######################



#######################
# unique intents
#######################
sorted_intent = list(sorted(df.label.unique()))
print("="*80)
print(f"Original dataset intents: {sorted_intent}")
print(f"Number of original intents: {len(sorted_intent)}\n")


# 2025.06.03
# New OOS approach - get 25/50/75% of class indexes for each dataset within the team (for reproducibility and comparable results)
# Change their class labels to 'oos'
snapshot_download(repo_id="KaiquanMah/open-intent-query-classification", repo_type="space", allow_patterns="*_idx2label.csv", local_dir=Config.idx2label_target_dir)
idx2label_filepath = Config.idx2label_target_dir + '/dataset_idx2label/' + Config.idx2label_filename_hf
idx2label = pd.read_csv(idx2label_filepath)
idx2label_oos = idx2label[idx2label.index.isin(Config.list_oos_idx)]
idx2label_oos.reset_index(drop=True, inplace=True)
print("="*80)
print("Original intents to convert to OOS class")
print(idx2label_oos)
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}\n")

oos_labels = idx2label_oos['label'].values
list_sorted_intent_aft_conversion = ['oos' if intent.lower() in oos_labels else intent for intent in sorted_intent]
list_sorted_intent_aft_conversion_deduped = sorted(set(list_sorted_intent_aft_conversion))
print("="*80)
print("Unique intents after converting some to OOS class")
print(list_sorted_intent_aft_conversion_deduped)
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}\n")



# unique intents - from set to bullet points (to use in prompts)
# bulletpts_intent = "\n".join(f"- {category}" for category in set_intent)
# 2025.06.03: do not show 'oos' in the prompt (to avoid leakage of 'oos' class)
bulletpts_intent = "\n".join(f"- {category}" for category in list_sorted_intent_aft_conversion_deduped if category and (category!='oos'))


# 2025.06.04: fix adjustment if 'oos' is already in the original dataset
int_oos_in_orig_dataset = int('oos' in idx2label.label.values)
adjust_if_oos_not_in_orig_dataset = [0 if int_oos_in_orig_dataset == 1 else 1][0]

print("="*80)
print("sanity check")
print(f"Number of original intents: {len(sorted_intent)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset): {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset}")
print(f"Number of original intents to convert to OOS class: {len(idx2label_oos)}")
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}")
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset) - converted classes: {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)}")
print(f"Numbers match: {(len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)) == len(list_sorted_intent_aft_conversion_deduped)}")
print("Prepared unique intents")





#######################




#######################
# Enforce schema on the model (e.g. allowed list of predicted categories)
#######################

class IntentSchema(BaseModel):
    # dynamically unpack list of categories for different dataset(s)
    category: Literal[*list_sorted_intent_aft_conversion_deduped]
    confidence: float
    
#######################




#######################
# Prompt
#######################
# prompt 2 with less information/compute, improve efficiency
def get_prompt(dataset_name, split, question, categories):
    
    prompt = f'''
You are an expert in understanding and identifying what users are asking you.

Your task is to analyze an input query from a user.
Then assign the most appropriate category to the query from a predefined list below:
{categories}

If you are unable to find the most appropriate category, please assign to the 'oos' (i.e. out of scope) category.

===============================

Question: {question}

===============================

Provide your final classification in **valid JSON format** with the following structure:
{{
  "category": "your_chosen_category_name",
  "confidence": confidence_level_rounded_to_the_nearest_2_decimal_places
}}


Ensure the JSON has:
- Opening and closing curly braces
- Double quotes around keys and string values
- Confidence as a number (not a string), with maximum 2 decimal places

Do not include any explanations or extra text.
            '''
    return prompt



Loaded dataset into dataframe: banking
Dimensions: (13083, 4)
Col names: Index(['text', 'label', 'dataset', 'split'], dtype='object')
Original dataset intents: ['Refund_not_showing_up', 'activate_my_card', 'age_limit', 'apple_pay_or_google_pay', 'atm_support', 'automatic_top_up', 'balance_not_updated_after_bank_transfer', 'balance_not_updated_after_cheque_or_cash_deposit', 'beneficiary_not_allowed', 'cancel_transfer', 'card_about_to_expire', 'card_acceptance', 'card_arrival', 'card_delivery_estimate', 'card_linking', 'card_not_working', 'card_payment_fee_charged', 'card_payment_not_recognised', 'card_payment_wrong_exchange_rate', 'card_swallowed', 'cash_withdrawal_charge', 'cash_withdrawal_not_recognised', 'change_pin', 'compromised_card', 'contactless_not_working', 'country_support', 'declined_card_payment', 'declined_cash_withdrawal', 'declined_transfer', 'direct_debit_payment_not_recognised', 'disposable_card_limits', 'edit_personal_details', 'exchange_charge', 'exchange_rate', 'exc

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Original intents to convert to OOS class
    index                                             label
0       5           balance_not_updated_after_bank_transfer
1       6  balance_not_updated_after_cheque_or_cash_deposit
2       8                                   cancel_transfer
3      17                  card_payment_wrong_exchange_rate
4      27                                 declined_transfer
5      33                                  exchange_via_app
6      36                             fiat_currency_support
7      38                                 get_physical_card
8      39                                getting_spare_card
9      45                              pending_card_payment
10     48                                  pending_transfer
11     49                                       pin_blocked
12     51                             refund_not_showing_up
13     53                            reverted_card_payment?
14     61                                   top_up_reverted

In [22]:
def predict_intent(model_name, df, categories, start_index=0, end_index=None, log_every_n_examples=100):
    start_time = time.time()
    results = []  # Store processed results
    
    # Slice DataFrame based on start/end indices
    if end_index is None:
        subset_df = df.iloc[start_index:]
    else:
        subset_df = df.iloc[start_index:end_index+1]
    
    total_rows = len(subset_df)
    
    for row in subset_df.itertuples():
        prompt = get_prompt(row.dataset, row.split, row.text, categories)
        if row.Index == 0:
            print("Example of how prompt looks, for the 1st example in this subset of data")
            print(prompt)

            print("Example of how IntentSchema looks")
            print(IntentSchema.model_json_schema())
        
        
        try:
            response = ollama.chat(model=model_name, 
                                   messages=[
                                                {'role': 'user', 'content': prompt}
                                            ],
                                   format = IntentSchema.model_json_schema(),
                                   options = {'temperature': 0},  # Set temperature to 0 for a more deterministic output
                                  )
            msg = response['message']['content']
            parsed = json.loads(msg)

            
            # Safely extract keys with defaults - resolve parsing error
            # maybe LLM did not output a particular key-value pair
            category = parsed.get('category', 'error')
            confidence = parsed.get('confidence', 0.0)
            parsed = {'category': category, 'confidence': confidence}
        except (json.JSONDecodeError, KeyError, Exception) as e:
            parsed = {'category': 'error', 'confidence': 0.0}
        
        # Combine original row data with predictions
        results.append({
            "Index": row.Index,
            "text": row.text,
            "label": row.label,
            "dataset": row.dataset,
            "split": row.split,
            "predicted": parsed['category'],
            "confidence": parsed['confidence']
        })
        
        # Log progress
        if row.Index % log_every_n_examples == 0:
            elapsed_time = time.time() - start_time
            
            avg_time_per_row = elapsed_time / (row.Index - start_index + 1)
            remaining_rows = total_rows - (row.Index - start_index + 1)
            eta = avg_time_per_row * remaining_rows
            
            print(f"Processed original df idx {row.Index} (subset row {row.Index - start_index}) | "
                  f"Elapsed: {elapsed_time:.2f}s | ETA: {eta:.2f}s")
    
    return results  # Return list of dictionaries
    

In [23]:

print(f"Starting intent classification using {Config.model_name}")
subset_results = predict_intent(Config.model_name, 
                                df, 
                                bulletpts_intent, 
                                start_index = Config.start_index, 
                                end_index = 4,
                                log_every_n_examples = Config.log_every_n_examples)




Starting intent classification using qwen3:8b
Example of how prompt looks, for the 1st example in this subset of data

You are an expert in understanding and identifying what users are asking you.

Your task is to analyze an input query from a user.
Then assign the most appropriate category to the query from a predefined list below:
- activate_my_card
- age_limit
- apple_pay_or_google_pay
- atm_support
- automatic_top_up
- beneficiary_not_allowed
- card_about_to_expire
- card_acceptance
- card_arrival
- card_delivery_estimate
- card_linking
- card_not_working
- card_payment_fee_charged
- card_payment_not_recognised
- card_swallowed
- cash_withdrawal_charge
- cash_withdrawal_not_recognised
- change_pin
- compromised_card
- contactless_not_working
- country_support
- declined_card_payment
- declined_cash_withdrawal
- direct_debit_payment_not_recognised
- disposable_card_limits
- edit_personal_details
- exchange_charge
- exchange_rate
- extra_charge_on_statement
- failed_transfer
- get_di

In [24]:
# update end_index for filename (if None is used for the end of the df)
# Get the last index of the DataFrame
last_index = df.index[-1] 
# Use last index if Config.end_index is None
end_index = Config.end_index if Config.end_index is not None else last_index


In [25]:
subset_results

[{'Index': 0,
  'text': 'Could you help my figure out the exchange fee?',
  'label': 'exchange_charge',
  'dataset': 'banking',
  'split': 'train',
  'predicted': 'exchange_charge',
  'confidence': 0.95},
 {'Index': 1,
  'text': "I made a cash deposit to my account but i don't see it",
  'label': 'balance_not_updated_after_cheque_or_cash_deposit',
  'dataset': 'banking',
  'split': 'train',
  'predicted': 'top_up_by_cash_or_cheque',
  'confidence': 0.95},
 {'Index': 2,
  'text': "Hello - I'm on the app and trying to purchase crypto. It's not going through. What am I doing wrong?",
  'label': 'beneficiary_not_allowed',
  'dataset': 'banking',
  'split': 'train',
  'predicted': 'oos',
  'confidence': 1.0},
 {'Index': 3,
  'text': 'Why is it saying I have a pending payment?',
  'label': 'pending_card_payment',
  'dataset': 'banking',
  'split': 'train',
  'predicted': 'pending_top_up',
  'confidence': 0.95},
 {'Index': 4,
  'text': 'Is there an extra charge to exchange different currencie

In [26]:
import json
with open(f'results_{Config.model_name}_{Config.dataset_name}_{Config.start_index}_{end_index}_fortesting.json', 'w') as f:
    json.dump(subset_results, f, indent=2)

In [27]:
with open(f'results_{Config.model_name}_{Config.dataset_name}_{Config.start_index}_{end_index}_fortesting.json', 'r') as f:
    res = json.load(f)

In [28]:
res

[{'Index': 0,
  'text': 'Could you help my figure out the exchange fee?',
  'label': 'exchange_charge',
  'dataset': 'banking',
  'split': 'train',
  'predicted': 'exchange_charge',
  'confidence': 0.95},
 {'Index': 1,
  'text': "I made a cash deposit to my account but i don't see it",
  'label': 'balance_not_updated_after_cheque_or_cash_deposit',
  'dataset': 'banking',
  'split': 'train',
  'predicted': 'top_up_by_cash_or_cheque',
  'confidence': 0.95},
 {'Index': 2,
  'text': "Hello - I'm on the app and trying to purchase crypto. It's not going through. What am I doing wrong?",
  'label': 'beneficiary_not_allowed',
  'dataset': 'banking',
  'split': 'train',
  'predicted': 'oos',
  'confidence': 1.0},
 {'Index': 3,
  'text': 'Why is it saying I have a pending payment?',
  'label': 'pending_card_payment',
  'dataset': 'banking',
  'split': 'train',
  'predicted': 'pending_top_up',
  'confidence': 0.95},
 {'Index': 4,
  'text': 'Is there an extra charge to exchange different currencie

# Fix edge case - OOS exists in dataset and distorts count of classes

In [29]:
import subprocess
import sys


# 1. Install libraries from requirements.txt
print("Installing dependencies...")
subprocess.run([sys.executable, "-m", "pip", "install", "-r", "/kaggle/working/requirements.txt"], check=True)

# 2. Run setup_ollama.py
print("Starting Ollama setup...")
# subprocess.run(["python3", "/kaggle/working/src/setup_ollama.py"], check=True)
print("Starting Ollama setup...")
subprocess.run(
    ["python3", "-m", "src.setup_ollama"],  # Run as a module
    cwd="/kaggle/working",  # Set working directory to parent of 'src'
    check=True
)

# 3. Run download_dataset.py
print("Downloading dataset...")
subprocess.run(["python3", "/kaggle/working/download_dataset.py"], check=True)

Installing dependencies...
Starting Ollama setup...
Starting Ollama setup...


[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠙ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠹ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest ⠸ [K[?25h[?2026l[?2026h[?25l[1Gpulling manifest [K
pulling a3de86cd1c13: 100% ▕██████████████████▏ 5.2 GB                         [K
pulling ae370d884f10: 100% ▕██████████████████▏ 1.7 KB                         [K
pulling d18a5cc71b84: 100% ▕██████████████████▏  11 KB                         [K
pulling cff3f395ef37: 100% ▕██████████████████▏  120 B                         [K
pulling 05a61d37b084: 100% ▕██████████████████▏  487 B                         [K
verifying sha256 digest [K
writing manifest [K
success [K[?25h[?2026l


Ollama is already installed.
Starting Ollama server...
Pulling qwen3:8b model...
Ollama setup complete!
Downloading dataset...
Dataset has already been downloaded. If this is incorrect, please delete the Adaptive-Decision-Boundary 'data' folder.


CompletedProcess(args=['python3', '/kaggle/working/download_dataset.py'], returncode=0)

In [30]:
from src.config import Config
import pandas as pd
import os
import ollama
import json
import pickle
import time
from pydantic import BaseModel
from typing import Literal
# from enum import Enum
from huggingface_hub import snapshot_download



# Config.target_dir
# Config.cloned_data_dir'
# Config.dataset_name
# Config.model_name
# Config.start_index
# Config.end_index
# Config.log_every_n_examples


#######################
# load data
#######################
def load_data(data_dir):
    """Loads train, dev, and test datasets from a specified directory."""

    main_df = pd.DataFrame()
    for split in ['train', 'dev', 'test']:
        file_path = os.path.join(data_dir, f'{split}.tsv')
        if os.path.exists(file_path):
          try:
            df = pd.read_csv(file_path, sep='\t')
            df['dataset'] = os.path.basename(data_dir)
            df['split'] = split
            main_df = pd.concat([main_df, df], ignore_index=True)
          except pd.errors.ParserError as e:
            print(f"Error parsing {file_path}: {e}")
            # Handle the error appropriately, e.g., skip the file, log the error, etc.
        else:
            print(f"Warning: {split}.tsv not found in {data_dir}")
    return main_df

all_data = pd.DataFrame()

data_dir = os.path.join(Config.cloned_data_dir, Config.dataset_name)
if os.path.exists(data_dir):
  df = load_data(data_dir)
  print(f"Loaded dataset into dataframe: {Config.dataset_name}")
  print(f"Dimensions: {df.shape}")
  print(f"Col names: {df.columns}")
else:
  print(f"Warning: Directory {data_dir} not found.")
#######################



#######################
# unique intents
#######################
sorted_intent = list(sorted(df.label.unique()))
print("="*80)
print(f"Original dataset intents: {sorted_intent}")
print(f"Number of original intents: {len(sorted_intent)}\n")


# 2025.06.03
# New OOS approach - get 25/50/75% of class indexes for each dataset within the team (for reproducibility and comparable results)
# Change their class labels to 'oos'
snapshot_download(repo_id="KaiquanMah/open-intent-query-classification", repo_type="space", allow_patterns="*_idx2label.csv", local_dir=Config.idx2label_target_dir)
idx2label_filepath = Config.idx2label_target_dir + '/dataset_idx2label/' + Config.idx2label_filename_hf
idx2label = pd.read_csv(idx2label_filepath)
idx2label_oos = idx2label[idx2label.index.isin(Config.list_oos_idx)]
idx2label_oos.reset_index(drop=True, inplace=True)
print("="*80)
print("Original intents to convert to OOS class")
print(idx2label_oos)
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}\n")

oos_labels = idx2label_oos['label'].values
list_sorted_intent_aft_conversion = ['oos' if intent.lower() in oos_labels else intent for intent in sorted_intent]
list_sorted_intent_aft_conversion_deduped = sorted(set(list_sorted_intent_aft_conversion))
print("="*80)
print("Unique intents after converting some to OOS class")
print(list_sorted_intent_aft_conversion_deduped)
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}\n")



# unique intents - from set to bullet points (to use in prompts)
# bulletpts_intent = "\n".join(f"- {category}" for category in set_intent)
# 2025.06.03: do not show 'oos' in the prompt (to avoid leakage of 'oos' class)
bulletpts_intent = "\n".join(f"- {category}" for category in list_sorted_intent_aft_conversion_deduped if category and (category!='oos'))


# 2025.06.04: fix adjustment if 'oos' is already in the original dataset
int_oos_in_orig_dataset = int('oos' in idx2label.label.values)
adjust_if_oos_not_in_orig_dataset = [0 if int_oos_in_orig_dataset == 1 else 1][0]

print("="*80)
print("sanity check")
print(f"Number of original intents: {len(sorted_intent)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset): {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset}")
print(f"Number of original intents to convert to OOS class: {len(idx2label_oos)}")
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}")
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset) - converted classes: {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)}")
print(f"Numbers match: {(len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)) == len(list_sorted_intent_aft_conversion_deduped)}")
print("Prepared unique intents")
#######################





Loaded dataset into dataframe: banking
Dimensions: (13083, 4)
Col names: Index(['text', 'label', 'dataset', 'split'], dtype='object')
Original dataset intents: ['Refund_not_showing_up', 'activate_my_card', 'age_limit', 'apple_pay_or_google_pay', 'atm_support', 'automatic_top_up', 'balance_not_updated_after_bank_transfer', 'balance_not_updated_after_cheque_or_cash_deposit', 'beneficiary_not_allowed', 'cancel_transfer', 'card_about_to_expire', 'card_acceptance', 'card_arrival', 'card_delivery_estimate', 'card_linking', 'card_not_working', 'card_payment_fee_charged', 'card_payment_not_recognised', 'card_payment_wrong_exchange_rate', 'card_swallowed', 'cash_withdrawal_charge', 'cash_withdrawal_not_recognised', 'change_pin', 'compromised_card', 'contactless_not_working', 'country_support', 'declined_card_payment', 'declined_cash_withdrawal', 'declined_transfer', 'direct_debit_payment_not_recognised', 'disposable_card_limits', 'edit_personal_details', 'exchange_charge', 'exchange_rate', 'exc

Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

Original intents to convert to OOS class
    index                                             label
0       5           balance_not_updated_after_bank_transfer
1       6  balance_not_updated_after_cheque_or_cash_deposit
2       8                                   cancel_transfer
3      17                  card_payment_wrong_exchange_rate
4      27                                 declined_transfer
5      33                                  exchange_via_app
6      36                             fiat_currency_support
7      38                                 get_physical_card
8      39                                getting_spare_card
9      45                              pending_card_payment
10     48                                  pending_transfer
11     49                                       pin_blocked
12     51                             refund_not_showing_up
13     53                            reverted_card_payment?
14     61                                   top_up_reverted

In [31]:
int('oos' in idx2label.label.values)

0

In [32]:
int('oos' not in idx2label.label.values)

1

In [33]:
int_oos_in_orig_dataset = int('oos' in idx2label.label.values)
adjust_if_oos_not_in_orig_dataset = [0 if int_oos_in_orig_dataset == 1 else 1][0]
adjust_if_oos_not_in_orig_dataset

1