# Refactored code for
* Setting up and running Ollama in Kaggle
* Downloading THUIAR dataset
* Zero-Shot Prompt
* Use LLM to classify intent from an input 'question' dataset
* To configure your file/folder paths, LLM, dataset, start_index and end_index for each run, please update the config.py file

This notebook will also be used as the base to test any fixes to the LLM intent classification pipeline.
* 2025.05.26: Updated results output file from JSON to Pickle, to store list of dictionaries. 1 dictionary contains the results for each record. Lists of dictionaries can be downloaded from multiple notebooks, then concatenated for analysis
* 2025.05.30: Update prompt and bulletpts_intent.
  * Check if dataset contains 'oos' (out of scope) category
  * If dataset has no 'oos' (out of scope) category, turn 1 category into 'oos'. Use updated categories in bulletpts_intent. Also update prompt instructions on when to classify an example as 'oos'
  * **This force_oos fix is implemented in [notebook 01E](https://www.kaggle.com/code/kaiquanmah/01e-kaggle-ollama-llama3-2-w-force-oos?scriptVersionId=242648764)**
* 2025.05.30: Add pydantic schema with enums
  * From an analysis of errors, the model previously had a 45% average accuracy rate across categories. The model predicted a set of categories outside of what we gave it in 'bulletpts_intent'
  * To fix this, we will try to implement a pydantic schema solution for the model to only predict categories from the allowed list of categories ('bulletpts_intent')
* 2025.05.30: Set Ollama chat temperature to 0
  * Previously, we used the default temperature of 0.8, which might have caused the model to predict categories we did not provide to it ([Reading](https://docs.spring.io/spring-ai/reference/api/chat/ollama-chat.html))
  * **The pydantic schema and temperature fixes are implemented in [notebook 01F](https://www.kaggle.com/code/kaiquanmah/01f-kaggle-ollama-llama3-2-w-pydantic-schema)**
* 2025.06.03:
  1. **Remove 'oos' from `bulletpts_intent` input into prompt**, to be consistent with the team's approach when exploring embedding approaches to classify 'oos' examples. **Keep 'oos' in pydantic enums/Literal (for LLM to output 'oos' as an allowed class value)**
  2. **Remove 0.99 when defining the prompt format - to avoid anchoring LLM on outputting confidence of 0.99**
  3. **Added ability for user to define which classes are 'oos'**
  * **These 3 fixes are in [notebook 01G](https://www.kaggle.com/code/kaiquanmah/01g-kaggle-ollama-llama3-2-oos-update)**
* 2025.06.10:
  * From an error analysis earlier, **models can get confused between similar intent classes**
  * Therefore **we will analyse similar intent classes/labels -> get their indexes -> put them into 'oos' in [notebook 01H](https://www.kaggle.com/code/kaiquanmah/01h1-openintent-ollama-llama3-2-3b-banking77)**
  * **Going from zero-shot prompt previously, to few-shot prompt (with 5 examples) from known intents**. These 5 examples were **non-oos, and misclassified previously**. This 'fix' is in **[notebook 01i](https://www.kaggle.com/code/kaiquanmah/01i1-openintent-ollama-llama3-2-3b-banking77)**
* 2025.06.16:
  * For known intents (ie not in the 'oos' class), give 5 examples each in the few-shot prompt **[notebook 01J](https://www.kaggle.com/code/kaiquanmah/01j1-openintent-ollama-llama3-2-3b-banking77)**
* 2025.06.17:
  * Now we explore how changing the number of known intent classes affects the recall of oos in **[notebook 01K](https://www.kaggle.com/code/kaiquanmah/01k1-openintent-ollama-llama3-2-3b-banking77)**
  * For quick experimentation, we implement (1) fewshot prompt with 1 example per known intent class, (2) changing number of known intent classes in various notebook runs, (3) 100 oos sentences for the model to classify (taking from first class for banking77 and stackoverflow dataset, or the oos class for CLINC150 oos dataset)
    * For (3) - Added 'first_class' variable for each dataset to Config
    * For (3) - Created new fn to filter and keep 100 records from 'first/oos class' to input to the model to classify
* 2025.07.07:
  * Explore free, rate-limited API model (such as Gemini) in **[notebook 01L](https://www.kaggle.com/code/kaiquanmah/01l1-openintent-gemini-banking77-explore)**
  * Added retry for when we exhaust API limits per minute
  * Updated end_index tracking that works with Ollama and Gemini when generating JSON results file
  * **Explore Qwen model from the Nebius platform**

In [1]:
# 1. create dirs if they do not exist
import os
os.makedirs('/kaggle/working/src', exist_ok=True)
os.makedirs('/kaggle/working/prediction', exist_ok=True)

In [2]:
%%writefile /kaggle/working/src/setup_ollama.py
import os
import subprocess
import time
from src.config import Config # absolute import

# 1. Install Ollama (if not already installed)
try:
    # Check if Ollama is already installed
    subprocess.run(["ollama", "--version"], capture_output=True, check=True)
    print("Ollama is already installed.")
except FileNotFoundError:
    print("Installing Ollama...")
    subprocess.run("curl -fsSL https://ollama.com/install.sh  | sh", shell=True, check=True)

# 2. Start Ollama server in the background
print("Starting Ollama server...")
process = subprocess.Popen("ollama serve", shell=True, stdout=subprocess.DEVNULL, stderr=subprocess.DEVNULL)

# Wait for the server to initialize
time.sleep(5)


# 3. Pull the model
model_name = Config.model_name
print(f"Pulling {model_name} model...")
subprocess.run(["ollama", "pull", model_name], check=True)

# 4. Install Python client
subprocess.run(["pip", "install", "ollama"], check=True)

print("Ollama setup complete!")

Writing /kaggle/working/src/setup_ollama.py


In [3]:
%%writefile requirements.txt
pandas
pydantic
typing
huggingface-hub
# google-genai # only used for gemini model
openai # used for openrouter's gemini model
tenacity # for gemini model retries
# numpy
# enum

Writing requirements.txt


In [4]:
%%writefile /kaggle/working/src/__init__.py
# folder for config

Writing /kaggle/working/src/__init__.py


In [5]:
%%writefile /kaggle/working/src/config.py
class Config:
    #######################################################
    # working directory for files
    #######################################################
    target_dir = '/kaggle/working/data' # data directory to clone into
    cloned_data_dir = target_dir + '/data'
    prediction_dir = target_dir + '/prediction'
    #######################################################
    # dataset and model
    #######################################################
    dataset_name = 'stackoverflow' # UPDATE options: 'banking', 'stackoverflow', 'oos'
    idx2label_target_dir = '/kaggle/working/idx2label'
    idx2label_filename_hf = 'stackoverflow_idx2label.csv' # UPDATE options: banking77_idx2label.csv, stackoverflow_idx2label.csv, clinc150_oos_idx2label.csv
    fewshot_examples_dir = '/kaggle/working/fewshot'
    fewshot_subdir = '/fewshot-5examples-per-nonoos/'
    fewshot_examples_filename = 'stackoverflow_25perc_oos.txt' # UPDATE options: banking_25perc_oos.txt, stackoverflow_25perc_oos.txt, oos_25perc_oos.txt
    list_oos_idx = [0, 3, 10, 12, 14] # UPDATE gathered from within the team - for reproducible, comparable results with other open intent classification approaches
    model_name = 'Qwen3-30B-A3B' # 'gemma-2-9b-it-fast'
    start_index=0 # eg: 0, 10001, 11851
    end_index=None # eg: 10, 10000, 11850 or None (use end_index=None to process the full dataset)
    log_every_n_examples=10 # 2
    force_oos = True  # NEW: Add flag to force dataset to contain 'oos' class for the last class value (sorted alphabetically), if 'oos' class does not exist in the original dataset
    #######################################################
    # evaluate threshold when 'oos' recall drops
    #######################################################
    filter_oos_qns_only = False # True (when you are testing 'oos' recall threshold), False
    n_oos_qns = 100
    first_class_banking = 'activate_my_card' # following idx2label
    first_class_stackoverflow = 'wordpress' # following idx2label
    first_class_oos = 'oos'
    #######################################################

Writing /kaggle/working/src/config.py


In [6]:
%%writefile download_dataset.py
from src.config import Config
import os
import subprocess
target_dir = Config.target_dir # data directory to clone into
cloned_data_dir = Config.cloned_data_dir

# Create target directory if it doesn't exist
os.makedirs(target_dir, exist_ok=True)

# do not clone dataset repo if cloned data folder exists
if os.path.exists(cloned_data_dir):
    print("Dataset has already been downloaded. If this is incorrect, please delete the Adaptive-Decision-Boundary 'data' folder.")
else:
    # Clone the repository
    subprocess.run(["git",
                    "clone",
                    "https://github.com/thuiar/Adaptive-Decision-Boundary.git",
                    target_dir
                   ])

Writing download_dataset.py


In [7]:
%%writefile predict_class.py
from src.config import Config
import pandas as pd
import os
# import ollama
import json
import pickle
import time
from pydantic import BaseModel
from typing import Literal
# from enum import Enum
from huggingface_hub import snapshot_download
    
###################
# Gemini API
###################
# from google import genai
# from google.genai.types import ThinkingConfig
# from google.api_core import retry
from openai import OpenAI
from tenacity import retry, stop_after_attempt, wait_fixed
from kaggle_secrets import UserSecretsClient


###################


# Config.target_dir
# Config.cloned_data_dir'
# Config.dataset_name
# Config.model_name
# Config.start_index
# Config.end_index
# Config.log_every_n_examples


#######################
# load data
#######################
def load_data(data_dir):
    """Loads train, dev, and test datasets from a specified directory."""

    main_df = pd.DataFrame()
    for split in ['train', 'dev', 'test']:
        file_path = os.path.join(data_dir, f'{split}.tsv')
        if os.path.exists(file_path):
          try:
            df = pd.read_csv(file_path, sep='\t')
            df['dataset'] = os.path.basename(data_dir)
            df['split'] = split
            main_df = pd.concat([main_df, df], ignore_index=True)
          except pd.errors.ParserError as e:
            print(f"Error parsing {file_path}: {e}")
            # Handle the error appropriately, e.g., skip the file, log the error, etc.
        else:
            print(f"Warning: {split}.tsv not found in {data_dir}")
    return main_df


def filter100examples_oos(dataset_name, df):
    # dont input 'only oos qns to model'
    if Config.filter_oos_qns_only == False:
        filtered_df = df
    # vs
    # input 'only oos qns to model'
    else:
        if dataset_name == 'banking':
            first_class = Config.first_class_banking
        elif dataset_name == 'stackoverflow':
            first_class = Config.first_class_stackoverflow
        else:
            first_class = Config.first_class_oos
    
        filtered_df = df.copy()
        filtered_df = filtered_df.loc[filtered_df["label"] == first_class]
        filtered_df = filtered_df.sample(n=Config.n_oos_qns, random_state=38)
    return filtered_df


df = pd.DataFrame()

data_dir = os.path.join(Config.cloned_data_dir, Config.dataset_name)
if os.path.exists(data_dir):
  df = load_data(data_dir)
  print(f"Loaded dataset into dataframe: {Config.dataset_name}")
  print(f"Dimensions: {df.shape}")
  print(f"Col names: {df.columns}")
else:
  print(f"Warning: Directory {data_dir} not found.")
#######################



#######################
# unique intents
#######################
sorted_intent = list(sorted(df.label.unique()))
print("="*80)
print(f"Original dataset intents: {sorted_intent}")
print(f"Number of original intents: {len(sorted_intent)}\n")


# 2025.06.03
# New OOS approach - get 25/50/75% of class indexes for each dataset within the team (for reproducibility and comparable results)
# Change their class labels to 'oos'
snapshot_download(repo_id="KaiquanMah/open-intent-query-classification", repo_type="space", allow_patterns="*_idx2label.csv", local_dir=Config.idx2label_target_dir)
idx2label_filepath = Config.idx2label_target_dir + '/dataset_idx2label/' + Config.idx2label_filename_hf
idx2label = pd.read_csv(idx2label_filepath)
idx2label_oos = idx2label[idx2label.index.isin(Config.list_oos_idx)]
idx2label_oos.reset_index(drop=True, inplace=True)

# 2025.06.17 keep track of non-oos labels, to use in IntentSchema
nonoos_labels = idx2label[~idx2label.label.isin(Config.list_oos_idx)]['label'].values
print("="*80)
print("Original intents to convert to OOS class")
print(idx2label_oos)
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}\n")

oos_labels = idx2label_oos['label'].values
list_sorted_intent_aft_conversion = ['oos' if intent.lower() in oos_labels else intent for intent in sorted_intent]
list_sorted_intent_aft_conversion_deduped = sorted(set(list_sorted_intent_aft_conversion))
print("="*80)
print("Unique intents after converting some to OOS class")
print(list_sorted_intent_aft_conversion_deduped)
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}\n")



# unique intents - from set to bullet points (to use in prompts)
# bulletpts_intent = "\n".join(f"- {category}" for category in set_intent)
# 2025.06.03: do not show 'oos' in the prompt (to avoid leakage of 'oos' class)
bulletpts_intent = "\n".join(f"- {category}" for category in list_sorted_intent_aft_conversion_deduped if category and (category!='oos'))

# 2025.06.04: fix adjustment if 'oos' is already in the original dataset
int_oos_in_orig_dataset = int('oos' in idx2label.label.values)
adjust_if_oos_not_in_orig_dataset = [0 if int_oos_in_orig_dataset == 1 else 1][0]

print("="*80)
print("sanity check")
print(f"Number of original intents: {len(sorted_intent)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset): {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset}")
print(f"Number of original intents to convert to OOS class: {len(idx2label_oos)}")
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}")
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset) - converted classes: {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)}")
print(f"Numbers match: {(len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)) == len(list_sorted_intent_aft_conversion_deduped)}")
print("Prepared unique intents")
#######################




#######################
# Enforce schema on the model (e.g. allowed list of predicted categories)
#######################

class IntentSchema(BaseModel):
    # dynamically unpack list of categories for different dataset(s)
    category: Literal[*list_sorted_intent_aft_conversion_deduped]
    confidence: float
    
#######################




#######################
# filter after preparing intents
#######################
df = filter100examples_oos(Config.dataset_name, df)
print("Filtered dataset")
print(f"Dimensions: {df.shape}")
print(f"Col names: {df.columns}")
#######################



#######################
# Prompt
#######################
# prompt 2 with less information/compute, improve efficiency
# 2025.06.10 prompt 3 with 5 few shot examples only - notebook O1H1, O1i1
# 2025.06.16 prompt 4 with 5 examples per each known intent (ie non-oos intent) - notebook 01J1
snapshot_download(repo_id="KaiquanMah/open-intent-query-classification", repo_type="space", allow_patterns="*.txt", local_dir=Config.fewshot_examples_dir)
with open(Config.fewshot_examples_dir + Config.fewshot_subdir + Config.fewshot_examples_filename, 'r') as file:
    fewshot_examples = file.read()

def get_prompt(dataset_name, split, question, categories, fewshot_examples):
    
    prompt = f'''
You are an expert in understanding and identifying what users are asking you.

Your task is to analyze an input query from a user and assign the most appropriate category from the following list:
{categories}

Only classify as "oos" (out of scope category) if none of the other categories apply.

Below are several examples to guide your classification:

---
{fewshot_examples}
---

===============================

New Question: {question}

===============================

Provide your final classification in **valid JSON format** with the following structure:
{{
  "category": "your_chosen_category_name",
  "confidence": confidence_level_rounded_to_the_nearest_2_decimal_places
}}


Ensure the JSON has:
- Opening and closing curly braces
- Double quotes around keys and string values
- Confidence as a number (not a string), with maximum 2 decimal places

Do not include any explanations or extra text.
            '''
    return prompt



#######################


#######################
# Model on 1 Dataset
#######################
# Save a list of dictionaries 
# containing a dictionary for each record's
# - predicted category
# - confidence level and
# - original dataframe values


# gemini
user_secrets = UserSecretsClient()
NEBIUS_API_KEY = user_secrets.get_secret("NEBIUS_API_KEY")
client = OpenAI(base_url="https://api.studio.nebius.com/v1/",
                api_key = NEBIUS_API_KEY)

@retry(stop=stop_after_attempt(3), wait=wait_fixed(30))
def api_llm(client, prompt):
    try:
        print("CHECKPOINT_3A")
        # gemini_config = {"temperature": 0,
        #                  "response_mime_type": "application/json",
        #                  "response_schema": IntentSchema.model_json_schema(),
        #                  "seed": 38,
        #                  # # added for "gemini-2.5-flash-lite-preview-06-17" model
        #                  # "thinking_config": ThinkingConfig(thinking_budget=-1, 
        #                  #                    include_thoughts=True)
        #                 }
        response = client.beta.chat.completions.parse(model = 'Qwen/'+Config.model_name,
                                                      messages = [{"role": "user",
                                                                  "content": prompt}],
                                                      response_format = IntentSchema,
                                                      seed = 38,
                                                      temperature = 0
                                                      )
        # print(response)
        # msg = response.parsed
        response = response.choices[0].message.content
        print("CHECKPOINT_3B")
        return response
    except:
        print(f"CHECKPOINT_4A: Exception Type: {type(e).__name__}")
        print(f"CHECKPOINT_4A: Exception Message: {str(e)}")
        
        # Gemini-specific errors
        if hasattr(e, 'code'):
            print(f"CHECKPOINT_4A: Status Code: {e.code}")
        if hasattr(e, 'details'):
            print(f"CHECKPOINT_4A: Details: {e.details}")
        
        # raise the exception again so retry can work
        raise

    

def predict_intent(model_name, df, categories, start_index=0, end_index=None, log_every_n_examples=100):
    start_time = time.time()
    results = []  # Store processed results
    
    # Slice DataFrame based on start/end indices
    if end_index is None:
        subset_df = df.iloc[start_index:]
    else:
        subset_df = df.iloc[start_index:end_index+1]
    
    total_rows = len(subset_df)
    subset_row_count = 0

    

    
    
    for row in subset_df.itertuples():
        subset_row_count+=1
        prompt = get_prompt(row.dataset, row.split, row.text, categories, fewshot_examples)
        if subset_row_count == 1:
            print("Example of how prompt looks, for the 1st example in this subset of data")
            print(prompt)

            print("Example of how IntentSchema looks")
            print(IntentSchema.model_json_schema())
        
        
        try:
            print("CHECKPOINT_1A")
            
            # response = ollama.chat(model=model_name, 
            #                        messages=[
            #                                     {'role': 'user', 'content': prompt}
            #                                 ],
            #                        format = IntentSchema.model_json_schema(),
            #                        options = {'temperature': 0},  # Set temperature to 0 for a more deterministic output
            #                       )
            # msg = response['message']['content']
            # parsed = json.loads(msg)
            
            response = api_llm(client, prompt)
            print("CHECKPOINT_1B")
            parsed = json.loads(response.text)
            # parsed = response.parsed
            print("CHECKPOINT_1C")
                        
            # Safely extract keys with defaults - resolve parsing error
            # maybe LLM did not output a particular key-value pair
            category = parsed.get('category', 'error')
            confidence = parsed.get('confidence', 0.0)
            parsed = {'category': category, 'confidence': confidence}
        except (json.JSONDecodeError, KeyError, Exception) as e:
            print(f"CHECKPOINT_2A: Exception Type: {type(e).__name__}")
            print(f"CHECKPOINT_2A: Exception Message: {str(e)}")
            
            # Gemini-specific errors
            if hasattr(e, 'code'):
                print(f"CHECKPOINT_2A: Status Code: {e.code}")
            if hasattr(e, 'details'):
                print(f"CHECKPOINT_2A: Details: {e.details}")
                
            parsed = {'category': 'error', 'confidence': 0.0}
        
        # Combine original row data with predictions
        results.append({
            "Index": row.Index,
            "text": row.text,
            "label": row.label,
            "dataset": row.dataset,
            "split": row.split,
            "predicted": parsed['category'],
            "confidence": parsed['confidence']
        })

        
        # Log progress
        if subset_row_count % log_every_n_examples == 0:
            elapsed_time = time.time() - start_time
            
            avg_time_per_row = elapsed_time / subset_row_count
            remaining_rows = total_rows - subset_row_count
            eta = avg_time_per_row * remaining_rows
            
            print(f"Processed original df idx {row.Index} (subset row {subset_row_count}) | "
                  f"Elapsed: {elapsed_time:.2f}s | ETA: {eta:.2f}s")
    
    return results  # Return list of dictionaries
    

print(f"Starting intent classification using {Config.model_name}")
subset_results = predict_intent(Config.model_name, 
                                df, 
                                bulletpts_intent, 
                                start_index = Config.start_index, 
                                end_index = Config.end_index,
                                log_every_n_examples = Config.log_every_n_examples)



# # previously for Ollama
# # update end_index for filename (if None is used for the end of the df)
# # Get the last index of the DataFrame
# last_index = df.index[-1] 
# # Use last index if Config.end_index is None
# end_index = Config.end_index if Config.end_index is not None else last_index
# 2025.07.07
# now for Ollama AND Gemini
# Gemini - needs to track 'end_index' for API JSON exports (when daily limits are exhausted)
# Ollama - reuse this code
end_index = max(r['Index'] for r in subset_results)



# 2025.05.23 changed from JSON to PKL
# because we are saving list of dictionaries
# Save to PKL
# 2025.06.04 explore changing back to JSON
# with open(f'results_{Config.model_name}_{Config.dataset_name}_{Config.start_index}_{end_index}.pkl', 'wb') as f:
#     pickle.dump(subset_results, f)
with open(f'results_{Config.model_name}_{Config.dataset_name}_{Config.start_index}_{end_index}.json', 'w') as f:
    json.dump(subset_results, f, indent=2)

print("Completed intent classification")


#######################


Writing predict_class.py


In [8]:
%%writefile /kaggle/working/main.py
import subprocess
import sys
from src.config import Config


# 1. Install libraries from requirements.txt
print("Installing dependencies...")
subprocess.run([sys.executable, "-m", "pip", "install", "-r", "/kaggle/working/requirements.txt"], check=True)


# # 2. Run setup_ollama.py
# if 'gemini' not in Config.model_name:
#     print("Starting Ollama setup...")
#     # subprocess.run(["python3", "/kaggle/working/src/setup_ollama.py"], check=True)
#     print("Starting Ollama setup...")
#     subprocess.run(
#         ["python3", "-m", "src.setup_ollama"],  # Run as a module
#         cwd="/kaggle/working",  # Set working directory to parent of 'src'
#         check=True
#     )
    

# 3. Run download_dataset.py
print("Downloading dataset...")
subprocess.run(["python3", "/kaggle/working/download_dataset.py"], check=True)

# 4. Run predict_class.py
print("Running prediction script...")
subprocess.run(["python3", "/kaggle/working/predict_class.py"], check=True)

Writing /kaggle/working/main.py


# Model on subset of examples

In [None]:
!python3 /kaggle/working/main.py

# Sanity check folders

In [None]:
!cd /kaggle/working/ && ls -la

In [None]:
!cd /kaggle/working/src && ls -la

In [None]:
!cd /kaggle/working/data/data && ls -la

# idx2label_oos examples

In [None]:
pip install huggingface-hub

In [None]:
from huggingface_hub import snapshot_download
snapshot_download(repo_id="KaiquanMah/open-intent-query-classification", repo_type="space", allow_patterns="*_idx2label.csv", local_dir='/kaggle/working/idx2label')

In [None]:
import pandas as pd
idx2label = pd.read_csv('/kaggle/working/idx2label/dataset_idx2label/banking77_idx2label.csv')
idx2label

In [None]:
idx2label_oos = idx2label[idx2label.index.isin([31,32,33,36])]
idx2label_oos

In [None]:
print(idx2label_oos)

In [None]:
idx2label_oos.shape

In [None]:
# percentage of OOS classes over ALL classes in the dataset
len(idx2label_oos)/len(idx2label)

# Test Batch

In [9]:
import subprocess
import sys
from src.config import Config


# 1. Install libraries from requirements.txt
print("Installing dependencies...")
subprocess.run([sys.executable, "-m", "pip", "install", "-r", "/kaggle/working/requirements.txt"], check=True)


# # 2. Run setup_ollama.py
# if 'gemini' not in Config.model_name:
#     print("Starting Ollama setup...")
#     # subprocess.run(["python3", "/kaggle/working/src/setup_ollama.py"], check=True)
#     print("Starting Ollama setup...")
#     subprocess.run(
#         ["python3", "-m", "src.setup_ollama"],  # Run as a module
#         cwd="/kaggle/working",  # Set working directory to parent of 'src'
#         check=True
#     )
    

# 3. Run download_dataset.py
print("Downloading dataset...")
subprocess.run(["python3", "/kaggle/working/download_dataset.py"], check=True)

Installing dependencies...
Collecting typing (from -r /kaggle/working/requirements.txt (line 3))
  Downloading typing-3.7.4.3.tar.gz (78 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78.6/78.6 kB 3.6 MB/s eta 0:00:00
  Preparing metadata (setup.py): started
  Preparing metadata (setup.py): finished with status 'done'
Building wheels for collected packages: typing
  Building wheel for typing (setup.py): started
  Building wheel for typing (setup.py): finished with status 'done'
  Created wheel for typing: filename=typing-3.7.4.3-py3-none-any.whl size=26304 sha256=e21696cbc9a250ede0c8cdc003e46f3145bd04cd19284c16907b29b03f24d1fb
  Stored in directory: /root/.cache/pip/wheels/9d/67/2f/53e3ef32ec48d11d7d60245255e2d71e908201d20c880c08ee
Successfully built typing
Installing collected packages: typing
Successfully installed typing-3.7.4.3
Downloading dataset...


Cloning into '/kaggle/working/data'...


CompletedProcess(args=['python3', '/kaggle/working/download_dataset.py'], returncode=0)

In [10]:
from src.config import Config
import pandas as pd
import os
# import ollama
import json
import pickle
import time
from pydantic import BaseModel
from typing import Literal
# from enum import Enum
from huggingface_hub import snapshot_download
    
###################
# Gemini API
###################
# from google import genai
# from google.genai.types import ThinkingConfig
# from google.api_core import retry
from openai import OpenAI
from tenacity import retry, stop_after_attempt, wait_fixed
from kaggle_secrets import UserSecretsClient


###################


# Config.target_dir
# Config.cloned_data_dir'
# Config.dataset_name
# Config.model_name
# Config.start_index
# Config.end_index
# Config.log_every_n_examples


#######################
# load data
#######################
def load_data(data_dir):
    """Loads train, dev, and test datasets from a specified directory."""

    main_df = pd.DataFrame()
    for split in ['train', 'dev', 'test']:
        file_path = os.path.join(data_dir, f'{split}.tsv')
        if os.path.exists(file_path):
          try:
            df = pd.read_csv(file_path, sep='\t')
            df['dataset'] = os.path.basename(data_dir)
            df['split'] = split
            main_df = pd.concat([main_df, df], ignore_index=True)
          except pd.errors.ParserError as e:
            print(f"Error parsing {file_path}: {e}")
            # Handle the error appropriately, e.g., skip the file, log the error, etc.
        else:
            print(f"Warning: {split}.tsv not found in {data_dir}")
    return main_df


def filter100examples_oos(dataset_name, df):
    # dont input 'only oos qns to model'
    if Config.filter_oos_qns_only == False:
        filtered_df = df
    # vs
    # input 'only oos qns to model'
    else:
        if dataset_name == 'banking':
            first_class = Config.first_class_banking
        elif dataset_name == 'stackoverflow':
            first_class = Config.first_class_stackoverflow
        else:
            first_class = Config.first_class_oos
    
        filtered_df = df.copy()
        filtered_df = filtered_df.loc[filtered_df["label"] == first_class]
        filtered_df = filtered_df.sample(n=Config.n_oos_qns, random_state=38)
    return filtered_df


df = pd.DataFrame()

data_dir = os.path.join(Config.cloned_data_dir, Config.dataset_name)
if os.path.exists(data_dir):
  df = load_data(data_dir)
  print(f"Loaded dataset into dataframe: {Config.dataset_name}")
  print(f"Dimensions: {df.shape}")
  print(f"Col names: {df.columns}")
else:
  print(f"Warning: Directory {data_dir} not found.")
#######################



#######################
# unique intents
#######################
sorted_intent = list(sorted(df.label.unique()))
print("="*80)
print(f"Original dataset intents: {sorted_intent}")
print(f"Number of original intents: {len(sorted_intent)}\n")


# 2025.06.03
# New OOS approach - get 25/50/75% of class indexes for each dataset within the team (for reproducibility and comparable results)
# Change their class labels to 'oos'
snapshot_download(repo_id="KaiquanMah/open-intent-query-classification", repo_type="space", allow_patterns="*_idx2label.csv", local_dir=Config.idx2label_target_dir)
idx2label_filepath = Config.idx2label_target_dir + '/dataset_idx2label/' + Config.idx2label_filename_hf
idx2label = pd.read_csv(idx2label_filepath)
idx2label_oos = idx2label[idx2label.index.isin(Config.list_oos_idx)]
idx2label_oos.reset_index(drop=True, inplace=True)

# 2025.06.17 keep track of non-oos labels, to use in IntentSchema
nonoos_labels = idx2label[~idx2label.label.isin(Config.list_oos_idx)]['label'].values
print("="*80)
print("Original intents to convert to OOS class")
print(idx2label_oos)
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}\n")

oos_labels = idx2label_oos['label'].values
list_sorted_intent_aft_conversion = ['oos' if intent.lower() in oos_labels else intent for intent in sorted_intent]
list_sorted_intent_aft_conversion_deduped = sorted(set(list_sorted_intent_aft_conversion))
print("="*80)
print("Unique intents after converting some to OOS class")
print(list_sorted_intent_aft_conversion_deduped)
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}\n")



# unique intents - from set to bullet points (to use in prompts)
# bulletpts_intent = "\n".join(f"- {category}" for category in set_intent)
# 2025.06.03: do not show 'oos' in the prompt (to avoid leakage of 'oos' class)
bulletpts_intent = "\n".join(f"- {category}" for category in list_sorted_intent_aft_conversion_deduped if category and (category!='oos'))

# 2025.06.04: fix adjustment if 'oos' is already in the original dataset
int_oos_in_orig_dataset = int('oos' in idx2label.label.values)
adjust_if_oos_not_in_orig_dataset = [0 if int_oos_in_orig_dataset == 1 else 1][0]

print("="*80)
print("sanity check")
print(f"Number of original intents: {len(sorted_intent)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset): {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset}")
print(f"Number of original intents to convert to OOS class: {len(idx2label_oos)}")
print(f"Percentage of original intents to convert to OOS class: {len(idx2label_oos)/len(idx2label)}")
print(f"Number of unique intents after converting some to OOS class: {len(list_sorted_intent_aft_conversion_deduped)}")
print(f"Number of original intents + 1 OOS class (if doesnt exist in original dataset) - converted classes: {len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)}")
print(f"Numbers match: {(len(sorted_intent) + adjust_if_oos_not_in_orig_dataset - len(idx2label_oos)) == len(list_sorted_intent_aft_conversion_deduped)}")
print("Prepared unique intents")
#######################




#######################
# Enforce schema on the model (e.g. allowed list of predicted categories)
#######################

class IntentSchema(BaseModel):
    # dynamically unpack list of categories for different dataset(s)
    category: Literal[*list_sorted_intent_aft_conversion_deduped]
    confidence: float
    
#######################




#######################
# filter after preparing intents
#######################
df = filter100examples_oos(Config.dataset_name, df)
print("Filtered dataset")
print(f"Dimensions: {df.shape}")
print(f"Col names: {df.columns}")
#######################



#######################
# Prompt
#######################
# prompt 2 with less information/compute, improve efficiency
# 2025.06.10 prompt 3 with 5 few shot examples only - notebook O1H1, O1i1
# 2025.06.16 prompt 4 with 5 examples per each known intent (ie non-oos intent) - notebook 01J1
snapshot_download(repo_id="KaiquanMah/open-intent-query-classification", repo_type="space", allow_patterns="*.txt", local_dir=Config.fewshot_examples_dir)
with open(Config.fewshot_examples_dir + Config.fewshot_subdir + Config.fewshot_examples_filename, 'r') as file:
    fewshot_examples = file.read()

def get_prompt(dataset_name, split, question, categories, fewshot_examples):
    
    prompt = f'''
You are an expert in understanding and identifying what users are asking you.

Your task is to analyze an input query from a user and assign the most appropriate category from the following list:
{categories}

Only classify as "oos" (out of scope category) if none of the other categories apply.

Below are several examples to guide your classification:

---
{fewshot_examples}
---

===============================

New Question: {question}

===============================

Provide your final classification in **valid JSON format** with the following structure:
{{
  "category": "your_chosen_category_name",
  "confidence": confidence_level_rounded_to_the_nearest_2_decimal_places
}}


Ensure the JSON has:
- Opening and closing curly braces
- Double quotes around keys and string values
- Confidence as a number (not a string), with maximum 2 decimal places

Do not include any explanations or extra text.
            '''
    return prompt



#######################



Loaded dataset into dataframe: stackoverflow
Dimensions: (20000, 4)
Col names: Index(['text', 'label', 'dataset', 'split'], dtype='object')
Original dataset intents: ['ajax', 'apache', 'bash', 'cocoa', 'drupal', 'excel', 'haskell', 'hibernate', 'linq', 'magento', 'matlab', 'oracle', 'osx', 'qt', 'scala', 'sharepoint', 'spring', 'svn', 'visual-studio', 'wordpress']
Number of original intents: 20



Fetching 3 files:   0%|          | 0/3 [00:00<?, ?it/s]

stackoverflow_idx2label.csv:   0%|          | 0.00/224 [00:00<?, ?B/s]

clinc150_oos_idx2label.csv: 0.00B [00:00, ?B/s]

banking77_idx2label.csv: 0.00B [00:00, ?B/s]

Original intents to convert to OOS class
   index      label
0      1  wordpress
1      4     apache
2     11     spring
3     13      scala
4     15       ajax
Percentage of original intents to convert to OOS class: 0.25

Unique intents after converting some to OOS class
['bash', 'cocoa', 'drupal', 'excel', 'haskell', 'hibernate', 'linq', 'magento', 'matlab', 'oos', 'oracle', 'osx', 'qt', 'sharepoint', 'svn', 'visual-studio']
Number of unique intents after converting some to OOS class: 16

sanity check
Number of original intents: 20
Number of original intents + 1 OOS class (if doesnt exist in original dataset): 21
Number of original intents to convert to OOS class: 5
Percentage of original intents to convert to OOS class: 0.25
Number of unique intents after converting some to OOS class: 16
Number of original intents + 1 OOS class (if doesnt exist in original dataset) - converted classes: 16
Numbers match: True
Prepared unique intents
Filtered dataset
Dimensions: (20000, 4)
Col names: 

Fetching 62 files:   0%|          | 0/62 [00:00<?, ?it/s]

banking_only2notoos.txt:   0%|          | 0.00/419 [00:00<?, ?B/s]

banking_only15notoos.txt: 0.00B [00:00, ?B/s]

banking_only35notoos.txt: 0.00B [00:00, ?B/s]

banking_only20notoos.txt: 0.00B [00:00, ?B/s]

banking_only3notoos.txt:   0%|          | 0.00/472 [00:00<?, ?B/s]

banking_only30notoos.txt: 0.00B [00:00, ?B/s]

banking_only1notoos.txt:   0%|          | 0.00/177 [00:00<?, ?B/s]

banking_only10notoos.txt: 0.00B [00:00, ?B/s]

banking_only50notoos.txt: 0.00B [00:00, ?B/s]

banking_only60notoos.txt: 0.00B [00:00, ?B/s]

banking_only5notoos.txt:   0%|          | 0.00/835 [00:00<?, ?B/s]

banking_only40notoos.txt: 0.00B [00:00, ?B/s]

banking_only70notoos.txt: 0.00B [00:00, ?B/s]

oos_10notoos.txt: 0.00B [00:00, ?B/s]

banking_only4notoos.txt:   0%|          | 0.00/651 [00:00<?, ?B/s]

oos_100notoos.txt: 0.00B [00:00, ?B/s]

oos_140notoos.txt: 0.00B [00:00, ?B/s]

oos_14notoos.txt: 0.00B [00:00, ?B/s]

oos_13notoos.txt: 0.00B [00:00, ?B/s]

oos_11notoos.txt: 0.00B [00:00, ?B/s]

oos_12notoos.txt: 0.00B [00:00, ?B/s]

oos_120notoos.txt: 0.00B [00:00, ?B/s]

oos_15notoos.txt: 0.00B [00:00, ?B/s]

oos_1notoos.txt:   0%|          | 0.00/121 [00:00<?, ?B/s]

oos_3notoos.txt:   0%|          | 0.00/387 [00:00<?, ?B/s]

oos_30notoos.txt: 0.00B [00:00, ?B/s]

oos_20notoos.txt: 0.00B [00:00, ?B/s]

oos_40notoos.txt: 0.00B [00:00, ?B/s]

oos_2notoos.txt:   0%|          | 0.00/259 [00:00<?, ?B/s]

oos_4notoos.txt:   0%|          | 0.00/537 [00:00<?, ?B/s]

oos_50notoos.txt: 0.00B [00:00, ?B/s]

oos_5notoos.txt:   0%|          | 0.00/731 [00:00<?, ?B/s]

stackoverflow_only10notoos.txt: 0.00B [00:00, ?B/s]

stackoverflow_only12notoos.txt: 0.00B [00:00, ?B/s]

oos_75notoos.txt: 0.00B [00:00, ?B/s]

stackoverflow_only2notoos.txt:   0%|          | 0.00/288 [00:00<?, ?B/s]

stackoverflow_only18notoos.txt: 0.00B [00:00, ?B/s]

stackoverflow_only16notoos.txt: 0.00B [00:00, ?B/s]

stackoverflow_only14notoos.txt: 0.00B [00:00, ?B/s]

stackoverflow_only1notoos.txt:   0%|          | 0.00/132 [00:00<?, ?B/s]

stackoverflow_only3notoos.txt:   0%|          | 0.00/420 [00:00<?, ?B/s]

stackoverflow_only5notoos.txt:   0%|          | 0.00/715 [00:00<?, ?B/s]

stackoverflow_only4notoos.txt:   0%|          | 0.00/591 [00:00<?, ?B/s]

banking_25perc_oos.txt: 0.00B [00:00, ?B/s]

oos_25perc_oos.txt: 0.00B [00:00, ?B/s]

stackoverflow_only6notoos.txt:   0%|          | 0.00/884 [00:00<?, ?B/s]

stackoverflow_25perc_oos.txt: 0.00B [00:00, ?B/s]

stackoverflow_only8notoos.txt: 0.00B [00:00, ?B/s]

stackoverflow_only2notoos.txt: 0.00B [00:00, ?B/s]

logs_cpu_2025.05.22%20round1.txt: 0.00B [00:00, ?B/s]

(…)5.05.22%20round2%20w%20start-end-idx.txt: 0.00B [00:00, ?B/s]

(…)u_2025.05.22%20round4%20parallelise2.txt: 0.00B [00:00, ?B/s]

(…)ification_report_llama3.2_3b_banking.txt: 0.00B [00:00, ?B/s]

(…)lassification_report_llama3.2_3b_oos.txt: 0.00B [00:00, ?B/s]

logs_cpu_2025.05.22%20round2.txt: 0.00B [00:00, ?B/s]

(…)u_2025.05.22%20round3%20parallelise1.txt: 0.00B [00:00, ?B/s]

logs_gpu_2025.05.22.txt: 0.00B [00:00, ?B/s]

(…)ion_report_llama3.2_3b_stackoverflow.txt: 0.00B [00:00, ?B/s]

(…)tion_report_llama3.2_3b_banking_full.txt: 0.00B [00:00, ?B/s]

(…)eport_llama3.2_3b_stackoverflow_full.txt: 0.00B [00:00, ?B/s]

(…)lama3.2_3b_stackoverflow_only2notoos.txt:   0%|          | 0.00/380 [00:00<?, ?B/s]

(…)fication_report_llama3.2_3b_oos_full.txt: 0.00B [00:00, ?B/s]

## [New Task] Create Batch JSONL file of requests to Nebius' Qwen model
* https://docs.nebius.com/studio/inference/batch

In [11]:
# gemini
user_secrets = UserSecretsClient()
NEBIUS_API_KEY = user_secrets.get_secret("NEBIUS_API_KEY")
client = OpenAI(base_url="https://api.studio.nebius.com/v1/",
                api_key = NEBIUS_API_KEY)

In [12]:
model_name = Config.model_name
df = df
categories = bulletpts_intent
start_index = Config.start_index
end_index = Config.end_index
log_every_n_examples = Config.log_every_n_examples



start_time = time.time()
results = []  # Store processed results

# Slice DataFrame based on start/end indices
if end_index is None:
    subset_df = df.iloc[start_index:]
else:
    subset_df = df.iloc[start_index:end_index+1]

total_rows = len(subset_df)
subset_row_count = 0





for row in subset_df.itertuples():
    subset_row_count+=1
    prompt = get_prompt(row.dataset, row.split, row.text, categories, fewshot_examples)
    if subset_row_count == 1:
        print("Example of how prompt looks, for the 1st example in this subset of data")
        # print(prompt)
        break

Example of how prompt looks, for the 1st example in this subset of data


In [17]:
def create_batch_file(df, categories):
    requests = []
    for row in df.itertuples():
        # # round 1 - test 10 records
        # if row.Index < 10:
        # round 2 - FULL RUN
        prompt = get_prompt(row.dataset, row.split, row.text, categories, fewshot_examples)
        requests.append({
            "custom_id": row.Index,
            "method": "POST",
            "url": "/v1/chat/completions",
            "body": {
                "model": f"Qwen/{Config.model_name}",
                "messages": [{"role": "user", "content": prompt}],
                "seed": 38,
                "temperature": 0,
                "extra_body": {"guided_json": IntentSchema.model_json_schema()}
            }
        })
    
    # Save to JSONL file
    batch_file_path = f"/kaggle/working/batch_prompts_{Config.dataset_name}_{Config.start_index}_{Config.end_index}.jsonl"
    with open(batch_file_path, 'w') as f:
        for req in requests:
            f.write(json.dumps(req) + '\n')
    
    return batch_file_path

In [13]:
start_index

0

In [14]:
len(subset_df)

20000

In [15]:
len(df)

20000

In [18]:
batch_file_path = create_batch_file(df, categories)
batch_file_path

'/kaggle/working/batch_prompts_stackoverflow_0_None.jsonl'

In [20]:
import json
list_prompts = []

with open(f'/kaggle/working/batch_prompts_{Config.dataset_name}_{Config.start_index}_{Config.end_index}.jsonl', 'r') as f:
    list_prompts = [json.loads(line) for line in f if line.strip()]

list_prompts[0]

{'custom_id': 0,
 'method': 'POST',
 'url': '/v1/chat/completions',
 'body': {'model': 'Qwen/Qwen3-30B-A3B',
  'messages': [{'role': 'user',
  'seed': 38,
  'temperature': 0,
  'extra_body': {'guided_json': {'properties': {'category': {'enum': ['bash',
       'cocoa',
       'drupal',
       'excel',
       'haskell',
       'hibernate',
       'linq',
       'magento',
       'matlab',
       'oos',
       'oracle',
       'osx',
       'qt',
       'sharepoint',
       'svn',
       'visual-studio'],
      'title': 'Category',
      'type': 'string'},
     'confidence': {'title': 'Confidence', 'type': 'number'}},
    'required': ['category', 'confidence'],
    'title': 'IntentSchema',
    'type': 'object'}}}}

In [21]:
len(list_prompts)

20000

In [22]:
# sanity check min, max custom_id / row.Index for batch requests
list_custom_id = []
for req in list_prompts:
    # print(req['custom_id'])  # Example: print custom_id of each request
    list_custom_id.append(req['custom_id'])
print(f"custom_id. min: {min(list_custom_id)}, max: {max(list_custom_id)}")

custom_id. min: 0, max: 19999


## Batch API call

In [23]:
# upload JSONL file of API requests
batch_requests = client.files.create(
    file=open(batch_file_path, "rb"),
    purpose="batch"
)

In [24]:
batch_requests

FileObject(id='file-e552ddaf-08ad-4e69-89ab-ce8ef2427350', bytes=275300714, created_at=1751950990, filename='batch_prompts_stackoverflow_0_None.jsonl', object='file', purpose='batch', status=None, expires_at=None, status_details=None)

In [25]:
batch_requests.__dict__

{'id': 'file-e552ddaf-08ad-4e69-89ab-ce8ef2427350',
 'bytes': 275300714,
 'created_at': 1751950990,
 'filename': 'batch_prompts_stackoverflow_0_None.jsonl',
 'object': 'file',
 'purpose': 'batch',
 'status': None,
 'expires_at': None,
 'status_details': None,
 '_request_id': None}

In [26]:
# submit batch requests to Nebius Qwen model
client.batches.create(
    input_file_id=batch_requests.id,
    endpoint="/v1/chat/completions",
    completion_window="24h",
    metadata={
        "description": "Banking77 - Qwen - BatchFull"
    }
)


Batch(id='batch_a7db8d91-9092-45ab-aada-c8027ea20d68', completion_window='24h', created_at=1751951000, endpoint='/v1/chat/completions', input_file_id='file-e552ddaf-08ad-4e69-89ab-ce8ef2427350', object='batch', status='validating', cancelled_at=None, cancelling_at=None, completed_at=None, error_file_id=None, errors=None, expired_at=None, expires_at=None, failed_at=None, finalizing_at=None, in_progress_at=None, metadata={'description': 'Banking77 - Qwen - BatchFull'}, output_file_id=None, request_counts=BatchRequestCounts(completed=None, failed=None, total=None))

In [32]:
# follow up on batch request status
batch_id = 'batch_a7db8d91-9092-45ab-aada-c8027ea20d68'
completed_batch = client.batches.retrieve(batch_id)
completed_batch

Batch(id='batch_a7db8d91-9092-45ab-aada-c8027ea20d68', completion_window='24h', created_at=1751951000, endpoint='/v1/chat/completions', input_file_id='file-e552ddaf-08ad-4e69-89ab-ce8ef2427350', object='batch', status='done', cancelled_at=None, cancelling_at=None, completed_at=1751951344, error_file_id=None, errors=None, expired_at=None, expires_at=None, failed_at=None, finalizing_at=1751951339, in_progress_at=1751951018, metadata={'description': 'Banking77 - Qwen - BatchFull'}, output_file_id='a4e83330-f9be-4b6f-9548-665b12d7a33f', request_counts=BatchRequestCounts(completed=20000, failed=0, total=20000))

In [33]:
completed_batch.__dict__

{'id': 'batch_a7db8d91-9092-45ab-aada-c8027ea20d68',
 'completion_window': '24h',
 'created_at': 1751951000,
 'endpoint': '/v1/chat/completions',
 'input_file_id': 'file-e552ddaf-08ad-4e69-89ab-ce8ef2427350',
 'object': 'batch',
 'status': 'done',
 'cancelled_at': None,
 'cancelling_at': None,
 'completed_at': 1751951344,
 'error_file_id': None,
 'errors': None,
 'expired_at': None,
 'expires_at': None,
 'failed_at': None,
 'finalizing_at': 1751951339,
 'in_progress_at': 1751951018,
 'metadata': {'description': 'Banking77 - Qwen - BatchFull'},
 'output_file_id': 'a4e83330-f9be-4b6f-9548-665b12d7a33f',
 'request_counts': BatchRequestCounts(completed=20000, failed=0, total=20000),
 '_request_id': None}

In [34]:
# retrieve results using uploaded JSONL 'file id'

# DO NOT RETRIEVE INPUT FILE from 'batch_requests.id'!
# batch_result = client.files.content(batch_requests.id)

# INSTEAD, PLEASE RETRIEVE file using 'output_file_id'!!!
batch_result = client.files.content('a4e83330-f9be-4b6f-9548-665b12d7a33f')
# print 1st 1k characters
print(batch_result.text[:1000])


{"id": "batch_req_f2e54888-46f1-4b56-895c-172a0aa092d0", "custom_id": "2425", "response": {"id": "chatcmpl-739d19f798b34947a446c81a15ca3c87", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "{\"category\":\"oos\",\"confidence\":99.99}", "refusal": null, "role": "assistant", "audio": null, "function_call": null, "tool_calls": [], "reasoning_content": null}, "stop_reason": null}], "created": 1751951057, "model": "Qwen/Qwen3-30B-A3B", "object": "chat.completion", "service_tier": null, "system_fingerprint": null, "usage": {"completion_tokens": 15, "prompt_tokens": 3561, "total_tokens": 3576, "completion_tokens_details": null, "prompt_tokens_details": null}, "prompt_logprobs": null}, "error": null}
{"id": "batch_req_b4e8c7f8-5bca-4fea-98bd-06b0f859aef9", "custom_id": "2744", "response": {"id": "chatcmpl-f909003b61384a2fb422046ba7a3eab7", "choices": [{"finish_reason": "stop", "index": 0, "logprobs": null, "message": {"content": "{\"category\": \"qt\

In [35]:
type(batch_result.text)

str

In [36]:
import json

# from 1 string of ALL dictionary lines -> to list of 1 string per dictionary line
lines = [line.strip() for line in batch_result.text.splitlines() if line.strip()]
# from list of string 'dictionary lines' -> to list of dictionary objects
json_objects = [json.loads(line) for line in lines]
# sort LLM 'response' dictionaries based on row.Index (ie custom_id)
sorted_objects = sorted(json_objects, 
                        key = lambda x: int(x["custom_id"]))

# 1st 5 dictionaries
sorted_objects[:5]

[{'id': 'batch_req_57a0e922-7214-49e6-9408-bac7bc92a2df',
  'custom_id': '0',
  'response': {'id': 'chatcmpl-51a5a2554f944c9492c534f02943810b',
   'choices': [{'finish_reason': 'stop',
     'index': 0,
     'logprobs': None,
     'message': {'content': '{"category":"oos","confidence":99.99}',
      'refusal': None,
      'role': 'assistant',
      'audio': None,
      'function_call': None,
      'tool_calls': [],
      'reasoning_content': None},
     'stop_reason': None}],
   'created': 1751951019,
   'model': 'Qwen/Qwen3-30B-A3B',
   'object': 'chat.completion',
   'service_tier': None,
   'system_fingerprint': None,
   'usage': {'completion_tokens': 15,
    'prompt_tokens': 3557,
    'total_tokens': 3572,
    'completion_tokens_details': None,
    'prompt_tokens_details': None},
   'prompt_logprobs': None},
  'error': None},
 {'id': 'batch_req_33fa54e8-4059-4d85-8092-fffa0f99defe',
  'custom_id': '1',
  'response': {'id': 'chatcmpl-ddaea598332b498386a41bb033a8f727',
   'choices': [

In [37]:
# Save to JSONL file
with open(f"batch_outputs_{Config.dataset_name}_{Config.start_index}_{Config.end_index}.jsonl", "w") as f:
    for obj in sorted_objects:
        f.write(json.dumps(obj) + "\n")  # Write each JSON object as a single line

In [38]:
import json
batchfull = []

# Load as a list of dictionaries
with open(f"batch_outputs_{Config.dataset_name}_{Config.start_index}_{Config.end_index}.jsonl", "r") as f:
    # batchof10 = [json.loads(line) for line in f if line.strip()]
    batchfull = [json.loads(line) for line in f]

In [39]:
batchfull[:5]

[{'id': 'batch_req_57a0e922-7214-49e6-9408-bac7bc92a2df',
  'custom_id': '0',
  'response': {'id': 'chatcmpl-51a5a2554f944c9492c534f02943810b',
   'choices': [{'finish_reason': 'stop',
     'index': 0,
     'logprobs': None,
     'message': {'content': '{"category":"oos","confidence":99.99}',
      'refusal': None,
      'role': 'assistant',
      'audio': None,
      'function_call': None,
      'tool_calls': [],
      'reasoning_content': None},
     'stop_reason': None}],
   'created': 1751951019,
   'model': 'Qwen/Qwen3-30B-A3B',
   'object': 'chat.completion',
   'service_tier': None,
   'system_fingerprint': None,
   'usage': {'completion_tokens': 15,
    'prompt_tokens': 3557,
    'total_tokens': 3572,
    'completion_tokens_details': None,
    'prompt_tokens_details': None},
   'prompt_logprobs': None},
  'error': None},
 {'id': 'batch_req_33fa54e8-4059-4d85-8092-fffa0f99defe',
  'custom_id': '1',
  'response': {'id': 'chatcmpl-ddaea598332b498386a41bb033a8f727',
   'choices': [

In [40]:
type(batchfull[0])

dict

In [41]:
dict0 = batchfull[0]
dict0

{'id': 'batch_req_57a0e922-7214-49e6-9408-bac7bc92a2df',
 'custom_id': '0',
 'response': {'id': 'chatcmpl-51a5a2554f944c9492c534f02943810b',
  'choices': [{'finish_reason': 'stop',
    'index': 0,
    'logprobs': None,
    'message': {'content': '{"category":"oos","confidence":99.99}',
     'refusal': None,
     'role': 'assistant',
     'audio': None,
     'function_call': None,
     'tool_calls': [],
     'reasoning_content': None},
    'stop_reason': None}],
  'created': 1751951019,
  'model': 'Qwen/Qwen3-30B-A3B',
  'object': 'chat.completion',
  'service_tier': None,
  'system_fingerprint': None,
  'usage': {'completion_tokens': 15,
   'prompt_tokens': 3557,
   'total_tokens': 3572,
   'completion_tokens_details': None,
   'prompt_tokens_details': None},
  'prompt_logprobs': None},
 'error': None}

In [42]:
dict0response = json.loads(dict0["response"]["choices"][0]["message"]["content"])
dict0response

{'category': 'oos', 'confidence': 99.99}

## Individual API Call Workings

In [None]:

#######################
# Model on 1 Dataset
#######################
# Save a list of dictionaries 
# containing a dictionary for each record's
# - predicted category
# - confidence level and
# - original dataframe values


# gemini
user_secrets = UserSecretsClient()
NEBIUS_API_KEY = user_secrets.get_secret("NEBIUS_API_KEY")
client = OpenAI(base_url="https://api.studio.nebius.com/v1/",
                api_key = NEBIUS_API_KEY)

@retry(stop=stop_after_attempt(3), wait=wait_fixed(30))
def api_llm(client, prompt):
    try:
        print("CHECKPOINT_3A")
        # gemini_config = {"temperature": 0,
        #                  "response_mime_type": "application/json",
        #                  "response_schema": IntentSchema.model_json_schema(),
        #                  "seed": 38,
        #                  # # added for "gemini-2.5-flash-lite-preview-06-17" model
        #                  # "thinking_config": ThinkingConfig(thinking_budget=-1, 
        #                  #                    include_thoughts=True)
        #                 }
        response = client.beta.chat.completions.parse(model = 'Qwen/'+Config.model_name,
                                                      messages = [{"role": "user",
                                                                  "content": prompt}],
                                                      response_format = IntentSchema,
                                                      seed = 38,
                                                      temperature = 0
                                                      )
        # print(response)
        # msg = response.parsed
        response = response.choices[0].message.content
        print("CHECKPOINT_3B")
        return response
    except:
        print(f"CHECKPOINT_4A: Exception Type: {type(e).__name__}")
        print(f"CHECKPOINT_4A: Exception Message: {str(e)}")
        
        # Gemini-specific errors
        if hasattr(e, 'code'):
            print(f"CHECKPOINT_4A: Status Code: {e.code}")
        if hasattr(e, 'details'):
            print(f"CHECKPOINT_4A: Details: {e.details}")
        
        # raise the exception again so retry can work
        raise

    


In [None]:
from src.config import Config


In [None]:
Config.end_index

In [None]:
model_name = Config.model_name
df = df
categories = bulletpts_intent
start_index = Config.start_index
end_index = Config.end_index
log_every_n_examples = Config.log_every_n_examples



start_time = time.time()
results = []  # Store processed results

# Slice DataFrame based on start/end indices
if end_index is None:
    subset_df = df.iloc[start_index:]
else:
    subset_df = df.iloc[start_index:end_index+1]

total_rows = len(subset_df)
subset_row_count = 0





for row in subset_df.itertuples():
    subset_row_count+=1
    prompt = get_prompt(row.dataset, row.split, row.text, categories, fewshot_examples)
    if subset_row_count == 1:
        print("Example of how prompt looks, for the 1st example in this subset of data")
        # print(prompt)
        break

In [None]:
response = api_llm(client, prompt)

In [None]:
response = client.beta.chat.completions.parse(model = 'Qwen/'+Config.model_name,
                                              messages = [{"role": "user",
                                                          "content": prompt}],
                                              response_format = IntentSchema,
                                              seed = 38,
                                              temperature = 0
                                              )
response

In [None]:
response2 = client.chat.completions.create(model = 'Qwen/'+Config.model_name,
                                              messages = [{"role": "user",
                                                          "content": prompt}],
                                              seed = 38,
                                              temperature = 0,
                                              extra_body = {"guided_json": IntentSchema.model_json_schema()}
                                              )

In [None]:
# response2
# ChatCompletion(id='chatcmpl-f6f492a52fc447d198ee34318e5c1801', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='{"category":"oos","confidence":0.00}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=[], reasoning_content=None), stop_reason=None)], created=1751899983, model='Qwen/Qwen3-30B-A3B', object='chat.completion', service_tier=None, system_fingerprint=None, usage=CompletionUsage(completion_tokens=14, prompt_tokens=14012, total_tokens=14026, completion_tokens_details=None, prompt_tokens_details=None), prompt_logprobs=None)

# response2.choices[0].message.content
# '{"category":"oos","confidence":0.00}'

# json.loads(response2.choices[0].message.content)
# {'category': 'oos', 'confidence': 0.0}


In [None]:
IntentSchema

In [None]:
IntentSchema.model_json_schema()

In [None]:
# response.__dict__
"""
{'id': 'chatcmpl-20151be35483463a9136d12957c6e0ec',
 'choices': [ParsedChoice[IntentSchema](finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage[IntentSchema](content='{"category":"oos","confidence":0.00}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, parsed=IntentSchema(category='oos', confidence=0.0), reasoning_content=None), stop_reason=None)],
 'created': 1751897335,
 'model': 'Qwen/Qwen3-30B-A3B',
 'object': 'chat.completion',
 'service_tier': None,
 'system_fingerprint': None,
 'usage': CompletionUsage(completion_tokens=14, prompt_tokens=14012, total_tokens=14026, completion_tokens_details=None, prompt_tokens_details=None),
 '_request_id': None}
"""

In [None]:
# response.choices
# [ParsedChoice[IntentSchema](finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage[IntentSchema](content='{"category":"oos","confidence":0.00}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, parsed=IntentSchema(category='oos', confidence=0.0), reasoning_content=None), stop_reason=None)]

# response.choices[0]
# ParsedChoice[IntentSchema](finish_reason='stop', index=0, logprobs=None, message=ParsedChatCompletionMessage[IntentSchema](content='{"category":"oos","confidence":0.00}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, parsed=IntentSchema(category='oos', confidence=0.0), reasoning_content=None), stop_reason=None)

# response.choices[0].__dict__
# """
# {'finish_reason': 'stop',
#  'index': 0,
#  'logprobs': None,
#  'message': ParsedChatCompletionMessage[IntentSchema](content='{"category":"oos","confidence":0.00}', refusal=None, role='assistant', annotations=None, audio=None, function_call=None, tool_calls=None, parsed=IntentSchema(category='oos', confidence=0.0), reasoning_content=None)}
# """

# response.choices[0].message.content
# '{"category":"oos","confidence":0.00}'

# type(response.choices[0].message.content)
# str


In [None]:

def predict_intent(model_name, df, categories, start_index=0, end_index=None, log_every_n_examples=100):
    start_time = time.time()
    results = []  # Store processed results
    
    # Slice DataFrame based on start/end indices
    if end_index is None:
        subset_df = df.iloc[start_index:]
    else:
        subset_df = df.iloc[start_index:end_index+1]
    
    total_rows = len(subset_df)
    subset_row_count = 0

    

    
    
    for row in subset_df.itertuples():
        subset_row_count+=1
        prompt = get_prompt(row.dataset, row.split, row.text, categories, fewshot_examples)
        if subset_row_count == 1:
            print("Example of how prompt looks, for the 1st example in this subset of data")
            print(prompt)

            print("Example of how IntentSchema looks")
            print(IntentSchema.model_json_schema())
        
        
        try:
            print("CHECKPOINT_1A")
            
            # response = ollama.chat(model=model_name, 
            #                        messages=[
            #                                     {'role': 'user', 'content': prompt}
            #                                 ],
            #                        format = IntentSchema.model_json_schema(),
            #                        options = {'temperature': 0},  # Set temperature to 0 for a more deterministic output
            #                       )
            # msg = response['message']['content']
            # parsed = json.loads(msg)
            
            response = api_llm(_client, prompt)
            print("CHECKPOINT_1B")
            parsed = json.loads(response.text)
            # parsed = response.parsed
            print("CHECKPOINT_1C")
                        
            # Safely extract keys with defaults - resolve parsing error
            # maybe LLM did not output a particular key-value pair
            category = parsed.get('category', 'error')
            confidence = parsed.get('confidence', 0.0)
            parsed = {'category': category, 'confidence': confidence}
        except (json.JSONDecodeError, KeyError, Exception) as e:
            print(f"CHECKPOINT_2A: Exception Type: {type(e).__name__}")
            print(f"CHECKPOINT_2A: Exception Message: {str(e)}")
            
            # Gemini-specific errors
            if hasattr(e, 'code'):
                print(f"CHECKPOINT_2A: Status Code: {e.code}")
            if hasattr(e, 'details'):
                print(f"CHECKPOINT_2A: Details: {e.details}")
                
            parsed = {'category': 'error', 'confidence': 0.0}
        
        # Combine original row data with predictions
        results.append({
            "Index": row.Index,
            "text": row.text,
            "label": row.label,
            "dataset": row.dataset,
            "split": row.split,
            "predicted": parsed['category'],
            "confidence": parsed['confidence']
        })

        
        # Log progress
        if subset_row_count % log_every_n_examples == 0:
            elapsed_time = time.time() - start_time
            
            avg_time_per_row = elapsed_time / subset_row_count
            remaining_rows = total_rows - subset_row_count
            eta = avg_time_per_row * remaining_rows
            
            print(f"Processed original df idx {row.Index} (subset row {subset_row_count}) | "
                  f"Elapsed: {elapsed_time:.2f}s | ETA: {eta:.2f}s")
    
    return results  # Return list of dictionaries
    

print(f"Starting intent classification using {Config.model_name}")
subset_results = predict_intent(Config.model_name, 
                                df, 
                                bulletpts_intent, 
                                start_index = Config.start_index, 
                                end_index = Config.end_index,
                                log_every_n_examples = Config.log_every_n_examples)



# # previously for Ollama
# # update end_index for filename (if None is used for the end of the df)
# # Get the last index of the DataFrame
# last_index = df.index[-1] 
# # Use last index if Config.end_index is None
# end_index = Config.end_index if Config.end_index is not None else last_index
# 2025.07.07
# now for Ollama AND Gemini
# Gemini - needs to track 'end_index' for API JSON exports (when daily limits are exhausted)
# Ollama - reuse this code
end_index = max(r['Index'] for r in subset_results)



# 2025.05.23 changed from JSON to PKL
# because we are saving list of dictionaries
# Save to PKL
# 2025.06.04 explore changing back to JSON
# with open(f'results_{Config.model_name}_{Config.dataset_name}_{Config.start_index}_{end_index}.pkl', 'wb') as f:
#     pickle.dump(subset_results, f)
with open(f'results_{Config.model_name}_{Config.dataset_name}_{Config.start_index}_{end_index}.json', 'w') as f:
    json.dump(subset_results, f, indent=2)

print("Completed intent classification")


#######################
