# LLM Classification Finetuning Inference

This is the **offline** notebook used for inference on Kaggle. This will load
the fine tuned model from the training notebook, along with loading dependencies
from the training notebook.

---

Competition: https://www.kaggle.com/competitions/llm-classification-finetuning/overview

## Submission File

For each ID in the test set, you must predict the probability for each target class. The file should contain a header and have the following format:

```csv
id,winner_model_a,winner_model_b,winner_tie
136060,0.33,0,33,0.33
211333,0.33,0,33,0.33
1233961,0.33,0,33,0.33
etc
```

Submission file must be named `submission.csv` in the `/kaggle/working/` directory.

## Inputs

Input files are in `/kaggle/input/llm-classification-finetuning/` directory if
running on Kaggle.

```
/kaggle/input/llm-classification-finetuning/sample_submission.csv
/kaggle/input/llm-classification-finetuning/train.csv
/kaggle/input/llm-classification-finetuning/test.csv
```


In [1]:
import os

kaggle_run_type = os.environ.get('KAGGLE_KERNEL_RUN_TYPE')
print(f"KAGGLE_KERNEL_RUN_TYPE: {kaggle_run_type}")

ON_KAGGLE = kaggle_run_type is not None

KAGGLE_KERNEL_RUN_TYPE: Batch


In [2]:
if ON_KAGGLE:
    %pip install torch torchvision pandas tabulate transformers evaluate peft wandb \
        --find-links /kaggle/input/k/drklee3/llm-classification-finetuning/frozen_packages \
        --no-index
else:
    %pip install transformers peft

Looking in links: /kaggle/input/k/drklee3/llm-classification-finetuning/frozen_packages
Processing /kaggle/input/k/drklee3/llm-classification-finetuning/frozen_packages/evaluate-0.4.3-py3-none-any.whl
INFO: pip is looking at multiple versions of torch to determine which version is compatible with other requirements. This could take a while.
Processing /kaggle/input/k/drklee3/llm-classification-finetuning/frozen_packages/torch-2.7.1-cp311-cp311-manylinux_2_28_x86_64.whl
Processing /kaggle/input/k/drklee3/llm-classification-finetuning/frozen_packages/sympy-1.14.0-py3-none-any.whl (from torch)
Processing /kaggle/input/k/drklee3/llm-classification-finetuning/frozen_packages/nvidia_cuda_nvrtc_cu12-12.6.77-py3-none-manylinux2014_x86_64.whl (from torch)
Processing /kaggle/input/k/drklee3/llm-classification-finetuning/frozen_packages/nvidia_cuda_runtime_cu12-12.6.77-py3-none-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (from torch)
Processing /kaggle/input/k/drklee3/llm-classification

In [3]:
BASE_PATH_INPUT = '/kaggle/input/llm-classification-finetuning' if ON_KAGGLE else './data/'
BASE_PATH_TRAIN_NB = '/kaggle/input/k/drklee3/llm-classification-finetuning' if ON_KAGGLE else './'

print(f"Using input path: {BASE_PATH_INPUT}")

def list_files(path):
    """List all files in the given directory and its subdirectories."""
    print(f"Files in: {path}")

    for root, dirs, files in os.walk(path):
        for file in files:
            print(f" - {os.path.join(root, file)}")

list_files(BASE_PATH_INPUT)

Using input path: /kaggle/input/llm-classification-finetuning
Files in: /kaggle/input/llm-classification-finetuning
 - /kaggle/input/llm-classification-finetuning/sample_submission.csv
 - /kaggle/input/llm-classification-finetuning/train.csv
 - /kaggle/input/llm-classification-finetuning/test.csv


## Load model trained in training notebook

In [4]:
# Load the model after training, saved locally.

from peft import AutoPeftModelForCausalLM, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer

device         = "cuda"
base_model_dir = BASE_PATH_TRAIN_NB + "/base-model"
peft_dir       = BASE_PATH_TRAIN_NB + "/model-output"

# 1) tokenizer + base
tokenizer  = AutoTokenizer.from_pretrained(base_model_dir, local_files_only=True)
base_model = AutoModelForCausalLM.from_pretrained(base_model_dir, local_files_only=True).to(device)

# 2) attach the adapter
model = PeftModel.from_pretrained(
    base_model,
    peft_dir, 
    local_files_only=True
).to(device)

2025-06-20 02:36:45.614074: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:477] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
E0000 00:00:1750387005.846161      19 cuda_dnn.cc:8310] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
E0000 00:00:1750387005.917279      19 cuda_blas.cc:1418] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered


In [5]:
# Try an inference example
prompt = "What is the capital of France?"
inputs = tokenizer(prompt, return_tensors="pt").to(device)

outputs = model.generate(**inputs, max_new_tokens=50)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)

print(f"Prompt: {prompt}")
print(f"Response: {response}")

Setting `pad_token_id` to `eos_token_id`:0 for open-end generation.


Prompt: What is the capital of France?
Response: What is the capital of France?

The capital of France is Paris.

The capital of France is Paris.

The capital of France is Paris.

The capital of France is Paris.

The capital of France is Paris.

The capital of


In [6]:
import json
import pandas as pd
from datasets import Dataset

def preprocess_function_for_inference(
    examples,
    tokenizer,
    max_length: int = 1024,
):
    """
    Modified preprocessing function for inference (no labels needed).
    Returns only the formatted input text for prediction.
    """
    # Build the text inputs in the desired format (same as training)
    inputs = []
    for prompt_json, response_a_json, response_b_json in zip(
        examples["prompt"], examples["response_a"], examples["response_b"]
    ):
        # JSON decode the columns to handle multi-turn conversations
        prompts = json.loads(prompt_json)
        responses_a = json.loads(response_a_json)
        responses_b = json.loads(response_b_json)
        
        # Build conversation with turn-by-turn format (same as training)
        conversation_parts = []
        for i, (prompt_turn, response_a_turn, response_b_turn) in enumerate(zip(prompts, responses_a, responses_b), 1):
            turn_text = f"## Turn {i}\n"
            turn_text += "### Prompt\n"
            turn_text += f"{prompt_turn}\n\n"

            turn_text += "### Response A\n"
            turn_text += f"{response_a_turn}\n\n"

            turn_text += "### Response B\n"
            turn_text += f"{response_b_turn}\n"

            conversation_parts.append(turn_text)
        
        # Join all turns with separator and add final question
        conversation = "\n---\n\n".join(conversation_parts)
        input_text = f"{conversation}\n\nWhich is better?\nAnswer: "
        inputs.append(input_text)

    # Tokenize inputs only (no targets for inference)
    model_inputs = tokenizer(
        inputs, 
        add_special_tokens=True, 
        padding=True, 
        truncation=True,
        max_length=max_length,
        return_tensors="pt",
    )
    
    return model_inputs.to(device)

# Load the test data
test_df = pd.read_csv(f"{BASE_PATH_INPUT}/test.csv")
print(f"Test data shape: {test_df.shape}")
print(f"Test data columns: {test_df.columns.tolist()}")
print(f"First few rows:\n{test_df.head()}")

Test data shape: (3, 4)
Test data columns: ['id', 'prompt', 'response_a', 'response_b']
First few rows:
        id                                             prompt  \
0   136060  ["I have three oranges today, I ate an orange ...   
1   211333  ["You are a mediator in a heated political deb...   
2  1233961  ["How to initialize the classification head wh...   

                                          response_a  \
0                    ["You have two oranges today."]   
1  ["Thank you for sharing the details of the sit...   
2  ["When you want to initialize the classificati...   

                                          response_b  
0  ["You still have three oranges. Eating an oran...  
1  ["Mr Reddy and Ms Blue both have valid points ...  
2  ["To initialize the classification head when p...  


In [7]:
# Convert to Hugging Face Dataset
test_dataset = Dataset.from_pandas(test_df)

In [8]:
import json
import torch
import numpy as np
from datasets import Dataset

def predict_with_model(model, tokenizer, test_dataset, batch_size=1):
    """
    Generate predictions for test examples using the fine-tuned model.
    """

    # Eval mode for inference
    model.eval()
    predictions = []
    
    print(f"Processing {len(test_dataset)} examples...")
    
    # Process in batches
    for i in range(0, len(test_dataset), batch_size):
        batch_end = min(i + batch_size, len(test_dataset))
        batch = test_dataset.select(range(i, batch_end))
        
        print(f"## Processing batch {i//batch_size + 1}/{(len(test_dataset) + batch_size - 1)//batch_size}")
        
        # Preprocess the batch
        inputs = preprocess_function_for_inference(batch, tokenizer)
        
        # Move to device if using GPU
        if torch.cuda.is_available():
            inputs = {k: v.cuda() for k, v in inputs.items()}

        with torch.no_grad():
            # Generate predictions
            outputs = model.generate(
                **inputs,
                max_new_tokens=10,  # We only expect "a", "b", or "tie"
                do_sample=False,    # Use greedy decoding for consistent results
                pad_token_id=tokenizer.pad_token_id,
                eos_token_id=tokenizer.eos_token_id,
            )
            
            # Decode the generated tokens (skip the input tokens)
            input_length = inputs['input_ids'].shape[1]
            generated_tokens = outputs[:, input_length:]

            # Decode each response
            for j, generated in enumerate(generated_tokens):
                response = tokenizer.decode(generated, skip_special_tokens=True).strip()
                print(f"### Example {i+j}")
                print(f"Response: {response}")

                # Convert response to probabilities
                if response.lower().startswith('a'):
                    pred = [1.0, 0.0, 0.0]  # winner_model_a
                elif response.lower().startswith('b'):
                    pred = [0.0, 1.0, 0.0]  # winner_model_b
                elif response.lower().startswith('tie'):
                    pred = [0.0, 0.0, 1.0]  # winner_tie
                else:
                    # Default to uniform distribution if unclear
                    pred = [0.33, 0.33, 0.34]
                    print("[Warning] Unclear response, using uniform distribution")

                predictions.append(pred)
    
    return predictions


In [9]:
# Generate predictions for the test set
print("Generating predictions...")
predictions = predict_with_model(model, tokenizer, test_dataset)

print(f"\nGenerated {len(predictions)} predictions")
for i, pred in enumerate(predictions):
    print(f"ID {test_df.iloc[i]['id']}")
    print(f"  Prediction: {pred}")

Generating predictions...
Processing 3 examples...
## Processing batch 1/3




### Example 0
Response: 1.

### Prompt
I have three
## Processing batch 2/3
### Example 1
Response: 1.

### Prompt
You are a
## Processing batch 3/3
### Example 2
Response: learning libraries like PyTorch or TensorFlow to load the model

Generated 3 predictions
ID 136060
  Prediction: [0.33, 0.33, 0.34]
ID 211333
  Prediction: [0.33, 0.33, 0.34]
ID 1233961
  Prediction: [0.33, 0.33, 0.34]


In [10]:
# Create submission dataframe
import pandas as pd

submission_df = pd.DataFrame({
    'id': test_df['id'],
    'winner_model_a': [pred[0] for pred in predictions],
    'winner_model_b': [pred[1] for pred in predictions], 
    'winner_tie': [pred[2] for pred in predictions]
})

print("Submission DataFrame:")
print(submission_df)

# Verify probabilities sum to 1 (approximately)
prob_sums = submission_df[['winner_model_a', 'winner_model_b', 'winner_tie']].sum(axis=1)
print(f"\nProbability sums - Min: {prob_sums.min():.3f}, Max: {prob_sums.max():.3f}")

# Save submission file
output_path = "/kaggle/working/submission.csv" if ON_KAGGLE else "./submission.csv"
submission_df.to_csv(output_path, index=False)
print(f"\nSubmission saved to: {output_path}")

Submission DataFrame:
        id  winner_model_a  winner_model_b  winner_tie
0   136060            0.33            0.33        0.34
1   211333            0.33            0.33        0.34
2  1233961            0.33            0.33        0.34

Probability sums - Min: 1.000, Max: 1.000

Submission saved to: /kaggle/working/submission.csv
