**AI & Machine Learning (KAN-CINTO4003U) - Copenhagen Business School | Spring 2025**

***


# Part III: LLM

Please see the description of the assignment in the README file (section 3) <br>
**Guide notebook**: [guides/llm_guide.ipynb](guides/llm_guide.ipynb)


***

<br>

* Note that you should report results using a classification report. 

* Also, remember to include some reflections on your results: how do they compare with the results from Part I, BoW?, and part II, BERT? Are there any hyperparameters or prompting techniques that are particularly important?

* You should follow the steps given in the `llm_guide` notebook

<br>


***

In [1]:
import sys
print(sys.version)


3.10.16 | packaged by conda-forge | (main, Dec  5 2024, 14:07:43) [MSC v.1942 64 bit (AMD64)]


In [2]:
import numpy as np
import pandas as pd
print(np.__version__)
print(pd.__version__)


ValueError: numpy.dtype size changed, may indicate binary incompatibility. Expected 96 from C header, got 88 from PyObject

In [None]:
# imports for the project

import pandas as pd

### 1. Load the data

We can load this data directly from [Hugging Face Datasets](https://huggingface.co/docs/datasets/) - The HuggingFace Hub- into a Pandas DataFrame. Pretty neat!

**Note**: This cell will download the dataset and keep it in memory. If you run this cell multiple times, it will download the dataset multiple times.

You are welcome to increase the `frac` parameter to load more data.

In [None]:

splits = {'train': 'data/train-00000-of-00001.parquet', 'test': 'data/test-00000-of-00001.parquet'}
# train = pd.read_parquet("hf://datasets/fancyzhx/ag_news/" + splits["train"])
test = pd.read_parquet("hf://datasets/fancyzhx/ag_news/" + splits["test"])

In [None]:
label_map = {
    0: 'World',
    1: 'Sports',
    2: 'Business',
    3: 'Sci/Tech'
}

def preprocess(df: pd.DataFrame, frac = 1e-2, label_map = label_map, seed=42) -> pd.DataFrame:
    return  (
        df
        .assign(label=lambda x: x['label'].map(label_map))
        [lambda df: df['label'].isin(label_map.values())]
        .groupby('label')
        .apply(lambda x: x.sample(frac=frac, random_state=seed))
        .reset_index(drop=True)

    )

# train_df = preprocess(train, frac=0.01)
test_df = preprocess(test, frac=0.1)

# clear up some memory by deleting the original dataframes
# del train
del test

test_df.shape, # train_df.shape, 

- Downgraded to python 3.10 in order to support the ibm-watsonx-ai instalation

In [None]:
!pip install python-decouple ibm-watsonx-ai

In [None]:
import os
print(os.getcwd())


In [None]:
from decouple import config
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference

In [None]:
WX_API_KEY = "ApiKey-da8f60f5-5005-4713-b215-42f9f8cb1efc"
PROJECT_ID = "6410ce5d-e689-4e44-ba03-9c7caa86da5c"

credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com",  # Dallas endpoint
    api_key=WX_API_KEY
)

client = APIClient(
    credentials=credentials, 
    project_id=PROJECT_ID
)


In [None]:
# Set Text Generation parameters for deterministic output
PARAMS = TextGenParameters(
    temperature=0,             # No randomness for consistent classification
    max_new_tokens=10,         # Expecting a short response (the category)
    stop_sequences=[".", "\n"] # Stop generation at a period or newline
)

# Initialize the model inference using the Granite model
model = ModelInference(
    api_client=client,
    model_id="ibm/granite-13b-instruct-v2",
    params=PARAMS
)

In [None]:
splits = {'train': 'data/train-00000-of-00001.parquet', 'test': 'data/test-00000-of-00001.parquet'}
# For LLM classification, we work with the test set
test = pd.read_parquet("hf://datasets/fancyzhx/ag_news/" + splits["test"])

label_map = {
    0: 'World',
    1: 'Sports',
    2: 'Business',
    3: 'Sci/Tech'
}

def preprocess(df: pd.DataFrame, frac=1e-2, label_map=label_map, seed=42) -> pd.DataFrame:
    """
    Preprocess the dataset by mapping numeric labels to categories,
    filtering out any unexpected labels, and sampling a fraction of data.
    """
    return (
        df
        .assign(label=lambda x: x['label'].map(label_map))
        [lambda df: df['label'].isin(label_map.values())]
        .groupby('label')
        .apply(lambda x: x.sample(frac=frac, random_state=seed))
        .reset_index(drop=True)
    )

# Sample 10% of the test set for evaluation
test_df = preprocess(test, frac=0.1)
del test
print("Test set shape:", test_df.shape)

In [None]:
SYSTEM_PROMPT = (
    "Your task is to classify the following news article into one of the following categories:\n\n"
    "Categories:\n{categories}\n\n"
    "Article:\n{text}\n\n"
    "Please provide the correct category from the list above. Answer with only the category name.\n"
    "Category:"
)

# Create a string with all unique categories (one per line)
CATEGORIES = "\n".join(["- " + cat for cat in test_df["label"].unique()])

In [None]:
predictions = []

for text in tqdm(test_df["text"], desc="Classifying articles"):
    # Format the prompt with the categories and the current article text
    prompt = SYSTEM_PROMPT.format(categories=CATEGORIES, text=text)
    response = model.generate(prompt)
    prediction = response["results"][0]["generated_text"].strip()
    predictions.append(prediction)

In [None]:
print("LLM Classification Performance on Test Set:")
print(classification_report(test_df["label"], predictions))