**AI & Machine Learning (KAN-CINTO4003U) - Copenhagen Business School | Spring 2025**

***


# Part III: LLM

Please see the description of the assignment in the README file (section 3) <br>
**Guide notebook**: [guides/llm_guide.ipynb](guides/llm_guide.ipynb)


***

<br>

* Note that you should report results using a classification report. 

* Also, remember to include some reflections on your results: how do they compare with the results from Part I, BoW?, and part II, BERT? Are there any hyperparameters or prompting techniques that are particularly important?

* You should follow the steps given in the `llm_guide` notebook

<br>


***

In [1]:
import pandas as pd
import os
from decouple import Config, RepositoryEnv
from ibm_watsonx_ai import APIClient
from ibm_watsonx_ai import Credentials
from ibm_watsonx_ai.foundation_models import ModelInference


In [None]:

# Correct relative path from assignments folder
env_path = os.path.abspath("mas/ma2/.env")  # Moves up one level to ma2
config = Config(RepositoryEnv(env_path))

WX_API_KEY = config("WX_API_KEY")


# Set up credentials
credentials = Credentials(
    url="https://us-south.ml.cloud.ibm.com/",
    api_key=WX_API_KEY
)

# Initialize API client
client = APIClient(
    credentials=credentials,
    project_id="77fa3fd6-5a4e-422f-bca4-3a2392ce6b70"
)

FileNotFoundError: [Errno 2] No such file or directory: '/Users/emilstausbol/Documents/GitHub/ma2/mas/ma2/.env'

### Testing the connection

In [51]:
model = ModelInference(
    api_client=client,
    model_id="ibm/granite-13b-instruct-v2",
)

In [52]:
prompt = "How do I build a gaming PC?"
generated_response = model.generate(prompt)

generated_response

{'model_id': 'ibm/granite-13b-instruct-v2',
 'created_at': '2025-03-30T07:13:41.683Z',
 'results': [{'generated_text': 'The first step is to decide what parts you want to use. You can start with a motherboard',
   'generated_token_count': 20,
   'input_token_count': 8,
   'stop_reason': 'max_tokens'}],
    'id': 'unspecified_max_new_tokens',
    'additional_properties': {'limit': 0,
     'new_value': 20,
     'parameter': 'parameters.max_new_tokens',
     'value': 0}}]}}

In [53]:
from ibm_watsonx_ai.foundation_models.schema import TextGenParameters

TextGenParameters.show()

+-----------------------+----------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| PARAMETER             | TYPE                                   | EXAMPLE VALUE                                                                                                                             |
| decoding_method       | str, TextGenDecodingMethod, NoneType   | sample                                                                                                                                    |
+-----------------------+----------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------+
| length_penalty        | dict, TextGenLengthPenalty, NoneType   | {'decay_factor': 2.5, 'start_index': 5}                                                                  

### Setting parameters

In [54]:
PARAMS = TextGenParameters(
    temperature=0.8,      # Higher temperature means more randomness
    max_new_tokens=500, # Maximum number of tokens to generate
    min_new_tokens=200, # Minimum number of tokens to generate
)

model = ModelInference(
    api_client=client,
    model_id="ibm/granite-13b-instruct-v2",
    params=PARAMS
)

In [55]:
response = model.generate(prompt)
response

{'model_id': 'ibm/granite-13b-instruct-v2',
 'created_at': '2025-03-30T07:13:46.245Z',
 'results': [{'generated_text': 'To build a gaming PC, you will need a motherboard, processor, graphics card, memory, storage, and a power supply. You will also need to install an operating system and any games you want to play. \nBuilding a gaming PC is not difficult, but it does require some experience with computer hardware. If you are not comfortable assembling your own PC, you can purchase a pre-built gaming PC from many retailers. \nOnce you have all of the components, installing them into a case is fairly straightforward. Be sure to read the instructions that come with your motherboard and CPU before beginning assembly. \nIf you have any problems during the build process, there are many online resources available to help you. Once your PC is assembled, you will need to install an operating system (such as Windows 10) and any games you want to play. \nIf you have any questions about building a 

In [56]:
print(response["results"][0]["generated_text"])

To build a gaming PC, you will need a motherboard, processor, graphics card, memory, storage, and a power supply. You will also need to install an operating system and any games you want to play. 
Building a gaming PC is not difficult, but it does require some experience with computer hardware. If you are not comfortable assembling your own PC, you can purchase a pre-built gaming PC from many retailers. 
Once you have all of the components, installing them into a case is fairly straightforward. Be sure to read the instructions that come with your motherboard and CPU before beginning assembly. 
If you have any problems during the build process, there are many online resources available to help you. Once your PC is assembled, you will need to install an operating system (such as Windows 10) and any games you want to play. 
If you have any questions about building a gaming PC, there are many online forums where you can ask for help. 


# Using LLMs for classification

In [57]:
import pandas as pd
from sklearn.metrics import classification_report 
from tqdm import tqdm

### Load the data

We can load this data directly from [Hugging Face Datasets](https://huggingface.co/docs/datasets/) - The HuggingFace Hub- into a Pandas DataFrame. Pretty neat!

**Note**: This cell will download the dataset and keep it in memory. If you run this cell multiple times, it will download the dataset multiple times.

You are welcome to increase the `frac` parameter to load more data.

### Reflections

We tried changing the system prompt to be more accurate and specific compared to the one given in the guide
Results: This greatly changed the results for the worse, resulting in us sticking to the one given in the guide.

Secondly we changed the parameters, which increased the scores 1%-2% 

In [58]:

splits = {'train': 'data/train-00000-of-00001.parquet', 'test': 'data/test-00000-of-00001.parquet'}
# train = pd.read_parquet("hf://datasets/fancyzhx/ag_news/" + splits["train"])
test = pd.read_parquet("hf://datasets/fancyzhx/ag_news/" + splits["test"])

In [59]:

label_map = {
    0: 'World',
    1: 'Sports',
    2: 'Business',
    3: 'Sci/Tech'
}

def preprocess(df: pd.DataFrame, frac : float = 1e-2, label_map : dict[int, str] = label_map, seed : int = 42) -> pd.DataFrame:


    return  (
        df
        .assign(label=lambda x: x['label'].map(label_map))
        [lambda df: df['label'].isin(label_map.values())]
        .groupby('label')[["text", "label"]]
        .apply(lambda x: x.sample(frac=frac, random_state=seed))
        .reset_index(drop=True)

    )

# train_df = preprocess(train, frac=0.01)
test_df = preprocess(test, frac=0.1)

# clear up some memory by deleting the original dataframes
# del train
del test

test_df.shape # , train_df.shape

(760, 2)

### Set model parameters

In [60]:
PARAMS = TextGenParameters(
    temperature=0.3,          # Slightly increase randomness
    top_k=20,                 # Broaden token selection
    top_p=0.95,               # Allow more varied generation
    max_new_tokens=50,        # Give the model more room to classify
    stop_sequences=["\n"]     # Let it stop naturally
)

model = ModelInference(
    api_client=client,
    model_id="ibm/granite-13b-instruct-v2",  # We could also try a larger model!
    params=PARAMS
)

### Create a system prompt

In [61]:
SYSTEM_PROMPT = """You task is to classify news stories into one of five categories

CATEGORIES:
{categories}

TEXT:
{text}

Please assign the correct category to the text. Answer with the correct category and nothing else.

Category:
"""

### Generate predictions

In [62]:
CATEGORIES = "- " + "\n- ".join(test_df["label"].unique())  # Create a string with all categories

predictions = []

for text in tqdm(test_df["text"]):

    # format the prompt with the categories and the text
    prompt = SYSTEM_PROMPT.format(categories=CATEGORIES, text=text)
    
    # generate the response from the model
    response = model.generate(prompt)

    # extract the generated text from the response
    prediction = response["results"][0]["generated_text"].strip()

    # append the prediction to the list of predictions
    predictions.append(prediction)

100%|██████████| 760/760 [03:45<00:00,  3.37it/s]


### Evaluate performance

In [63]:
print(classification_report(test_df.label, predictions))

              precision    recall  f1-score   support

    Business       0.54      0.92      0.68       190
    Sci/Tech       0.90      0.36      0.52       190
      Sports       0.96      0.91      0.93       190
       World       0.82      0.79      0.80       190

    accuracy                           0.74       760
   macro avg       0.80      0.74      0.73       760
weighted avg       0.80      0.74      0.73       760

