# Initial Testing with pretrained LLM (Gemma, Qwen)

## Install packages

In [1]:
%pip install python-dotenv huggingface_hub pandas transformers hf_xet accelerate
%pip install torch torchvision --index-url https://download.pytorch.org/whl/cu129

Note: you may need to restart the kernel to use updated packages.
Looking in indexes: https://download.pytorch.org/whl/cu129
Note: you may need to restart the kernel to use updated packages.


## Import packages

In [12]:
import os
from dotenv import load_dotenv
import pandas as pd
from huggingface_hub import login, InferenceClient
import re

## Login to HuggingFace

In [10]:
load_dotenv()
hf_token = os.getenv("HF_TOKEN")
login(token=hf_token)

Note: Environment variable`HF_TOKEN` is set and is the current active token independently from the token you've just configured.


## Load Dataset (kaggle)

In [4]:
df = pd.read_csv("./data/reviews.csv")
reviews = df["text"].dropna().tolist()
print(df.head())
print(len(df))
print(len(reviews))

                     business_name    author_name  \
0  Haci'nin Yeri - Yigit Lokantasi    Gulsum Akar   
1  Haci'nin Yeri - Yigit Lokantasi  Oguzhan Cetin   
2  Haci'nin Yeri - Yigit Lokantasi     Yasin Kuyu   
3  Haci'nin Yeri - Yigit Lokantasi     Orhan Kapu   
4  Haci'nin Yeri - Yigit Lokantasi     Ozgur Sati   

                                                text  \
0  We went to Marmaris with my wife for a holiday...   
1  During my holiday in Marmaris we ate here to f...   
2  Prices are very affordable. The menu in the ph...   
3  Turkey's cheapest artisan restaurant and its f...   
4  I don't know what you will look for in terms o...   

                                               photo  rating  \
0         dataset/taste/hacinin_yeri_gulsum_akar.png       5   
1        dataset/menu/hacinin_yeri_oguzhan_cetin.png       4   
2  dataset/outdoor_atmosphere/hacinin_yeri_yasin_...       3   
3  dataset/indoor_atmosphere/hacinin_yeri_orhan_k...       5   
4           dataset/menu

## Prompt design

In [11]:
prompts = [f"""
Classify the following Google review into one category.
Categories: [Advertisement, Irrelevant Content, Rant Without Visit, Clean]
Respond with only the category name.

Review: {review}
Answer:
""" for review in reviews]
print(prompts[0])


Classify the following Google review into one category.
Categories: [Advertisement, Irrelevant Content, Rant Without Visit, Clean]
Respond with only the category name.

Review: We went to Marmaris with my wife for a holiday. We chose this restaurant as a place for dinner based on the reviews and because we wanted juicy food. When we first went there was a serious queue. You proceed by taking the food you want in the form of an open buffet. Both vegetable dishes and meat dishes were plentiful. There was also dessert for those who wanted it. After you get what you want you pay at the cashier. They don't go through cards they work in cash. There was a lot of food variety. And the food prices were unbelievably cheap. We paid only 84 TL for all the meals here. It included buttermilk and bread. But unfortunately I can't say it's too clean as a place..
Answer:



## Inference pipeline (Qwen3 on cloud)

### Zero shot inference

In [6]:
client = InferenceClient()
results = [client.chat_completion(
    model="Qwen/Qwen3-8B",
    messages=[
        {"role": "user", "content": prompt}
    ]
) for prompt in prompts]
cleaned = []
for out in results:
    parts = re.split(r"</think>", out.choices[0].message.content, maxsplit=1, flags=re.DOTALL)
    if len(parts) > 1:
        cleaned.append(parts[1].strip())
    else:
        cleaned.append(out.choices[0].message.content.strip())
print(cleaned)

TypeError: expected string or bytes-like object, got 'ChatCompletionOutput'

### Few shots inference

In [None]:
client = InferenceClient()
results = [client.chat_completion(
    model="Qwen/Qwen3-8B",
    messages=[
        {"role": "user", "content": "I work for them Barre Vt Location"},
        {"role": "assistant", "content": "Advertisement"},
        {"role": "user", "content": "You can review a lake? How does that work"},
        {"role": "assistant", "content": "Irrelevant Content"},
        {"role": "user", "content": "Didn't go here lol"},
        {"role": "assistant", "content": "Rant Without Visit"},
        {"role": "user", "content": "We went to Marmaris with my wife for a holiday. We chose this restaurant as a place for dinner based on the reviews and because we wanted juicy food. When we first went there was a serious queue. You proceed by taking the food you want in the form of an open buffet. Both vegetable dishes and meat dishes were plentiful. There was also dessert for those who wanted it. After you get what you want you pay at the cashier. They don't go through cards they work in cash. There was a lot of food variety. And the food prices were unbelievably cheap. We paid only 84 TL for all the meals here. It included buttermilk and bread. But unfortunately I can't say it's too clean as a place."},
        {"role": "assistant", "content": "Clean"},
        {"role": "user", "content": prompt}
    ]
) for prompt in prompts]
cleaned = []
for out in results:
    parts = re.split(r"</think>", out.choices[0].message.content, maxsplit=1, flags=re.DOTALL)
    if len(parts) > 1:
        cleaned.append(parts[1].strip())
    else:
        cleaned.append(out.choices[0].message.content.strip())
print(cleaned)

['Clean']
