# Initial Testing with pretrained LLM (Gemma, Qwen)

This is our initial testing with pretrained LLM. As the results are slow and expensive, we decided to switch to a better alternative with BERT.

## Install packages

In [None]:
%pip install python-dotenv huggingface_hub pandas

## Import packages

In [None]:
import os
from dotenv import load_dotenv
import pandas as pd
from huggingface_hub import login, InferenceClient
import re

## Login to HuggingFace

Place your huggingface token in `.env` file as mentioned in `README.md`.

In [None]:
load_dotenv()
hf_token = os.getenv("HF_TOKEN")
login(token=hf_token)

## Load dataset (kaggle)

This dataset is from kaggle and used for testing llm solution. Download the dataset as mentioned in `README.md` and place in `./data` directory.

In [None]:
df = pd.read_csv("./data/reviews.csv")
reviews = df["text"].dropna().tolist()
print(df.head())
print(len(df))
print(len(reviews))

## Prompt design

We used a simple prompt to classify into the 3 policy violations and clean classes.

In [None]:
prompts = [f"""
Classify the following Google review into one category.
Categories: [Advertisement, Irrelevant Content, Rant Without Visit, Clean]
Respond with only the category name.

Review: {review}
Answer:
""" for review in reviews]
print(prompts[0])

## Inference pipeline (Qwen3 on cloud)

Due to heavy inference on local computer, we decided to inference on cloud using huggingface inference client. Qwen3 is chosen as Gemma has some issues with the inference client and Qwen3 is a great state of the art LLM.

### Zero shot inference

Zero shot inference with Qwen3 takes quite a long time.

In [None]:
client = InferenceClient()
results = [client.chat_completion(
    model="Qwen/Qwen3-8B",
    messages=[
        {"role": "user", "content": prompt}
    ]
) for prompt in prompts]
cleaned = []
for out in results:
    parts = re.split(r"</think>", out.choices[0].message.content, maxsplit=1, flags=re.DOTALL)
    if len(parts) > 1:
        cleaned.append(parts[1].strip())
    else:
        cleaned.append(out.choices[0].message.content.strip())
print(cleaned)

### Few shots inference

Few shots inference with Qwen3 also takes quite a long time.

In [None]:
client = InferenceClient()
results = [client.chat_completion(
    model="Qwen/Qwen3-8B",
    messages=[
        {"role": "user", "content": "I work for them Barre Vt Location"},
        {"role": "assistant", "content": "Advertisement"},
        {"role": "user", "content": "You can review a lake? How does that work"},
        {"role": "assistant", "content": "Irrelevant Content"},
        {"role": "user", "content": "Didn't go here lol"},
        {"role": "assistant", "content": "Rant Without Visit"},
        {"role": "user", "content": "We went to Marmaris with my wife for a holiday. We chose this restaurant as a place for dinner based on the reviews and because we wanted juicy food. When we first went there was a serious queue. You proceed by taking the food you want in the form of an open buffet. Both vegetable dishes and meat dishes were plentiful. There was also dessert for those who wanted it. After you get what you want you pay at the cashier. They don't go through cards they work in cash. There was a lot of food variety. And the food prices were unbelievably cheap. We paid only 84 TL for all the meals here. It included buttermilk and bread. But unfortunately I can't say it's too clean as a place."},
        {"role": "assistant", "content": "Clean"},
        {"role": "user", "content": prompt}
    ]
) for prompt in prompts]
cleaned = []
for out in results:
    parts = re.split(r"</think>", out.choices[0].message.content, maxsplit=1, flags=re.DOTALL)
    if len(parts) > 1:
        cleaned.append(parts[1].strip())
    else:
        cleaned.append(out.choices[0].message.content.strip())
print(cleaned)