# **Single Multitask Pipeline**

**Pipeline:** cleaned dataset (csv) -> dataframe -> classify reviews -> prediction dataframe -> prediction (csv)

# Set up
- Install the packages
- Import dependencies

In [22]:
%pip install pandas numpy matplotlib seaborn httpx tqdm scikit-learn

Note: you may need to restart the kernel to use updated packages.



[notice] A new release of pip is available: 25.0.1 -> 25.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import pandas as pd
import httpx
import json
from tqdm import tqdm
import ast

# Multi-task Pipeline
This pipeline treat the reviews **one by one**. For every review, the function send one request to the AI model

Load the dataset

In [None]:
df = pd.read_csv('../data/cleaned_reviews_noempty.csv')
df.head()

Unnamed: 0,store_name,rating,review,reviewer_name
0,49 SEATS,5,wowowow great vibes and food!! super eccentric...,Hannah Eva
1,49 SEATS,4,We had the classic pasta and fish n chips with...,S dssp
2,49 SEATS,5,Its an amazing restaurant with good vibes,Sanjith
3,49 SEATS,5,great atmosphere,Vivian L
4,49 SEATS,5,Great atmosphere and vibes!,Jayden


Prompt setup

In [None]:
# Read single-task prompts from .txt files using a helper function
def read_prompt(filename):
    # Input: filename (str) - path to the .txt file containing the prompt
    # Output: prompt (str) - the prompt text read from the file
    file_path = os.path.join("..", "prompts", filename)
    with open(file_path, 'r', encoding='utf-8') as f:
        return f.read()

return_type_prompt = "After clarifying, return a dictionary with key are Ad, Irr, Rant, or Val, value : 1 for detected, 0 for not detected."
multitask_prompt = read_prompt('few_shot_prompt.txt') 

Classify function:
- Input: dataframe 
- Output: dataframe with predictions

Parse in the desired "multitask_model_name":
- for qwen-3-8b: "qwen/qwen3-8b:free"
- for gemma-3-12b: "google/gemma-3-12b-it:free"

In [None]:
def classify_df(df, api_key):
    headers = {
        "Authorization": f"Bearer {api_key}",
        "X-Title": "Multitask Review Classification",
        "Content-Type": "application/json"
    }

    multitask_model_name = "google/gemma-3-12b-it:free" # plug the desired model name here

    reviews = df['review'].tolist()

    ad_preds = []
    irrelevant_preds = []
    rant_preds = []
    val_preds = []

    def classify_review(review, rating, store_name, reviewer_name):
        prompt = multitask_prompt + "\n" + return_type_prompt
        user_content = prompt.format(
            clean_text=review,
            rating=rating,
            store_name=store_name,
            reviewer_name=reviewer_name
        )
        data = {
            "model": multitask_model_name,
            "messages": [
                {"role": "system", "content": "You are a helpful assistant."},
                {"role": "user", "content": user_content}
            ]
        }
        with httpx.Client() as client:
            resp = client.post(
                "https://openrouter.ai/api/v1/chat/completions",
                headers=headers,
                json=data,
                timeout=60
            )
            resp.raise_for_status()
            result = resp.json()
            return result["choices"][0]["message"]["content"]

    for idx in tqdm(range(len(reviews)), desc="Classifying reviews"):
        review = reviews[idx]
        rating = df.loc[idx, 'rating']
        store_name = df.loc[idx, 'store_name']
        reviewer_name = df.loc[idx, 'reviewer_name']
        pred = classify_review(review, rating, store_name, reviewer_name)
        # Extract the dictionary from the pred string
        start = pred.find('{')
        end = pred.rfind('}') + 1
        pred_dict = ast.literal_eval(pred[start:end])

        ad_preds.append(pred_dict.get('Ad', 'NA'))
        irrelevant_preds.append(pred_dict.get('Irr', 'NA'))
        rant_preds.append(pred_dict.get('Rant', 'NA'))
        val_preds.append(pred_dict.get('Val', 'NA'))

    multitask_df = pd.DataFrame({
        'review': reviews,
        'val_pred': val_preds,
        'ad_pred': ad_preds,
        'irrelevant_pred': irrelevant_preds,
        'rant_pred': rant_preds
    })
    multitask_df.to_csv('multitask_predictions.csv', index=False)
    return multitask_df

# To use:


Set up the export function (from dataframe to csv)

In [21]:
def export_df_to_csv(df, filename):
    df.to_csv(filename, index=False)

Assign the API key variable

In [None]:
api_1 = "your_api"

- Batch 1: 0 to 9 index

In [20]:
df_0_9 = df[:10]
df_0_9

Unnamed: 0,store_name,rating,review,reviewer_name
0,49 SEATS,5,wowowow great vibes and food!! super eccentric...,Hannah Eva
1,49 SEATS,4,We had the classic pasta and fish n chips with...,S dssp
2,49 SEATS,5,Its an amazing restaurant with good vibes,Sanjith
3,49 SEATS,5,great atmosphere,Vivian L
4,49 SEATS,5,Great atmosphere and vibes!,Jayden
5,49 SEATS,5,Wonderful food and service!,Andy Lim
6,49 SEATS,5,"The atmosphere and service are excellent, enjo...",郑苡萱
7,49 SEATS,2,"Resturant price, hawker center quality (the bl...",s z
8,49 SEATS,5,Always enjoy my meal there. Take note sometime...,AG Lee
9,49 SEATS,5,"Atmosphere here is great, food wise is also aw...",lee shaoxuan


In [22]:
df_0_9_pred = classify_df(df_0_9, api_1)

Classifying reviews: 100%|██████████| 10/10 [01:35<00:00,  9.59s/it]


In [23]:
df_0_9_pred

Unnamed: 0,review,val_pred,ad_pred,irrelevant_pred,rant_pred
0,wowowow great vibes and food!! super eccentric...,1,0,0,0
1,We had the classic pasta and fish n chips with...,1,0,0,0
2,Its an amazing restaurant with good vibes,1,0,0,0
3,great atmosphere,1,0,0,0
4,Great atmosphere and vibes!,1,0,0,0
5,Wonderful food and service!,1,0,0,0
6,"The atmosphere and service are excellent, enjo...",1,0,0,0
7,"Resturant price, hawker center quality (the bl...",1,0,0,0
8,Always enjoy my meal there. Take note sometime...,1,0,0,0
9,"Atmosphere here is great, food wise is also aw...",0,1,0,0


In [24]:
export_df_to_csv(df_0_9_pred, "df_0_9_pred.csv")