## Policy F: Classify the intent of the reviews

Idea - we want a quantitative value that means “how likely is this a genuine review?”

**So How?**

__Zero-shot classifier__
- Model: facebook/bart-large-mnli (or MoritzLaurer/deberta-v3-large-zeroshot-v2)
- Then provide the labels as plain English strings:
- “genuine”, “spam”, “advertising”, “competitor attack”, “irrelevant”
__Model returns a probability for each label__
- The score: S(intent) = P("genuine") which is already in [0,1] scale.

__“Irrelevancy” - How do we judge that?__
Idea - does the text actually talk about this place?
Ex: lets say the review is about “Baskin-Robbins ice cream” but the location is “Dominos Pizza”
Cosine sim
- sim_name = cos01(emb_text, emb_name)
- sim_desc = cos01(emb_text, emb_desc)
- sim_cat = cos01(emb_text, emb_cat)
- S(relevancy) = max(sim_name, sim_desc, sim_cat)

In [1]:
from typing import Dict, List, Optional, Tuple
import numpy as np
import pandas as pd
from transformers import pipeline
from sentence_transformers import SentenceTransformer

  from .autonotebook import tqdm as notebook_tqdm


ImportError: cannot import name 'TFPreTrainedModel' from 'transformers' (/opt/anaconda3/envs/myenv/lib/python3.13/site-packages/transformers/__init__.py)

In [None]:
# configuration
INTENT_LABELS = ['genuine', 'spam', 'advertising', 'competitor attack', "incentivize", "mistaken identity"]

# zero-shot classifier (ZSC) model
ZSC_MODEL_NAME = "facebook/bart-large-mnli"

# Model holders for initial state
_ZSC_PIPELINE = None



In [None]:
# utility functions

def get_zero_shot_pipeline(model_name: str = ZSC_MODEL_NAME):
    global _ZSC_PIPELINE
    # check if is the initial state == None
    if _ZSC_PIPELINE is None:
        _ZSC_PIPELINE = pipeline(
            task = "zero-shot-classification",
            model = model_name
        )
    return _ZSC_PIPELINE


In [None]:
# intent classification
def score_intent(
    text:str,
    labels: Optional[List[str]] = None,
    model_name: str = ZSC_MODEL_NAME,
) -> Dict[str, float]:
    # returns a dict: {label: probability}, including S(intent) = P("genuine").
    
    if not text or not text.strip():
        # edge case
        # if empty text, return 0 probability for all labels, including intent
        base = {lbl: 0.0 for lbl in labels or INTENT_LABELS}
        base["S_intent"] = 0.0
        return base
    
    zsc = get_zero_shot_pipeline(model_name)
    use_labels = labels or INTENT_LABELS
    
    res = zsc(
        sequences=text,
        candidate_labels = use_labels,
        multi_label = False # pick one distribution that sums ~1
    )
    
    # mapping scores to their labels
    scores = dict(zip(res["labels"], res['scores']))
    
    # ensuring all labels exist, incase the model drops something
    for lbl in use_labels:
        scores.setdefault(lbl, 0.0)
    
    # S(intent) = P("genuine")
    scores["S_intent"] = float(scores.get("genuine", 0.0))
    return scores


def batch_score_intent(
    texts: List[str],
    labels: Optional[List[str]] = None,
    model_name: str = ZSC_MODEL_NAME,
    batch_size: int = 16
) -> List[Dict[str, float]]:
    """
    Batched intent scoring for throughput. Returns list of per-text dicts.
    """
    zsc = get_zero_shot_pipeline(model_name)
    use_labels = labels or INTENT_LABELS
    outputs: List[Dict[str, float]] = []

    for i in range(0, len(texts), batch_size):
        chunk = texts[i:i+batch_size]
        res_list = zsc(sequences=chunk, candidate_labels=use_labels, multi_label=False)
        if isinstance(res_list, dict):
            res_list = [res_list]
        for res in res_list:
            scores = dict(zip(res["labels"], res["scores"]))
            for lbl in use_labels:
                scores.setdefault(lbl, 0.0)
            scores["S_intent"] = float(scores.get("genuine", 0.0))
            outputs.append(scores)
    return outputs

In [14]:
df = pd.read_csv('/Users/evan/Documents/Projects/TikTok-TechJam-2025/final_data_sampled.csv')

In [15]:
df.head()


Unnamed: 0.1,Unnamed: 0,rating,text,business_name,business_category,business_description,_id
0,848694,5,Excellent beach for family activities great su...,'Ehukai Beach Park,"['Park', 'Public beach', 'Tourist attraction']",Popular surfing beach offering massive wintert...,1.1730942640485394e+20_1605375558437
1,848706,5,My favorite Beach for surfing on Oahu North Sh...,'Ehukai Beach Park,"['Park', 'Public beach', 'Tourist attraction']",Popular surfing beach offering massive wintert...,1.1249899958787118e+20_1570685676722
2,848685,5,Usually a parking spot available and a nice sp...,'Ehukai Beach Park,"['Park', 'Public beach', 'Tourist attraction']",Popular surfing beach offering massive wintert...,1.1677373083828122e+20_1618554513347
3,848711,5,Nice small beach. Great place to watch surfers,'Ehukai Beach Park,"['Park', 'Public beach', 'Tourist attraction']",Popular surfing beach offering massive wintert...,1.0664503467931671e+20_1541146996259
4,848700,5,Awesome spot for surfing!,'Ehukai Beach Park,"['Park', 'Public beach', 'Tourist attraction']",Popular surfing beach offering massive wintert...,1.1425088661032362e+20_1612418675718


In [21]:
review = df["text"].iloc[0]
print(f"Review: {review}")


# Intent
intent_scores = score_intent(review)
print("Intent scores:", intent_scores)
print("S(intent) =", intent_scores["S_intent"])


Review: Excellent beach for family activities great sunset


Device set to use mps:0


Intent scores: {'genuine': 0.7652190923690796, 'advertising': 0.14800938963890076, 'competitor attack': 0.05454599857330322, 'spam': 0.032225579023361206, 'S_intent': 0.7652190923690796}
S(intent) = 0.7652190923690796


In [None]:
sample_df = df.sample(n=10, random_state=42).reset_index(drop=True).copy()
reviews = sample_df["text"].fillna("").tolist()

intent_scores = batch_score_intent(reviews)
intent_scores_df = pd.DataFrame(intent_scores)
result = pd.concat([sample_df, intent_scores_df], axis=1)
print(result)

   Unnamed: 0  rating                                               text  \
0      617643       5  Good fast Korean food. You can get an extra si...   
1      799373       4  Located in the small ranching town of Waimea o...   
2      188913       3                 It's all the way on the West Coast   
3      763245       4  Great selection of Tequilas and Jo the bartner...   
4      399817       4  Lots of stunning views.  Long hike left at 8:3...   
5      664326       5                            Good is very delicious!   
6      297606       5  Fun mini golf here. They also have a zipline b...   
7      154787       5  Awesome,  compassionate hospitality good food ...   
8     1047180       5  Georgia Peach and Durian,  a new first time fl...   
9      898727       5   Great bakery. Great selection of tasty products.   

                   business_name  \
0             Sam's Delicatessen   
1          Merriman's Big Island   
2         Ewa Pointe Marketplace   
3              Mi A