## Policy F: Classify the intent of the reviews

Idea - we want a quantitative value that means “how likely is this a genuine review?”

**So How?**

__Zero-shot classifier__
- Model: facebook/bart-large-mnli (or MoritzLaurer/deberta-v3-large-zeroshot-v2)
- Then provide the labels as plain English strings:
- “genuine”, “spam”, “advertising”, “competitor attack”, “irrelevant”
__Model returns a probability for each label__
- The score: S(intent) = P("genuine") which is already in [0,1] scale.

__“Irrelevancy” - How do we judge that?__
Idea - does the text actually talk about this place?
Ex: lets say the review is about “Baskin-Robbins ice cream” but the location is “Dominos Pizza”
Cosine sim
- sim_name = cos01(emb_text, emb_name)
- sim_desc = cos01(emb_text, emb_desc)
- sim_cat = cos01(emb_text, emb_cat)
- S(relevancy) = max(sim_name, sim_desc, sim_cat)

In [9]:
from typing import Dict, List, Optional, Tuple
import numpy as np
import pandas as pd
from transformers import pipeline
from sentence_transformers import SentenceTransformer

In [5]:
# configuration
INTENT_LABELS = ['genuine', 'spam', 'advertising', 'competitor attack', 'irrelevant']

# zero-shot classifier (ZSC) model
ZSC_MODEL_NAME = "facebook/bart-large-mnli"

# lightweight general-purpose embedder
EMBED_MODEL_NAME = "sentence-transformers/all-MiniLM-L6-v2"

# Model holders for initial state
_ZSC_PIPELINE = None
_EMBED_MODEL = None


In [12]:
# utility functions

def _cos_sim(a:np.ndarray, b: np.ndarray) -> float:
    # range of [-1, 1]
    denom = (np.linalg.norm(a) * np.linalg.norm(b))
    if denom == 0:
        return 0.0
    return float(np.dot(a,b)/denom)


def _cos01(x:float) -> float:
    # mapping cos sim from range [-1, 1] to [0, 1]
    return (x+1.0)/2.0

def get_zero_shot_pipeline(model_name: str = ZSC_MODEL_NAME):
    global _ZSC_PIPELINE
    # check if is the initial state == None
    if _ZSC_PIPELINE is None:
        _ZSC_PIPELINE = pipeline(
            task = "zero-shot-classification",
            model = model_name
        )
    return _ZSC_PIPELINE

def get_embedder(model_name: str = EMBED_MODEL_NAME) -> SentenceTransformer:
    global _EMBED_MODEL
    if _EMBED_MODEL is None:
        _EMBED_MODEL = SentenceTransformer(model_name)
    return _EMBED_MODEL

In [22]:
# intent classification
def score_intent(
    text:str,
    labels: Optional[List[str]] = None,
    model_name: str = ZSC_MODEL_NAME,
) -> Dict[str, float]:
    # returns a dict: {label: probability}, including S(intent) = P("genuine").
    
    if not text or not text.strip():
        # edge case
        # if empty text, return 0 probability for all labels, including intent
        base = {lbl: 0.0 for lbl in labels or INTENT_LABELS}
        base["S_intent"] = 0.0
        return base
    
    zsc = get_zero_shot_pipeline(model_name)
    use_labels = labels or INTENT_LABELS
    
    res = zsc(
        sequences=text,
        candidate_labels = use_labels,
        multi_label = False # pick one distribution that sums ~1
    )
    
    # mapping scores to their labels
    scores = dict(zip(res["labels"], res['scores']))
    
    # ensuring all labels exist, incase the model drops something
    for lbl in use_labels:
        scores.setdefault(lbl, 0.0)
    
    # S(intent) = P("genuine")
    scores["S_intent"] = float(scores.get("genuine", 0.0))
    return scores

In [23]:
review = "The pizza was great and delivery was fast!"
place = {
    "name": "Domino's Pizza",
    "desc": "Pizza delivery and carryout chain offering a wide range of pizzas and sides.",
    "cat": "Pizza restaurant"
}

# Intent
intent_scores = score_intent(review)
print("Intent scores:", intent_scores)
print("S(intent) =", intent_scores["S_intent"])


Intent scores: {'genuine': 0.8091768026351929, 'competitor attack': 0.0965869128704071, 'advertising': 0.06775948405265808, 'spam': 0.0159190371632576, 'irrelevant': 0.010557707399129868, 'S_intent': 0.8091768026351929}
S(intent) = 0.8091768026351929


In [24]:
review = "Text ‘JOIN123’ to this number and get exclusive crypto deals."

# Intent
intent_scores = score_intent(review)
print("Intent scores:", intent_scores)
print("S(intent) =", intent_scores["S_intent"])


Intent scores: {'advertising': 0.44572576880455017, 'genuine': 0.3553007245063782, 'competitor attack': 0.13766080141067505, 'spam': 0.03831101208925247, 'irrelevant': 0.02300165593624115, 'S_intent': 0.3553007245063782}
S(intent) = 0.3553007245063782


In [25]:
review = "Make money online fast!! Visit www.cash-now.biz for instant profit!"

intent_scores = score_intent(review)
print("Intent scores:", intent_scores)
print("S(intent) = ", intent_scores["S_intent"])

Intent scores: {'genuine': 0.6330023407936096, 'competitor attack': 0.21709799766540527, 'advertising': 0.08982175588607788, 'spam': 0.03289635479450226, 'irrelevant': 0.027181584388017654, 'S_intent': 0.6330023407936096}
S(intent) =  0.6330023407936096


#### Findings:

Will need to relate business name, category and description to the model as well, it lacks context and may determine review as "genuine" if not.