<a href="https://colab.research.google.com/github/Yonah18/ML_Learning/blob/main/Hugging_face.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

In [2]:
import math
from transformers import pipeline

# Dataset
satisfied = [
    "service was quick and helpful",
    "happy with customer support",
    "support team was polite",
    "good service experience",
    "problem solved quickly"
]

unsatisfied = [
    "very slow response",
    "issue not resolved",
    "poor customer care",
    "worst support ever",
    "not helpful at all"
]

# Word count function
def word_count(data):
    d = {}
    for s in data:
        for w in s.split():
            d[w] = d.get(w, 0) + 1
    return d

sat_words = word_count(satisfied)
unsat_words = word_count(unsatisfied)

# Priors
total = len(satisfied) + len(unsatisfied)
p_sat = len(satisfied) / total
p_unsat = len(unsatisfied) / total

# Vocabulary
vocab = set(sat_words) | set(unsat_words)
V = len(vocab)

sat_total = sum(sat_words.values())
unsat_total = sum(unsat_words.values())

# Naive Bayes prediction
def predict_nb(text):
    s_score = math.log(p_sat)
    u_score = math.log(p_unsat)

    for w in text.split():
        s_score += math.log((sat_words.get(w, 0) + 1) / (sat_total + V))
        u_score += math.log((unsat_words.get(w, 0) + 1) / (unsat_total + V))

    return "Satisfied" if s_score > u_score else "Unsatisfied"

# Test data
tests = [
    "quick and helpful service",
    "poor customer support",
    "support team was polite",
    "very slow response",
    "not helpful at all",
    "happy with the service"
]

print("Naive Bayes:\n")
for t in tests:
    print(f"{t} -> {predict_nb(t)}")

# Hugging Face model
classifier = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english"
)

print("\nHugging Face:\n")
for t in tests:
    r = classifier(t)[0]
    label = "Satisfied" if r["label"] == "POSITIVE" else "Unsatisfied"
    print(f"{t} -> {label} ({r['score']:.2f})")

# Comparison
print("\nComparison:")
print("Feedback".ljust(30), "NB".ljust(12), "HF")

for t in tests:
    nb = predict_nb(t)
    hf = "Satisfied" if classifier(t)[0]["label"] == "POSITIVE" else "Unsatisfied"
    print(t.ljust(30), nb.ljust(12), hf)


Naive Bayes:

quick and helpful service -> Satisfied
poor customer support -> Unsatisfied
support team was polite -> Satisfied
very slow response -> Unsatisfied
not helpful at all -> Unsatisfied
happy with the service -> Satisfied


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/629 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

Loading weights:   0%|          | 0/104 [00:00<?, ?it/s]

tokenizer_config.json:   0%|          | 0.00/48.0 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]


Hugging Face:

quick and helpful service -> Satisfied (1.00)
poor customer support -> Unsatisfied (1.00)
support team was polite -> Satisfied (0.82)
very slow response -> Unsatisfied (1.00)
not helpful at all -> Unsatisfied (1.00)
happy with the service -> Satisfied (1.00)

Comparison:
Feedback                       NB           HF
quick and helpful service      Satisfied    Satisfied
poor customer support          Unsatisfied  Unsatisfied
support team was polite        Satisfied    Satisfied
very slow response             Unsatisfied  Unsatisfied
not helpful at all             Unsatisfied  Unsatisfied
happy with the service         Satisfied    Satisfied
