# AI L·ªçc N·ªôi Dung ƒê·ªôc H·∫°i - PhoBERT Production

H·ªá th·ªëng ph√°t hi·ªán n·ªôi dung nh·∫°y c·∫£m ti·∫øng Vi·ªát s·ª≠ d·ª•ng PhoBERT.

**T√≠nh nƒÉng:**
- Ph√°t hi·ªán 6 lo·∫°i n·ªôi dung: Safe, Toxic, Hate, Violence, NSFW, Suicide
- Ng∆∞·ª°ng th√¥ng minh cho t·ª´ng lo·∫°i n·ªôi dung
- T·ªëi ∆∞u h√≥a cho production (inference nhanh)

**M√¥ h√¨nh:** vinai/phobert-base-v2 (fine-tuned)

In [3]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


## 1. C√†i ƒê·∫∑t Th∆∞ Vi·ªán

In [4]:
!pip install transformers torch pandas -q

print("‚úì ƒê√£ c√†i ƒë·∫∑t xong c√°c th∆∞ vi·ªán c·∫ßn thi·∫øt!")

‚úì ƒê√£ c√†i ƒë·∫∑t xong c√°c th∆∞ vi·ªán c·∫ßn thi·∫øt!


## 2. Import Th∆∞ Vi·ªán

In [5]:
import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from typing import Dict, Tuple
import warnings

warnings.filterwarnings('ignore')

print("‚úì Import th√†nh c√¥ng!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

‚úì Import th√†nh c√¥ng!
PyTorch version: 2.9.0+cu126
CUDA available: True


## 3. Class PhoBERT Moderator (Production-Ready)

In [6]:
class PhoBERTModerator:
    """
    H·ªá th·ªëng l·ªçc n·ªôi dung ƒë·ªôc h·∫°i ti·∫øng Vi·ªát s·ª≠ d·ª•ng PhoBERT
    T·ªëi ∆∞u h√≥a cho production v·ªõi bi·∫øn static v√† inference nhanh
    """

    # Bi·∫øn static - kh·ªüi t·∫°o 1 l·∫ßn, t√°i s·ª≠ d·ª•ng nhi·ªÅu l·∫ßn
    SMART_THRESHOLDS = {
        'suicide': 0.35,   # C·ª∞C K·ª≤ NGHI√äM TR·ªåNG - ng∆∞·ª°ng th·∫•p
        'violence': 0.40,  # R·∫•t nghi√™m tr·ªçng - ƒëe d·ªça/kh·ªßng b·ªë
        'nsfw': 0.50,      # N·ªôi dung 18+
        'toxic': 0.60,     # ƒê·ªôc h·∫°i
        'hate': 0.70,      # K·ª≥ th·ªã - ng∆∞·ª°ng cao (tr√°nh nh·∫ßm tranh lu·∫≠n)
        'safe': 0.30       # An to√†n
    }

    LABEL_EMOJI = {
        'safe': '‚úÖ',
        'toxic': '‚ö†Ô∏è',
        'hate': 'üö´',
        'violence': '‚öîÔ∏è',
        'nsfw': 'üîû',
        'suicide': 'üíÄ'
    }

    CUMULATIVE_THRESHOLD = 0.60  # Ng∆∞·ª°ng t·ªïng ƒëi·ªÉm ti√™u c·ª±c

    def __init__(self, model_path: str = '/content/drive/MyDrive/phobert_vietnamese_moderation'):
        """
        Kh·ªüi t·∫°o model PhoBERT

        Args:
            model_path: ƒê∆∞·ªùng d·∫´n ƒë·∫øn model ƒë√£ fine-tune
        """
        print("ƒêang t·∫£i PhoBERT model...")

        self.device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
        self.model = AutoModelForSequenceClassification.from_pretrained(model_path)
        self.tokenizer = AutoTokenizer.from_pretrained(model_path)
        self.model.to(self.device)
        self.model.eval()

        # Label mapping
        self.id2label = self.model.config.id2label
        self.label2id = self.model.config.label2id

        print(f"‚úì Model loaded on {self.device}")
        print(f"‚úì Labels: {list(self.id2label.values())}")

    def predict(self, text: str) -> Dict:
        """
        D·ª± ƒëo√°n label cho vƒÉn b·∫£n v·ªõi ng∆∞·ª°ng th√¥ng minh

        Args:
            text: VƒÉn b·∫£n c·∫ßn ki·ªÉm tra

        Returns:
            Dict ch·ª©a k·∫øt qu·∫£ ph√¢n t√≠ch
        """
        # Tokenize
        encoding = self.tokenizer(
            text,
            add_special_tokens=True,
            max_length=256,
            padding='max_length',
            truncation=True,
            return_tensors='pt'
        )

        input_ids = encoding['input_ids'].to(self.device)
        attention_mask = encoding['attention_mask'].to(self.device)

        # Inference
        with torch.no_grad():
            outputs = self.model(input_ids=input_ids, attention_mask=attention_mask)
            logits = outputs.logits
            probs = torch.nn.functional.softmax(logits, dim=1)[0]

        # T√≠nh ƒëi·ªÉm cho t·ª´ng label
        label_scores = {self.id2label[i]: float(probs[i]) for i in range(len(probs))}

        # T√≠nh t·ªïng ƒëi·ªÉm ti√™u c·ª±c
        negative_labels = ['suicide', 'nsfw', 'violence', 'toxic', 'hate']
        cumulative_negative = sum(label_scores.get(label, 0.0) for label in negative_labels)

        # √Åp d·ª•ng logic ng∆∞·ª°ng th√¥ng minh
        if cumulative_negative > self.CUMULATIVE_THRESHOLD:
            # T·ªïng ƒëi·ªÉm ti√™u c·ª±c cao -> ch·ªçn label ti√™u c·ª±c cao nh·∫•t
            negative_scores = {label: label_scores[label] for label in negative_labels
                             if label in label_scores}
            final_label = max(negative_scores.items(), key=lambda x: x[1])[0]
            final_confidence = label_scores[final_label]
        else:
            # Ki·ªÉm tra t·ª´ng label theo th·ª© t·ª± nghi√™m tr·ªçng
            final_label = 'safe'
            final_confidence = label_scores.get('safe', 0.0)

            for label in negative_labels:
                if label in label_scores:
                    score = label_scores[label]
                    threshold = self.SMART_THRESHOLDS[label]

                    if score >= threshold:
                        final_label = label
                        final_confidence = score
                        break

        # ƒê√°nh gi√° m·ª©c ƒë·ªô r·ªßi ro
        risk_level = self._assess_risk(final_label, final_confidence, cumulative_negative)

        return {
            'text': text,
            'label': final_label,
            'confidence': final_confidence,
            'cumulative_negative': cumulative_negative,
            'all_scores': label_scores,
            'risk_level': risk_level,
            'is_safe': final_label == 'safe'
        }

    def _assess_risk(self, label: str, confidence: float, cumulative: float) -> str:
        """ƒê√°nh gi√° m·ª©c ƒë·ªô r·ªßi ro"""
        if label == 'safe':
            return 'no_risk'
        elif label in ['suicide', 'violence'] or cumulative > 0.75:
            return 'high_risk'
        elif confidence > 0.80 or cumulative > 0.65:
            return 'medium_risk'
        else:
            return 'low_risk'

    def print_result(self, result: Dict):
        """In k·∫øt qu·∫£ ƒë·∫πp m·∫Øt"""
        emoji = self.LABEL_EMOJI.get(result['label'], '‚ùì')

        print("=" * 70)
        print(f"Text: {result['text']}")
        print(f"\n{emoji} LABEL: {result['label'].upper()}")
        print(f"Confidence: {result['confidence']:.2%}")
        print(f"T·ªïng ƒëi·ªÉm ti√™u c·ª±c: {result['cumulative_negative']:.2%}")
        print(f"M·ª©c ƒë·ªô r·ªßi ro: {result['risk_level'].upper()}")
        print(f"An to√†n: {'C√ì ‚úì' if result['is_safe'] else 'KH√îNG ‚úó'}")

        # Hi·ªÉn th·ªã chi ti·∫øt ƒëi·ªÉm
        print(f"\nChi ti·∫øt ƒëi·ªÉm:")
        sorted_scores = sorted(result['all_scores'].items(), key=lambda x: x[1], reverse=True)
        for label, score in sorted_scores:
            if score > 0.05:
                bar = '‚ñà' * int(score * 20)
                print(f"  {label:10s}: {score:.2%} {bar}")
        print("=" * 70)

    def batch_predict(self, texts: list) -> list:
        """D·ª± ƒëo√°n cho nhi·ªÅu vƒÉn b·∫£n c√πng l√∫c"""
        results = []
        for text in texts:
            results.append(self.predict(text))
        return results


print("‚úì Class PhoBERTModerator ƒë√£ s·∫µn s√†ng!")

‚úì Class PhoBERTModerator ƒë√£ s·∫µn s√†ng!


## 4. Kh·ªüi T·∫°o Model

In [7]:
# Kh·ªüi t·∫°o model (ch·ªâ 1 l·∫ßn duy nh·∫•t)
moderator = PhoBERTModerator(model_path='/content/drive/MyDrive/phobert_vietnamese_moderation')

print("\n‚úì Model ƒë√£ s·∫µn s√†ng ƒë·ªÉ s·ª≠ d·ª•ng!")

ƒêang t·∫£i PhoBERT model...
‚úì Model loaded on cuda
‚úì Labels: ['safe', 'toxic', 'hate', 'violence', 'nsfw', 'suicide']

‚úì Model ƒë√£ s·∫µn s√†ng ƒë·ªÉ s·ª≠ d·ª•ng!


## 5. Demo - Test V·ªõi C√°c C√¢u Ti·∫øng Vi·ªát

In [8]:
# Test cases ƒëa d·∫°ng
test_cases = [
    # SAFE
    "H√¥m nay tr·ªùi ƒë·∫πp qu√°! M√¨nh ƒëi ch∆°i v·ªõi b·∫°n b√® r·∫•t vui",
    "·∫¢nh ƒë·∫πp qu√° b·∫°n ∆°i! Ch√∫c b·∫°n m·ªôt ng√†y t·ªët l√†nh",

    # TOXIC
    "M√†y ngu v√£i l·ªìn, ƒë·ªì ng·ªëc",
    "ƒê·ªãt m·∫π m√†y, c√∫t ƒëi",

    # HATE
    "B·ªçn ƒë·ªìng t√≠nh ph·∫£i ch·∫øt h·∫øt ƒëi",
    "Mi·ªÅn B·∫Øc ngu h∆°n mi·ªÅn Nam",

    # VIOLENCE
    "Tao s·∫Ω gi·∫øt m√†y b√¢y gi·ªù",
    "ƒê√°nh ch·∫øt m√†y cho bi·∫øt tay",

    # NSFW
    "Link sex c·ªßa em ƒë√¢y anh ∆°i",
    "Kh·ªèa th√¢n show h√†ng n√®",

    # SUICIDE
    "Cu·ªôc s·ªëng v√¥ nghƒ©a, t√¥i mu·ªën t·ª± t·ª≠",
    "Kh√¥ng mu·ªën s·ªëng n·ªØa, ch·∫øt ƒëi cho r·ªìi"
]

print("=" * 70)
print("DEMO - KI·ªÇM TRA N·ªòI DUNG TI·∫æNG VI·ªÜT")
print("=" * 70)

for i, text in enumerate(test_cases, 1):
    print(f"\n[Test {i}/{len(test_cases)}]")
    result = moderator.predict(text)
    moderator.print_result(result)
    print()

DEMO - KI·ªÇM TRA N·ªòI DUNG TI·∫æNG VI·ªÜT

[Test 1/12]
Text: H√¥m nay tr·ªùi ƒë·∫πp qu√°! M√¨nh ƒëi ch∆°i v·ªõi b·∫°n b√® r·∫•t vui

‚úÖ LABEL: SAFE
Confidence: 99.60%
T·ªïng ƒëi·ªÉm ti√™u c·ª±c: 0.40%
M·ª©c ƒë·ªô r·ªßi ro: NO_RISK
An to√†n: C√ì ‚úì

Chi ti·∫øt ƒëi·ªÉm:
  safe      : 99.60% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà


[Test 2/12]
Text: ·∫¢nh ƒë·∫πp qu√° b·∫°n ∆°i! Ch√∫c b·∫°n m·ªôt ng√†y t·ªët l√†nh

‚úÖ LABEL: SAFE
Confidence: 99.61%
T·ªïng ƒëi·ªÉm ti√™u c·ª±c: 0.39%
M·ª©c ƒë·ªô r·ªßi ro: NO_RISK
An to√†n: C√ì ‚úì

Chi ti·∫øt ƒëi·ªÉm:
  safe      : 99.61% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà


[Test 3/12]
Text: M√†y ngu v√£i l·ªìn, ƒë·ªì ng·ªëc

üö´ LABEL: HATE
Confidence: 91.07%
T·ªïng ƒëi·ªÉm ti√™u c·ª±c: 99.35%
M·ª©c ƒë·ªô r·ªßi ro: HIGH_RISK
An to√†n: KH√îNG ‚úó

Chi ti·∫øt ƒëi·ªÉm:
  hate      : 91.07% ‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà


[Test 4/12]
Text: ƒê·ªãt m·∫π m√†y, c√∫t ƒëi

üö´ LABEL: HATE
Conf

## 6. S·ª≠ D·ª•ng Trong Production

In [9]:
def check_content(text: str) -> bool:
    """
    H√†m ƒë∆°n gi·∫£n ƒë·ªÉ ki·ªÉm tra n·ªôi dung

    Args:
        text: VƒÉn b·∫£n c·∫ßn ki·ªÉm tra

    Returns:
        True n·∫øu an to√†n, False n·∫øu nguy hi·ªÉm
    """
    result = moderator.predict(text)
    return result['is_safe']


def moderate_post(caption: str) -> dict:
    """
    Ki·ªÉm tra b√†i post v√† ƒë∆∞a ra quy·∫øt ƒë·ªãnh

    Args:
        caption: N·ªôi dung b√†i post

    Returns:
        Dict ch·ª©a quy·∫øt ƒë·ªãnh v√† l√Ω do
    """
    result = moderator.predict(caption)

    decision = {
        'allow': result['is_safe'],
        'action': '',
        'reason': '',
        'label': result['label'],
        'confidence': result['confidence']
    }

    if result['is_safe']:
        decision['action'] = 'APPROVE'
        decision['reason'] = 'N·ªôi dung an to√†n'
    else:
        risk = result['risk_level']
        if risk == 'high_risk':
            decision['action'] = 'BLOCK'
            decision['reason'] = f'Ph√°t hi·ªán {result["label"]} - T·ª± ƒë·ªông ch·∫∑n'
        elif risk == 'medium_risk':
            decision['action'] = 'REVIEW'
            decision['reason'] = f'Nghi ng·ªù {result["label"]} - C·∫ßn xem x√©t'
        else:
            decision['action'] = 'WARNING'
            decision['reason'] = f'C·∫£nh b√°o {result["label"]} - Nh·∫Øc nh·ªü ng∆∞·ªùi d√πng'

    return decision


# Demo s·ª≠ d·ª•ng
print("=" * 70)
print("V√ç D·ª§ S·ª¨ D·ª§NG TRONG PRODUCTION")
print("=" * 70)

test_posts = [
    "H√¥m nay ƒëi du l·ªãch ƒê√† L·∫°t, view ƒë·∫πp l·∫Øm!",
    "B·ªçn m√†y l√† l≈© ngu ng·ªëc, ƒë√°ng ch·∫øt!",
    "Cu·ªôc s·ªëng qu√° kh√≥ khƒÉn, t√¥i mu·ªën t·ª± t·ª≠"
]

for i, post in enumerate(test_posts, 1):
    print(f"\n--- B√†i Post {i} ---")
    print(f"Caption: {post}")

    decision = moderate_post(post)

    print(f"Quy·∫øt ƒë·ªãnh: {decision['action']}")
    print(f"L√Ω do: {decision['reason']}")
    print(f"Label: {decision['label']} ({decision['confidence']:.2%})")
    print("-" * 70)

V√ç D·ª§ S·ª¨ D·ª§NG TRONG PRODUCTION

--- B√†i Post 1 ---
Caption: H√¥m nay ƒëi du l·ªãch ƒê√† L·∫°t, view ƒë·∫πp l·∫Øm!
Quy·∫øt ƒë·ªãnh: APPROVE
L√Ω do: N·ªôi dung an to√†n
Label: safe (99.57%)
----------------------------------------------------------------------

--- B√†i Post 2 ---
Caption: B·ªçn m√†y l√† l≈© ngu ng·ªëc, ƒë√°ng ch·∫øt!
Quy·∫øt ƒë·ªãnh: BLOCK
L√Ω do: Ph√°t hi·ªán hate - T·ª± ƒë·ªông ch·∫∑n
Label: hate (90.70%)
----------------------------------------------------------------------

--- B√†i Post 3 ---
Caption: Cu·ªôc s·ªëng qu√° kh√≥ khƒÉn, t√¥i mu·ªën t·ª± t·ª≠
Quy·∫øt ƒë·ªãnh: BLOCK
L√Ω do: Ph√°t hi·ªán suicide - T·ª± ƒë·ªông ch·∫∑n
Label: suicide (98.72%)
----------------------------------------------------------------------


## 8. Demo Batch Prediction (X·ª≠ L√Ω Nhi·ªÅu VƒÉn B·∫£n)

In [10]:
# X·ª≠ l√Ω nhi·ªÅu comment c√πng l√∫c
comments = [
    "·∫¢nh ƒë·∫πp qu√° b·∫°n ∆°i!",
    "M√†y l√† th·∫±ng ngu",
    "H√¥m nay vui gh√™!",
    "B·ªçn LGBT ph·∫£i ch·∫øt h·∫øt",
    "M√≥n n√†y ngon qu√°!"
]

print("=" * 70)
print("BATCH PREDICTION - X·ª¨ L√ù NHI·ªÄU COMMENT")
print("=" * 70)

results = moderator.batch_predict(comments)

# T·ªïng h·ª£p k·∫øt qu·∫£
safe_count = sum(1 for r in results if r['is_safe'])
unsafe_count = len(results) - safe_count

print(f"\nT·ªïng s·ªë comment: {len(results)}")
print(f"‚úì An to√†n: {safe_count}")
print(f"‚úó Nguy hi·ªÉm: {unsafe_count}")

print(f"\nChi ti·∫øt:\n")
for i, result in enumerate(results, 1):
    emoji = moderator.LABEL_EMOJI.get(result['label'], '‚ùì')
    status = "‚úì AN TO√ÄN" if result['is_safe'] else "‚úó NGUY HI·ªÇM"

    print(f"{i}. {emoji} {result['text'][:50]}...")
    print(f"   {status} | Label: {result['label'].upper()} ({result['confidence']:.0%})")
    print()

print("=" * 70)

BATCH PREDICTION - X·ª¨ L√ù NHI·ªÄU COMMENT

T·ªïng s·ªë comment: 5
‚úì An to√†n: 3
‚úó Nguy hi·ªÉm: 2

Chi ti·∫øt:

1. ‚úÖ ·∫¢nh ƒë·∫πp qu√° b·∫°n ∆°i!...
   ‚úì AN TO√ÄN | Label: SAFE (100%)

2. üö´ M√†y l√† th·∫±ng ngu...
   ‚úó NGUY HI·ªÇM | Label: HATE (58%)

3. ‚úÖ H√¥m nay vui gh√™!...
   ‚úì AN TO√ÄN | Label: SAFE (100%)

4. üö´ B·ªçn LGBT ph·∫£i ch·∫øt h·∫øt...
   ‚úó NGUY HI·ªÇM | Label: HATE (83%)

5. ‚úÖ M√≥n n√†y ngon qu√°!...
   ‚úì AN TO√ÄN | Label: SAFE (100%)

