# Detection of Hate Speech on Croatian Online Portals Using NLP Methods

**Authors:** Duje Jurić, Teo Matošević, Teo Radolović

**Institution:** University of Zagreb, Faculty of Electrical Engineering and Computing

**Course:** Natural Language Processing (Obrada prirodnog jezika)

---

This notebook provides a comprehensive overview of our Croatian hate speech detection project, demonstrating all major components including data exploration, lexicon analysis, model training, evaluation, and interactive demos.

**Pre-trained Models (HuggingFace):**
- BERTić: [TeoMatosevic/croatian-hate-speech-bertic](https://huggingface.co/TeoMatosevic/croatian-hate-speech-bertic)
- XLM-RoBERTa: [TeoMatosevic/croatian-hate-speech-xlm-roberta](https://huggingface.co/TeoMatosevic/croatian-hate-speech-xlm-roberta)
- Baseline: [TeoMatosevic/croatian-hate-speech-baseline](https://huggingface.co/TeoMatosevic/croatian-hate-speech-baseline)

## 1. Setup and Imports

In [None]:
import sys
import os
import json
import warnings
warnings.filterwarnings('ignore')

# Add project root to path
sys.path.insert(0, '..')
os.chdir('..')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter

# Set style
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')

print("Setup complete!")

## 2. Project Overview

### Problem Statement

The proliferation of hate speech on Croatian online platforms presents a significant challenge for content moderation. This project investigates the effectiveness of NLP methods for automated detection of offensive language in Croatian.

### Research Questions

1. **RQ1:** How effective are transformer-based models compared to traditional ML approaches for Croatian hate speech detection?
2. **RQ2:** Can models pre-trained on related South Slavic languages transfer effectively to Croatian offensive language detection?

### Key Results

| Model | Accuracy | F1-Macro | MCC |
|-------|----------|----------|-----|
| TF-IDF + Logistic Regression | 71.6% | 0.711 | 0.423 |
| TF-IDF + SVM | 71.0% | 0.707 | 0.414 |
| XLM-RoBERTa (fine-tuned) | 74.8% | 0.745 | 0.490 |
| **BERTić (fine-tuned)** | **81.3%** | **0.810** | **0.621** |

## 3. Dataset Exploration

We use the FRENK Croatian hate speech dataset containing 10,971 annotated comments from Croatian news portals.

In [None]:
# Load datasets
train_df = pd.read_json('data/processed/frenk_train.jsonl', lines=True)
dev_df = pd.read_json('data/processed/frenk_dev.jsonl', lines=True)
test_df = pd.read_json('data/processed/frenk_test.jsonl', lines=True)

# Combine for full analysis
all_df = pd.concat([train_df, dev_df, test_df], ignore_index=True)

print(f"Dataset Statistics:")
print(f"{'='*40}")
print(f"Training set:   {len(train_df):,} samples")
print(f"Development set: {len(dev_df):,} samples")
print(f"Test set:       {len(test_df):,} samples")
print(f"{'='*40}")
print(f"Total:          {len(all_df):,} samples")

In [None]:
# Show sample data
print("Sample comments from the dataset:")
print("="*60)
all_df[['text', 'label']].head(10)

In [None]:
# Label distribution
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Pie chart
label_counts = all_df['label'].value_counts()
colors = ['#2ecc71', '#e74c3c']
axes[0].pie(label_counts, labels=['Acceptable (ACC)', 'Offensive (OFF)'], 
            autopct='%1.1f%%', colors=colors, explode=(0.02, 0.02),
            shadow=True, startangle=90)
axes[0].set_title('Label Distribution', fontsize=14, fontweight='bold')

# Bar chart by split
split_data = pd.DataFrame({
    'Split': ['Train', 'Train', 'Dev', 'Dev', 'Test', 'Test'],
    'Label': ['ACC', 'OFF', 'ACC', 'OFF', 'ACC', 'OFF'],
    'Count': [
        len(train_df[train_df['label'] == 'ACC']),
        len(train_df[train_df['label'] == 'OFF']),
        len(dev_df[dev_df['label'] == 'ACC']),
        len(dev_df[dev_df['label'] == 'OFF']),
        len(test_df[test_df['label'] == 'ACC']),
        len(test_df[test_df['label'] == 'OFF'])
    ]
})
sns.barplot(data=split_data, x='Split', y='Count', hue='Label', ax=axes[1], palette=colors)
axes[1].set_title('Label Distribution by Split', fontsize=14, fontweight='bold')
axes[1].set_ylabel('Number of Samples')

plt.tight_layout()
plt.show()

print(f"\nClass distribution: ACC={label_counts.get('ACC', 0):,} ({label_counts.get('ACC', 0)/len(all_df)*100:.1f}%), "
      f"OFF={label_counts.get('OFF', 0):,} ({label_counts.get('OFF', 0)/len(all_df)*100:.1f}%)")

In [None]:
# Text length analysis
all_df['text_length'] = all_df['text'].str.len()
all_df['word_count'] = all_df['text'].str.split().str.len()

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Character length distribution
for label, color in zip(['ACC', 'OFF'], colors):
    subset = all_df[all_df['label'] == label]['text_length']
    axes[0].hist(subset, bins=50, alpha=0.6, label=label, color=color)
axes[0].set_xlabel('Character Length')
axes[0].set_ylabel('Frequency')
axes[0].set_title('Text Length Distribution by Class', fontsize=14, fontweight='bold')
axes[0].legend()

# Word count distribution
for label, color in zip(['ACC', 'OFF'], colors):
    subset = all_df[all_df['label'] == label]['word_count']
    axes[1].hist(subset, bins=50, alpha=0.6, label=label, color=color)
axes[1].set_xlabel('Word Count')
axes[1].set_ylabel('Frequency')
axes[1].set_title('Word Count Distribution by Class', fontsize=14, fontweight='bold')
axes[1].legend()

plt.tight_layout()
plt.show()

print(f"\nText Statistics:")
print(f"  Average length: {all_df['text_length'].mean():.1f} characters")
print(f"  Average words:  {all_df['word_count'].mean():.1f} words")

In [None]:
# Sample comments from each class
print("Sample ACCEPTABLE (ACC) Comments:")
print("="*60)
for i, row in all_df[all_df['label'] == 'ACC'].sample(5).iterrows():
    print(f"  - {row['text'][:100]}..." if len(row['text']) > 100 else f"  - {row['text']}")

print("\nSample OFFENSIVE (OFF) Comments:")
print("="*60)
for i, row in all_df[all_df['label'] == 'OFF'].sample(5).iterrows():
    print(f"  - {row['text'][:100]}..." if len(row['text']) > 100 else f"  - {row['text']}")

## 4. Coded Language Lexicon

We developed a lexicon of 32 "dog whistle" terms - seemingly innocuous words used with hidden hateful meanings in Croatian online discourse.

In [None]:
# Load lexicon
with open('data/lexicon/coded_terms.json', 'r', encoding='utf-8') as f:
    lexicon_data = json.load(f)

# Combine main and user-provided terms
all_terms = lexicon_data.get('coded_terms', []) + lexicon_data.get('user_provided_terms', [])

print(f"Lexicon Statistics:")
print(f"{'='*40}")
print(f"Total coded terms: {len(all_terms)}")
print(f"Main terms: {len(lexicon_data.get('coded_terms', []))}")
print(f"User-provided terms: {len(lexicon_data.get('user_provided_terms', []))}")

In [None]:
# Create DataFrame for visualization
terms_df = pd.DataFrame(all_terms)

# Show sample entries
print("\nSample Coded Terms:")
print("="*80)
display_cols = ['term', 'literal_meaning', 'coded_meaning', 'target_group']
available_cols = [c for c in display_cols if c in terms_df.columns]
terms_df[available_cols].head(10)

In [None]:
# Visualize terms by target group
if 'target_group' in terms_df.columns:
    target_counts = terms_df['target_group'].value_counts()
    
    plt.figure(figsize=(10, 6))
    bars = plt.barh(target_counts.index, target_counts.values, color=sns.color_palette('husl', len(target_counts)))
    plt.xlabel('Number of Terms')
    plt.ylabel('Target Group')
    plt.title('Coded Terms by Target Group', fontsize=14, fontweight='bold')
    
    # Add value labels
    for bar, count in zip(bars, target_counts.values):
        plt.text(bar.get_width() + 0.1, bar.get_y() + bar.get_height()/2, 
                 str(count), va='center', fontweight='bold')
    
    plt.tight_layout()
    plt.show()

In [None]:
# Demonstrate lexicon matching
from src.utils.lexicon import CodedTermLexicon

lexicon = CodedTermLexicon('data/lexicon/coded_terms.json')

test_sentences = [
    "Inženjeri opet prave nered u gradu.",
    "Globalisti kontroliraju sve medije.",
    "Ovo je normalan komentar bez kodiranih riječi.",
    "Ovce će primiti sve što im kažu.",
    "Kulturno obogaćenje nam donosi samo probleme."
]

print("Lexicon Matching Demo:")
print("="*70)
for sentence in test_sentences:
    matches = lexicon.find_matches(sentence)
    print(f"\nText: \"{sentence}\"")
    if matches:
        for match in matches:
            print(f"  Found: '{match['term']}' -> {match['coded_meaning']} (Target: {match['target_group']})")
    else:
        print("  No coded terms found.")

## 5. Baseline Models (TF-IDF + Classical ML)

We implement TF-IDF vectorization with Logistic Regression and SVM classifiers as baselines.

**Pre-trained baseline model:** [TeoMatosevic/croatian-hate-speech-baseline](https://huggingface.co/TeoMatosevic/croatian-hate-speech-baseline)

In [None]:
from src.models.baseline import BaselineClassifier
from pathlib import Path

# Check if model exists
baseline_path = Path('checkpoints/baseline/logistic_regression_model.pkl')

if baseline_path.exists():
    print("Loading pre-trained baseline model...")
    baseline = BaselineClassifier.load(str(baseline_path))
    print("Baseline model loaded successfully!")
else:
    print("Training new baseline model...")
    baseline = BaselineClassifier(classifier_type='logistic_regression')
    baseline.fit(train_df['text'].tolist(), train_df['label'].tolist())
    print("Baseline model trained!")

In [None]:
# Evaluate baseline on test set
baseline_results = baseline.evaluate(test_df['text'].tolist(), test_df['label'].tolist())

print("Baseline Model Performance (Test Set):")
print("="*50)
print(f"Accuracy:    {baseline_results.get('accuracy', 0):.1%}")
print(f"F1-Macro:    {baseline_results.get('f1_macro', 0):.3f}")
print(f"F1-Weighted: {baseline_results.get('f1_weighted', 0):.3f}")
print(f"Precision:   {baseline_results.get('precision_macro', 0):.3f}")
print(f"Recall:      {baseline_results.get('recall_macro', 0):.3f}")

In [None]:
# Feature importance - top words for each class
if hasattr(baseline, 'get_feature_importance'):
    importance = baseline.get_feature_importance(top_n=15)
    
    fig, axes = plt.subplots(1, 2, figsize=(14, 6))
    
    for idx, (label, features) in enumerate(importance.items()):
        words = [f[0] for f in features]
        weights = [f[1] for f in features]
        
        color = '#2ecc71' if label == 'ACC' else '#e74c3c'
        axes[idx].barh(words, weights, color=color)
        axes[idx].set_xlabel('Weight')
        axes[idx].set_title(f'Top Words for {label}', fontsize=14, fontweight='bold')
        axes[idx].invert_yaxis()
    
    plt.tight_layout()
    plt.show()

In [None]:
# Confusion matrix for baseline
from sklearn.metrics import confusion_matrix

baseline_preds = baseline.predict(test_df['text'].tolist())
cm = confusion_matrix(test_df['label'].tolist(), baseline_preds, labels=['ACC', 'OFF'])

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=['ACC', 'OFF'], yticklabels=['ACC', 'OFF'])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Baseline Model Confusion Matrix', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

## 6. BERTić Transformer Model

We fine-tune BERTić (`classla/bcms-bertic`), a BERT model pre-trained on 8 billion tokens of South Slavic text.

**Our fine-tuned model:** [TeoMatosevic/croatian-hate-speech-bertic](https://huggingface.co/TeoMatosevic/croatian-hate-speech-bertic)

To download:
```python
from huggingface_hub import snapshot_download
snapshot_download("TeoMatosevic/croatian-hate-speech-bertic", local_dir="checkpoints/bertic/best_model")
```

In [None]:
print("BERTić Model Configuration:")
print("="*50)
print("Base model:       classla/bcms-bertic")
print("Architecture:     12 layers, 768 hidden dim, 110M params")
print("")
print("Training Configuration:")
print("  Learning rate:  2e-5")
print("  Batch size:     16")
print("  Epochs:         5")
print("  Max length:     256 tokens")
print("  Optimizer:      AdamW")
print("  Warmup ratio:   0.1")

In [None]:
# Load BERTić if available
from src.models.bertic import BERTicTrainer

bertic_path = Path('checkpoints/bertic/best_model')
bertic = None

if bertic_path.exists():
    print("Loading pre-trained BERTić model...")
    try:
        bertic = BERTicTrainer()
        bertic.load(str(bertic_path))
        print("BERTić model loaded successfully!")
    except Exception as e:
        print(f"Could not load BERTić: {e}")
        print("BERTić results will be shown from saved metrics.")
else:
    print("BERTić checkpoint not found.")
    print("Results shown below are from previous training runs.")

In [None]:
# Evaluate BERTić on test set
if bertic is not None:
    print("Evaluating BERTić on test set (this may take a few minutes on CPU)...")
    bertic_results = bertic.evaluate(test_df['text'].tolist(), test_df['label'].tolist())
    report = bertic_results.get('classification_report', {})
    bertic_results['per_class'] = {
        'ACC': {
            'precision': report.get('ACC', {}).get('precision', 0),
            'recall': report.get('ACC', {}).get('recall', 0),
            'f1': report.get('ACC', {}).get('f1-score', 0),
        },
        'OFF': {
            'precision': report.get('OFF', {}).get('precision', 0),
            'recall': report.get('OFF', {}).get('recall', 0),
            'f1': report.get('OFF', {}).get('f1-score', 0),
        }
    }
else:
    print("BERTić not loaded — using saved metrics.")
    bertic_results = {
        'accuracy': 0.8127, 'f1_macro': 0.810, 'f1_weighted': 0.813, 'mcc': 0.621,
        'per_class': {
            'ACC': {'precision': 0.777, 'recall': 0.803, 'f1': 0.790},
            'OFF': {'precision': 0.842, 'recall': 0.820, 'f1': 0.831}
        }
    }

print("\nBERTić Model Performance (Test Set):")
print("="*50)
print(f"Accuracy:    {bertic_results.get('accuracy', 0):.1%}")
print(f"F1-Macro:    {bertic_results.get('f1_macro', 0):.3f}")
print(f"F1-Weighted: {bertic_results.get('f1_weighted', 0):.3f}")
print(f"MCC:         {bertic_results.get('mcc', 0):.3f}")
print("\nPer-Class Performance:")
for cls in ['ACC', 'OFF']:
    pc = bertic_results['per_class'][cls]
    print(f"  {cls}: P={pc['precision']:.3f}, R={pc['recall']:.3f}, F1={pc['f1']:.3f}")

In [None]:
# BERTić per-class metrics visualization
pc = bertic_results['per_class']
metrics_data = {
    'Class': ['ACC', 'ACC', 'ACC', 'OFF', 'OFF', 'OFF'],
    'Metric': ['Precision', 'Recall', 'F1-Score', 'Precision', 'Recall', 'F1-Score'],
    'Value': [
        pc['ACC']['precision'], pc['ACC']['recall'], pc['ACC']['f1'],
        pc['OFF']['precision'], pc['OFF']['recall'], pc['OFF']['f1'],
    ]
}
metrics_df = pd.DataFrame(metrics_data)

plt.figure(figsize=(10, 6))
sns.barplot(data=metrics_df, x='Class', y='Value', hue='Metric', palette='viridis')
plt.ylim(0.7, 0.9)
plt.title('BERTić Per-Class Performance', fontsize=14, fontweight='bold')
plt.ylabel('Score')
plt.legend(title='Metric')
plt.tight_layout()
plt.show()

## 7. XLM-RoBERTa Transformer Model

We fine-tune XLM-RoBERTa (`xlm-roberta-base`), a multilingual transformer pre-trained on 100 languages including Croatian, with 278M parameters.

**Our fine-tuned model:** [TeoMatosevic/croatian-hate-speech-xlm-roberta](https://huggingface.co/TeoMatosevic/croatian-hate-speech-xlm-roberta)

To download:
```python
from huggingface_hub import snapshot_download
snapshot_download("TeoMatosevic/croatian-hate-speech-xlm-roberta", local_dir="checkpoints/xlm_roberta/best_model")
```

In [None]:
print("XLM-RoBERTa Model Configuration:")
print("="*50)
print("Base model:       xlm-roberta-base")
print("Architecture:     12 layers, 768 hidden dim, 278M params")
print("")
print("Training Configuration:")
print("  Learning rate:  2e-5")
print("  Batch size:     16")
print("  Epochs:         5")
print("  Max length:     256 tokens")
print("  Optimizer:      AdamW")
print("  Loss:           Cross-entropy")
print("  Warmup ratio:   0.1")

In [None]:
# Load XLM-RoBERTa if available
from src.models.xlm_roberta import XLMRobertaTrainer

xlm_roberta_path = Path('checkpoints/xlm_roberta/best_model')
xlm_roberta = None

if xlm_roberta_path.exists():
    print("Loading pre-trained XLM-RoBERTa model...")
    try:
        xlm_roberta = XLMRobertaTrainer()
        xlm_roberta.load(str(xlm_roberta_path))
        print("XLM-RoBERTa model loaded successfully!")
    except Exception as e:
        print(f"Could not load XLM-RoBERTa: {e}")
        print("XLM-RoBERTa results will be shown from saved metrics.")
else:
    print("XLM-RoBERTa checkpoint not found.")
    print("Results shown below are from previous training runs.")

In [None]:
# Evaluate XLM-RoBERTa on test set
if xlm_roberta is not None:
    print("Evaluating XLM-RoBERTa on test set (this may take a few minutes on CPU)...")
    xlm_results = xlm_roberta.evaluate(test_df['text'].tolist(), test_df['label'].tolist())
else:
    print("XLM-RoBERTa not loaded — using saved metrics.")
    xlm_results = {
        'accuracy': 0.748, 'f1_macro': 0.745, 'f1_weighted': 0.748, 'mcc': 0.490,
    }

print("\nXLM-RoBERTa Model Performance (Test Set):")
print("="*50)
print(f"Accuracy:    {xlm_results.get('accuracy', 0):.1%}")
print(f"F1-Macro:    {xlm_results.get('f1_macro', 0):.3f}")
print(f"F1-Weighted: {xlm_results.get('f1_weighted', 0):.3f}")
print(f"MCC:         {xlm_results.get('mcc', 0):.3f}")

## 8. Model Comparison

Comparing all models to highlight the improvement from transformer-based approaches.

In [None]:
# Comparison table
comparison_data = {
    'Model': ['Logistic Regression', 'SVM (Linear)', 'XLM-RoBERTa', 'BERTić'],
    'Accuracy': [0.716, 0.710, 0.748, 0.813],
    'F1-Macro': [0.711, 0.707, 0.745, 0.810],
    'F1-Weighted': [0.714, 0.710, 0.748, 0.813],
    'MCC': [0.423, 0.414, 0.490, 0.621]
}
comparison_df = pd.DataFrame(comparison_data)

print("Model Comparison:")
print("="*70)
comparison_df

In [None]:
# Visual comparison
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

models = comparison_df['Model']
x = np.arange(len(models))
width = 0.35

# F1-Macro comparison
bars1 = axes[0].bar(x, comparison_df['F1-Macro'], width, color=['#3498db', '#3498db', '#e67e22', '#2ecc71'])
axes[0].set_ylabel('F1-Macro Score')
axes[0].set_title('F1-Macro Score by Model', fontsize=14, fontweight='bold')
axes[0].set_xticks(x)
axes[0].set_xticklabels(models, rotation=15, ha='right')
axes[0].set_ylim(0.6, 0.9)
for bar, val in zip(bars1, comparison_df['F1-Macro']):
    axes[0].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005,
                 f'{val:.3f}', ha='center', va='bottom', fontweight='bold')

# MCC comparison
bars2 = axes[1].bar(x, comparison_df['MCC'], width, color=['#3498db', '#3498db', '#e67e22', '#2ecc71'])
axes[1].set_ylabel('MCC Score')
axes[1].set_title("Matthew's Correlation Coefficient by Model", fontsize=14, fontweight='bold')
axes[1].set_xticks(x)
axes[1].set_xticklabels(models, rotation=15, ha='right')
axes[1].set_ylim(0.3, 0.7)
for bar, val in zip(bars2, comparison_df['MCC']):
    axes[1].text(bar.get_x() + bar.get_width()/2, bar.get_height() + 0.005,
                 f'{val:.3f}', ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

## 9. Interactive Demo

Analyze any text with our models and lexicon.

In [ ]:
def analyze_text(text, baseline_model=None, bertic_model=None, xlm_roberta_model=None, lexicon=None):
    """Analyze text with all available models."""
    print("="*70)
    print(f"INPUT: \"{text}\"")
    print("="*70)
    
    # Lexicon analysis
    if lexicon:
        matches = lexicon.find_matches(text)
        print("\n[LEXICON] Coded Terms:")
        if matches:
            for match in matches:
                print(f"  - '{match['term']}' -> {match['coded_meaning']} (Target: {match['target_group']})")
        else:
            print("  No coded terms detected.")
    
    # Baseline prediction
    if baseline_model:
        pred = baseline_model.predict([text])[0]
        proba = baseline_model.predict_proba([text])
        print(f"\n[BASELINE] Prediction: {pred}")
        if proba is not None:
            print(f"  Confidence: {proba[0].max():.1%}")
    
    # XLM-RoBERTa prediction
    if xlm_roberta_model:
        try:
            pred = xlm_roberta_model.predict([text])[0]
            print(f"\n[XLM-RoBERTa] Prediction: {pred}")
        except Exception as e:
            print(f"\n[XLM-RoBERTa] Not available: {e}")
    
    # BERTić prediction
    if bertic_model:
        try:
            pred = bertic_model.predict([text])[0]
            print(f"\n[BERTić] Prediction: {pred}")
        except Exception as e:
            print(f"\n[BERTić] Not available: {e}")
    
    print()

In [None]:
# Demo with example sentences
demo_sentences = [
    "Hvala na lijepom komentaru, slažem se s vama.",
    "Inženjeri opet prave probleme u našem gradu.",
    "Svi političari su lopovi i treba ih zatvoriti.",
    "Globalisti žele uništiti našu kulturu i tradiciju.",
    "Ovo je odlična vijest za Hrvatsku!",
    "Ovce će vjerovati u sve što im mediji kažu."
]

print("Demo Analysis of Sample Sentences:\n")
for sentence in demo_sentences:
    analyze_text(sentence, baseline_model=baseline, bertic_model=bertic,
                 xlm_roberta_model=xlm_roberta, lexicon=lexicon)

## 10. Error Analysis

Examining cases where the model makes mistakes.

In [None]:
# Find misclassified examples (baseline)
test_texts = test_df['text'].tolist()
test_labels = test_df['label'].tolist()
baseline_predictions = baseline.predict(test_texts)

# Create error analysis dataframe
error_df = test_df.copy()
error_df['predicted'] = baseline_predictions
error_df['correct'] = error_df['label'] == error_df['predicted']

misclassified = error_df[~error_df['correct']]
print(f"Baseline Misclassification Analysis:")
print(f"{'='*50}")
print(f"Total test samples: {len(test_df)}")
print(f"Correctly classified: {len(error_df[error_df['correct']])} ({len(error_df[error_df['correct']])/len(test_df)*100:.1f}%)")
print(f"Misclassified: {len(misclassified)} ({len(misclassified)/len(test_df)*100:.1f}%)")

In [None]:
# Show sample misclassified examples
print("\nSample False Positives (ACC predicted as OFF):")
print("-"*60)
false_positives = misclassified[(misclassified['label'] == 'ACC') & (misclassified['predicted'] == 'OFF')]
for i, row in false_positives.head(3).iterrows():
    print(f"  Text: {row['text'][:80]}...")
    print()

print("\nSample False Negatives (OFF predicted as ACC):")
print("-"*60)
false_negatives = misclassified[(misclassified['label'] == 'OFF') & (misclassified['predicted'] == 'ACC')]
for i, row in false_negatives.head(3).iterrows():
    print(f"  Text: {row['text'][:80]}...")
    print()

## 11. Conclusions

### Key Findings

1. **Transformer Superiority**: BERTić significantly outperforms traditional ML baselines, achieving +13.9% F1 improvement over Logistic Regression (0.810 vs 0.711).

2. **Language-Specific vs Multilingual Pre-training**: BERTić (South Slavic pre-training) outperforms XLM-RoBERTa (multilingual pre-training) by a significant margin (F1: 0.810 vs 0.745, McNemar p < 0.001), demonstrating the value of language-specific models for Croatian.

3. **XLM-RoBERTa as Middle Ground**: XLM-RoBERTa achieves +4.8% F1 improvement over baselines (0.745 vs 0.711), offering moderate gains from multilingual transfer learning.

4. **Statistical Significance**: All model differences are statistically significant (McNemar's test, p < 0.05) except Logistic Regression vs SVM (p = 0.497).

5. **Coded Language Detection**: The lexicon of 32 dog whistle terms provides complementary detection for implicit hate speech.

### Practical Applications

- Semi-automated content moderation on Croatian online platforms
- Reducing manual moderation workload while maintaining accuracy
- Detecting implicit hate speech through lexicon integration

### Future Work

- Multi-label classification for hate speech subtypes
- Cross-platform evaluation (social media)
- Integration of lexicon features with neural models
- Explainable AI for transparent moderation decisions

---

**Repository:** https://github.com/TeoMatosevic/slur-analysis-model

**Pre-trained Models (HuggingFace):**
- BERTić: https://huggingface.co/TeoMatosevic/croatian-hate-speech-bertic
- XLM-RoBERTa: https://huggingface.co/TeoMatosevic/croatian-hate-speech-xlm-roberta
- Baseline: https://huggingface.co/TeoMatosevic/croatian-hate-speech-baseline

In [ ]:
print("\n" + "="*60)
print("PROJECT SHOWCASE COMPLETE")
print("="*60)
print("\nThis notebook demonstrated all major components of the")
print("Croatian Hate Speech Detection project.")
print("\nFor more information, see:")
print("  - docs/paper.md (Academic paper)")
print("  - README.md (Project documentation)")
print("  - src/demo.py (Interactive demo script)")
print("\nPre-trained Models (HuggingFace):")
print("  - BERTić: huggingface.co/TeoMatosevic/croatian-hate-speech-bertic")
print("  - XLM-RoBERTa: huggingface.co/TeoMatosevic/croatian-hate-speech-xlm-roberta")
print("  - Baseline: huggingface.co/TeoMatosevic/croatian-hate-speech-baseline")