# üéØ Layered Analytics Demo

**Multi-Layer Opinion Analytics untuk Komentar Timnas Indonesia**

Notebook ini mendemonstrasikan:
1. Cara melatih model untuk setiap layer (emotion, aspect, toxicity, stance, intent)
2. Cara melakukan prediksi berlapis
3. Cara mengevaluasi dan menyimpan model
4. Cara membuat output comprehensive dengan semua layer

---

## üìö Setup dan Import Libraries

In [None]:
import sys
import os
sys.path.append('../src')
sys.path.append('..')

import pandas as pd
import numpy as np
import yaml
import json
from pathlib import Path

# Import custom modules
from layered_classifier import (
    EmotionClassifier, AspectClassifier, 
    ToxicityClassifier, StanceClassifier, IntentClassifier
)
from utils.layered_utils import (
    create_layered_output_dataframe,
    save_metrics_json,
    create_comprehensive_report,
    multilabel_to_string,
    string_to_multilabel,
    create_layer_summary
)

# Visualization
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("‚úÖ Libraries imported successfully!")

## ‚öôÔ∏è Load Configuration

In [None]:
# Load config
with open('../config.yaml', 'r') as f:
    config = yaml.safe_load(f)

# Extract layered analytics config
layer_config = config['layered_analytics']

print("üéØ Layered Analytics Configuration:")
print(f"- Enable Emotion: {layer_config['enable_emotion']}")
print(f"- Enable Aspect: {layer_config['enable_aspect']}")
print(f"- Enable Toxicity: {layer_config['enable_toxicity']}")
print(f"- Enable Stance: {layer_config['enable_stance']}")
print(f"- Enable Intent: {layer_config['enable_intent']}")
print(f"\n- Low Confidence Threshold: {layer_config['low_confidence_threshold']}")
print(f"- Toxicity Threshold: {layer_config['toxicity_threshold']}")

## üìä Load and Prepare Labeled Data

**IMPORTANT**: Anda perlu membuat data berlabel terlebih dahulu!

1. Export subset komentar dari `data/processed/`
2. Ikuti panduan di `data/labelled/LABELING_GUIDE.md`
3. Simpan hasil labeling sebagai `labelled_comments.csv`

Format yang diharapkan:
```
comment_id,text,sentiment,emotions,aspects,toxicity,stance,intent
1,"Pelatih harus diganti!",negatif,"marah,kecewa","pelatih,strategi",non-toxic,kontra,komplain
```

In [None]:
# Load labeled data
# NOTE: File ini harus Anda buat terlebih dahulu!
labeled_file = '../data/labelled/labelled_comments.csv'

if not os.path.exists(labeled_file):
    print("‚ö†Ô∏è  WARNING: File labelled_comments.csv tidak ditemukan!")
    print("üìù Silakan buat data berlabel mengikuti panduan di data/labelled/LABELING_GUIDE.md")
    print("\nüí° Untuk demo, saya akan membuat contoh data dummy...")
    
    # Create dummy data for demonstration
    dummy_data = {
        'comment_id': range(1, 21),
        'text': [
            "Pelatih harus diganti, strateginya kacau!",
            "Kapan Indonesia bisa lolos Piala Dunia?",
            "Mantap performanya, terus semangat!",
            "PSSI harus lebih profesional dalam manajemen",
            "Menurut saya latihan harus lebih intensif",
            "Pemain kurang disiplin di lapangan",
            "Wasit memihak lawan, tidak adil!",
            "Supporter Indonesia luar biasa!",
            "Bagaimana cara daftar jadi pemain timnas?",
            "Kecewa berat dengan hasil ini",
            "Semangat terus untuk timnas kita!",
            "Manajemen PSSI perlu evaluasi menyeluruh",
            "Strategi pelatih sudah bagus, tinggal eksekusi",
            "Pemain muda harus diberi kesempatan lebih",
            "Kenapa tidak panggil pemain dari liga eropa?",
            "Garuda di dadaku!",
            "Jangan menyerah, masih ada kesempatan",
            "Federasi harus mendukung penuh timnas",
            "Harusnya latih tanding dengan tim kuat",
            "Bangga dengan perjuangan kalian!"
        ],
        'sentiment': ['negatif', 'netral', 'positif', 'netral', 'netral', 
                     'negatif', 'negatif', 'positif', 'netral', 'negatif',
                     'positif', 'negatif', 'positif', 'netral', 'netral',
                     'positif', 'positif', 'netral', 'netral', 'positif'],
        'emotions': ['marah,kecewa', 'sedih', 'senang,bangga', '', 'sedih',
                    'kecewa', 'marah', 'senang', '', 'kecewa,sedih',
                    'senang', 'kecewa', 'senang', '', '',
                    'bangga', 'senang', '', '', 'bangga,senang'],
        'aspects': ['pelatih,strategi', '', 'pemain', 'PSSI,manajemen', 'strategi',
                   'pemain', 'wasit', 'fanbase', '', '',
                   '', 'PSSI,manajemen', 'pelatih,strategi', 'pemain', '',
                   '', '', 'federasi', 'strategi', ''],
        'toxicity': ['non-toxic'] * 20,
        'stance': ['kontra', 'tidak_jelas', 'pro', 'kontra', 'tidak_jelas',
                  'kontra', 'kontra', 'pro', 'tidak_jelas', 'kontra',
                  'pro', 'kontra', 'pro', 'tidak_jelas', 'tidak_jelas',
                  'pro', 'pro', 'tidak_jelas', 'tidak_jelas', 'pro'],
        'intent': ['komplain', 'pertanyaan', 'ajakan', 'saran', 'saran',
                  'komplain', 'komplain', 'informasi', 'pertanyaan', 'komplain',
                  'ajakan', 'komplain', 'informasi', 'saran', 'pertanyaan',
                  'informasi', 'ajakan', 'saran', 'saran', 'informasi']
    }
    
    df = pd.DataFrame(dummy_data)
    print("‚úÖ Dummy data created for demonstration")
else:
    df = pd.read_csv(labeled_file)
    print(f"‚úÖ Loaded {len(df)} labeled comments")

# Display sample
print("\nüìã Sample Data:")
display(df.head())

print(f"\nüìä Dataset Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")

## üîÑ Prepare Data for Training

Convert string labels to proper format for each classifier

In [None]:
# Extract texts
texts = df['text'].tolist()

# Prepare emotion labels (multi-label)
emotion_labels = df['emotions'].apply(
    lambda x: x.split(',') if pd.notna(x) and x else []
).tolist()

# Prepare aspect labels (multi-label)
aspect_labels = df['aspects'].apply(
    lambda x: x.split(',') if pd.notna(x) and x else []
).tolist()

# Single-label columns
toxicity_labels = df['toxicity'].tolist()
stance_labels = df['stance'].tolist()
intent_labels = df['intent'].tolist()

print("‚úÖ Data prepared for training")
print(f"\nüìä Label Statistics:")
print(f"- Total texts: {len(texts)}")
print(f"- Unique emotions: {set([e for sublist in emotion_labels for e in sublist])}")
print(f"- Unique aspects: {set([a for sublist in aspect_labels for a in sublist])}")
print(f"- Toxicity distribution: {df['toxicity'].value_counts().to_dict()}")
print(f"- Stance distribution: {df['stance'].value_counts().to_dict()}")
print(f"- Intent distribution: {df['intent'].value_counts().to_dict()}")

## üé≠ Train Emotion Classifier (Multi-Label)

In [None]:
if layer_config['enable_emotion']:
    print("üé≠ Training Emotion Classifier...")
    
    emotion_clf = EmotionClassifier(
        emotion_labels=layer_config['emotion_labels'],
        max_features=config['model_parameters']['max_features'],
        ngram_range=tuple(config['model_parameters']['ngram_range'])
    )
    
    # Train
    emotion_clf.train(texts, emotion_labels)
    
    # Evaluate
    emotion_metrics = emotion_clf.evaluate(texts, emotion_labels)
    
    print("\nüìä Emotion Classifier Metrics:")
    print(f"- Macro Precision: {emotion_metrics['macro_precision']:.4f}")
    print(f"- Macro Recall: {emotion_metrics['macro_recall']:.4f}")
    print(f"- Macro F1: {emotion_metrics['macro_f1']:.4f}")
    print(f"- Hamming Loss: {emotion_metrics['hamming_loss']:.4f}")
    print(f"- Jaccard Score: {emotion_metrics['jaccard_score']:.4f}")
    
    # Save model
    os.makedirs('../models', exist_ok=True)
    emotion_clf.save(
        '../models/emotion_svm.pkl',
        '../models/emotion_vectorizer.pkl',
        '../models/emotion_mlb.pkl'
    )
    save_metrics_json(emotion_metrics, '../models/metrics_emotion.json')
    
    print("‚úÖ Emotion classifier trained and saved!")
else:
    print("‚è≠Ô∏è  Emotion layer disabled in config")

## üéØ Train Aspect Classifier (Multi-Label)

In [None]:
if layer_config['enable_aspect']:
    print("üéØ Training Aspect Classifier...")
    
    aspect_clf = AspectClassifier(
        aspect_labels=layer_config['aspect_labels'],
        max_features=config['model_parameters']['max_features'],
        ngram_range=tuple(config['model_parameters']['ngram_range'])
    )
    
    # Train
    aspect_clf.train(texts, aspect_labels)
    
    # Evaluate
    aspect_metrics = aspect_clf.evaluate(texts, aspect_labels)
    
    print("\nüìä Aspect Classifier Metrics:")
    print(f"- Macro Precision: {aspect_metrics['macro_precision']:.4f}")
    print(f"- Macro Recall: {aspect_metrics['macro_recall']:.4f}")
    print(f"- Macro F1: {aspect_metrics['macro_f1']:.4f}")
    print(f"- Hamming Loss: {aspect_metrics['hamming_loss']:.4f}")
    print(f"- Jaccard Score: {aspect_metrics['jaccard_score']:.4f}")
    
    # Save model
    aspect_clf.save(
        '../models/aspect_svm.pkl',
        '../models/aspect_vectorizer.pkl',
        '../models/aspect_mlb.pkl'
    )
    save_metrics_json(aspect_metrics, '../models/metrics_aspect.json')
    
    print("‚úÖ Aspect classifier trained and saved!")
else:
    print("‚è≠Ô∏è  Aspect layer disabled in config")

## ‚ö†Ô∏è Train Toxicity Classifier

In [None]:
if layer_config['enable_toxicity']:
    print("‚ö†Ô∏è  Training Toxicity Classifier...")
    
    toxicity_clf = ToxicityClassifier(
        max_features=config['model_parameters']['max_features'],
        ngram_range=tuple(config['model_parameters']['ngram_range'])
    )
    
    # Train
    toxicity_clf.train(texts, toxicity_labels)
    
    # Evaluate
    toxicity_metrics = toxicity_clf.evaluate(texts, toxicity_labels)
    
    print("\nüìä Toxicity Classifier Metrics:")
    print(f"- Accuracy: {toxicity_metrics['accuracy']:.4f}")
    print(f"- Precision: {toxicity_metrics['precision']:.4f}")
    print(f"- Recall: {toxicity_metrics['recall']:.4f}")
    print(f"- F1-Score: {toxicity_metrics['f1_score']:.4f}")
    
    # Save model
    toxicity_clf.save(
        '../models/toxicity_svm.pkl',
        '../models/toxicity_vectorizer.pkl',
        '../models/toxicity_encoder.pkl'
    )
    save_metrics_json(toxicity_metrics, '../models/metrics_toxicity.json')
    
    print("‚úÖ Toxicity classifier trained and saved!")
else:
    print("‚è≠Ô∏è  Toxicity layer disabled in config")

## üí≠ Train Stance Classifier

In [None]:
if layer_config['enable_stance']:
    print("üí≠ Training Stance Classifier...")
    
    stance_clf = StanceClassifier(
        stance_labels=layer_config['stance_labels'],
        max_features=config['model_parameters']['max_features'],
        ngram_range=tuple(config['model_parameters']['ngram_range'])
    )
    
    # Train
    stance_clf.train(texts, stance_labels)
    
    # Evaluate
    stance_metrics = stance_clf.evaluate(texts, stance_labels)
    
    print("\nüìä Stance Classifier Metrics:")
    print(f"- Accuracy: {stance_metrics['accuracy']:.4f}")
    print(f"- Precision: {stance_metrics['precision']:.4f}")
    print(f"- Recall: {stance_metrics['recall']:.4f}")
    print(f"- F1-Score: {stance_metrics['f1_score']:.4f}")
    
    # Save model
    stance_clf.save(
        '../models/stance_svm.pkl',
        '../models/stance_vectorizer.pkl',
        '../models/stance_encoder.pkl'
    )
    save_metrics_json(stance_metrics, '../models/metrics_stance.json')
    
    print("‚úÖ Stance classifier trained and saved!")
else:
    print("‚è≠Ô∏è  Stance layer disabled in config")

## üìù Train Intent Classifier

In [None]:
if layer_config['enable_intent']:
    print("üìù Training Intent Classifier...")
    
    intent_clf = IntentClassifier(
        intent_labels=layer_config['intent_labels'],
        max_features=config['model_parameters']['max_features'],
        ngram_range=tuple(config['model_parameters']['ngram_range'])
    )
    
    # Train
    intent_clf.train(texts, intent_labels)
    
    # Evaluate
    intent_metrics = intent_clf.evaluate(texts, intent_labels)
    
    print("\nüìä Intent Classifier Metrics:")
    print(f"- Accuracy: {intent_metrics['accuracy']:.4f}")
    print(f"- Precision: {intent_metrics['precision']:.4f}")
    print(f"- Recall: {intent_metrics['recall']:.4f}")
    print(f"- F1-Score: {intent_metrics['f1_score']:.4f}")
    
    # Save model
    intent_clf.save(
        '../models/intent_svm.pkl',
        '../models/intent_vectorizer.pkl',
        '../models/intent_encoder.pkl'
    )
    save_metrics_json(intent_metrics, '../models/metrics_intent.json')
    
    print("‚úÖ Intent classifier trained and saved!")
else:
    print("‚è≠Ô∏è  Intent layer disabled in config")

## üîÆ Inference: Predict on New Comments

Load trained models and predict on new data

In [None]:
# Sample new comments to predict
new_comments = [
    "Pelatih baru harus segera dicari!",
    "Bangga dengan perjuangan kalian, terus berjuang!",
    "Kapan timnas kita bisa juara?",
    "Pemain muda perlu lebih banyak jam terbang",
    "Kecewa dengan manajemen PSSI yang tidak profesional"
]

print(f"üîÆ Predicting on {len(new_comments)} new comments...\n")

# Predict from each layer
emotion_pred, emotion_conf = emotion_clf.predict(new_comments)
aspect_pred, aspect_conf = aspect_clf.predict(new_comments)
toxicity_pred, toxicity_scores = toxicity_clf.predict(new_comments)
stance_pred, stance_conf = stance_clf.predict(new_comments)
intent_pred, intent_conf = intent_clf.predict(new_comments)

# For sentiment, we'll use dummy predictions (in real case, load your sentiment model)
sentiment_pred = ['negatif', 'positif', 'netral', 'netral', 'negatif']
sentiment_conf = np.array([0.85, 0.92, 0.68, 0.73, 0.88])

print("‚úÖ Predictions complete!")

## üìä Create Comprehensive Output DataFrame

In [None]:
# Create comprehensive output
output_df = create_layered_output_dataframe(
    texts=new_comments,
    sentiment_labels=sentiment_pred,
    sentiment_confidence=sentiment_conf,
    emotion_labels=emotion_pred,
    aspect_labels=aspect_pred,
    toxicity_labels=toxicity_pred,
    toxicity_scores=toxicity_scores,
    stance_labels=stance_pred,
    stance_confidence=stance_conf,
    intent_labels=intent_pred,
    intent_confidence=intent_conf,
    low_confidence_threshold=layer_config['low_confidence_threshold']
)

print("üìã Layered Predictions Output:\n")
display(output_df)

# Save to CSV
output_path = '../data/processed/layered_predictions_demo.csv'
output_df.to_csv(output_path, index=False, encoding='utf-8')
print(f"\n‚úÖ Output saved to: {output_path}")

## üìà Visualize Layer Distributions

In [None]:
# Visualize sentiment distribution
fig, axes = plt.subplots(2, 3, figsize=(18, 10))

# Sentiment
output_df['sentiment'].value_counts().plot(kind='bar', ax=axes[0, 0], color='skyblue')
axes[0, 0].set_title('Sentiment Distribution')
axes[0, 0].set_xlabel('')

# Toxicity
output_df['toxicity_label'].value_counts().plot(kind='bar', ax=axes[0, 1], color='coral')
axes[0, 1].set_title('Toxicity Distribution')
axes[0, 1].set_xlabel('')

# Stance
output_df['stance'].value_counts().plot(kind='bar', ax=axes[0, 2], color='lightgreen')
axes[0, 2].set_title('Stance Distribution')
axes[0, 2].set_xlabel('')

# Intent
output_df['intent'].value_counts().plot(kind='bar', ax=axes[1, 0], color='plum')
axes[1, 0].set_title('Intent Distribution')
axes[1, 0].set_xlabel('')

# Emotions (count all)
all_emotions = []
for emotions_str in output_df['emotions']:
    if pd.notna(emotions_str) and emotions_str:
        all_emotions.extend(emotions_str.split(','))
        
if all_emotions:
    pd.Series(all_emotions).value_counts().plot(kind='bar', ax=axes[1, 1], color='gold')
    axes[1, 1].set_title('Emotion Distribution (Multi-label)')
    axes[1, 1].set_xlabel('')

# Aspects (count all)
all_aspects = []
for aspects_str in output_df['aspects']:
    if pd.notna(aspects_str) and aspects_str:
        all_aspects.extend(aspects_str.split(','))

if all_aspects:
    pd.Series(all_aspects).value_counts().plot(kind='bar', ax=axes[1, 2], color='orange')
    axes[1, 2].set_title('Aspect Distribution (Multi-label)')
    axes[1, 2].set_xlabel('')

plt.tight_layout()
plt.savefig('../results/visualizations/layered_distributions.png', dpi=300, bbox_inches='tight')
plt.show()

print("‚úÖ Visualizations created and saved!")

## üìù Generate Comprehensive Report

In [None]:
# Create comprehensive report
sentiment_metrics_dummy = {
    'accuracy': 0.85,
    'precision': 0.83,
    'recall': 0.84,
    'f1_score': 0.83
}

report = create_comprehensive_report(
    sentiment_metrics=sentiment_metrics_dummy,
    emotion_metrics=emotion_metrics if layer_config['enable_emotion'] else None,
    aspect_metrics=aspect_metrics if layer_config['enable_aspect'] else None,
    toxicity_metrics=toxicity_metrics if layer_config['enable_toxicity'] else None,
    stance_metrics=stance_metrics if layer_config['enable_stance'] else None,
    intent_metrics=intent_metrics if layer_config['enable_intent'] else None
)

print(report)

# Save report
report_path = '../results/reports/layered_analytics_report.txt'
os.makedirs('../results/reports', exist_ok=True)
with open(report_path, 'w', encoding='utf-8') as f:
    f.write(report)

print(f"\n‚úÖ Report saved to: {report_path}")

## üéâ Summary

### What We've Accomplished:

1. ‚úÖ **Trained 5 Layer Classifiers**
   - Emotion (multi-label)
   - Aspect (multi-label)
   - Toxicity (binary)
   - Stance (3-class)
   - Intent (6-class)

2. ‚úÖ **Evaluated Each Layer**
   - Single-label: Accuracy, Precision, Recall, F1
   - Multi-label: Macro metrics, Hamming loss, Jaccard score

3. ‚úÖ **Saved All Artifacts**
   - Models: `models/*_svm.pkl`
   - Vectorizers: `models/*_vectorizer.pkl`
   - Encoders: `models/*_encoder.pkl` / `*_mlb.pkl`
   - Metrics: `models/metrics_*.json`

4. ‚úÖ **Created Comprehensive Output**
   - CSV with all layer predictions
   - Confidence scores per layer
   - Low confidence flags

5. ‚úÖ **Generated Visualizations**
   - Distribution per layer
   - Saved to `results/visualizations/`

6. ‚úÖ **Generated Report**
   - Comprehensive metrics across all layers
   - Saved to `results/reports/`

### Next Steps:

1. **Label More Data**: Increase training set size (target 300-500 per class)
2. **Tune Thresholds**: Adjust confidence/toxicity thresholds
3. **Integrate with Main Pipeline**: Add layered analytics to production workflow
4. **Deploy**: Create API endpoint or dashboard
5. **Monitor**: Track performance on new data

---

**üöÄ Your multi-layer analytics system is ready!**