# Advanced Text Classification with Multi-label Analysis

This notebook demonstrates the complete text classification system with:
- Multi-label text classification
- Aspect-wise sentiment analysis
- Confidence/probability scoring
- Keyword extraction
- Emotion detection
- Text length and tone analysis
- Batch processing capabilities

## 1. Setup and Imports

In [None]:
# Import required libraries
import sys
import os
sys.path.append('../backend')

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import warnings
warnings.filterwarnings('ignore')

# Import our custom modules
from model.text_classifier import TextClassifier
from model.aspect_analyzer import AspectAnalyzer
from model.emotion_detector import EmotionDetector
from utils.text_processor import TextProcessor
from utils.keyword_extractor import KeywordExtractor

print("All modules imported successfully!")

## 2. Initialize Models

In [None]:
# Initialize all models
text_classifier = TextClassifier()
aspect_analyzer = AspectAnalyzer()
emotion_detector = EmotionDetector()
text_processor = TextProcessor()
keyword_extractor = KeywordExtractor()

print("Models initialized successfully!")
print("\nAvailable features:")
print("‚úì Multi-label Classification")
print("‚úì Aspect-wise Sentiment Analysis")
print("‚úì Emotion Detection")
print("‚úì Text Analysis (Length, Tone, Complexity)")
print("‚úì Keyword Extraction")
print("‚úì Confidence Scoring")

## 3. Sample Texts for Analysis

In [None]:
# Sample texts for comprehensive analysis
sample_texts = [
    "The acting was absolutely brilliant and the performances were outstanding, but the story was quite boring and predictable.",
    "This movie is a masterpiece! Fantastic direction, beautiful cinematography, and the music score is simply incredible.",
    "Terrible film with poor acting and a confusing plot. The music was annoying and the direction was amateurish.",
    "The cinematography was stunning and the visual effects were amazing, however the pacing was too slow and the dialogue felt unnatural.",
    "I loved the emotional depth of the characters and the storyline was engaging, but the ending was disappointing.",
    "Mediocre movie with average performances. Nothing special but not terrible either. The music was okay."
]

print(f"Loaded {len(sample_texts)} sample texts for analysis")
for i, text in enumerate(sample_texts, 1):
    print(f"\n{i}. {text[:80]}...")

## 4. Complete Text Analysis Example

In [None]:
# Let's analyze one text with all features
test_text = sample_texts[0]
print(f"Analyzing: '{test_text}'")
print("=" * 80)

# 1. Multi-label Classification
sentiment_result = text_classifier.predict_sentiment(test_text)
topic_result = text_classifier.predict_topics(test_text)

print("\nüéØ MULTI-LABEL CLASSIFICATION:")
print(f"Sentiment: {sentiment_result['label']} (Confidence: {sentiment_result['confidence']:.3f})")
print(f"Topics: {', '.join(topic_result)}")

# 2. Aspect-wise Sentiment Analysis
aspect_result = aspect_analyzer.analyze_aspects(test_text)
print("\nüîç ASPECT-WISE SENTIMENT ANALYSIS:")
for aspect, sentiment in aspect_result.items():
    print(f"{aspect.capitalize()}: {sentiment}")

# 3. Emotion Detection
emotion_result = emotion_detector.detect_emotion(test_text)
print("\nüòä EMOTION DETECTION:")
print(f"Emotion: {emotion_result['label']} (Confidence: {emotion_result['confidence']:.3f})")
print("Emotion Scores:")
for emotion, score in emotion_result['scores'].items():
    print(f"  {emotion.capitalize()}: {score:.3f}")

# 4. Text Analysis
text_analysis = text_processor.analyze_text(test_text)
print("\nüìä TEXT ANALYSIS:")
print(f"Length: {text_analysis['length']} characters")
print(f"Words: {text_analysis['word_count']}")
print(f"Sentences: {text_analysis['sentence_count']}")
print(f"Tone: {text_analysis['tone']}")
print(f"Formality: {text_analysis['formality']}")
print(f"Sentiment Strength: {text_analysis['sentiment_strength']}")
print(f"Complexity: {text_analysis['complexity']}")

# 5. Keyword Extraction
keywords = keyword_extractor.extract_keywords(test_text)
print("\nüîë KEYWORDS:")
print(f"Extracted keywords: {', '.join(keywords)}")

## 5. Batch Analysis Demo

In [None]:
# Analyze all sample texts
results = []

for i, text in enumerate(sample_texts, 1):
    print(f"\nAnalyzing Text {i}...")
    
    # Perform complete analysis
    sentiment_result = text_classifier.predict_sentiment(text)
    topic_result = text_classifier.predict_topics(text)
    aspect_result = aspect_analyzer.analyze_aspects(text)
    emotion_result = emotion_detector.detect_emotion(text)
    text_analysis = text_processor.analyze_text(text)
    keywords = keyword_extractor.extract_keywords(text)
    
    result = {
        'text_id': i,
        'text': text,
        'sentiment': sentiment_result['label'],
        'sentiment_confidence': sentiment_result['confidence'],
        'topics': ', '.join(topic_result),
        'acting_sentiment': aspect_result.get('acting', 'Neutral'),
        'story_sentiment': aspect_result.get('story', 'Neutral'),
        'music_sentiment': aspect_result.get('music', 'Neutral'),
        'direction_sentiment': aspect_result.get('direction', 'Neutral'),
        'emotion': emotion_result['label'],
        'emotion_confidence': emotion_result['confidence'],
        'text_length': text_analysis['length'],
        'tone': text_analysis['tone'],
        'keywords': ', '.join(keywords[:5])  # Top 5 keywords
    }
    
    results.append(result)

# Convert to DataFrame for better visualization
df = pd.DataFrame(results)
print("\nüìà BATCH ANALYSIS RESULTS:")
print(df.to_string(index=False))

## 6. Visualizations

In [None]:
# Create visualizations
fig, axes = plt.subplots(2, 2, figsize=(15, 12))
fig.suptitle('Text Classification Analysis Dashboard', fontsize=16, fontweight='bold')

# 1. Sentiment Distribution
sentiment_counts = df['sentiment'].value_counts()
axes[0, 0].pie(sentiment_counts.values, labels=sentiment_counts.index, autopct='%1.1f%%', 
               colors=['#10b981', '#ef4444', '#6b7280'])
axes[0, 0].set_title('Sentiment Distribution')

# 2. Emotion Distribution
emotion_counts = df['emotion'].value_counts()
axes[0, 1].pie(emotion_counts.values, labels=emotion_counts.index, autopct='%1.1f%%',
               colors=['#fbbf24', '#3b82f6', '#ef4444', '#6b7280'])
axes[0, 1].set_title('Emotion Distribution')

# 3. Confidence Scores
axes[1, 0].bar(df.index, df['sentiment_confidence'], color='#3b82f6', alpha=0.7)
axes[1, 0].set_title('Sentiment Confidence Scores')
axes[1, 0].set_xlabel('Text ID')
axes[1, 0].set_ylabel('Confidence')
axes[1, 0].set_ylim(0, 1)

# 4. Text Length Distribution
axes[1, 1].bar(df.index, df['text_length'], color='#10b981', alpha=0.7)
axes[1, 1].set_title('Text Length Distribution')
axes[1, 1].set_xlabel('Text ID')
axes[1, 1].set_ylabel('Characters')

plt.tight_layout()
plt.show()

## 7. Aspect-wise Analysis Summary

In [None]:
# Aspect-wise sentiment analysis summary
aspects = ['acting_sentiment', 'story_sentiment', 'music_sentiment', 'direction_sentiment']
aspect_summary = {}

for aspect in aspects:
    aspect_name = aspect.replace('_sentiment', '').capitalize()
    counts = df[aspect].value_counts()
    aspect_summary[aspect_name] = counts

print("üìä ASPECT-WISE SENTIMENT SUMMARY:")
for aspect, counts in aspect_summary.items():
    print(f"\n{aspect}:")
    for sentiment, count in counts.items():
        percentage = (count / len(df)) * 100
        print(f"  {sentiment}: {count} ({percentage:.1f}%)")

# Create aspect comparison chart
fig, ax = plt.subplots(figsize=(12, 8))

aspect_data = []
for aspect in aspects:
    aspect_name = aspect.replace('_sentiment', '').capitalize()
    counts = df[aspect].value_counts()
    for sentiment, count in counts.items():
        aspect_data.append({
            'Aspect': aspect_name,
            'Sentiment': sentiment,
            'Count': count
        })

aspect_df = pd.DataFrame(aspect_data)
pivot_df = aspect_df.pivot(index='Aspect', columns='Sentiment', values='Count').fillna(0)

pivot_df.plot(kind='bar', ax=ax, width=0.8)
ax.set_title('Aspect-wise Sentiment Analysis', fontweight='bold')
ax.set_xlabel('Aspects')
ax.set_ylabel('Count')
ax.legend(title='Sentiment')
ax.tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

## 8. Export Results to CSV

In [None]:
# Save results to CSV
output_filename = 'text_classification_results.csv'
df.to_csv(output_filename, index=False)
print(f"\nüíæ Results saved to '{output_filename}'")
print(f"Total texts analyzed: {len(df)}")
print(f"Columns: {list(df.columns)}")

# Display first few rows
print("\nüìã Sample of results:")
print(df.head(3).to_string(index=False))

## 9. Advanced Analysis Examples

In [None]:
# Example 1: Complex mixed sentiment text
complex_text = "The cinematography was absolutely breathtaking and the visual effects were stunning, but the storyline was incredibly boring and the acting felt forced and unnatural."

print("üé¨ COMPLEX MIXED SENTIMENT EXAMPLE:")
print(f"Text: '{complex_text}'")
print("=" * 80)

# Analyze
sentiment = text_classifier.predict_sentiment(complex_text)
aspects = aspect_analyzer.analyze_aspects(complex_text)
emotion = emotion_detector.detect_emotion(complex_text)
keywords = keyword_extractor.extract_keywords(complex_text)

print(f"\nOverall Sentiment: {sentiment['label']} (Confidence: {sentiment['confidence']:.3f})")
print("\nAspect-wise Analysis:")
for aspect, sent in aspects.items():
    print(f"  {aspect.capitalize()}: {sent}")
print(f"\nEmotion: {emotion['label']} (Confidence: {emotion['confidence']:.3f})")
print(f"\nKeywords: {', '.join(keywords)}")

In [None]:
# Example 2: Text with strong emotions
emotional_text = "I am absolutely furious and devastated by this terrible movie! It's disgusting and pathetic! I hate everything about it!"

print("üò° STRONG EMOTION EXAMPLE:")
print(f"Text: '{emotional_text}'")
print("=" * 80)

# Analyze
sentiment = text_classifier.predict_sentiment(emotional_text)
emotion = emotion_detector.detect_emotion(emotional_text)
text_analysis = text_processor.analyze_text(emotional_text)

print(f"\nSentiment: {sentiment['label']} (Confidence: {sentiment['confidence']:.3f})")
print(f"Emotion: {emotion['label']} (Confidence: {emotion['confidence']:.3f})")
print(f"Sentiment Strength: {text_analysis['sentiment_strength']}")
print(f"Tone: {text_analysis['tone']}")
print(f"Formality: {text_analysis['formality']}")

## 10. Performance Metrics and Statistics

In [None]:
# Calculate performance statistics
print("üìà PERFORMANCE METRICS:")
print("=" * 50)

# Confidence statistics
avg_sentiment_confidence = df['sentiment_confidence'].mean()
avg_emotion_confidence = df['emotion_confidence'].mean()

print(f"\nüìä Confidence Scores:")
print(f"  Average Sentiment Confidence: {avg_sentiment_confidence:.3f}")
print(f"  Average Emotion Confidence: {avg_emotion_confidence:.3f}")

# Text statistics
avg_text_length = df['text_length'].mean()
min_text_length = df['text_length'].min()
max_text_length = df['text_length'].max()

print(f"\nüìù Text Statistics:")
print(f"  Average Text Length: {avg_text_length:.1f} characters")
print(f"  Min Text Length: {min_text_length} characters")
print(f"  Max Text Length: {max_text_length} characters")

# Topic distribution
all_topics = []
for topics in df['topics']:
    all_topics.extend([topic.strip() for topic in topics.split(',')])

topic_counts = Counter(all_topics)
print(f"\nüè∑Ô∏è Topic Distribution:")
for topic, count in topic_counts.most_common():
    print(f"  {topic}: {count}")

# Tone distribution
tone_counts = df['tone'].value_counts()
print(f"\nüé≠ Tone Distribution:")
for tone, count in tone_counts.items():
    percentage = (count / len(df)) * 100
    print(f"  {tone}: {count} ({percentage:.1f}%)")

## 11. Conclusion

This notebook demonstrates a comprehensive text classification system with advanced NLP capabilities:

### ‚úÖ **Features Implemented:**

1. **Multi-label Classification** - Identifies multiple topics simultaneously
2. **Aspect-wise Sentiment Analysis** - Granular sentiment for different aspects
3. **Confidence Scoring** - Probability scores for all predictions
4. **Keyword Extraction** - Intelligent keyword identification
5. **Emotion Detection** - Detects Happy, Sad, Angry, Neutral emotions
6. **Text Analysis** - Length, tone, formality, complexity metrics
7. **Batch Processing** - Analyze multiple texts efficiently

### üéØ **Key Strengths:**

- **Comprehensive Analysis**: Covers multiple dimensions of text understanding
- **High Confidence**: Reliable predictions with confidence scores
- **Granular Insights**: Aspect-wise analysis provides detailed understanding
- **Scalable**: Can handle batch processing for large datasets
- **Interpretable**: Clear explanations and visualizations

### üöÄ **Use Cases:**

- Customer feedback analysis
- Social media monitoring
- Product review analysis
- Market research
- Content moderation
- Sentiment tracking

This system provides a solid foundation for advanced text classification and can be extended with additional features as needed.