# Natural Language Understanding: Text Classification & Named Entity Recognition

This notebook demonstrates comprehensive NLU tasks using Hugging Face Transformers and spaCy for text classification and entity recognition.

In [None]:
# Install required packages
!pip install transformers spacy torch scikit-learn matplotlib seaborn
!python -m spacy download en_core_web_sm

In [None]:
# Import required libraries
import warnings
warnings.filterwarnings('ignore')

from transformers import pipeline
import spacy
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import pandas as pd

print("✅ All libraries imported successfully!")

## Text Classification

Text classification is the task of assigning predefined categories or labels to text documents. We'll explore different types of classification tasks including sentiment analysis, topic classification, and emotion detection.

### 1. Sentiment Analysis

Sentiment analysis determines the emotional tone of text (positive, negative, neutral).

In [None]:
# Initialize sentiment analysis pipeline
sentiment_classifier = pipeline('sentiment-analysis', model='distilbert-base-uncased-finetuned-sst-2-english')

# Test examples
texts = [
    "The movie was absolutely fantastic! I loved every minute of it.",
    "This product is terrible and completely useless.",
    "The weather is okay today, nothing special.",
    "I'm so excited about the upcoming vacation!",
    "The service at the restaurant was disappointing."
]

print("🎭 Sentiment Analysis Results")
print("=" * 50)

for i, text in enumerate(texts, 1):
    result = sentiment_classifier(text)[0]
    emoji = "😊" if result['label'] == 'POSITIVE' else "😔"
    
    print(f"{i}. Text: {text}")
    print(f"   {emoji} Sentiment: {result['label']} (confidence: {result['score']:.4f})")
    print()

### 2. Emotion Detection

Emotion detection goes beyond sentiment to identify specific emotions like joy, anger, fear, sadness, etc.

In [None]:
# Initialize emotion detection pipeline
emotion_classifier = pipeline('text-classification', model='j-hartmann/emotion-english-distilroberta-base')

# Test examples for emotion detection
emotion_texts = [
    "I can't believe I won the lottery! This is amazing!",
    "I'm so frustrated with this constant technical issues.",
    "The thunderstorm outside is making me feel anxious.",
    "I feel so lonely since moving to this new city.",
    "That horror movie really scared me last night.",
    "I'm disgusted by the way they treated the animals."
]

print("😊 Emotion Detection Results")
print("=" * 50)

emotion_emoji_map = {
    'joy': '😄', 'anger': '😠', 'fear': '😨', 
    'sadness': '😢', 'surprise': '😲', 'disgust': '🤢'
}

for i, text in enumerate(emotion_texts, 1):
    result = emotion_classifier(text)[0]
    emotion = result['label'].lower()
    emoji = emotion_emoji_map.get(emotion, '😐')
    
    print(f"{i}. Text: {text}")
    print(f"   {emoji} Emotion: {result['label']} (confidence: {result['score']:.4f})")
    print()

## Named Entity Recognition (NER)

Named Entity Recognition identifies and classifies named entities in text into predefined categories such as persons, organizations, locations, dates, etc.

### 1. spaCy NER

spaCy provides robust named entity recognition with pre-trained models.

In [None]:
# Load spaCy model
nlp = spacy.load('en_core_web_sm')

# Test examples for NER
ner_texts = [
    "Apple Inc. is looking at buying U.K. startup for $1 billion on December 15, 2023.",
    "Elon Musk founded SpaceX in 2002 and later became CEO of Tesla in Palo Alto, California.",
    "The meeting with Microsoft will be held at 3 PM EST in New York City next Tuesday.",
    "Dr. Jane Smith from Harvard University published a research paper in Nature journal.",
    "Amazon announced a new AWS data center in Frankfurt, Germany, investing €2.8 billion."
]

print("🏷️  spaCy Named Entity Recognition")
print("=" * 60)

# Entity type descriptions
entity_descriptions = {
    'PERSON': '👤 Person names',
    'ORG': '🏢 Organizations, companies',
    'GPE': '🌍 Countries, cities, states',
    'MONEY': '💰 Monetary values',
    'DATE': '📅 Dates and times',
    'TIME': '⏰ Times',
    'PRODUCT': '📦 Products, vehicles',
    'EVENT': '🎉 Events',
    'FAC': '🏛️  Buildings, facilities',
    'LOC': '📍 Locations',
    'NORP': '🏛️  Nationalities, religious groups'
}

for i, text in enumerate(ner_texts, 1):
    doc = nlp(text)
    print(f"{i}. Text: {text}")
    print("   Entities found:")
    
    if doc.ents:
        for ent in doc.ents:
            desc = entity_descriptions.get(ent.label_, f"📌 {ent.label_}")
            print(f"      • {ent.text} → {desc}")
    else:
        print("      • No entities found")
    print()

### 2. Transformers NER

Hugging Face Transformers also provides powerful NER models with different capabilities.

In [None]:
# Initialize Transformers NER pipeline
ner_pipeline = pipeline('ner', model='dbmdz/bert-large-cased-finetuned-conll03-english', grouped_entities=True)

# Test text for comparison
test_text = "Apple Inc. is looking at buying U.K. startup for $1 billion on December 15, 2023."

print("🤖 Transformers NER Results")
print("=" * 50)
print(f"Text: {test_text}")
print("\nEntities found:")

entities = ner_pipeline(test_text)
for entity in entities:
    confidence_bar = "█" * int(entity['score'] * 10)
    print(f"  • {entity['word']}: {entity['entity_group']} (confidence: {entity['score']:.4f}) {confidence_bar}")

# Compare with spaCy results
print("\n🔄 Comparison with spaCy:")
print("=" * 30)
doc = nlp(test_text)
print("spaCy entities:")
for ent in doc.ents:
    print(f"  • {ent.text}: {ent.label_}")

print("\nTransformers entities:")
for entity in entities:
    print(f"  • {entity['word']}: {entity['entity_group']}")

## Data Visualization

Let's visualize the results of our NLU analysis to better understand the patterns.

In [None]:
# Analyze sentiment distribution
sentiment_results = []
for text in texts:
    result = sentiment_classifier(text)[0]
    sentiment_results.append(result['label'])

# Count sentiment distribution
sentiment_counts = Counter(sentiment_results)

# Create visualizations
fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(2, 2, figsize=(15, 12))

# 1. Sentiment Distribution
ax1.pie(sentiment_counts.values(), labels=sentiment_counts.keys(), autopct='%1.1f%%', 
        colors=['lightgreen', 'lightcoral'])
ax1.set_title('📊 Sentiment Distribution', fontsize=14, fontweight='bold')

# 2. Emotion Distribution
emotion_results = []
for text in emotion_texts:
    result = emotion_classifier(text)[0]
    emotion_results.append(result['label'])

emotion_counts = Counter(emotion_results)
ax2.bar(emotion_counts.keys(), emotion_counts.values(), color='skyblue')
ax2.set_title('😊 Emotion Distribution', fontsize=14, fontweight='bold')
ax2.tick_params(axis='x', rotation=45)

# 3. Entity Type Distribution (spaCy)
all_entities = []
for text in ner_texts:
    doc = nlp(text)
    for ent in doc.ents:
        all_entities.append(ent.label_)

entity_counts = Counter(all_entities)
ax3.barh(list(entity_counts.keys()), list(entity_counts.values()), color='lightgreen')
ax3.set_title('🏷️  Entity Types (spaCy)', fontsize=14, fontweight='bold')

# 4. Confidence Scores Distribution
confidence_scores = []
for text in ner_texts:
    entities = ner_pipeline(text)
    for entity in entities:
        confidence_scores.append(entity['score'])

ax4.hist(confidence_scores, bins=10, color='orange', alpha=0.7, edgecolor='black')
ax4.set_title('📈 NER Confidence Scores', fontsize=14, fontweight='bold')
ax4.set_xlabel('Confidence Score')
ax4.set_ylabel('Frequency')

plt.tight_layout()
plt.show()

# Summary statistics
print("📈 Summary Statistics")
print("=" * 40)
print(f"• Total texts analyzed: {len(texts + emotion_texts + ner_texts)}")
print(f"• Unique entity types found: {len(entity_counts)}")
print(f"• Average NER confidence: {sum(confidence_scores)/len(confidence_scores):.4f}")
print(f"• Most common sentiment: {max(sentiment_counts, key=sentiment_counts.get)}")
print(f"• Most common emotion: {max(emotion_counts, key=emotion_counts.get)}")
print(f"• Most common entity type: {max(entity_counts, key=entity_counts.get)}")

## Comprehensive NLU Analysis

Let's create a unified function that performs all NLU tasks on any given text.

In [None]:
def comprehensive_nlu_analysis(text):
    """
    Perform comprehensive NLU analysis including sentiment, emotion, and entity recognition
    """
    print("🔍 Comprehensive NLU Analysis")
    print("=" * 60)
    print(f"📝 Input Text: {text}")
    print()
    
    # Sentiment Analysis
    sentiment_result = sentiment_classifier(text)[0]
    sentiment_emoji = "😊" if sentiment_result['label'] == 'POSITIVE' else "😔"
    print(f"🎭 Sentiment: {sentiment_emoji} {sentiment_result['label']} (confidence: {sentiment_result['score']:.4f})")
    
    # Emotion Detection
    emotion_result = emotion_classifier(text)[0]
    emotion = emotion_result['label'].lower()
    emotion_emoji = emotion_emoji_map.get(emotion, '😐')
    print(f"😊 Emotion: {emotion_emoji} {emotion_result['label']} (confidence: {emotion_result['score']:.4f})")
    
    # spaCy NER
    doc = nlp(text)
    print("🏷️  Entities (spaCy):")
    if doc.ents:
        for ent in doc.ents:
            desc = entity_descriptions.get(ent.label_, f"📌 {ent.label_}")
            print(f"   • {ent.text} → {desc}")
    else:
        print("   • No entities found")
    
    # Transformers NER
    entities = ner_pipeline(text)
    print("🤖 Entities (Transformers):")
    if entities:
        for entity in entities:
            print(f"   • {entity['word']} → {entity['entity_group']} (confidence: {entity['score']:.4f})")
    else:
        print("   • No entities found")
    
    print("\n" + "="*60 + "\n")
    
    return {
        'sentiment': sentiment_result,
        'emotion': emotion_result,
        'spacy_entities': [(ent.text, ent.label_) for ent in doc.ents],
        'transformer_entities': [(e['word'], e['entity_group'], e['score']) for e in entities]
    }

# Test the comprehensive function
test_samples = [
    "I'm thrilled to announce that our company Apple Inc. will be launching a new product in San Francisco next month!",
    "The disappointing performance by Tesla in the stock market yesterday really worried investors in New York.",
    "Dr. Sarah Johnson from MIT published groundbreaking research about artificial intelligence in Nature journal last week."
]

for sample in test_samples:
    result = comprehensive_nlu_analysis(sample)

## Interactive Experimentation

Try your own text examples below! Modify the `user_text` variable to experiment with different NLU scenarios.

In [None]:
# 🧪 Experiment with your own text!
# Modify this variable to test different texts
user_text = "I absolutely love the new iPhone released by Apple in Cupertino! Tim Cook did an amazing job presenting it."

# Run comprehensive analysis
print("🧪 Your Custom Analysis")
result = comprehensive_nlu_analysis(user_text)

# Additional analysis: Word count and text statistics
words = user_text.split()
print("📊 Text Statistics:")
print(f"   • Word count: {len(words)}")
print(f"   • Character count: {len(user_text)}")
print(f"   • Sentence count: {len([s for s in user_text.split('.') if s.strip()])}")

# Entity density
spacy_entity_count = len(result['spacy_entities'])
transformer_entity_count = len(result['transformer_entities'])
print(f"   • Entity density (spaCy): {spacy_entity_count/len(words):.2%}")
print(f"   • Entity density (Transformers): {transformer_entity_count/len(words):.2%}")

print("\n💡 Try modifying the 'user_text' variable above with your own examples!")