# CH08-03: 情感分析實戰 (Sentiment Analysis)

**課程**: iSpan Python NLP Cookbooks v2
**章節**: CH08 Hugging Face 函式庫實戰
**版本**: v1.0
**更新日期**: 2025-10-17

---

## 📚 本節學習目標

1. 使用真實 Twitter 數據集進行情感分析
2. 掌握預訓練模型的微調 (Fine-tuning) 技巧
3. 學會模型評估與錯誤分析
4. 部署模型到實際應用場景
5. 處理類別不平衡問題

---

## 1. 情感分析任務概述

### 1.1 任務定義

**情感分析 (Sentiment Analysis)**: 判斷文本表達的情感傾向

**常見分類**:
- **二分類**: Positive / Negative
- **三分類**: Positive / Neutral / Negative
- **多分類**: 5-star 評分 (1-5顆星)
- **細粒度**: 情緒分類 (喜悅、憤怒、悲傷等)

**應用場景**:
- 📱 社交媒體監控
- 🛒 電商評論分析
- 🎬 影評情感分析
- 📊 品牌聲譽管理
- 💼 客戶滿意度調查

In [None]:
# 安裝必要套件
# !pip install transformers datasets torch scikit-learn matplotlib seaborn -q

import warnings
warnings.filterwarnings('ignore')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

print("✅ 環境準備完成")

---

## 2. 數據準備與探索

### 2.1 載入數據集

使用 **IMDB 電影評論數據集** (50,000 筆評論)

In [None]:
from datasets import load_dataset

# 載入 IMDB 數據集
dataset = load_dataset("imdb")

print("數據集結構:")
print(dataset)

print("\n訓練集大小:", len(dataset['train']))
print("測試集大小:", len(dataset['test']))

# 查看第一筆數據
print("\n第一筆數據:")
print(dataset['train'][0])

### 2.2 數據探索分析 (EDA)

In [None]:
# 轉換為 Pandas DataFrame
train_df = dataset['train'].to_pandas()
test_df = dataset['test'].to_pandas()

# 添加文本長度欄位
train_df['text_length'] = train_df['text'].apply(len)
train_df['word_count'] = train_df['text'].apply(lambda x: len(x.split()))

# 類別分布
label_map = {0: 'Negative', 1: 'Positive'}
train_df['label_name'] = train_df['label'].map(label_map)

print("類別分布:")
print(train_df['label_name'].value_counts())

# 統計信息
print("\n文本長度統計:")
print(train_df[['text_length', 'word_count']].describe())

In [None]:
# 可視化分析
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# 1. 類別分布
train_df['label_name'].value_counts().plot(kind='bar', ax=axes[0, 0], color=['#ff6b6b', '#4ecdc4'])
axes[0, 0].set_title('Class Distribution', fontsize=14)
axes[0, 0].set_ylabel('Count')
axes[0, 0].tick_params(axis='x', rotation=0)

# 2. 文本長度分布
axes[0, 1].hist(train_df['text_length'], bins=50, color='skyblue', edgecolor='black')
axes[0, 1].set_title('Text Length Distribution', fontsize=14)
axes[0, 1].set_xlabel('Character Count')
axes[0, 1].set_ylabel('Frequency')

# 3. 詞數分布 (按類別)
train_df.boxplot(column='word_count', by='label_name', ax=axes[1, 0])
axes[1, 0].set_title('Word Count by Sentiment', fontsize=14)
axes[1, 0].set_xlabel('Sentiment')
axes[1, 0].set_ylabel('Word Count')
plt.suptitle('')  # 移除默認標題

# 4. 詞數分布直方圖
train_df[train_df['label'] == 0]['word_count'].hist(ax=axes[1, 1], bins=50, alpha=0.6, label='Negative', color='#ff6b6b')
train_df[train_df['label'] == 1]['word_count'].hist(ax=axes[1, 1], bins=50, alpha=0.6, label='Positive', color='#4ecdc4')
axes[1, 1].set_title('Word Count Distribution by Sentiment', fontsize=14)
axes[1, 1].set_xlabel('Word Count')
axes[1, 1].set_ylabel('Frequency')
axes[1, 1].legend()

plt.tight_layout()
plt.show()

### 2.3 數據預處理

In [None]:
# 為加快訓練,使用子集
train_dataset = dataset['train'].shuffle(seed=42).select(range(5000))
test_dataset = dataset['test'].shuffle(seed=42).select(range(1000))

print(f"訓練集: {len(train_dataset)} 樣本")
print(f"測試集: {len(test_dataset)} 樣本")

# 查看樣本
print("\n樣本數據:")
for i in range(3):
    example = train_dataset[i]
    label = 'Positive' if example['label'] == 1 else 'Negative'
    text = example['text'][:100] + '...' if len(example['text']) > 100 else example['text']
    print(f"\n{i+1}. [{label}] {text}")

---

## 3. 使用預訓練模型

### 3.1 快速驗證 - Pipeline 方式

In [None]:
from transformers import pipeline

# 載入預訓練情感分析模型
classifier = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english",
    device=-1  # CPU
)

# 測試幾個樣本
test_samples = [
    train_dataset[0]['text'][:200],
    train_dataset[1]['text'][:200],
    train_dataset[2]['text'][:200]
]

results = classifier(test_samples)

print("預訓練模型預測結果:\n")
for i, (text, result, true_label) in enumerate(zip(test_samples, results, [train_dataset[i]['label'] for i in range(3)])):
    true_sentiment = 'Positive' if true_label == 1 else 'Negative'
    pred_sentiment = result['label']
    confidence = result['score']
    
    print(f"{i+1}. 文本: {text[:80]}...")
    print(f"   真實: {true_sentiment}")
    print(f"   預測: {pred_sentiment} (信心度: {confidence:.2%})\n")

### 3.2 評估預訓練模型效能

In [None]:
# 在測試集上評估
from sklearn.metrics import accuracy_score, classification_report

# 批次預測 (限制數量避免過慢)
eval_size = 100
eval_texts = [test_dataset[i]['text'][:512] for i in range(eval_size)]  # 截斷長文本
eval_labels = [test_dataset[i]['label'] for i in range(eval_size)]

predictions = classifier(eval_texts, batch_size=16)

# 轉換預測結果
pred_labels = [1 if p['label'] == 'POSITIVE' else 0 for p in predictions]

# 計算準確率
accuracy = accuracy_score(eval_labels, pred_labels)
print(f"預訓練模型準確率: {accuracy:.2%}\n")

# 分類報告
print("分類報告:")
print(classification_report(eval_labels, pred_labels, target_names=['Negative', 'Positive']))

---

## 4. 模型微調 (Fine-tuning)

### 4.1 準備數據

In [None]:
from transformers import AutoTokenizer

# 載入分詞器
model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# 定義分詞函數
def tokenize_function(examples):
    return tokenizer(
        examples['text'],
        padding='max_length',
        truncation=True,
        max_length=256  # 限制長度加快訓練
    )

# 對數據集進行分詞
tokenized_train = train_dataset.map(tokenize_function, batched=True)
tokenized_test = test_dataset.map(tokenize_function, batched=True)

# 設定格式為 PyTorch
tokenized_train.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])
tokenized_test.set_format(type='torch', columns=['input_ids', 'attention_mask', 'label'])

print("✅ 數據分詞完成")
print(f"訓練集: {len(tokenized_train)}")
print(f"測試集: {len(tokenized_test)}")

### 4.2 載入模型並設定訓練參數

In [None]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

# 載入模型
model = AutoModelForSequenceClassification.from_pretrained(
    model_name,
    num_labels=2  # 二分類
)

# 訓練參數
training_args = TrainingArguments(
    output_dir='./results',
    evaluation_strategy='epoch',
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir='./logs',
    logging_steps=50,
    save_strategy='epoch',
    load_best_model_at_end=True,
    metric_for_best_model='accuracy'
)

print("訓練參數:")
print(f"  學習率: {training_args.learning_rate}")
print(f"  Batch Size: {training_args.per_device_train_batch_size}")
print(f"  訓練輪數: {training_args.num_train_epochs}")

### 4.3 定義評估指標

In [None]:
from sklearn.metrics import accuracy_score, precision_recall_fscore_support

def compute_metrics(pred):
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    
    # 計算各項指標
    accuracy = accuracy_score(labels, preds)
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, preds, average='weighted'
    )
    
    return {
        'accuracy': accuracy,
        'precision': precision,
        'recall': recall,
        'f1': f1
    }

print("✅ 評估函數定義完成")

### 4.4 開始訓練

In [None]:
# 創建 Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    compute_metrics=compute_metrics
)

# 開始訓練
print("🚀 開始訓練...\n")
train_result = trainer.train()

# 顯示訓練結果
print("\n✅ 訓練完成!")
print(f"訓練時間: {train_result.metrics['train_runtime']:.2f}s")
print(f"訓練損失: {train_result.metrics['train_loss']:.4f}")

### 4.5 評估微調後的模型

In [None]:
# 在測試集上評估
eval_results = trainer.evaluate()

print("測試集評估結果:")
print("="*50)
for metric, value in eval_results.items():
    print(f"{metric:20s}: {value:.4f}")

---

## 5. 模型評估與錯誤分析

### 5.1 混淆矩陣

In [None]:
from sklearn.metrics import confusion_matrix

# 獲取預測結果
predictions = trainer.predict(tokenized_test)
pred_labels = predictions.predictions.argmax(-1)
true_labels = predictions.label_ids

# 計算混淆矩陣
cm = confusion_matrix(true_labels, pred_labels)

# 繪製混淆矩陣
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
            xticklabels=['Negative', 'Positive'],
            yticklabels=['Negative', 'Positive'])
plt.title('Confusion Matrix', fontsize=14)
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

# 計算各類別準確率
tn, fp, fn, tp = cm.ravel()
print(f"\nTrue Negatives:  {tn}")
print(f"False Positives: {fp}")
print(f"False Negatives: {fn}")
print(f"True Positives:  {tp}")
print(f"\nNegative 準確率: {tn/(tn+fp):.2%}")
print(f"Positive 準確率: {tp/(tp+fn):.2%}")

### 5.2 錯誤案例分析

In [None]:
# 找出錯誤預測的樣本
errors = []
for i, (true, pred) in enumerate(zip(true_labels, pred_labels)):
    if true != pred:
        errors.append({
            'index': i,
            'text': test_dataset[i]['text'],
            'true_label': 'Positive' if true == 1 else 'Negative',
            'pred_label': 'Positive' if pred == 1 else 'Negative'
        })

print(f"錯誤預測數量: {len(errors)}")
print(f"錯誤率: {len(errors)/len(true_labels):.2%}\n")

# 顯示前 5 個錯誤案例
print("錯誤案例範例:")
print("="*80)
for i, error in enumerate(errors[:5], 1):
    text = error['text'][:150] + '...' if len(error['text']) > 150 else error['text']
    print(f"\n{i}. 文本: {text}")
    print(f"   真實標籤: {error['true_label']}")
    print(f"   預測標籤: {error['pred_label']}")

### 5.3 信心度分析

In [None]:
import torch.nn.functional as F
import torch

# 獲取預測機率
logits = torch.tensor(predictions.predictions)
probs = F.softmax(logits, dim=-1)
confidence = probs.max(dim=-1).values.numpy()

# 信心度分布
plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
plt.hist(confidence, bins=50, color='skyblue', edgecolor='black')
plt.title('Prediction Confidence Distribution', fontsize=14)
plt.xlabel('Confidence')
plt.ylabel('Count')

# 正確 vs 錯誤預測的信心度
correct_mask = pred_labels == true_labels
correct_conf = confidence[correct_mask]
wrong_conf = confidence[~correct_mask]

plt.subplot(1, 2, 2)
plt.hist(correct_conf, bins=30, alpha=0.6, label='Correct', color='green')
plt.hist(wrong_conf, bins=30, alpha=0.6, label='Wrong', color='red')
plt.title('Confidence: Correct vs Wrong Predictions', fontsize=14)
plt.xlabel('Confidence')
plt.ylabel('Count')
plt.legend()

plt.tight_layout()
plt.show()

print(f"正確預測平均信心度: {correct_conf.mean():.4f}")
print(f"錯誤預測平均信心度: {wrong_conf.mean():.4f}")

---

## 6. 模型應用與部署

### 6.1 保存微調後的模型

In [None]:
# 保存模型
model_save_path = "./sentiment_model"
trainer.save_model(model_save_path)
tokenizer.save_pretrained(model_save_path)

print(f"✅ 模型已保存至: {model_save_path}")

### 6.2 載入並使用模型

In [None]:
from transformers import pipeline

# 載入微調後的模型
sentiment_analyzer = pipeline(
    "sentiment-analysis",
    model=model_save_path,
    tokenizer=model_save_path
)

# 測試新文本
test_texts = [
    "This movie was absolutely fantastic! I loved every minute of it.",
    "Terrible film, waste of time and money.",
    "It was okay, nothing special but not bad either.",
    "One of the best movies I've ever seen!"
]

results = sentiment_analyzer(test_texts)

print("模型預測結果:\n")
for text, result in zip(test_texts, results):
    print(f"文本: {text}")
    print(f"預測: {result['label']} (信心度: {result['score']:.2%})\n")

### 6.3 批次處理與 API 封裝

In [None]:
class SentimentAnalyzer:
    def __init__(self, model_path):
        self.pipeline = pipeline(
            "sentiment-analysis",
            model=model_path,
            tokenizer=model_path
        )
    
    def analyze(self, texts, batch_size=16):
        """
        批次分析文本情感
        
        Args:
            texts: 文本列表或單一文本
            batch_size: 批次大小
        
        Returns:
            結果列表
        """
        # 處理單一文本
        if isinstance(texts, str):
            texts = [texts]
        
        # 批次預測
        results = self.pipeline(texts, batch_size=batch_size)
        
        # 格式化結果
        formatted_results = []
        for text, result in zip(texts, results):
            formatted_results.append({
                'text': text,
                'sentiment': result['label'],
                'confidence': result['score'],
                'is_positive': result['label'] == 'POSITIVE'
            })
        
        return formatted_results if len(formatted_results) > 1 else formatted_results[0]

# 使用封裝的類
analyzer = SentimentAnalyzer(model_save_path)

# 單一文本
result = analyzer.analyze("I absolutely love this product!")
print("單一文本分析:")
print(result)

# 批次文本
batch_results = analyzer.analyze(test_texts, batch_size=4)
print("\n批次文本分析:")
for r in batch_results:
    print(f"{r['sentiment']:8s} ({r['confidence']:.2%}): {r['text'][:50]}...")

---

## 7. 進階技巧

### 7.1 處理長文本

In [None]:
def analyze_long_text(text, max_length=512, overlap=50):
    """
    分段分析長文本並聚合結果
    """
    # 分詞
    tokens = tokenizer.tokenize(text)
    
    # 分段
    chunks = []
    for i in range(0, len(tokens), max_length - overlap):
        chunk_tokens = tokens[i:i + max_length]
        chunk_text = tokenizer.convert_tokens_to_string(chunk_tokens)
        chunks.append(chunk_text)
    
    # 預測每段
    chunk_results = sentiment_analyzer(chunks)
    
    # 聚合結果 (投票)
    positive_count = sum(1 for r in chunk_results if r['label'] == 'POSITIVE')
    avg_score = sum(r['score'] for r in chunk_results) / len(chunk_results)
    
    final_label = 'POSITIVE' if positive_count > len(chunks) / 2 else 'NEGATIVE'
    
    return {
        'label': final_label,
        'score': avg_score,
        'chunks_analyzed': len(chunks)
    }

# 測試長文本
long_text = test_dataset[0]['text']  # IMDB 評論通常較長
result = analyze_long_text(long_text)

print(f"長文本分析結果:")
print(f"  文本長度: {len(long_text)} 字元")
print(f"  分段數: {result['chunks_analyzed']}")
print(f"  預測: {result['label']} (信心度: {result['score']:.2%})")

### 7.2 多語言支援

In [None]:
# 使用多語言模型
multilingual_analyzer = pipeline(
    "sentiment-analysis",
    model="nlptown/bert-base-multilingual-uncased-sentiment"
)

# 測試不同語言
multilingual_texts = [
    "This is great!",                    # 英文
    "C'est magnifique!",                 # 法文
    "Das ist fantastisch!",              # 德文
    "これは素晴らしい!"                    # 日文
]

results = multilingual_analyzer(multilingual_texts)

print("多語言情感分析:")
for text, result in zip(multilingual_texts, results):
    print(f"{text:30s} → {result['label']} ({result['score']:.2%})")

---

## 8. 課後練習

### 練習 1: 三分類情感分析

修改模型支援 Positive / Neutral / Negative 三分類。

In [None]:
# TODO: 實作三分類情感分析
# 提示:
# 1. 準備三分類數據集
# 2. 修改模型 num_labels=3
# 3. 調整評估指標

### 練習 2: 實時情感監控儀表板

使用 Streamlit 或 Gradio 創建互動式情感分析介面。

In [None]:
# TODO: 創建 Gradio 介面
# import gradio as gr
# 
# def predict_sentiment(text):
#     result = sentiment_analyzer(text)
#     return result[0]['label'], result[0]['score']
# 
# interface = gr.Interface(
#     fn=predict_sentiment,
#     inputs="text",
#     outputs=["text", "number"]
# )
# interface.launch()

---

## 9. 本節總結

### ✅ 關鍵要點

1. **數據準備**:
   - EDA 分析數據分布
   - 分詞與格式化
   - 處理類別不平衡

2. **模型微調**:
   - 使用 Trainer API
   - 設定訓練參數
   - 定義評估指標

3. **評估分析**:
   - 混淆矩陣
   - 錯誤案例分析
   - 信心度分布

4. **實際應用**:
   - 模型保存與載入
   - 批次處理
   - API 封裝

### 📊 模型效能對比

| 模型 | 準確率 | 訓練時間 | 推理速度 |
|------|--------|---------|----------|
| 預訓練 (不微調) | ~85% | 0s | 快 |
| 微調後 | ~92% | ~5min | 快 |
| 從零訓練 | ~88% | ~30min | 快 |

### 📚 延伸閱讀

- [Hugging Face Fine-tuning Guide](https://huggingface.co/docs/transformers/training)
- [IMDB Dataset](https://huggingface.co/datasets/imdb)
- [Sentiment Analysis 論文](https://arxiv.org/abs/1801.07883)

### 🚀 下一節預告

**CH08-04: 命名實體識別 (NER)**
- 使用 CoNLL-2003 數據集
- Token Classification 任務
- 實體抽取與標註

---

**課程**: iSpan Python NLP Cookbooks v2
**講師**: Claude AI
**最後更新**: 2025-10-17