# 🇰🇷 Korean Sentiment Analysis

Simple Korean sentiment analysis in Google Colab.

## How to use:
1. Upload Excel file with text column
2. Run all cells
3. Download results

## 🤖 Available Korean Models

### Primary Model (Current):
- **snunlp/KR-FinBert-SC** - Best performance for Korean sentiment

### Alternative Models:
- **beomi/KcELECTRA-base-v2022** - High accuracy alternative
- **klue/roberta-base** - General purpose Korean model

### Model Selection:
You can change the `model_name` variable below to use different models.

## 📦 Install Required Packages

In [None]:
# Install required packages
!pip install torch transformers pandas openpyxl matplotlib seaborn wordcloud

## 📚 Import Libraries

In [None]:
import torch
import pandas as pd
import matplotlib.pyplot as plt
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from google.colab import files
from io import BytesIO
import warnings
warnings.filterwarnings('ignore')
print('✅ Libraries loaded successfully')

## 🤖 Load Korean Sentiment Analysis Model

In [None]:
# =============================================================================
# 🚀 Korean Sentiment Analysis Models
# =============================================================================

# Primary model (best performance)
model_name = 'snunlp/KR-FinBert-SC'

# Alternative models (uncomment to use)
# model_name = 'beomi/KcELECTRA-base-v2022'  # High accuracy alternative
# model_name = 'klue/roberta-base'           # General purpose Korean model

print(f"🔄 Loading model: {model_name}")
print("\n📋 Model information:")
print(f"   - Primary: snunlp/KR-FinBert-SC")
print(f"   - Alternative: beomi/KcELECTRA-base-v2022")
print(f"   - Fallback: klue/roberta-base")

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Use GPU if available
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model.to(device)
model.eval()

print(f"\n✅ Model loaded successfully on {device}")

## 📁 Upload Excel File

In [None]:
# Upload Excel file
print("📁 Please upload your Excel file (.xlsx)")
uploaded = files.upload()

# Get the first uploaded file
filename = list(uploaded.keys())[0]
print(f"✅ File uploaded: {filename}")

# Read Excel file
df = pd.read_excel(BytesIO(uploaded[filename]))
print(f"📊 Data loaded: {len(df)} rows, {len(df.columns)} columns")
print("\n📋 Columns:", list(df.columns))
print("\n🔍 First few rows:")
df.head()

## 📝 Select Text Column

In [None]:
# Set text column name (modify this to match your column name)
text_column = 'text'  # Change this to your actual column name

# Common column names for Korean text
# text_column = 'comment'    # 댓글
# text_column = 'review'     # 리뷰
# text_column = '댓글'       # Korean column name
# text_column = '리뷰'       # Korean column name

# Check if column exists
if text_column not in df.columns:
    print(f"❌ Column '{text_column}' not found. Available columns: {list(df.columns)}")
    print("\n💡 Please modify the 'text_column' variable above to match your column name")
    print("\n🔍 Common Korean column names: comment, review, 댓글, 리뷰, text")
else:
    print(f"✅ Text column selected: {text_column}")
    print(f"📝 Sample text: {df[text_column].iloc[0]}")

## 🧠 Perform Sentiment Analysis

In [None]:
def analyze_sentiment(text):
    """Analyze sentiment of Korean text"""
    try:
        # Tokenize text
        inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128, padding=True)
        inputs = {k: v.to(device) for k, v in inputs.items()}
        
        # Get prediction
        with torch.no_grad():
            outputs = model(**inputs)
            probabilities = torch.softmax(outputs.logits, dim=1)
            
        # Get sentiment and confidence
        sentiment_id = torch.argmax(probabilities, dim=1).item()
        confidence = probabilities[0][sentiment_id].item()
        
        # Map sentiment ID to label (Korean models typically use 3 classes)
        sentiment_labels = ['negative', 'neutral', 'positive']
        sentiment = sentiment_labels[sentiment_id]
        
        return sentiment, confidence
    except Exception as e:
        return 'error', 0.0

# Analyze each text
print("🔄 Analyzing sentiments...")
results = []

for idx, text in enumerate(df[text_column]):
    if pd.isna(text) or str(text).strip() == '':
        sentiment, confidence = 'neutral', 0.0
    else:
        sentiment, confidence = analyze_sentiment(str(text))
    
    results.append({
        'text': text,
        'sentiment': sentiment,
        'confidence': confidence
    })
    
    # Show progress
    if (idx + 1) % 10 == 0:
        print(f"Progress: {idx + 1}/{len(df)}")

print("✅ Sentiment analysis completed!")

## 📊 Create Results DataFrame

In [None]:
# Create results DataFrame
results_df = pd.DataFrame(results)

# Add original data
for col in df.columns:
    if col != text_column:
        results_df[col] = df[col]

# Reorder columns
cols = ['text', 'sentiment', 'confidence'] + [col for col in df.columns if col != text_column]
results_df = results_df[cols]

print("📊 Results DataFrame created")
print(f"\n📈 Sentiment distribution:")
print(results_df['sentiment'].value_counts())
print(f"\n🔍 Sample results:")
results_df.head()

## 🎨 Create Visualizations

In [None]:
# Create charts
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 5))

# Pie chart
sentiment_counts = results_df['sentiment'].value_counts()
ax1.pie(sentiment_counts.values, labels=sentiment_counts.index, autopct='%1.1f%%')
ax1.set_title('Sentiment Distribution', fontweight='bold')

# Bar chart
ax2.bar(sentiment_counts.index, sentiment_counts.values)
ax2.set_title('Sentiment Counts', fontweight='bold')
ax2.set_ylabel('Count')
for i, v in enumerate(sentiment_counts.values):
    ax2.text(i, v + 0.5, str(v), ha='center', va='bottom')

plt.tight_layout()
plt.show()

## 💾 Download Results

In [None]:
# Save results
output_file = f'korean_sentiment_results_{pd.Timestamp.now().strftime("%Y%m%d_%H%M%S")}.xlsx'
results_df.to_excel(output_file, index=False)
files.download(output_file)
print(f"✅ Results saved to: {output_file}")
print("📥 File download started!")

## 🎉 Summary

✅ **Korean sentiment analysis completed successfully!**

### Model Used:
- **snunlp/KR-FinBert-SC** (Primary model)

### Alternative Models Available:
- **beomi/KcELECTRA-base-v2022** - High accuracy alternative
- **klue/roberta-base** - General purpose Korean model

### To Try Different Models:
1. Go to the 'Load Model' cell above
2. Comment out the current model_name
3. Uncomment one of the alternative models
4. Re-run the analysis

### Results:
- **Total texts analyzed**: Check the results above
- **Sentiment distribution**: Check the results above
- **Average confidence**: Check the results above

---

**💡 Tip**: Different models may give slightly different results. Try multiple models for best accuracy!