# üöÄ Sentiment Analysis Training on Google Colab

This notebook helps you train sentiment analysis models on Google Colab with T4 GPU.

**Steps:**
1. Check GPU availability
2. Clone your GitHub repository
3. Install dependencies
4. Run training
5. Download results

---


## 1Ô∏è‚É£ Check GPU


In [None]:
import torch

print("üîç Checking GPU availability...")
if torch.cuda.is_available():
    print(f"‚úÖ GPU is available: {torch.cuda.get_device_name(0)}")
    print(f"üíæ GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
    print(f"üî¢ Number of GPUs: {torch.cuda.device_count()}")
else:
    print("‚ùå No GPU found. Make sure you've selected GPU runtime:")
    print("   Runtime ‚Üí Change runtime type ‚Üí Hardware accelerator ‚Üí GPU (T4)")


## 2Ô∏è‚É£ Mount Google Drive (Optional)

Mount Google Drive to save results persistently.


In [None]:
from google.colab import drive

# Uncomment the line below if you want to save results to Google Drive
# drive.mount('/content/drive')


## 3Ô∏è‚É£ Clone GitHub Repository

**‚ö†Ô∏è IMPORTANT:** Replace `YOUR_USERNAME` and `YOUR_REPO_NAME` with your actual GitHub username and repository name.


In [None]:
import os

# üìù REPLACE THESE WITH YOUR ACTUAL VALUES
GITHUB_USERNAME = "YOUR_USERNAME"  # e.g., "nguyenvana"
REPO_NAME = "YOUR_REPO_NAME"       # e.g., "sentiment-analysis"

# Clone repository
repo_url = f"https://github.com/{GITHUB_USERNAME}/{REPO_NAME}.git"
print(f"üì• Cloning repository from {repo_url}...")

# Remove existing directory if it exists
if os.path.exists(REPO_NAME):
    !rm -rf {REPO_NAME}

!git clone {repo_url}

# Change to repository directory
%cd {REPO_NAME}

print("‚úÖ Repository cloned successfully!")
print(f"üìÇ Current directory: {os.getcwd()}")


## 4Ô∏è‚É£ Install Dependencies


In [None]:
print("üì¶ Installing dependencies...")
%pip install -q -r requirements.txt
print("‚úÖ Dependencies installed successfully!")


## 5Ô∏è‚É£ Verify Data Files

Make sure your processed data files are in the repository.


In [None]:
import os

print("üîç Checking data files...")
data_files = [
    'data/processed/train_processed.csv',
    'data/processed/val_processed.csv',
    'data/processed/test_processed.csv'
]

all_exist = True
for file in data_files:
    if os.path.exists(file):
        print(f"‚úÖ {file} found")
    else:
        print(f"‚ùå {file} NOT FOUND")
        all_exist = False

if all_exist:
    print("\n‚úÖ All data files are ready!")
else:
    print("\n‚ö†Ô∏è Some data files are missing. Make sure you've pushed them to GitHub.")


## 6Ô∏è‚É£ Train Models

Choose which models to train by uncommenting the appropriate cell below.


### Option A: Train All Models (~45 minutes)


In [None]:
!python train.py --model all --epochs 3 --batch-size 16


### Option B: Train Only PhoBERT (~20 minutes)


In [None]:
# Uncomment to run
# !python train.py --model phobert --epochs 3 --batch-size 16


### Option C: Train Only XLM-RoBERTa (~25 minutes)


In [None]:
# Uncomment to run
# !python train.py --model xlm-roberta --epochs 3 --batch-size 16


### Option D: Train Only Baseline Models (~2 minutes)


In [None]:
# Uncomment to run
# !python train.py --model lr
# !python train.py --model svm
# !python train.py --model nb


## 7Ô∏è‚É£ View Results


In [None]:
import pandas as pd
from IPython.display import Image, display

# Load comparison results
print("üìä Model Comparison Results:\n")
df = pd.read_csv('results/comparison.csv')
display(df)

# Display comparison visualization
print("\nüìà Visualization:")
display(Image('results/model_comparison.png'))


## 8Ô∏è‚É£ Download Results

Download trained models and results to your local machine.


In [None]:
from google.colab import files
import shutil

print("üì¶ Creating archive of results...")

# Create zip of models directory
shutil.make_archive('trained_models', 'zip', 'models')
print("‚úÖ Models archived")

# Create zip of results directory
shutil.make_archive('training_results', 'zip', 'results')
print("‚úÖ Results archived")

# Download files
print("\nüì• Downloading files...")
files.download('trained_models.zip')
files.download('training_results.zip')

print("\n‚úÖ Download complete! Check your Downloads folder.")


## 9Ô∏è‚É£ Save to Google Drive (Optional)

If you mounted Google Drive earlier, you can copy results there for persistent storage.


In [None]:
# Uncomment if you mounted Google Drive
# import shutil
# import os

# # Create directory in Drive
# drive_dir = '/content/drive/MyDrive/sentiment_analysis_results'
# os.makedirs(drive_dir, exist_ok=True)

# # Copy results
# shutil.copytree('models', f'{drive_dir}/models', dirs_exist_ok=True)
# shutil.copytree('results', f'{drive_dir}/results', dirs_exist_ok=True)

# print(f"‚úÖ Results saved to Google Drive: {drive_dir}")


## üéâ Done!

Your models have been trained successfully. Check the downloaded files for:
- Trained model weights
- Performance metrics (JSON and CSV)
- Confusion matrices
- Comparison visualizations

---

### üìù Notes:
- Training times are estimates and may vary based on Colab's GPU allocation
- If training is interrupted, you can restart from the last saved checkpoint
- For longer training sessions, consider using Colab Pro for better GPU access
- Always save results to Google Drive or download them before closing the notebook
