# DESS Training for DimABSA 2026

**Subtask 2**: Dimensional Aspect Sentiment Triplet Extraction

**Model**: DESS (Dual-channel Enhanced Sentiment Span) with DeBERTa-v3-base

---

## Setup Requirements
- **GPU**: T4 or P100 (enable in Settings ‚Üí Accelerator)
- **Time**: ~4-5 hours for full training
- **Internet**: Required for downloading code
- **Memory**: ~10.7 GB (safe for 16GB GPU with batch_size=1)

## Step 1: Clone Repository and Setup

In [None]:
# Clone the repository
!git clone https://github.com/VishalRepos/dimabsa-2026.git
%cd dimabsa-2026/DESS/Codebase

# Install dependencies
!pip install -q transformers torch sentencepiece protobuf spacy torch-geometric tensorboardX
!python -m spacy download en_core_web_sm

print("‚úì Setup complete!")

## Step 2: Verify GPU and Data

In [None]:
import torch
import json
import os

# Check GPU
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è WARNING: GPU not available! Enable GPU in Settings ‚Üí Accelerator ‚Üí GPU T4")

# Check data
print("\nChecking data...")
data_path = "./data/dimabsa_combined/train_dep_triple_polarity_result.json"
if os.path.exists(data_path):
    with open(data_path) as f:
        data = json.load(f)
    print(f"‚úì Training data found: {len(data)} samples")
    print(f"  Max entities: {max(len(d['entities']) for d in data)}")
    print(f"  Avg entities: {sum(len(d['entities']) for d in data) / len(data):.2f}")
else:
    print(f"‚ùå Data not found at {data_path}")

## Step 3: Train DESS Model

**Configuration**:
- Dataset: dimabsa_combined (3,727 samples)
- Model: DeBERTa-v3-base
- Batch size: 1 (memory safe)
- Epochs: 10
- Memory: ~10.7 GB

In [None]:
# Clear GPU memory
import torch
import gc
torch.cuda.empty_cache()
gc.collect()
print("‚úì GPU memory cleared")

# Train DESS
!python train.py \
  --dataset dimabsa_combined \
  --pretrained_deberta_name microsoft/deberta-v3-base \
  --train_batch_size 1 \
  --eval_batch_size 1 \
  --epochs 10 \
  --lr 5e-5 \
  --max_span_size 10 \
  --neg_entity_count 100 \
  --neg_triple_count 100

## Step 4: Check Training Results

In [None]:
import os
import json

# Check saved models
print("Saved models:")
if os.path.exists("./savemodels"):
    models = [f for f in os.listdir("./savemodels") if f.endswith(".pth")]
    for model in models:
        size = os.path.getsize(f"./savemodels/{model}") / 1e6
        print(f"  {model}: {size:.1f} MB")
else:
    print("  No models found")

# Check logs
print("\nLog files:")
if os.path.exists("./log"):
    for root, dirs, files in os.walk("./log"):
        for file in files:
            if file.endswith(".txt") or file.endswith(".json"):
                print(f"  {os.path.join(root, file)}")
else:
    print("  No logs found")

## Step 5: Generate Predictions (Optional)

In [None]:
# Generate predictions on test set
!python predict.py \
  --dataset dimabsa_combined \
  --pretrained_deberta_name microsoft/deberta-v3-base \
  --model_path ./savemodels/best_model.pth \
  --eval_batch_size 1

## Step 6: Package Results

In [None]:
# Create results directory
!mkdir -p results

# Copy outputs
!cp -r savemodels/*.pth results/ 2>/dev/null || echo "No models to copy"
!cp -r log/* results/ 2>/dev/null || echo "No logs to copy"

# Create zip
!zip -r dess_results.zip results/

print("\n‚úÖ Results packaged!")
print("\nüì¶ Download: dess_results.zip")
print("\nContents:")
!ls -lh results/

## Step 7: Training Summary

In [None]:
import json
import os

print("=" * 70)
print("DESS TRAINING SUMMARY")
print("=" * 70)

# Dataset info
data_path = "./data/dimabsa_combined/train_dep_triple_polarity_result.json"
if os.path.exists(data_path):
    with open(data_path) as f:
        data = json.load(f)
    print(f"\nüìä Dataset:")
    print(f"  Total samples: {len(data)}")
    print(f"  Max entities: {max(len(d['entities']) for d in data)}")
    print(f"  Avg entities: {sum(len(d['entities']) for d in data) / len(data):.2f}")

# Model info
print(f"\nü§ñ Model:")
print(f"  Architecture: DESS (Dual-channel Enhanced Sentiment Span)")
print(f"  Backbone: DeBERTa-v3-base")
print(f"  Batch size: 1")
print(f"  Epochs: 10")

# Check if training completed
if os.path.exists("./savemodels"):
    models = [f for f in os.listdir("./savemodels") if f.endswith(".pth")]
    if models:
        print(f"\n‚úÖ Training Complete!")
        print(f"  Models saved: {len(models)}")
    else:
        print(f"\n‚ö†Ô∏è No models saved")
else:
    print(f"\n‚ö†Ô∏è Training may not have completed")

print("\nüì• Download: dess_results.zip (from Output tab)")
print("=" * 70)

---

## Notes

### Memory Usage
- DESS uses **1 forward pass** (vs 6 in Pipeline)
- Memory: ~10.7 GB with batch_size=1
- Safe for T4 (15GB) and P100 (16GB)

### Advantages over Pipeline-DeBERTa
- ‚úì Uses full dataset (no filtering needed)
- ‚úì More efficient (1 pass vs 6)
- ‚úì Dual-channel architecture (syntax + semantics)
- ‚úì Span-based extraction

### Training Time
- ~4-5 hours for 10 epochs
- 3,727 samples with batch_size=1

### Troubleshooting
- If OOM: Already using batch_size=1 (minimum)
- If slow: Expected with batch_size=1
- If errors: Check GPU is enabled in Settings