# Pipeline-DeBERTa Training for DimABSA 2026

**Subtask 2**: Dimensional Aspect Sentiment Triplet Extraction

**Model**: Pipeline-based with DeBERTa-v3-base

---

## Setup Requirements
- **GPU**: T4 or P100 (enable in Settings ‚Üí Accelerator)
- **Time**: ~2-3 hours for both domains
- **Internet**: Required for downloading code and data

## Step 1: Clone Repository and Setup

In [None]:
# Clone the repository
!git clone https://github.com/VishalRepos/dimabsa-2026.git
%cd dimabsa-2026/Pipeline-DeBERTa

# Install dependencies
!pip install -q transformers torch sentencepiece protobuf

print("‚úì Setup complete!")

## Step 2: Download Dataset

In [None]:
# Create data directory
!mkdir -p data/track_a/subtask_2/eng

# Download restaurant data
!wget -q https://raw.githubusercontent.com/DimABSA/DimABSA2026/main/task-dataset/track_a/subtask_2/eng/eng_restaurant_train_alltasks.jsonl \
    -O data/track_a/subtask_2/eng/eng_restaurant_train_alltasks.jsonl
!wget -q https://raw.githubusercontent.com/DimABSA/DimABSA2026/main/task-dataset/track_a/subtask_2/eng/eng_restaurant_dev_task2.jsonl \
    -O data/track_a/subtask_2/eng/eng_restaurant_dev_task2.jsonl

# Download laptop data
!wget -q https://raw.githubusercontent.com/DimABSA/DimABSA2026/main/task-dataset/track_a/subtask_2/eng/eng_laptop_train_alltasks.jsonl \
    -O data/track_a/subtask_2/eng/eng_laptop_train_alltasks.jsonl
!wget -q https://raw.githubusercontent.com/DimABSA/DimABSA2026/main/task-dataset/track_a/subtask_2/eng/eng_laptop_dev_task2.jsonl \
    -O data/track_a/subtask_2/eng/eng_laptop_dev_task2.jsonl

print("‚úì Dataset downloaded")
!ls -lh data/track_a/subtask_2/eng/

## Step 3: Verify GPU

In [None]:
import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ö†Ô∏è WARNING: GPU not available! Enable GPU in Settings ‚Üí Accelerator ‚Üí GPU T4")

## Step 4: Train Restaurant Domain

**Dataset**: 2,284 training samples

**Time**: ~30-45 minutes

In [None]:
!python run_task2\&3_trainer_multilingual.py \
  --task 2 \
  --domain res \
  --language eng \
  --data_path ./ \
  --train_data data/track_a/subtask_2/eng/eng_restaurant_train_alltasks.jsonl \
  --infer_data data/track_a/subtask_2/eng/eng_restaurant_dev_task2.jsonl \
  --bert_model_type microsoft/deberta-v3-base \
  --mode train \
  --epoch_num 3 \
  --batch_size 8 \
  --learning_rate 1e-3 \
  --tuning_bert_rate 1e-5 \
  --inference_beta 0.9

## Step 5: Check Restaurant Results

In [None]:
import json

# Load predictions
with open("tasks/subtask_2/pred_eng_restaurant.jsonl", 'r') as f:
    predictions = [json.loads(line) for line in f]

# Calculate statistics
total_triplets = sum(len(p['Triplet']) for p in predictions)

print(f"üìä Restaurant Results:")
print(f"  Total predictions: {len(predictions)}")
print(f"  Total triplets: {total_triplets}")
print(f"  Avg triplets/sample: {total_triplets/len(predictions):.2f}")

print(f"\nüìù First 3 predictions:")
for i, pred in enumerate(predictions[:3]):
    print(f"\n{i+1}. ID: {pred['ID']}")
    print(f"   Triplets: {len(pred['Triplet'])}")
    if pred['Triplet']:
        for t in pred['Triplet'][:2]:  # Show first 2 triplets
            print(f"   - Aspect: {t['Aspect']}, Opinion: {t['Opinion']}, VA: {t['VA']}")

## Step 6: Train Laptop Domain

**Dataset**: 4,076 training samples

**Time**: ~60-90 minutes

In [None]:
!python run_task2\&3_trainer_multilingual.py \
  --task 2 \
  --domain lap \
  --language eng \
  --data_path ./ \
  --train_data data/track_a/subtask_2/eng/eng_laptop_train_alltasks.jsonl \
  --infer_data data/track_a/subtask_2/eng/eng_laptop_dev_task2.jsonl \
  --bert_model_type microsoft/deberta-v3-base \
  --mode train \
  --epoch_num 3 \
  --batch_size 8 \
  --learning_rate 1e-3 \
  --tuning_bert_rate 1e-5 \
  --inference_beta 0.9

## Step 7: Check Laptop Results

In [None]:
# Load predictions
with open("tasks/subtask_2/pred_eng_laptop.jsonl", 'r') as f:
    predictions = [json.loads(line) for line in f]

# Calculate statistics
total_triplets = sum(len(p['Triplet']) for p in predictions)

print(f"üíª Laptop Results:")
print(f"  Total predictions: {len(predictions)}")
print(f"  Total triplets: {total_triplets}")
print(f"  Avg triplets/sample: {total_triplets/len(predictions):.2f}")

print(f"\nüìù First 3 predictions:")
for i, pred in enumerate(predictions[:3]):
    print(f"\n{i+1}. ID: {pred['ID']}")
    print(f"   Triplets: {len(pred['Triplet'])}")
    if pred['Triplet']:
        for t in pred['Triplet'][:2]:  # Show first 2 triplets
            print(f"   - Aspect: {t['Aspect']}, Opinion: {t['Opinion']}, VA: {t['VA']}")

## Step 8: Validate Output Format

In [None]:
def validate_predictions(pred_file, domain_name):
    """Validate prediction format"""
    with open(pred_file, 'r') as f:
        predictions = [json.loads(line) for line in f]
    
    errors = []
    for i, pred in enumerate(predictions):
        # Check required keys
        if 'ID' not in pred or 'Triplet' not in pred:
            errors.append(f"Line {i}: Missing required keys")
            continue
        
        # Check triplet format
        for j, triplet in enumerate(pred['Triplet']):
            if 'Aspect' not in triplet or 'Opinion' not in triplet or 'VA' not in triplet:
                errors.append(f"Line {i}, Triplet {j}: Missing keys")
                continue
            
            # Validate VA format
            va = triplet['VA']
            if '#' not in va:
                errors.append(f"Line {i}, Triplet {j}: VA missing '#'")
            else:
                try:
                    v, a = map(float, va.split('#'))
                    if not (1.0 <= v <= 9.0) or not (1.0 <= a <= 9.0):
                        errors.append(f"Line {i}, Triplet {j}: VA out of range [1,9]")
                except:
                    errors.append(f"Line {i}, Triplet {j}: Invalid VA format")
    
    if errors:
        print(f"‚ùå {domain_name}: Found {len(errors)} errors")
        for err in errors[:5]:
            print(f"  - {err}")
        return False
    else:
        print(f"‚úÖ {domain_name}: All {len(predictions)} predictions valid!")
        return True

# Validate both domains
validate_predictions("tasks/subtask_2/pred_eng_restaurant.jsonl", "Restaurant")
validate_predictions("tasks/subtask_2/pred_eng_laptop.jsonl", "Laptop")

## Step 9: Package Results for Download

In [None]:
# Create results directory
!mkdir -p results

# Copy all outputs
!cp model/*.pth results/ 2>/dev/null || echo "No models found"
!cp tasks/subtask_2/*.jsonl results/
!cp log/*.log results/ 2>/dev/null || echo "No logs found"

# Create zip file
!zip -r pipeline_deberta_results.zip results/

print("\n‚úÖ Results packaged!")
print("\nüì¶ Download: pipeline_deberta_results.zip")
print("\nContents:")
!ls -lh results/

## Step 10: Training Summary

In [None]:
import json

print("=" * 70)
print("TRAINING SUMMARY")
print("=" * 70)

# Restaurant stats
with open("tasks/subtask_2/pred_eng_restaurant.jsonl", 'r') as f:
    res_preds = [json.loads(line) for line in f]
    res_triplets = sum(len(p['Triplet']) for p in res_preds)

print(f"\nüìä Restaurant Domain:")
print(f"  Predictions: {len(res_preds)}")
print(f"  Total triplets: {res_triplets}")
print(f"  Avg triplets/sample: {res_triplets/len(res_preds):.2f}")
print(f"  Model: model/task2_res_eng.pth")
print(f"  Output: tasks/subtask_2/pred_eng_restaurant.jsonl")

# Laptop stats
with open("tasks/subtask_2/pred_eng_laptop.jsonl", 'r') as f:
    lap_preds = [json.loads(line) for line in f]
    lap_triplets = sum(len(p['Triplet']) for p in lap_preds)

print(f"\nüíª Laptop Domain:")
print(f"  Predictions: {len(lap_preds)}")
print(f"  Total triplets: {lap_triplets}")
print(f"  Avg triplets/sample: {lap_triplets/len(lap_preds):.2f}")
print(f"  Model: model/task2_lap_eng.pth")
print(f"  Output: tasks/subtask_2/pred_eng_laptop.jsonl")

print(f"\n‚úÖ Training Complete!")
print(f"\nüì• Download: pipeline_deberta_results.zip (from Output tab)")
print("=" * 70)