# Milestone 3 - Notebook 4: Execution Engine

## Objective

- Compile unified DependencyMatcher for all pattern types
- Apply patterns with strict anchoring verification
- Generate predictions for train and test sets

## Output

Predictions with explanations

In [1]:
import json
import sys
from pathlib import Path
import spacy

sys.path.insert(0, str(Path.cwd().parent / 'src'))

from utils import preprocess_data
from execution_engine import (
    compile_dependency_matcher,
    apply_patterns_with_anchoring,
    analyze_pattern_usage
)

print('Imports successful!')

Successfully imported functions from Milestone 2: /Users/egeaydin/Github/TUW2025WS/Token13-tuw-nlp-ie-2025WS/milestone_2/rule_based
Imports successful!


In [2]:
nlp = spacy.load('en_core_web_lg')
print(f'Loaded: {nlp.meta["name"]}')

Loaded: core_web_lg


## 1. Load Data and Patterns

In [3]:
# Load patterns
with open('../data/patterns_augmented.json', 'r') as f:
    patterns = json.load(f)

print(f'Loaded {len(patterns)} patterns')

# Load and preprocess data
with open('../../data/processed/train/train.json', 'r') as f:
    train_data = json.load(f)

with open('../../data/processed/test/test.json', 'r') as f:
    test_data = json.load(f)

train_processed = preprocess_data(train_data, nlp)
test_processed = preprocess_data(test_data, nlp)

print(f'Train: {len(train_processed)} samples')
print(f'Test: {len(test_processed)} samples')

Loaded 1878 patterns


Processing:   0%|          | 0/8000 [00:00<?, ?it/s]

Processing:   0%|          | 0/2717 [00:00<?, ?it/s]

Train: 8000 samples
Test: 2717 samples


## 2. Compile DependencyMatcher

In [4]:
dep_matcher, pattern_lookup = compile_dependency_matcher(patterns, nlp)

Compiling 1878 patterns into DependencyMatcher...
Key: pattern_1842
Key: pattern_1843
Key: pattern_1844
Key: pattern_1845
Key: pattern_1846
Key: pattern_1847
Key: pattern_1848
Key: pattern_1849
Key: pattern_1850
Key: pattern_1851
Key: pattern_1852
Key: pattern_1853
Key: pattern_1854
Key: pattern_1855
Key: pattern_1856
Key: pattern_1857
Key: pattern_1858
Key: pattern_1859
Key: pattern_1860
Key: pattern_1861
Key: pattern_1862
Key: pattern_1863
Key: pattern_1864
Key: pattern_1865
Key: pattern_1866
Key: pattern_1867
Key: pattern_1868
Key: pattern_1869
Key: pattern_1870
Key: pattern_1871
Key: pattern_1872
Key: pattern_1873
Key: pattern_1874
Key: pattern_1875
Key: pattern_1876
Key: pattern_1877
Successfully compiled 1842 patterns


## 3. Apply to Train Set

In [5]:
train_preds, train_dirs, train_expls, train_stats = apply_patterns_with_anchoring(
    train_processed, dep_matcher, pattern_lookup, nlp
)

\nApplying patterns to 8000 samples...


Classifying:   0%|          | 0/8000 [00:00<?, ?it/s]

\nClassification complete!
  Matched: 7885 (98.6%)
  Default to Other: 115 (1.4%)
  Failed anchoring: 81439
  Match attempts: 89324 (avg 11.2/sample)
  Unique patterns used: 1145


## 4. Apply to Test Set

In [6]:
test_preds, test_dirs, test_expls, test_stats = apply_patterns_with_anchoring(
    test_processed, dep_matcher, pattern_lookup, nlp
)

\nApplying patterns to 2717 samples...


Classifying:   0%|          | 0/2717 [00:00<?, ?it/s]

\nClassification complete!
  Matched: 2682 (98.7%)
  Default to Other: 35 (1.3%)
  Failed anchoring: 31859
  Match attempts: 34541 (avg 12.7/sample)
  Unique patterns used: 552


## 5. Save Predictions

In [7]:
# Save predictions
Path('../data/predictions').mkdir(exist_ok=True)

train_output = [
    {'id': s['id'], 'prediction': p, 'direction': d, 'explanation': e}
    for s, p, d, e in zip(train_processed, train_preds, train_dirs, train_expls)
]

test_output = [
    {'id': s['id'], 'prediction': p, 'direction': d, 'explanation': e}
    for s, p, d, e in zip(test_processed, test_preds, test_dirs, test_expls)
]

with open('../data/predictions/train_predictions.json', 'w') as f:
    json.dump(train_output, f, indent=2)

with open('../data/predictions/test_predictions.json', 'w') as f:
    json.dump(test_output, f, indent=2)

print('Predictions saved!')

Predictions saved!


## Summary

**Next:** Notebook 5 - Evaluation & Analysis