# üêõ Debugging Workflow for Go-Eval Project

Use this notebook for iterative development and debugging.

**Workflow:**
1. Edit code on GitHub (or locally and push)
2. Run "Pull Latest Changes" cell below
3. Run your specific script
4. Check errors
5. Repeat

## üîß Setup (Run Once Per Session)

In [None]:
# Install dependencies
!pip install -q torch transformers datasets tree-sitter tree-sitter-go scipy scikit-learn matplotlib seaborn tqdm networkx

In [None]:
# Mount Google Drive (for persistence)
from google.colab import drive
drive.mount('/content/drive')

## üì• Initial Clone (Run Only ONCE)

In [None]:
# Clone to Google Drive for persistence
%cd /content/drive/MyDrive
!git clone https://github.com/amiyilade/go-eval.git
%cd go-eval

print("‚úÖ Repository cloned to Google Drive")
print("üìÅ Location: /content/drive/MyDrive/go-eval")

## üîÑ Pull Latest Changes (Run Every Time You Make Changes)

In [None]:
# Navigate to repo
%cd /content/drive/MyDrive/go-eval

# Discard any local changes (if you edited files in Colab)
!git reset --hard HEAD

# Pull latest from GitHub
!git pull origin main

# Show last 3 commits to verify
print("\nüìù Latest commits:")
!git log --oneline -3

print("\n‚úÖ Code updated to latest version")

## üß™ Test Individual Scripts

Run these cells to test specific scripts after making changes.

In [None]:
# Test: Download data (quick test - will only download if not exists)
!python scripts/download_coir_go.py

In [None]:
# Test: Organize data
!python scripts/organise_coir_go_full.py

In [None]:
# Test: Parse ASTs (this takes ~45 min on full dataset)
# Use Ctrl+C to stop if you just want to test it runs
!python scripts/parse_go_asts_full.py

In [None]:
# Test: Extract model features - UniXcoder
!python scripts/extract_model_outputs_full.py --model unixcoder

In [None]:
# Test: Extract model features - CodeBERT
!python scripts/extract_model_outputs_full.py --model codebert

In [None]:
# Test: Attention-AST alignment analysis
!python scripts/analyze_attention_ast_full.py

In [None]:
# Test: Structural probing
!python scripts/structural_probing_full.py

In [None]:
# Test: Tree induction
!python scripts/tree_induction_full.py

In [None]:
# Test: Construct analysis
!python scripts/construct_analysis.py

In [None]:
# Test: Cross-model analysis
!python scripts/cross_model_analysis.py

In [None]:
# Test: Visualizations
!python scripts/visualizations_full.py

## üîç Debugging Helpers

In [None]:
# Check what files exist
!ls -lh data/
!ls -lh results/

In [None]:
# Check Python imports work
import torch
import transformers
import tree_sitter
import tree_sitter_go
import scipy
import sklearn
import matplotlib
import seaborn

print("‚úÖ All imports successful")
print(f"PyTorch version: {torch.__version__}")
print(f"Transformers version: {transformers.__version__}")

In [None]:
# Check GPU availability
import torch

if torch.cuda.is_available():
    print(f"‚úÖ GPU available: {torch.cuda.get_device_name(0)}")
    print(f"   Memory: {torch.cuda.get_device_properties(0).total_memory / 1e9:.2f} GB")
else:
    print("‚ùå No GPU available")
    print("   Go to Runtime > Change runtime type > Select T4 GPU")

In [None]:
# View error logs from specific script
# Replace 'script_name' with the script that's failing
!tail -50 results/logs/script_name.log

## üßπ Cleanup (If Needed)

In [None]:
# Delete all results (to start fresh)
# WARNING: This deletes all analysis results!
!rm -rf results/
print("üóëÔ∏è All results deleted")

In [None]:
# Delete data (to re-download)
# WARNING: This deletes all downloaded data!
!rm -rf data/
print("üóëÔ∏è All data deleted")

In [None]:
# Nuclear option: Delete repo and re-clone
# WARNING: This deletes EVERYTHING!
%cd /content/drive/MyDrive
!rm -rf go-eval
!git clone https://github.com/amiyilade/go-eval.git
%cd go-eval
print("üóëÔ∏è Repository deleted and re-cloned")

## üìä Quick Status Check

In [None]:
# Check what's been completed
import os
from pathlib import Path

checks = [
    ("Data downloaded", Path("data/raw/codesearchnet/consolidated.jsonl")),
    ("Data organized", Path("data/code-to-text/full_code_to_text.jsonl")),
    ("ASTs parsed", Path("data/code-to-text/full_code_to_text_with_asts.jsonl")),
    ("UniXcoder features", Path("results/model_outputs/unixcoder/code_to_text_full_unixcoder_features.jsonl")),
    ("CodeBERT features", Path("results/model_outputs/codebert/code_to_text_full_codebert_features.jsonl")),
    ("Alignment results", Path("results/ast_alignment/all_alignment_results.json")),
    ("Probing results", Path("results/structural_probing/all_probing_results.json")),
    ("Induction results", Path("results/tree_induction/all_tree_induction_results.json")),
    ("Construct results", Path("results/construct_analysis/all_construct_analysis.json")),
    ("Cross-model results", Path("results/cross_model_analysis/all_cross_model_analysis.json")),
    ("Visualizations", Path("results/visualizations/fig1_alignment_heatmap.png")),
]

print("Pipeline Status:")
print("=" * 50)
for name, path in checks:
    status = "‚úÖ" if path.exists() else "‚ùå"
    print(f"{status} {name}")
    if path.exists():
        size = path.stat().st_size / (1024*1024)  # MB
        print(f"   Size: {size:.2f} MB")