# MusicTrOCR (Luca Model) Demonstration

This notebook demonstrates the capabilities of the MusicTrOCR model for Optical Music Recognition (OMR). The model converts sheet music images into BeKern notation and visualizes the results.

## Architecture Overview
- **Vision Encoder**: ConvNeXt-Tiny pre-trained backbone
- **Decoder**: Transformer decoder with cross-attention
- **Output**: BeKern notation (symbolic music representation)
- **Visualization**: Verovio rendering engine for score display

---

## 1. Configuration
Set the paths for the demo image and trained model checkpoint.

In [None]:
# Demo configuration
img_name = "demos/demo.png"  # Input sheet music image
ckpt_path = "networks/checkpoints/luca_model/best_model.pth"  # Trained model checkpoint

# Optional: Override paths if needed
# img_name = "path/to/your/demo/image.png"
# ckpt_path = "path/to/your/checkpoint.pth"

print(f"Demo image: {img_name}")
print(f"Model checkpoint: {ckpt_path}")

## 2. Dependencies & Imports
Load required libraries and utility functions.

In [None]:
import torch
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
import warnings
import os

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Import demo utilities
from utils import (
    load_model_and_vocab,
    preprocess_image,
    run_inference,
    decode_bekern_prediction,
    bekern_to_kern,
    render_music_score,
    visualize_results,
    demo_pipeline
)

print("Dependencies loaded successfully!")
print(f"PyTorch version: {torch.__version__}")
print(f"Device available: {'CUDA' if torch.cuda.is_available() else 'CPU'}")

## 3. Model Loading
Load the trained MusicTrOCR model and BeKern vocabulary.

In [None]:
# Load model and vocabulary
print("Loading MusicTrOCR model...")

try:
    model, vocab_dict, id_to_token = load_model_and_vocab(
        ckpt_path=ckpt_path,
        vocab_path="data/FP_GrandStaff_BeKernw2i.npy"
    )
    
    print("\n✅ Model loaded successfully!")
    print(f"Model parameters: {model.count_parameters():,}")
    print(f"Vocabulary size: {len(vocab_dict)}")
    
    # Display some vocabulary examples
    print("\nSample vocabulary tokens:")
    sample_tokens = list(vocab_dict.keys())[:10]
    for token in sample_tokens:
        print(f"  {token} -> {vocab_dict[token]}")
        
except Exception as e:
    print(f"❌ Error loading model: {e}")
    print("Please check that the checkpoint path exists and is valid.")

## 4. Image Preprocessing
Load and preprocess the demo sheet music image for model input.

In [None]:
# Check if demo image exists
if not os.path.exists(img_name):
    print(f"❌ Demo image not found: {img_name}")
    print("Please place a sheet music image at the specified path.")
    print("You can use any sheet music image (PNG, JPG, etc.)")
else:
    # Load and display original image
    print(f"Loading demo image: {img_name}")
    
    # Display original image
    original_img = Image.open(img_name)
    plt.figure(figsize=(12, 6))
    plt.imshow(original_img, cmap='gray')
    plt.title(f"Original Input Image: {img_name}")
    plt.axis('off')
    plt.show()
    
    # Preprocess for model
    try:
        image_tensor = preprocess_image(img_name, target_height=128)
        print("\n✅ Image preprocessed successfully!")
    except Exception as e:
        print(f"❌ Error preprocessing image: {e}")

## 5. Model Inference
Run the MusicTrOCR model to generate BeKern notation predictions.

In [None]:
# Run inference if model and image are loaded
if 'model' in locals() and 'image_tensor' in locals():
    print("Running MusicTrOCR inference...")
    
    try:
        # Generate predictions
        predictions = run_inference(model, image_tensor, max_length=512)
        
        print(f"\n✅ Inference completed!")
        print(f"Generated sequence shape: {predictions.shape}")
        
        # Display raw token predictions (first 20 tokens)
        raw_tokens = predictions.squeeze().tolist()[:20]
        print(f"\nFirst 20 predicted tokens: {raw_tokens}")
        
    except Exception as e:
        print(f"❌ Error during inference: {e}")
else:
    print("❌ Cannot run inference: model or image not loaded properly.")

## 6. Prediction Decoding
Convert model token predictions to readable BeKern notation.

In [None]:
# Decode predictions if available
if 'predictions' in locals() and 'id_to_token' in locals():
    print("Decoding token predictions to BeKern notation...")
    
    try:
        # Decode tokens to BeKern string
        bekern_str = decode_bekern_prediction(predictions, id_to_token, model)
        
        print("\n✅ Decoding completed!")
        print(f"BeKern string length: {len(bekern_str)} characters")
        
        # Display BeKern notation preview
        print("\n--- BeKern Notation Preview ---")
        preview_length = min(300, len(bekern_str))
        print(bekern_str[:preview_length])
        if len(bekern_str) > preview_length:
            print("... (truncated)")
            
        # Token statistics
        tokens = bekern_str.split()
        print(f"\nToken count: {len(tokens)}")
        print(f"Unique tokens: {len(set(tokens))}")
        
        # Most frequent tokens
        from collections import Counter
        token_counts = Counter(tokens)
        print("\nMost frequent tokens:")
        for token, count in token_counts.most_common(10):
            print(f"  '{token}': {count}")
            
    except Exception as e:
        print(f"❌ Error during decoding: {e}")
else:
    print("❌ Cannot decode: predictions not available.")

## 7. BeKern to Kern Conversion
Convert BeKern format to standard Kern format for music rendering.

In [None]:
# Convert BeKern to Kern format
if 'bekern_str' in locals():
    print("Converting BeKern to Kern format for rendering...")
    
    try:
        # Convert format
        kern_str = bekern_to_kern(bekern_str)
        
        print("\n✅ Format conversion completed!")
        print(f"Kern string length: {len(kern_str)} characters")
        
        # Display Kern format preview
        print("\n--- Kern Format Preview ---")
        kern_lines = kern_str.split('\n')[:10]  # First 10 lines
        for i, line in enumerate(kern_lines):
            print(f"{i+1:2d}: {line}")
        
        if len(kern_str.split('\n')) > 10:
            print("... (more lines)")
            
    except Exception as e:
        print(f"❌ Error during format conversion: {e}")
else:
    print("❌ Cannot convert format: BeKern string not available.")

## 8. Music Score Rendering
Render the Kern notation as a visual music score using Verovio.

In [None]:
# Render music score
if 'kern_str' in locals():
    print("Rendering music score from Kern notation...")
    
    try:
        # Render score to image
        rendered_score = render_music_score(kern_str, "demos/output_score.png")
        
        if rendered_score is not None:
            print("\n✅ Music score rendered successfully!")
            print(f"Rendered image shape: {rendered_score.shape}")
            
            # Display rendered score
            plt.figure(figsize=(14, 8))
            plt.imshow(rendered_score)
            plt.title("Rendered Music Score from Predictions")
            plt.axis('off')
            plt.show()
        else:
            print("⚠️ Music score rendering failed (Verovio may not be available)")
            print("The BeKern and Kern text outputs above show the symbolic representation.")
            
    except Exception as e:
        print(f"❌ Error during rendering: {e}")
else:
    print("❌ Cannot render: Kern notation not available.")

## 9. Results Comparison
Display side-by-side comparison of input and predicted output.

In [None]:
# Display final results
if 'bekern_str' in locals() and os.path.exists(img_name):
    print("Displaying final results...")
    
    # Get rendered score if available
    score_image = rendered_score if 'rendered_score' in locals() else None
    
    # Show comparison
    visualize_results(img_name, score_image, bekern_str)
    
    # Summary statistics
    print("\n=== DEMO SUMMARY ===")
    print(f"Input image: {img_name}")
    print(f"Model checkpoint: {ckpt_path}")
    print(f"BeKern tokens generated: {len(bekern_str.split())}")
    print(f"Kern notation lines: {len(kern_str.split(chr(10))) if 'kern_str' in locals() else 'N/A'}")
    print(f"Score visualization: {'✅ Success' if score_image is not None else '❌ Failed'}")
    
else:
    print("❌ Cannot display results: required data not available.")

## 10. Complete Pipeline (Alternative)
Run the entire demo pipeline in one step using the utility function.

In [None]:
# Alternative: Run complete pipeline at once
# Uncomment and run this cell to execute the entire demo in one step

# results = demo_pipeline(
#     img_path=img_name,
#     ckpt_path=ckpt_path,
#     vocab_path="data/FP_GrandStaff_BeKernw2i.npy"
# )

print("Complete pipeline function available but commented out.")
print("Uncomment the lines above to run the entire demo in one step.")

---

## Conclusion

This demo showcased the MusicTrOCR (Luca Model) capabilities for Optical Music Recognition:

1. **Image Processing**: Converted sheet music image to model input format
2. **Neural Inference**: Used transformer architecture to predict music notation
3. **Sequence Decoding**: Converted model tokens to BeKern symbolic notation
4. **Format Conversion**: Transformed BeKern to standard Kern format
5. **Visual Rendering**: Generated visual music score from symbolic representation

### Model Architecture
- **Vision Encoder**: ConvNeXt-Tiny pre-trained backbone
- **Text Decoder**: 6-layer transformer with cross-attention
- **Vocabulary**: BeKern format with ~300 music symbols
- **Generation**: Autoregressive sequence prediction

### Next Steps
- Try different sheet music images
- Experiment with generation parameters (temperature, max_length)
- Compare with other OMR approaches
- Analyze prediction confidence and error patterns

---