# VisProbe Quick Start Guide

**Find robustness failures in your vision models in 5 minutes.**

This notebook demonstrates how to use VisProbe to test your model's robustness against natural perturbations.

---

## What You'll Learn

1. ‚úÖ How to test a model with just 3 lines of code
2. ‚úÖ How to interpret robustness scores and failures
3. ‚úÖ How to export failures for model improvement
4. ‚úÖ How to choose the right preset for your use case

**Time to complete:** 5-10 minutes

---

## 1. Installation

First, make sure VisProbe is installed:

In [None]:
# Run this cell if you haven't installed VisProbe yet
# !pip install -e .

## 2. Import and Setup

Import the necessary libraries:

In [None]:
import torch
import torchvision.models as models
import torchvision.transforms as T
from torchvision.datasets import CIFAR10

# Import VisProbe
from visprobe import quick_check

print("‚úì Imports successful!")

## 3. Load a Model

For this example, we'll use a pretrained ResNet-18. You can replace this with your own model!

In [None]:
# Load a pretrained model
model = models.resnet18(weights='IMAGENET1K_V1')
model.eval()

print("‚úì Model loaded: ResNet-18")
print(f"  Parameters: {sum(p.numel() for p in model.parameters()):,}")

## 4. Prepare Test Data

Load some test images. We'll use CIFAR-10 for this demo.

> **Note:** In production, use images that match your model's training distribution!

In [None]:
# Prepare transforms
transform = T.Compose([
    T.Resize(224),  # ResNet expects 224x224
    T.ToTensor(),
])

# Load CIFAR-10 dataset
print("Loading CIFAR-10 dataset...")
dataset = CIFAR10(root='./data', train=False, download=True, transform=transform)

# Take a subset for faster testing
num_samples = 50  # Increase this for more thorough testing
test_data = [dataset[i] for i in range(num_samples)]

print(f"‚úì Loaded {num_samples} test samples")

## 5. Run Robustness Test üöÄ

Now for the magic! Test your model with just one function call:

In [None]:
# Run robustness test
report = quick_check(
    model=model,
    data=test_data,
    preset="lighting",  # Test lighting variations
    budget=500,         # Number of model queries (increase for more precision)
    device="auto"       # Auto-detect GPU/CPU
)

print("\n‚úÖ Testing complete!")

## 6. View Results üìä

The `report.show()` method displays a rich HTML summary in Jupyter:

In [None]:
# Display results (will show rich HTML in Jupyter)
report.show()

## 7. Analyze Results Programmatically

Access the results as Python objects for further analysis:

In [None]:
# Overall robustness score (0-1, higher is better)
print(f"Robustness Score: {report.score:.1%}")

# Number of failures found
print(f"Total Failures: {len(report.failures)}")

# Test metadata
print(f"Runtime: {report.summary['runtime_sec']:.1f}s")
print(f"Model Queries: {report.summary['model_queries']}")

### Interpreting the Score

The robustness score tells you how well your model handles perturbations:

- **> 80%**: ‚úÖ Excellent - Model is highly robust
- **60-80%**: ‚úÖ Good - Reasonable robustness with some weaknesses
- **40-60%**: ‚ö†Ô∏è Moderate - Significant robustness issues
- **< 40%**: ‚ùå Poor - Model is very fragile

In [None]:
# Interpret the score
score = report.score

if score > 0.8:
    print("‚úÖ EXCELLENT - Your model is highly robust!")
elif score > 0.6:
    print("‚úÖ GOOD - Reasonable robustness with room for improvement.")
elif score > 0.4:
    print("‚ö†Ô∏è MODERATE - Your model has significant robustness issues.")
else:
    print("‚ùå POOR - Your model is very fragile to perturbations.")

## 8. Inspect Failures

Look at specific failure cases to understand what went wrong:

In [None]:
# Show first 5 failures
if report.failures:
    print(f"Found {len(report.failures)} failure cases:\n")
    
    for i, failure in enumerate(report.failures[:5], 1):
        print(f"{i}. Sample {failure['index']}:")
        print(f"   Original prediction: {failure['original_pred']}")
        print(f"   Perturbed prediction: {failure['perturbed_pred']}")
        print(f"   Perturbation level: {failure['level']:.3f}")
        print()
else:
    print("No failures found! Your model is robust to this preset.")

## 9. Export Failures for Retraining

Export the worst failures to use as hard examples in your training set:

In [None]:
if report.failures:
    # Export top 10 failures
    export_path = report.export_failures(n=10)
    print(f"‚úÖ Exported 10 failure cases to:")
    print(f"   {export_path}")
    print(f"\nüí° Use these failures to:")
    print(f"   1. Understand your model's weak points")
    print(f"   2. Add similar examples to your training set")
    print(f"   3. Increase data augmentation in problem areas")
else:
    print("No failures to export!")

## 10. Try Different Presets

VisProbe includes 4 presets for different use cases. Let's try the "standard" preset which includes compositional perturbations:

In [None]:
# List available presets
from visprobe import presets

print("Available presets:\n")
for name, description in presets.list_presets():
    print(f"  ‚Ä¢ {name:12s}: {description}")

In [None]:
# Test with "standard" preset (includes compositional perturbations!)
report_standard = quick_check(
    model=model,
    data=test_data[:20],  # Use fewer samples for faster demo
    preset="standard",
    budget=300,
    device="auto"
)

print(f"\nStandard Preset Results:")
print(f"  Score: {report_standard.score:.1%}")
print(f"  Failures: {len(report_standard.failures)}")

### Preset Comparison

Compare results across presets:

In [None]:
# Compare the two presets
print("\nüìä Preset Comparison:\n")
print(f"  Lighting:  {report.score:.1%} ({len(report.failures)} failures)")
print(f"  Standard:  {report_standard.score:.1%} ({len(report_standard.failures)} failures)")

# Which is weaker?
if report.score < report_standard.score:
    print(f"\n‚ö†Ô∏è  Your model is weaker on lighting perturbations")
else:
    print(f"\n‚ö†Ô∏è  Your model is weaker on standard perturbations")

## 11. Complete Summary

Get a complete summary dict for programmatic use (e.g., CI/CD checks):

In [None]:
# Get summary dictionary
summary = report.summary

print("Full Summary:")
for key, value in summary.items():
    print(f"  {key}: {value}")

---

## üéâ Congratulations!

You've learned how to use VisProbe to test your model's robustness!

### What You Accomplished

‚úÖ Tested a model with `quick_check()` in 3 lines  
‚úÖ Viewed results with `report.show()`  
‚úÖ Analyzed failures programmatically  
‚úÖ Exported failures for retraining  
‚úÖ Compared different presets  

---

## üìö Next Steps

### For Your Own Model

1. **Replace the model:**
   ```python
   model = YourModel()
   model.load_state_dict(torch.load('your_weights.pth'))
   ```

2. **Use your test data:**
   ```python
   test_data = your_dataset  # Can be DataLoader, list, or tensors
   ```

3. **Set correct normalization:**
   ```python
   report = quick_check(
       model, data, preset="standard",
       mean=(0.485, 0.456, 0.406),  # Your training mean
       std=(0.229, 0.224, 0.225)     # Your training std
   )
   ```

### Advanced Usage

- See `examples/custom_model_example.py` for a complete template
- See `examples/preset_comparison.py` to compare all presets
- See `README.md` for advanced configuration options

### Production Deployment

Use VisProbe in your CI/CD pipeline:

```python
report = quick_check(model, test_data, preset="standard")
assert report.score > 0.7, f"Model robustness too low: {report.score:.1%}"
```

---

## ü§ù Contributing

Found this helpful? Give us a star on GitHub! ‚≠ê

Have questions or issues? Open an issue on GitHub.

---

## üìñ Resources

- **Main README**: [../README.md](../README.md)
- **Examples**: [../examples/](../examples/)
- **API Reference**: [../COMPREHENSIVE_API_REFERENCE.md](../COMPREHENSIVE_API_REFERENCE.md)
- **Troubleshooting**: [../TROUBLESHOOTING.md](../TROUBLESHOOTING.md)

Happy testing! üöÄ