Walkthrough of how spurious_detector.py would treat a buggy model and a regular model:

Setting up both models:

In [None]:
# Initialize detectors for both models
regular_detector = SpuriousDetector(
    model=regular_model,
    results_dir=results_dir,
    concept_instance_name=concept_instance_name,
    validation_data=validation_data
)

buggy_detector = SpuriousDetector(
    model=buggy_model,
    results_dir=results_dir,
    concept_instance_name=concept_instance_name,
    validation_data=validation_data
)

Concept Importance:
- Measures how much the model relies on a concept to make its decisions
- Calculated during concept extraction (CRAFT/ACE) by analyzing the model's internal representations
- Independent of any specific prediction

Correlation with Predictions:
- Measures how often a concept appears together with positive predictions
- Calculated by comparing concept presence with model outputs
- Depends on the specific dataset and predictions

This is the work that the detector does for each concept:

In [None]:
# Example for concept_idx = 0 (e.g., "snow" concept)

# A. Gets pre-computed concept activations
concept_activations = self.activations[:, concept_idx]  # How strongly "snow" appears in each image

# B. Gets concept importance
concept_importance = self.importances[concept_idx]  # How important "snow" is for classification

# C. Gets model predictions
predictions = []  # Confidence scores for "corgi" class
for image in data_list:
    pred = model(image)  # Get prediction confidence
    predictions.append(pred)

# D. Calculates correlation between concept and predictions
correlation = np.corrcoef(concept_activations, predictions)[0, 1]

# E. Calculates final score
score = concept_importance * abs(correlation)

<b>Expected Differences:</b>

Regular Model:
- Lower correlation between environmental concepts (like "snow") and predictions
- More balanced importance scores across relevant dog features
- Result: Lower spurious scores

Buggy Model:
- Higher correlation between environmental concepts and predictions
- e.g., if model learned "snow = corgi", strong correlation between snow and corgi predictions
- Higher importance scores for spurious features
- Result: Higher spurious scores