# Explainable Recommendation Model End-to-End Demo

This notebook demonstrates KMR's ExplainableRecommendationModel with interpretability features, including:

- Data generation using KMR utilities
- Model creation and training with recommendation metrics
- Recommendation generation with similarity explanations
- Visualization of recommendations and similarity matrices


In [1]:
import numpy as np
import tensorflow as tf
import keras
from keras.optimizers import Adam

from kmr.models import ExplainableRecommendationModel
from kmr.metrics import AccuracyAtK, PrecisionAtK, RecallAtK
from kmr.losses import ImprovedMarginRankingLoss
from kmr.utils import KMRDataGenerator, KMRPlotter

print("‚úÖ All imports successful!")
print(f"TensorFlow version: {tf.__version__}")
print(f"Keras version: {keras.__version__}")


‚úÖ All imports successful!
TensorFlow version: 2.18.0
Keras version: 3.8.0


## 1. Generate Collaborative Filtering Data

We'll use KMR's data generator to create synthetic user-item interactions for collaborative filtering.


In [2]:
print("üì¶ Generating collaborative filtering data...")

user_ids, item_ids, ratings, _, _ = KMRDataGenerator.generate_collaborative_filtering_data(
    n_users=1000,
    n_items=500,
    n_interactions=10000,
    random_state=42
)

n_users = len(np.unique(user_ids))
n_items = len(np.unique(item_ids))

print(f"‚úÖ Generated data:")
print(f"   - Users: {n_users}")
print(f"   - Items: {n_items}")
print(f"   - Interactions: {len(user_ids)}")
print(f"   - Rating range: {ratings.min():.1f} - {ratings.max():.1f}")

# Convert to binary interaction (for implicit feedback)
interactions = (ratings >= 3.0).astype(np.float32)

# Split into train/test
train_size = int(0.8 * len(user_ids))
train_user_ids = user_ids[:train_size]
train_item_ids = item_ids[:train_size]
train_interactions = interactions[:train_size]

test_user_ids = user_ids[train_size:]
test_item_ids = item_ids[train_size:]
test_interactions = interactions[train_size:]


üì¶ Generating collaborative filtering data...
‚úÖ Generated data:
   - Users: 1000
   - Items: 500
   - Interactions: 10000
   - Rating range: 1.0 - 5.0


## 2. Build Explainable Recommendation Model


In [3]:
# Create model
model = ExplainableRecommendationModel(
    num_users=n_users,
    num_items=n_items,
    embedding_dim=32,
    top_k=10,
    l2_reg=1e-4,
    feedback_weight=0.5
)

# Create recommendation metrics
acc_at_5 = AccuracyAtK(k=5, name="acc@5")
acc_at_10 = AccuracyAtK(k=10, name="acc@10")
prec_at_5 = PrecisionAtK(k=5, name="prec@5")
prec_at_10 = PrecisionAtK(k=10, name="prec@10")
recall_at_5 = RecallAtK(k=5, name="recall@5")
recall_at_10 = RecallAtK(k=10, name="recall@10")

# Compile model with custom ranking loss and metrics
# Model returns 5-tuple: (scores, rec_indices, rec_scores, similarity_matrix, feedback_adjusted)
# Use list mapping: first element has loss/metrics, others are None
model.compile(
    optimizer=Adam(learning_rate=0.001),
    loss=[
        ImprovedMarginRankingLoss(margin=1.0, max_min_weight=0.6, avg_weight=0.4),  # For scores
        None,  # For rec_indices
        None,  # For rec_scores
        None,  # For similarity_matrix
        None   # For feedback_adjusted
    ],
    metrics=[
        [acc_at_5, acc_at_10, prec_at_5, prec_at_10, recall_at_5, recall_at_10],  # For scores
        None,  # For rec_indices
        None,  # For rec_scores
        None,  # For similarity_matrix
        None   # For feedback_adjusted
    ]
)

print("‚úÖ Model created and compiled!")
print(f"   - Users: {model.num_users}")
print(f"   - Items: {model.num_items}")
print(f"   - Embedding dim: {model.embedding_dim}")
print(f"   - Top-K: {model.top_k}")
print(f"   - Feedback weight: {model.feedback_weight}")
print(f"   - Metrics: Accuracy@5, Accuracy@10, Precision@5, Precision@10, Recall@5, Recall@10")


[32m2025-11-07 13:11:20.284[0m | [34m[1mDEBUG   [0m | [36mkmr.layers._base_layer[0m:[36m_log_initialization[0m:[36m73[0m - [34m[1mInitialized CollaborativeUserItemEmbedding with parameters: {'name': 'collaborative_user_item_embedding', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'num_users': 1000, 'num_items': 500, 'embedding_dim': 32, 'l2_reg': 0.0001}[0m
[32m2025-11-07 13:11:20.285[0m | [34m[1mDEBUG   [0m | [36mkmr.layers._base_layer[0m:[36m_log_initialization[0m:[36m73[0m - [34m[1mInitialized CosineSimilarityExplainer with parameters: {'name': 'cosine_similarity_explainer', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}}[0m
[32m2025-11-07 13:11:20.285[0m | [34m[1mDEBUG   [0m | [36mkmr.layers._base_layer[0m:[36m_log_initialization[0m:[36m73[0m - [34m[1mInitialized Feedba

‚úÖ Model created and compiled!
   - Users: 1000
   - Items: 500
   - Embedding dim: 32
   - Top-K: 10
   - Feedback weight: 0.5
   - Metrics: Accuracy@5, Accuracy@10, Precision@5, Precision@10, Recall@5, Recall@10


## 3. Train Model


In [4]:
print("üöÄ Training Model")
print("=" * 60)
print("Using model.fit() with built-in ranking loss")
print("=" * 60)
print("The model's train_step() method handles ranking loss internally!")
print("Just prepare data and call model.fit() - no custom training loop needed.\n")

# Prepare data for keras.fit() format
# For each user, provide all items and binary labels
unique_users = np.unique(train_user_ids)[:50]  # Use subset for demo
# Filter to only valid user IDs (within range)
unique_users = unique_users[unique_users < n_users]
batch_size = 8

# Create training data: for each user, provide all items and binary labels
train_x_user_ids = []
train_x_item_ids = []
train_y = []

for user_id in unique_users:
    # Get user's positive items (all interactions, not just < n_items)
    user_item_ids = train_item_ids[train_user_ids == user_id]
    # Filter to valid item range AND ensure we have positive items
    positive_set = set([i for i in user_item_ids if i < n_items])
    
    # Skip users with no valid positive items
    if len(positive_set) == 0:
        continue
    
    # Create label vector: 1 for positive items, 0 for others
    labels = np.zeros(n_items, dtype=np.float32)
    labels[list(positive_set)] = 1.0
    
    # Prepare item IDs: all items for this user
    all_item_ids = np.arange(n_items, dtype=np.int32)
    
    train_x_user_ids.append(user_id)
    train_x_item_ids.append(all_item_ids)
    train_y.append(labels)

train_x_user_ids = np.array(train_x_user_ids, dtype=np.int32)
train_x_item_ids = np.array(train_x_item_ids, dtype=np.int32)  # (n_users, n_items)
train_y = np.array(train_y, dtype=np.float32)

print(f"Prepared training data: {len(train_x_user_ids)} users")
print(f"  - User IDs shape: {train_x_user_ids.shape}")
print(f"  - Item IDs shape: {train_x_item_ids.shape}")
print(f"  - Labels shape: {train_y.shape}")
print(f"  - Positive items per user: {train_y.sum(axis=1).mean():.1f} on average\n")

# Build model by calling it once with sample data
# This ensures all layers are initialized before training
_ = model.predict([
    tf.constant(train_x_user_ids[:1]),
    tf.constant(train_x_item_ids[:1])
], verbose=0)

print("Training with model.fit()...")
print("Note: Metrics may start at 0.0 with random initial embeddings and many items (500).")
print("      This is expected - metrics will improve as the model learns to rank positive items higher.")
print("      With 500 items and ~8 positives per user, it takes time for the model to learn.")
print("      Watch the loss decrease and metrics gradually increase over epochs.\n")
history = model.fit(
    x=[train_x_user_ids, train_x_item_ids],
    y=train_y,
    epochs=30,  # More epochs needed for large item space (500 items)
    batch_size=batch_size,
    verbose=1
)

print("\n‚úÖ Training completed!")
print(f"Final loss: {history.history['loss'][-1]:.4f}")

# Display recommendation metrics
if 'acc@5' in history.history:
    print("\nüìä Recommendation Metrics:")
    print(f"   - Accuracy@5:  {history.history['acc@5'][-1]:.4f}")
    print(f"   - Accuracy@10: {history.history['acc@10'][-1]:.4f}")
    print(f"   - Precision@5:  {history.history['prec@5'][-1]:.4f}")
    print(f"   - Precision@10: {history.history['prec@10'][-1]:.4f}")
    print(f"   - Recall@5:  {history.history['recall@5'][-1]:.4f}")
    print(f"   - Recall@10: {history.history['recall@10'][-1]:.4f}")
    
    print("\nüìà Metric Improvement:")
    if len(history.history['acc@5']) > 1:
        initial_acc = history.history['acc@5'][0]
        final_acc = history.history['acc@5'][-1]
        improvement = final_acc - initial_acc
        print(f"   - Accuracy@5:  {initial_acc:.4f} ‚Üí {final_acc:.4f} (Œî{improvement:+.4f})")
        
        initial_prec = history.history['prec@5'][0]
        final_prec = history.history['prec@5'][-1]
        improvement_prec = final_prec - initial_prec
        print(f"   - Precision@5: {initial_prec:.4f} ‚Üí {final_prec:.4f} (Œî{improvement_prec:+.4f})")
    else:
        print("   - Metrics are improving during training!")
        print("   - Watch the per-epoch values above to see the progression.")

print("\nNote: The model uses margin ranking loss internally.")
print("      Positive items are encouraged to rank higher than negative items.")
print("      The model provides similarity explanations for interpretability.")

üöÄ Training Model
Using model.fit() with built-in ranking loss
The model's train_step() method handles ranking loss internally!
Just prepare data and call model.fit() - no custom training loop needed.

Prepared training data: 50 users
  - User IDs shape: (50,)
  - Item IDs shape: (50, 500)
  - Labels shape: (50, 500)
  - Positive items per user: 8.0 on average

Training with model.fit()...
Note: Metrics may start at 0.0 with random initial embeddings and many items (500).
      This is expected - metrics will improve as the model learns to rank positive items higher.
      With 500 items and ~8 positives per user, it takes time for the model to learn.
      Watch the loss decrease and metrics gradually increase over epochs.

Epoch 1/30




[1m7/7[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m1s[0m 3ms/step - acc@10: 0.1710 - acc@5: 0.0673 - loss: 0.5315 - prec@10: 0.0191 - prec@5: 0.0135 - recall@10: 0.0265 - recall@5: 0.0076              
Epoch 2/30
[1m7/7[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 3ms/step - acc@10: 0.6213 - acc@5: 0.4988 - loss: 0.4033 - prec@10: 0.0800 - prec@5: 0.1124 - recall@10: 0.1026 - recall@5: 0.0776
Epoch 3/30
[1m7/7[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 3ms/step - acc@10: 0.9577 - acc@5: 0.8083 - loss: 0.3341 - prec@10: 0.1428 - prec@5: 0.2150 - recall@10: 0.1863 - recall@5: 0.1399 
Epoch 4/30
[1m7/7[0m [32m‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ‚îÅ[0m[37m[0m [1m0s[0m 3ms/step - acc@10: 0.9924 - acc@5: 0.9817 - loss: 0.3093 - prec@10: 0.1969 - prec@5: 0.2924 - recall@10: 0.2544 - recall@5: 0.1987
Epoch 5/30
[1m7/7[

## 4. Generate Recommendations with Explanations


In [6]:
# Generate recommendations for multiple users to check diversity
print("üîç Checking recommendation diversity across users...")
n_sample_users = min(10, len(train_x_user_ids))
sample_user_indices = np.arange(n_sample_users)
sample_user_ids = tf.constant(train_x_user_ids[sample_user_indices])
sample_item_ids = tf.constant(train_x_item_ids[sample_user_indices])

# Get recommendations for all sample users
all_rec_indices = []
all_rec_scores = []

for i in range(n_sample_users):
    # ‚úÖ FIX: Model returns 5-tuple: (scores, rec_indices, rec_scores, similarity_matrix, feedback_adjusted)
    scores, rec_indices, rec_scores, similarity_matrix, feedback_adjusted = model.predict([
        tf.constant([train_x_user_ids[sample_user_indices[i]]]),
        tf.constant([train_x_item_ids[sample_user_indices[i]]])
    ], verbose=0)
    
    rec_indices_np = rec_indices[0].numpy() if hasattr(rec_indices[0], 'numpy') else np.array(rec_indices[0])
    rec_scores_np = rec_scores[0].numpy() if hasattr(rec_scores[0], 'numpy') else np.array(rec_scores[0])
    
    all_rec_indices.append(rec_indices_np)
    all_rec_scores.append(rec_scores_np)

all_rec_indices = np.array(all_rec_indices)

# Check diversity
print(f"\nüìä Recommendation Diversity Analysis:")
print(f"   Checking {n_sample_users} users...")
unique_items_per_user = [len(np.unique(rec)) for rec in all_rec_indices]
shared_items = len(set(all_rec_indices[0]).intersection(*[set(rec) for rec in all_rec_indices[1:]]))
diversity_ratio = 1.0 - (shared_items / model.top_k)
print(f"   Shared items across all users: {shared_items}/{model.top_k}")
print(f"   Diversity ratio: {diversity_ratio:.2%}")
print(f"   Average unique items per user: {np.mean(unique_items_per_user):.1f}")

if shared_items == model.top_k:
    print(f"\n‚ö†Ô∏è  WARNING: All users receive the same recommendations!")
    print(f"   This suggests the model may not be learning user-specific preferences.")
    print(f"   Try: increasing training epochs, adjusting learning rate, or checking data quality.")
else:
    print(f"\n‚úÖ Recommendations are diverse across users - model is working correctly!")

# Visualize recommendation diversity
print("\nüìä Visualizing recommendation diversity...")
fig_diversity = KMRPlotter.plot_recommendation_diversity(
    all_rec_indices,
    user_ids=train_x_user_ids[sample_user_indices],
    title="Recommendation Diversity Across Users"
)
fig_diversity.show()

# Show detailed example for first user
print(f"\nüìã Detailed example for user {sample_user_indices[0]} (user_id={train_x_user_ids[sample_user_indices[0]]}):")
print(f"   Top-{model.top_k} recommended items: {all_rec_indices[0]}")
print(f"   Recommendation scores: {all_rec_scores[0]}")

# Visualize recommendation scores for first user
print("\nüìä Visualizing recommendation scores for sample user...")
fig_scores = KMRPlotter.plot_recommendation_scores(
    all_rec_scores[0],
    top_k=model.top_k,
    title=f"Recommendation Scores for User {train_x_user_ids[sample_user_indices[0]]}"
)
fig_scores.show()

üîç Checking recommendation diversity across users...

üìä Recommendation Diversity Analysis:
   Checking 10 users...
   Shared items across all users: 0/10
   Diversity ratio: 100.00%
   Average unique items per user: 10.0

‚úÖ Recommendations are diverse across users - model is working correctly!

üìä Visualizing recommendation diversity...



üìã Detailed example for user 0 (user_id=0):
   Top-10 recommended items: [102  88 495   6 403 483 123 117 444 182]
   Recommendation scores: [0.8864695  0.8753476  0.85279727 0.8007815  0.79104924 0.73414046
 0.7254196  0.64017963 0.62517095 0.5445876 ]

üìä Visualizing recommendation scores for sample user...
