<a href="https://colab.research.google.com/github/dastef1984/russmann/blob/master/mapping_debugging.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

### 1. Discrepancy between Custom and RecBole NDCG Calculation

- **Challenge**: My custom implementation of the NDCG metric produces significantly different results from RecBole’s built-in NDCG calculation, despite using similar formulas for Discounted Cumulative Gain (DCG) and Ideal DCG (IDCG).
- **Goal**: To determine why my custom NDCG calculation does not match RecBole’s NDCG results.
- **What I’ve Tried**:
    - Implemented a custom NDCG calculation using the formula for DCG and IDCG:


In [2]:
def custom_ndcg_at_k(predictions, ground_truth, k=10):
    def dcg_at_k(recommended_items, relevant_items, k):
        dcg = 0.0
        for i in range(min(k, len(recommended_items))):
            if recommended_items[i] in relevant_items:
                dcg += 1 / np.log2(i + 2)
        return dcg

    def idcg_at_k(relevant_items, k):
        idcg = 0.0
        for i in range(min(k, len(relevant_items))):
            idcg += 1 / np.log2(i + 2)
        return idcg

    total_ndcg = 0.0
    num_users = len(predictions)

    for user_idx in range(num_users):
        recommended_items = predictions[user_idx]
        relevant_items = ground_truth[user_idx]

        # Calculate DCG@K
        dcg = dcg_at_k(recommended_items, relevant_items, k)
        # Calculate IDCG@K
        idcg = idcg_at_k(relevant_items, k)

        ndcg = dcg / idcg if idcg > 0 else 0
        total_ndcg += ndcg

    return total_ndcg / num_users


   - I compared it to RecBole’s simulated NDCG calculation based on their source code.
   - **Result**: My custom NDCG gave results like 0.012, while RecBole's NDCG gave significantly higher values (0.19 or higher). I am unsure where the discrepancy comes from.

- **Questions for Supervisor**:
   - Could the mapping of internal vs. external IDs (from RecBole’s dataset) be affecting my NDCG calculation? I’ve already ensured that both IDs map correctly when converting back and forth, but the results still differ.
   - Should I try to align my DCG/IDCG calculation approach to RecBole’s more closely, e.g., by following RecBole’s binary relevance indexing system?


   - **Result**: Unsure if this impacts the way RecBole calculates relevance for metrics like NDCG.

- **Questions**:
   - Should I inspect RecBole’s entire preprocessing pipeline to understand how it handles relevance scores?
   - Would aligning my preprocessing steps with RecBole’s possibly eliminate the discrepancy between custom and RecBole-calculated NDCG?


### Preprocessing in RecBole

- **Challenge**: I suspect that RecBole's preprocessing steps might affect how relevance is handled during NDCG calculations.
- **Goal**: Understand and compare RecBole’s preprocessing steps with my own approach.
- **What I’ve Tried**:
   - Loaded the dataset and checked the output of the internal features after preprocessing:


### Simulating RecBole’s NDCG Calculation

- **Challenge**: My attempt to simulate RecBole’s NDCG calculation yielded only approximated results.
- **Goal**: Ensure my simulated NDCG calculation matches RecBole’s implementation exactly.

   - The results are similar but still not exactly the same as RecBole’s output.

- **Questions**:
   - Should I continue refining this, or would it make sense to focus on why my original custom implementation doesn’t match RecBole’s?


In [4]:
!pip install recbole
!pip install ray
!pip install kmeans-pytorch

Collecting recbole
  Downloading recbole-1.2.0-py3-none-any.whl.metadata (1.4 kB)
Collecting colorlog==4.7.2 (from recbole)
  Downloading colorlog-4.7.2-py2.py3-none-any.whl.metadata (9.9 kB)
Collecting colorama==0.4.4 (from recbole)
  Downloading colorama-0.4.4-py2.py3-none-any.whl.metadata (14 kB)
Collecting thop>=0.1.1.post2207130030 (from recbole)
  Downloading thop-0.1.1.post2209072238-py3-none-any.whl.metadata (2.7 kB)
Collecting texttable>=0.9.0 (from recbole)
  Downloading texttable-1.7.0-py2.py3-none-any.whl.metadata (9.8 kB)
Downloading recbole-1.2.0-py3-none-any.whl (2.1 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.1/2.1 MB[0m [31m21.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)
Downloading colorlog-4.7.2-py2.py3-none-any.whl (10 kB)
Downloading texttable-1.7.0-py2.py3-none-any.whl (10 kB)
Downloading thop-0.1.1.post2209072238-py3-none-any.whl (15 kB)
Installing collected packages: texttable, colorlog, 

In [5]:
from recbole.quick_start import run_recbole
import os

# Configuration for training BPR model on ml100k
dataset_name = 'ml-100k'
checkpoint_dir = './saved_models/'
os.makedirs(checkpoint_dir, exist_ok=True)

# Configuration dictionary
config_dict = {
    'model': 'BPR',
    'dataset': dataset_name,
    'data_path': './dataset/',
    'epochs': 10,
    'topk': 10,
    'metrics': ['ndcg', 'mrr'],   # Metrics for comparison
    'checkpoint_dir': checkpoint_dir,
    'save_model': True,
    'valid_metric': 'ndcg@10',
}

# Train the BPR model
result = run_recbole(config_dict=config_dict)

# Output the RecBole evaluation results
print(f"RecBole Validation Result: {result['best_valid_result']}")
print(f"RecBole Test Result: {result['test_result']}")

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  feat[field].fillna(value=0, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  feat[field].fillna(value=feat[field].mean(), inplace=True)
  scaler = amp.GradScaler(enabled=self.enable_scaler)
Train     0: 100%|██████████████████████████████████████████████████| 40/40 [

RecBole Validation Result: OrderedDict([('ndcg@10', 0.1601), ('mrr@10', 0.2991)])
RecBole Test Result: OrderedDict([('ndcg@10', 0.1911), ('mrr@10', 0.3565)])


In [7]:
from recbole.quick_start import load_data_and_model

# Load the trained BPR model and dataset
model_file = './saved_models/BPR.pth'
config, model, dataset, _, test_dataloader, _ = load_data_and_model(model_file)

# Function to get top-k recommendations for users
import torch

def get_topk_recommendations(model, dataloader, topk=10):
    model.eval()
    topk_recommendations = []
    for data in dataloader:
        interaction = data[0]
        scores = model.full_sort_predict(interaction)
        topk_items = torch.topk(scores, k=topk, dim=-1).indices
        topk_recommendations.append(topk_items.cpu().numpy())
    return topk_recommendations

bpr_topk_recs = get_topk_recommendations(model, test_dataloader, topk=10)

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  feat[field].fillna(value=0, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  feat[field].fillna(value=feat[field].mean(), inplace=True)


In [8]:
def extract_ground_truth(dataset):
    ground_truth = []
    for i in range(len(dataset.inter_feat)):
        ground_truth.append(dataset.inter_feat['item_id'][i])  # No conversion needed, already internal ID
    return ground_truth

bpr_ground_truth = extract_ground_truth(dataset)

In [9]:
import numpy as np

# Custom NDCG@10 calculation
def calculate_ndcg_at_k(topk_recs, ground_truth, k=10):
    def dcg_at_k(recommended_items, relevant_items, k):
        dcg = 0.0
        for i in range(min(k, len(recommended_items))):
            if recommended_items[i] in relevant_items:
                dcg += 1 / np.log2(i + 2)
        return dcg

    def idcg_at_k(relevant_items, k):
        idcg = 0.0
        for i in range(min(k, len(relevant_items))):
            idcg += 1 / np.log2(i + 2)
        return idcg

    total_ndcg = 0.0
    num_users = len(topk_recs)

    for user_idx in range(num_users):
        recommended_items = topk_recs[user_idx]
        relevant_items = [ground_truth[user_idx]]

        # Calculate DCG@K
        dcg = dcg_at_k(recommended_items, relevant_items, k)
        # Calculate IDCG@K
        idcg = idcg_at_k(relevant_items, k)

        if idcg == 0:
            ndcg = 0.0
        else:
            ndcg = dcg / idcg

        total_ndcg += ndcg

    return total_ndcg / num_users

# Compare NDCG@10
custom_ndcg = calculate_ndcg_at_k(bpr_topk_recs, bpr_ground_truth, k=10)
print(f"Custom NDCG@10: {custom_ndcg}")

Custom NDCG@10: 0.01211335827200395


In [11]:
import numpy as np
import torch

# Custom NDCG Calculation Function
def calculate_ndcg_at_k(topk_recs, ground_truth, k=10):
    def dcg_at_k(recommended_items, relevant_items, k):
        dcg = 0.0
        for i in range(min(k, len(recommended_items))):
            if recommended_items[i] in relevant_items:
                dcg += 1 / np.log2(i + 2)
        return dcg

    def idcg_at_k(relevant_items, k):
        idcg = 0.0
        for i in range(min(k, len(relevant_items))):
            idcg += 1 / np.log2(i + 2)
        return idcg

    total_ndcg = 0.0
    num_users = len(topk_recs)

    for user_idx in range(num_users):
        recommended_items = topk_recs[user_idx]
        relevant_items = ground_truth[user_idx]

        # Ensure relevant_items is a list
        if isinstance(relevant_items, int):
            relevant_items = [relevant_items]  # Convert int to a list for comparison

        # Calculate DCG@K for this user
        dcg = dcg_at_k(recommended_items, relevant_items, k)
        # Calculate IDCG@K for this user
        idcg = idcg_at_k(relevant_items, k)

        if idcg == 0:
            ndcg = 0.0
        else:
            ndcg = dcg / idcg

        total_ndcg += ndcg

    return total_ndcg / num_users


# Extract ground truth as lists of relevant items for each user
def extract_ground_truth_from_dataset(dataset, test=True):
    ground_truth = []
    data = dataset.inter_feat if test else dataset.train_data

    for i in range(len(data)):
        item_id = data['item_id'][i]
        ground_truth.append([item_id])  # Ensure ground truth is a list of relevant items for each user

    return ground_truth

# Extract the ground truth interactions from the test dataset for comparison
bpr_ground_truth = extract_ground_truth_from_dataset(bpr_dataset, test=True)

# Step 6: Compare the custom NDCG with RecBole's NDCG
print("\nCalculating Custom NDCG:")
custom_ndcg = calculate_ndcg_at_k(bpr_topk_recs, bpr_ground_truth, k=10)
print(f"Custom NDCG@10: {custom_ndcg}")

print(f"\nRecBole NDCG@10: {recbole_result['test_result']['ndcg@10']}")


NameError: name 'bpr_dataset' is not defined

In [12]:
import numpy as np
import torch
from recbole.config import Config
from recbole.data import create_dataset, data_preparation
from recbole.data.interaction import Interaction
from recbole.model.general_recommender.bpr import BPR

def convert_tokens_to_ids(dataset, field, tokens):
    """Convert external tokens to internal ids."""
    if isinstance(tokens, str):
        return dataset.token2id(field, tokens)
    elif isinstance(tokens, (list, np.ndarray)):
        return np.array([dataset.token2id(field, token) for token in tokens])
    else:
        raise TypeError(f"The type of tokens [{tokens}] is not supported")

def convert_ids_to_tokens(dataset, field, ids):
    """Convert internal ids to external tokens."""
    if isinstance(ids, (list, np.ndarray, torch.Tensor)):
        return dataset.id2token(field, ids)
    else:
        raise TypeError(f"The type of ids [{ids}] is not supported")

# Inline configuration dictionary
config_dict = {
    'model': 'BPR',
    'dataset': 'ml-100k',
    'data_path': './dataset/ml-100k/',
    'epochs': 10,
    'topk': 10,
    'metrics': ['ndcg', 'mrr'],
    'train_batch_size': 512,
    'eval_batch_size': 512,
    'valid_metric': 'ndcg@10',
    'save_model': True,
    'checkpoint_dir': './saved_models/',
}

# Load model, dataset and prepare data without external config file
config = Config(config_dict=config_dict)
dataset = create_dataset(config)
train_data, valid_data, test_data = data_preparation(config, dataset)

# Load model
model = BPR(config, dataset)
checkpoint_path = './saved_models/BPR.pth'  # Make sure to adjust the path
checkpoint = torch.load(checkpoint_path)
model.load_state_dict(checkpoint['state_dict'])
model.eval()

# Example external user and item IDs
user_ids = ['1', '2', '3']  # Use actual user IDs from the dataset
item_id_lists = [['50', '172', '300'], ['10', '20'], ['100', '200']]

# Convert external tokens to internal IDs
user_ids_internal = convert_tokens_to_ids(dataset, 'user_id', user_ids)
item_id_lists_internal = [convert_tokens_to_ids(dataset, 'item_id', item_list) for item_list in item_id_lists]

# Prepare the input for interaction with padding
max_list_size = 50
padded_item_id_lists = np.zeros((len(user_ids), max_list_size), dtype=int)
item_lengths = []

for i, item_list in enumerate(item_id_lists_internal):
    item_lengths.append(len(item_list))
    padded_item_id_lists[i, :len(item_list)] = item_list

# Create interaction for model input
input_inter = Interaction({
    'user_id': torch.tensor(user_ids_internal),
    'item_id_list': torch.tensor(padded_item_id_lists),
    'item_length': torch.tensor(item_lengths),
})

# Generate predictions
with torch.no_grad():
    scores = model.full_sort_predict(input_inter)

# Convert internal IDs back to external item tokens
scores = scores.numpy()
item_ids_external = np.arange(scores.shape[1])
item_ids_external = convert_ids_to_tokens(dataset, 'item_id', item_ids_external)

# Display the top-10 predicted items for each user
for i, user_id in enumerate(user_ids):
    top_10_indices = np.argsort(scores[i])[::-1][:10]
    top_10_items = item_ids_external[top_10_indices]
    top_10_scores = scores[i][top_10_indices]
    print(f"User ID: {user_id}")
    print("Top 10 predicted items:")
    for item, score in zip(top_10_items, top_10_scores):
        print(f"Item ID: {item}, Score: {score:.4f}")
    print("-" * 50)


The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  feat[field].fillna(value=0, inplace=True)
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  feat[field].fillna(value=feat[field].mean(), inplace=True)
  checkpoint = torch.load(checkpoint_path)


IndexError: tuple index out of range

In [13]:
with torch.no_grad():
    scores = model.full_sort_predict(input_inter)
    print("Shape of scores:", scores.shape)


Shape of scores: torch.Size([5049])


In [14]:
# For a single user, `scores` is 1D
if len(scores.shape) == 1:
    item_ids_external = np.arange(scores.shape[0])
else:
    item_ids_external = np.arange(scores.shape[1])  # For multiple users


In [15]:
# Get the total number of items in the dataset
num_items_in_dataset = dataset.item_num

# Adjust the top-k recommendations to handle out-of-bounds indices
with torch.no_grad():
    scores = model.full_sort_predict(input_inter)
    print("Shape of scores:", scores.shape)

# Convert internal IDs back to external item tokens
if len(scores.shape) == 1:  # Single user
    item_ids_external = np.arange(min(scores.shape[0], num_items_in_dataset))
else:  # Multiple users
    item_ids_external = np.arange(min(scores.shape[1], num_items_in_dataset))

# Convert the internal IDs to tokens only for valid IDs
item_ids_external = convert_ids_to_tokens(dataset, 'item_id', item_ids_external)

# Process the top-k items, ensuring we stay within the valid range of item IDs
if len(scores.shape) == 1:  # Single user
    top_10_indices = np.argsort(scores)[:10]
    top_10_indices = top_10_indices[top_10_indices < num_items_in_dataset]  # Filter out-of-bounds indices
    top_10_items = item_ids_external[top_10_indices]
    top_10_scores = scores[top_10_indices]

    print("Top 10 predicted items for the single user:")
    for item, score in zip(top_10_items, top_10_scores):
        print(f"Item ID: {item}, Score: {score:.4f}")
else:
    # Loop through each user's predictions if multiple users
    for i in range(scores.shape[0]):
        top_10_indices = np.argsort(scores[i])[:10]
        top_10_indices = top_10_indices[top_10_indices < num_items_in_dataset]  # Filter out-of-bounds indices
        top_10_items = item_ids_external[top_10_indices]
        top_10_scores = scores[i][top_10_indices]

        print(f"Top 10 predicted items for user {i}:")
        for item, score in zip(top_10_items, top_10_scores):
            print(f"Item ID: {item}, Score: {score:.4f}")
        print("-" * 50)


Shape of scores: torch.Size([5049])
Top 10 predicted items for the single user:
Item ID: 1660, Score: -2.3586
Item ID: 1626, Score: -2.3237
Item ID: 1347, Score: -2.3016
Item ID: 1669, Score: -2.2747
Item ID: 1666, Score: -2.2684
Item ID: 1676, Score: -2.2617
Item ID: 1616, Score: -2.2590
Item ID: 1678, Score: -2.2337
Item ID: 1364, Score: -2.2307
Item ID: 1307, Score: -2.2251


In [16]:
import numpy as np
import torch

# Custom NDCG calculation
def calculate_ndcg_at_k(topk_recs, ground_truth, k=10):
    def dcg_at_k(recommended_items, relevant_items, k):
        dcg = 0.0
        for i in range(min(k, len(recommended_items))):
            # Check if each element is iterable or scalar
            if isinstance(recommended_items[i], (list, np.ndarray)):
                if any(item in relevant_items for item in recommended_items[i]):
                    dcg += 1 / np.log2(i + 2)
            else:
                if recommended_items[i] in relevant_items:
                    dcg += 1 / np.log2(i + 2)
        return dcg

    def idcg_at_k(relevant_items, k):
        idcg = 0.0
        for i in range(min(k, len(relevant_items))):
            idcg += 1 / np.log2(i + 2)
        return idcg

    total_ndcg = 0.0
    num_users = len(topk_recs)

    for user_idx in range(num_users):
        recommended_items = topk_recs[user_idx]
        relevant_items = ground_truth[user_idx]

        # Ensure both recommended_items and relevant_items are lists
        if isinstance(recommended_items, torch.Tensor):
            recommended_items = recommended_items.tolist()
        if isinstance(relevant_items, torch.Tensor):
            relevant_items = relevant_items.tolist()
        if isinstance(recommended_items, int):  # Convert a single integer to a list
            recommended_items = [recommended_items]
        if isinstance(relevant_items, int):
            relevant_items = [relevant_items]

        # Calculate DCG@K for this user
        dcg = dcg_at_k(recommended_items, relevant_items, k)
        # Calculate IDCG@K for this user
        idcg = idcg_at_k(relevant_items, k)

        if idcg == 0:
            ndcg = 0.0
        else:
            ndcg = dcg / idcg

        total_ndcg += ndcg

    return total_ndcg / num_users

# Assuming bpr_topk_recs and bpr_ground_truth have already been extracted
# Step 3: Calculate Custom NDCG
custom_ndcg = calculate_ndcg_at_k(bpr_topk_recs, bpr_ground_truth, k=10)

# Output the custom NDCG@10
print(f"Custom NDCG@10: {custom_ndcg}")


Custom NDCG@10: 0.01211335827200395
