# Recorded Demo of RecNextEval for SIGIR 2026 Presentation

Paper title: RecNextEval: A Reference Implementation for Temporal Next-Batch Recommendation Evaluation

Presenter: Ng Tze Kean

Presentation paramters:

- Dataset: MovieLens100K
- Top K = 10
- Algorithm: ItemKNNIncremental, RecentPopularity
- First timestamp split at epoch time 875_156_710

In [1]:
from recnexteval.datasets import MovieLens100K
from recnexteval.settings import SlidingWindowSetting


k = 10
dataset = MovieLens100K()
data = dataset.load()

setting_window = SlidingWindowSetting(
    training_t=875_156_710,
    window_size=60 * 60 * 24 * 30,  # day times N
    top_K=k
)

setting_window.split(data)

INFO - recnexteval package loaded.
DEBUG - MovieLens100K being initialized with '/Users/ngtzekean/personal/RecNextEval/data/movielens' as the base path.
DEBUG - MovieLens100K is initialized.
INFO - MovieLens100K is loading dataset...
INFO - Loading from cache: /Users/ngtzekean/personal/RecNextEval/data/movielens/ml-100k_u.data.processed.parquet
DEBUG - MovieLens100K applying filters set.
DEBUG - 	interactions before preprocess: 100000
DEBUG - 	items before preprocess: 1682
DEBUG - 	users before preprocess: 943
DEBUG - 	interactions after preprocess: 100000
DEBUG - 	items after preprocess: 1682
DEBUG - 	users after preprocess: 943
INFO - MovieLens100K dataset loaded - Took 0.0451s
DEBUG - Splitting data...
DEBUG - Performing lt(t, 2147483647)
DEBUG - Performing lt(t, 875156710)
DEBUG - Performing ge(t, 875156710)
DEBUG - Performing lt(t, 3022640357)
DEBUG - TimestampSplitter(t=875156710,t_lower=None,t_upper=2147483647) has complete split


  0%|          | 0/6 [00:00<?, ?it/s]

DEBUG - NLastInteractionTimestampSplitter(t=875156710,t_lower=None,t_upper=2592000,n_seq_data=0,include_all_past_data=False) - Updating split point to t=875156710
DEBUG - Performing lt(t, 877748710)
DEBUG - Performing ge(t, 875156710)
DEBUG - NLastInteractionTimestampSplitter(t=875156710,t_lower=None,t_upper=2592000,n_seq_data=0,include_all_past_data=False) has complete split
INFO - Split at time 875156710 resulted in empty unlabelled testing samples.
DEBUG - NLastInteractionTimestampSplitter(t=875156710,t_lower=None,t_upper=2592000,n_seq_data=0,include_all_past_data=False) - Updating split point to t=877748710
DEBUG - Performing lt(t, 880340710)
DEBUG - Performing ge(t, 877748710)
DEBUG - NLastInteractionTimestampSplitter(t=877748710,t_lower=None,t_upper=2592000,n_seq_data=0,include_all_past_data=False) has complete split
INFO - Split at time 877748710 resulted in empty unlabelled testing samples.
DEBUG - NLastInteractionTimestampSplitter(t=877748710,t_lower=None,t_upper=2592000,n_seq

7it [00:00, 197.56it/s]              

INFO - Finished split with window size 2592000 seconds. Number of splits: 7 in total.
INFO - SlidingWindowSetting data split - Took 0.0488s
DEBUG - Checking split attribute and sizes.
DEBUG - Checking split attributes.
DEBUG - Split attributes are set.
DEBUG - Checking size of split sets.

DEBUG - Size of split sets are checked.
INFO - SlidingWindowSetting data split complete.





In [2]:
from recnexteval.algorithms import DecayPopularity, ItemKNNIncremental, RecentPopularity
from recnexteval.evaluators import EvaluatorPipelineBuilder


builder = EvaluatorPipelineBuilder()
builder.add_setting(setting_window)
builder.set_metric_k(k)
# builder.add_metric("RecallK")
builder.add_metric("NDCGK")
builder.add_metric("HitK")
# builder.add_algorithm(ItemKNNIncremental, params={"K": k})
builder.add_algorithm(ItemKNNIncremental, params={"K": k})
builder.add_algorithm(RecentPopularity, params={"K": k})
builder.add_algorithm(DecayPopularity, params={"K": k})
evaluator = builder.build()

INFO - Registered algorithm 'ItemKNNIncremental(K=10,pad_with_popularity=False)' with ID 51476e85-ee3a-5260-918a-1ad1856df1d7
INFO - Registered algorithm 'RecentPopularity(K=10)' with ID 8c4cf57c-4e77-567a-95fe-41ade21a0148
INFO - Registered algorithm 'DecayPopularity(K=10)' with ID b48e753c-438c-5e9c-8e4a-df9b2faddf38


In [3]:
evaluator.run()

DEBUG - First step, getting training data
DEBUG - (user x item) shape defined is (np.int16(41), np.int16(872)). Shape of dataframe stored in matrix was (3446, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(41), np.int16(872))
DEBUG - Fitting ItemKNNIncremental complete - Took 0.0111s
DEBUG - Fitting RecentPopularity complete - Took 0.00142s
DEBUG - Fitting DecayPopularity complete - Took 0.000816s
DEBUG - Algorithms trained with background data...
DEBUG - Metric accumulator instantiated...
DEBUG - Restoring setting to iteration 0
DEBUG - Setting data generators ready...


Evaluating steps:   0%|          | 0/7 [00:00<?, ?it/s]

INFO - Running step 0
INFO - Phase 2: Evaluating the algorithms...
DEBUG - (user x item) shape defined is (np.int16(174), np.int16(1177)). Shape of dataframe stored in matrix was (1463, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(174), np.int16(1177))
DEBUG - (user x item) shape defined is (np.int16(174), np.int16(1177)). Shape of dataframe stored in matrix was (1463, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(174), np.int16(1177))
DEBUG - Performing items_in comparison
DEBUG - Padding item ID in range(872, 1178) with empty fields
DEBUG - Padding by ItemKNNIncremental completed
DEBUG - Padding user ID in range(41, 175) with random items
DEBUG - Padding by ItemKNNIncremental completed
DEBUG - Shape of prediction matrix: (1463, 1178)
DEBUG - Shape of ground truth matrix: (1463, 1177)
DEBUG - NDCGK compute started - NDCGK_10
DEBUG - Number of users: 1463
DEBUG - Number of ground truth interactions: 1463
DEBUG - NDCGK compute compl

Evaluating steps:  29%|██▊       | 2/7 [00:00<00:00, 17.60it/s]

DEBUG - Not first step, getting previous ground truth data as training data
DEBUG - (user x item) shape defined is (np.int16(391), np.int16(1411)). Shape of dataframe stored in matrix was (21810, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(391), np.int16(1411))
DEBUG - Fitting ItemKNNIncremental complete - Took 0.0308s
DEBUG - Fitting RecentPopularity complete - Took 0.00126s
DEBUG - Fitting DecayPopularity complete - Took 0.00135s
INFO - Running step 2
INFO - Phase 2: Evaluating the algorithms...
DEBUG - (user x item) shape defined is (np.int16(497), np.int16(1467)). Shape of dataframe stored in matrix was (1607, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(497), np.int16(1467))
DEBUG - (user x item) shape defined is (np.int16(497), np.int16(1467)). Shape of dataframe stored in matrix was (1607, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(497), np.int16(1467))
DEBUG - Performing items_in comparison
D

Evaluating steps:  57%|█████▋    | 4/7 [00:00<00:00, 13.55it/s]

DEBUG - Not first step, getting previous ground truth data as training data
DEBUG - (user x item) shape defined is (np.int16(618), np.int16(1515)). Shape of dataframe stored in matrix was (12069, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(618), np.int16(1515))
DEBUG - Fitting ItemKNNIncremental complete - Took 0.0418s
DEBUG - Fitting RecentPopularity complete - Took 0.00112s
DEBUG - Fitting DecayPopularity complete - Took 0.00116s
INFO - Running step 4
INFO - Phase 2: Evaluating the algorithms...
DEBUG - (user x item) shape defined is (np.int16(703), np.int16(1594)). Shape of dataframe stored in matrix was (1453, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(703), np.int16(1594))
DEBUG - (user x item) shape defined is (np.int16(703), np.int16(1594)). Shape of dataframe stored in matrix was (1453, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(703), np.int16(1594))
DEBUG - Performing items_in comparison
D

Evaluating steps:  86%|████████▌ | 6/7 [00:00<00:00, 11.86it/s]

DEBUG - Not first step, getting previous ground truth data as training data
DEBUG - (user x item) shape defined is (np.int16(782), np.int16(1626)). Shape of dataframe stored in matrix was (8919, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(782), np.int16(1626))
DEBUG - Fitting ItemKNNIncremental complete - Took 0.053s
DEBUG - Fitting RecentPopularity complete - Took 0.000893s
DEBUG - Fitting DecayPopularity complete - Took 0.000983s
INFO - Running step 6
INFO - Phase 2: Evaluating the algorithms...
DEBUG - (user x item) shape defined is (np.int16(943), np.int16(1670)). Shape of dataframe stored in matrix was (2120, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(943), np.int16(1670))
DEBUG - (user x item) shape defined is (np.int16(943), np.int16(1670)). Shape of dataframe stored in matrix was (2120, 5) before masking
DEBUG - Final (user x item) shape defined is (np.int16(943), np.int16(1670))
DEBUG - Performing items_in comparison
D

Evaluating steps: 100%|██████████| 7/7 [00:00<00:00, 11.88it/s]


# evaluate metrics

In [4]:
evaluator.metric_results("macro")

Unnamed: 0_level_0,Unnamed: 1_level_0,macro_score,num_window
algorithm,metric,Unnamed: 2_level_1,Unnamed: 3_level_1
DecayPopularity(K=10)_b48e753c-438c-5e9c-8e4a-df9b2faddf38,HitK_10,0.527113,7
DecayPopularity(K=10)_b48e753c-438c-5e9c-8e4a-df9b2faddf38,NDCGK_10,0.125114,7
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",HitK_10,0.366751,7
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",NDCGK_10,0.166534,7
RecentPopularity(K=10)_8c4cf57c-4e77-567a-95fe-41ade21a0148,HitK_10,0.576868,7
RecentPopularity(K=10)_8c4cf57c-4e77-567a-95fe-41ade21a0148,NDCGK_10,0.157799,7


In [5]:
evaluator.metric_results("micro")

Unnamed: 0_level_0,Unnamed: 1_level_0,micro_score,num_user
algorithm,metric,Unnamed: 2_level_1,Unnamed: 3_level_1
DecayPopularity(K=10)_b48e753c-438c-5e9c-8e4a-df9b2faddf38,HitK_10,0.097255,1428
DecayPopularity(K=10)_b48e753c-438c-5e9c-8e4a-df9b2faddf38,NDCGK_10,0.113453,1428
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",HitK_10,0.172926,1428
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",NDCGK_10,0.205655,1428
RecentPopularity(K=10)_8c4cf57c-4e77-567a-95fe-41ade21a0148,HitK_10,0.127239,1428
RecentPopularity(K=10)_8c4cf57c-4e77-567a-95fe-41ade21a0148,NDCGK_10,0.150742,1428


In [6]:
evaluator.metric_results("user")

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,user_id,user_score
algorithm,timestamp,metric,Unnamed: 3_level_1,Unnamed: 4_level_1
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",875156710,NDCGK_10,0,0.725495
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",875156710,NDCGK_10,1,0.126419
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",875156710,NDCGK_10,2,0.232245
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",875156710,NDCGK_10,3,0.455156
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",875156710,NDCGK_10,4,0.345176
...,...,...,...,...
DecayPopularity(K=10)_b48e753c-438c-5e9c-8e4a-df9b2faddf38,890708710,HitK_10,938,0.2
DecayPopularity(K=10)_b48e753c-438c-5e9c-8e4a-df9b2faddf38,890708710,HitK_10,939,0.3
DecayPopularity(K=10)_b48e753c-438c-5e9c-8e4a-df9b2faddf38,890708710,HitK_10,940,0.3
DecayPopularity(K=10)_b48e753c-438c-5e9c-8e4a-df9b2faddf38,890708710,HitK_10,941,0.2


In [7]:
from recnexteval.evaluators import MetricLevelEnum

In [8]:
# you also can use the Enum instead of string for better code completion support
evaluator.metric_results(MetricLevelEnum.WINDOW)

Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,window_score,num_user
algorithm,timestamp,metric,Unnamed: 3_level_1,Unnamed: 4_level_1
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",875156710,NDCGK_10,0.05493,1463
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",875156710,HitK_10,0.207547,1463
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",877748710,NDCGK_10,0.080724,2534
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",877748710,HitK_10,0.220974,2534
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",880340710,NDCGK_10,0.190783,1607
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",880340710,HitK_10,0.393443,1607
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",882932710,NDCGK_10,0.203331,1889
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",882932710,HitK_10,0.415888,1889
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",885524710,NDCGK_10,0.226851,1453
"ItemKNNIncremental(K=10,pad_with_popularity=False)_51476e85-ee3a-5260-918a-1ad1856df1d7",885524710,HitK_10,0.468927,1453
