Skip to content

[BUG] Inconsistent AUC between training and inference on KuaiRand-1K ranking task #334

@cry-daniel

Description

@cry-daniel

Describe the bug
Hi, thanks for open-sourcing this great project!

We recently tried running the ranking task on the KuaiRand-1K dataset, but observed a significant inconsistency between training-time evaluation results and standalone inference results.

Steps/Code to reproduce bug
Training Command

PYTHONPATH=${PYTHONPATH}:$(realpath ../) \
torchrun --nproc_per_node 1 --master_addr localhost --master_port 6000 \
./training/pretrain_gr_ranking.py \
--gin-config-file ./training/configs/kuairand_1k_ranking.gin

Inference Command

PYTHONPATH=${PYTHONPATH}:$(realpath ../) torchrun --nproc_per_node 1 \
--master_addr localhost --master_port 6000 ./inference/inference_gr_ranking.py \
--gin_config_file ./inference/configs/kuairand_1k_inference_ranking.gin \
 --checkpoint_dir ckpts/iter550/ --mode eval

Expected behavior
Training-time Evaluation Results at Iteration 550:

Metrics.task0.AUC: 0.703728
Metrics.task1.AUC: 0.532176
Metrics.task2.AUC: 0.664358
Metrics.task3.AUC: 0.474177
Metrics.task4.AUC: 0.644220
Metrics.task5.AUC: 0.377520
Metrics.task6.AUC: 0.691218
Metrics.task7.AUC: 0.558680

Inference Results Using the Saved Checkpoint:

Metrics.task0.AUC: 0.350148
Metrics.task1.AUC: 0.637459
Metrics.task2.AUC: 0.331639
Metrics.task3.AUC: 0.659130
Metrics.task4.AUC: 0.430801
Metrics.task5.AUC: 0.649176
Metrics.task6.AUC: 0.329165
Metrics.task7.AUC: 0.493231

Environment details (please complete the following information):

  • Environment location: [Docker]
  • Method of recsys-examples install: [Docker]
  • docker pull shijieliu01/recsys-examples:2026.1.9 for training and docker pull shijieliu01/recsys-examples:inference.2026.1.14 for inference.
  • docker run --gpus all -it --name gr_training shijieliu01/recsys-examples:2026.1.9 and docker run --gpus all -it --name gr_inference shijieliu01/recsys-examples:inference.2026.1.14
  • Run print_env.sh from the project root and paste the results here:
<details><summary>Click here to see environment details</summary><pre>
     
     **git***
     Not inside a git repository
     
     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=24.04
     DISTRIB_CODENAME=noble
     DISTRIB_DESCRIPTION="Ubuntu 24.04.2 LTS"
     PRETTY_NAME="Ubuntu 24.04.2 LTS"
     NAME="Ubuntu"
     VERSION_ID="24.04"
     VERSION="24.04.2 LTS (Noble Numbat)"
     VERSION_CODENAME=noble
     ID=ubuntu
     ID_LIKE=debian
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     UBUNTU_CODENAME=noble
     LOGO=ubuntu-logo
     Linux ee8f456bcbfa 5.15.0-105-generic #115-Ubuntu SMP Mon Apr 15 09:52:04 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
     
     ***GPU Information***
     Thu Mar 26 11:58:20 2026
     +-----------------------------------------------------------------------------------------+
     | NVIDIA-SMI 580.126.20             Driver Version: 580.126.20     CUDA Version: 13.0     |
     +-----------------------------------------+------------------------+----------------------+
     | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
     |                                         |                        |               MIG M. |
     |=========================================+========================+======================|
     |   0  NVIDIA A100-SXM4-80GB          On  |   00000000:13:00.0 Off |                    0 |
     | N/A   43C    P0            250W /  400W |   68333MiB /  81920MiB |     98%      Default |
     |                                         |                        |             Disabled |
     +-----------------------------------------+------------------------+----------------------+
     |   1  NVIDIA A100-SXM4-80GB          On  |   00000000:1C:00.0 Off |                    0 |
     | N/A   22C    P0             62W /  400W |       4MiB /  81920MiB |      0%      Default |
     |                                         |                        |             Disabled |
     +-----------------------------------------+------------------------+----------------------+
     
     +-----------------------------------------------------------------------------------------+
     | Processes:                                                                              |
     |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
     |        ID   ID                                                               Usage      |
     |=========================================================================================|
     |  No running processes found                                                             |
     +-----------------------------------------------------------------------------------------+
     
     ***CPU***
     Architecture:                       x86_64
     CPU op-mode(s):                     32-bit, 64-bit
     Address sizes:                      45 bits physical, 48 bits virtual
     Byte Order:                         Little Endian
     CPU(s):                             64
     On-line CPU(s) list:                0-63
     Vendor ID:                          AuthenticAMD
     Model name:                         AMD EPYC 7443 24-Core Processor
     CPU family:                         25
     Model:                              1
     Thread(s) per core:                 1
     Core(s) per socket:                 2
     Socket(s):                          32
     Stepping:                           1
     BogoMIPS:                           5689.31
     Flags:                              fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl tsc_reliable nonstop_tsc cpuid extd_apicid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw topoext invpcid_single ibpb vmmcall fsgsbase bmi1 avx2 smep bmi2 invpcid rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves clzero wbnoinvd arat umip pku ospke vaes vpclmulqdq rdpid overflow_recov succor
     Hypervisor vendor:                  VMware
     Virtualization type:                full
     L1d cache:                          2 MiB (64 instances)
     L1i cache:                          2 MiB (64 instances)
     L2 cache:                           32 MiB (64 instances)
     L3 cache:                           1 GiB (32 instances)
     NUMA node(s):                       1
     NUMA node0 CPU(s):                  0-63
     Vulnerability Gather data sampling: Not affected
     Vulnerability Itlb multihit:        Not affected
     Vulnerability L1tf:                 Not affected
     Vulnerability Mds:                  Not affected
     Vulnerability Meltdown:             Not affected
     Vulnerability Mmio stale data:      Not affected
     Vulnerability Retbleed:             Not affected
     Vulnerability Spec rstack overflow: Mitigation; safe RET
     Vulnerability Spec store bypass:    Vulnerable
     Vulnerability Spectre v1:           Mitigation; usercopy/swapgs barriers and __user pointer sanitization
     Vulnerability Spectre v2:           Mitigation; Retpolines, IBPB conditional, STIBP disabled, RSB filling, PBRSB-eIBRS Not affected
     Vulnerability Srbds:                Not affected
     Vulnerability Tsx async abort:      Not affected
     
     ***CMake***
     /usr/local/bin/cmake
     cmake version 3.31.6
     
     CMake suite maintained and supported by Kitware (kitware.com/cmake).
     
     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
     Copyright (C) 2023 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
     
     
     ***nvcc***
     /usr/local/cuda/bin/nvcc
     nvcc: NVIDIA (R) Cuda compiler driver
     Copyright (c) 2005-2025 NVIDIA Corporation
     Built on Tue_May_27_02:21:03_PDT_2025
     Cuda compilation tools, release 12.9, V12.9.86
     Build cuda_12.9.r12.9/compiler.36037853_0
     
     ***Python***
     /usr/bin/python
     Python 3.12.3
     
     ***Environment Variables***
     PATH                            : /usr/local/lib/python3.12/dist-packages/torch_tensorrt/bin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/mpi/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/local/ucx/bin:/opt/amazon/efa/bin:/opt/tensorrt/bin
     LD_LIBRARY_PATH                 : /usr/local/lib/python3.12/dist-packages/torch/lib:/usr/local/lib/python3.12/dist-packages/torch_tensorrt/lib:/usr/local/cuda/compat/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    :
     PYTHON_PATH                     :
     
     conda not found
     ***pip packages***
     /usr/local/bin/pip
     Package                    Version                       Editable project location
     -------------------------- ----------------------------- ---------------------------
     absl-py                    2.3.0
     aiohappyeyeballs           2.6.1
     aiohttp                    3.12.7
     aiosignal                  1.3.2
     annotated-types            0.7.0
     anyio                      4.9.0
     apex                       0.1
     argon2-cffi                25.1.0
     argon2-cffi-bindings       21.2.0
     arrow                      1.3.0
     asciitree                  0.3.3
     asttokens                  3.0.0
     astunparse                 1.6.3
     async-lru                  2.0.5
     attrs                      25.3.0
     audioread                  3.0.1
     babel                      2.17.0
     beautifulsoup4             4.13.4
     black                      25.1.0
     bleach                     6.2.0
     blis                       0.7.11
     cachetools                 6.0.0
     catalogue                  2.0.10
     certifi                    2025.4.26
     cffi                       1.17.1
     cfgv                       3.5.0
     charset-normalizer         3.4.2
     click                      8.2.1
     cloudpathlib               0.21.1
     cloudpickle                3.1.1
     cmake                      3.31.6
     comm                       0.2.2
     confection                 0.1.5
     contourpy                  1.3.2
     cuda-bindings              12.9.0
     cuda-python                12.9.0
     cudf                       25.4.0
     cudf-polars                25.4.0
     cugraph                    25.4.0
     cugraph-service-client     25.4.0
     cugraph-service-server     25.4.0
     cuml                       25.4.0
     cupy-cuda12x               13.3.0
     cuvs                       25.4.0
     cycler                     0.12.1
     cymem                      2.0.11
     Cython                     3.1.1
     dask                       2025.2.0
     dask-cuda                  25.4.0
     dask-cudf                  25.4.0
     debugpy                    1.8.14
     decorator                  5.2.1
     defusedxml                 0.7.1
     dill                       0.4.0
     distlib                    0.4.0
     distributed                2025.2.0
     distributed-ucxx           0.43.0
     distro                     1.9.0
     dm-tree                    0.1.9
     docker                     7.1.0
     docstring_parser           0.17.0
     dynamicemb                 0.0.1+961ac1e
     einops                     0.8.1
     execnet                    2.1.1
     executing                  2.2.0
     expecttest                 0.3.0
     fasteners                  0.19
     fastjsonschema             2.21.1
     fastrlock                  0.8.3
     fbgemm_gpu_nightly         2026.1.8
     filelock                   3.20.2
     flash_attn                 2.7.4.post1
     fonttools                  4.58.1
     fqdn                       1.5.1
     frozenlist                 1.6.0
     fsspec                     2025.5.1
     gast                       0.6.0
     gin-config                 0.5.0
     grpcio                     1.62.1
     h11                        0.16.0
     hstu_attn                  0.1.0+961ac1e.cu12.9
     hstu_cuda_ops              0.0.0
     hstu-hopper                0.1.1+961ac1e.cu12.9
     httpcore                   1.0.9
     httpx                      0.28.1
     hypothesis                 6.130.8
     identify                   2.6.15
     idna                       3.10
     importlib_metadata         8.7.0
     iniconfig                  2.1.0
     intel-openmp               2021.4.0
     iopath                     0.1.10
     ipykernel                  6.29.5
     ipython                    9.3.0
     ipython_pygments_lexers    1.1.1
     isoduration                20.11.0
     isort                      6.0.1
     jedi                       0.19.2
     Jinja2                     3.1.6
     joblib                     1.5.1
     json5                      0.12.0
     jsonpointer                3.0.0
     jsonschema                 4.24.0
     jsonschema-specifications  2025.4.1
     jupyter_client             8.6.3
     jupyter_core               5.8.1
     jupyter-events             0.12.0
     jupyter-lsp                2.2.5
     jupyter_server             2.16.0
     jupyter_server_terminals   0.5.3
     jupyterlab                 4.4.3
     jupyterlab_code_formatter  3.0.2
     jupyterlab_pygments        0.3.0
     jupyterlab_server          2.27.3
     jupyterlab_tensorboard_pro 4.0.0
     jupytext                   1.17.2
     kiwisolver                 1.4.8
     kvikio                     25.4.0
     langcodes                  3.5.0
     language_data              1.3.0
     lazy_loader                0.4
     libcudf                    25.4.0
     libcugraph                 25.4.0
     libcuml                    25.4.0
     libcuvs                    25.4.0
     libkvikio                  25.4.0
     libraft                    25.4.0
     librmm                     25.4.0
     librmm-cu12                25.4.0
     librosa                    0.11.0
     libucxx                    0.43.0
     lightning-thunder          0.2.3.dev0
     lightning-utilities        0.14.3
     lintrunner                 0.12.7
     llvmlite                   0.42.0
     locket                     1.0.0
     looseversion               1.3.0
     marisa-trie                1.2.1
     Markdown                   3.8
     markdown-it-py             3.0.0
     MarkupSafe                 3.0.2
     matplotlib                 3.10.3
     matplotlib-inline          0.1.7
     mdit-py-plugins            0.4.2
     mdurl                      0.1.2
     megatron-core              0.12.1                        /workspace/deps/megatron-lm
     mistune                    3.1.3
     mkl                        2021.1.1
     mkl-devel                  2021.1.1
     mkl-include                2021.1.1
     mock                       5.2.0
     mpmath                     1.3.0
     msgpack                    1.1.0
     multidict                  6.4.4
     murmurhash                 1.0.13
     mypy_extensions            1.1.0
     nbclient                   0.10.2
     nbconvert                  7.16.6
     nbformat                   5.10.4
     nest-asyncio               1.6.0
     networkx                   3.5
     ninja                      1.11.1.4
     nodeenv                    1.10.0
     notebook                   7.4.3
     notebook_shim              0.2.4
     numba                      0.59.1
     numba-cuda                 0.4.0
     numcodecs                  0.13.1
     numpy                      1.26.4
     nvdlfw_inspect             0.1.0
     nvfuser                    0.2.27a0+9bf5aca
     nvidia-cudnn-frontend      1.12.0
     nvidia-cutlass-dsl         4.3.0
     nvidia-dali-cuda120        1.50.0
     nvidia-ml-py               12.575.51
     nvidia-modelopt            0.29.0
     nvidia-modelopt-core       0.29.0
     nvidia-nvcomp-cu12         4.2.0.14
     nvidia-nvimgcodec-cu12     0.5.0.13
     nvidia-nvjpeg-cu12         12.4.0.16
     nvidia-nvjpeg2k-cu12       0.8.1.40
     nvidia-nvtiff-cu12         0.5.0.67
     nvidia-resiliency-ext      0.4.0
     nvtx                       0.2.11
     nx-cugraph                 25.4.0
     onnx                       1.17.0
     opt_einsum                 3.4.0
     optree                     0.16.0
     ordered-set                4.1.0
     orjson                     3.11.5
     overrides                  7.7.0
     packaging                  25.0
     pandas                     2.2.3
     pandocfilters              1.5.1
     parso                      0.8.4
     partd                      1.4.2
     pathspec                   0.12.1
     pexpect                    4.9.0
     pillow                     11.2.1
     pip                        25.1.1
     platformdirs               4.3.8
     pluggy                     1.6.0
     ply                        3.11
     polars                     1.25.2
     polygraphy                 0.49.20
     pooch                      1.8.2
     portalocker                3.2.0
     pre_commit                 4.5.1
     preshed                    3.0.10
     prometheus_client          0.22.1
     prompt_toolkit             3.0.51
     propcache                  0.3.1
     protobuf                   4.24.4
     psutil                     7.0.0
     ptyprocess                 0.7.0
     PuLP                       3.2.1
     pure_eval                  0.2.3
     pyarrow                    19.0.1
     pybind11                   2.13.6
     pybind11_global            2.13.6
     pycocotools                2.0+nv0.8.1
     pycparser                  2.22
     pydantic                   2.11.5
     pydantic_core              2.33.2
     Pygments                   2.19.1
     pylibcudf                  25.4.0
     pylibcugraph               25.4.0
     pylibcugraphops            25.4.0
     pylibraft                  25.4.0
     pylibwholegraph            25.4.0
     pynvjitlink                0.3.0
     pynvml                     12.0.0
     pyparsing                  3.2.3
     pyre-extensions            0.0.32
     pytest                     8.1.1
     pytest-flakefinder         1.1.0
     pytest-rerunfailures       15.1
     pytest-shard               0.1.2
     pytest-xdist               3.7.0
     python-dateutil            2.9.0.post0
     python-hostlist            2.2.1
     python-json-logger         3.3.0
     pytorch-triton             3.3.0+git96316ce52.nvinternal
     pytz                       2023.4
     pyvers                     0.1.0
     PyYAML                     6.0.2
     pyzmq                      26.4.0
     raft-dask                  25.4.0
     rapids-dask-dependency     25.4.0a0
     rapids-logger              0.1.18
     referencing                0.36.2
     regex                      2024.11.6
     requests                   2.32.3
     rfc3339-validator          0.1.4
     rfc3986-validator          0.1.1
     rich                       14.0.0
     rmm                        25.4.0
     rpds-py                    0.25.1
     safetensors                0.5.3
     scikit-build               0.18.1
     scikit-learn               1.6.1
     scipy                      1.15.3
     Send2Trash                 1.8.3
     setuptools                 78.1.1
     setuptools-git-versioning  2.1.0
     shellingham                1.5.4
     six                        1.16.0
     smart-open                 7.1.0
     sniffio                    1.3.1
     sortedcontainers           2.4.0
     soundfile                  0.13.1
     soupsieve                  2.7
     soxr                       0.5.0.post1
     spacy                      3.7.5
     spacy-legacy               3.0.12
     spacy-loggers              1.0.5
     srsly                      2.5.1
     stack-data                 0.6.3
     sympy                      1.14.0
     tabulate                   0.9.0
     tbb                        2021.13.1
     tblib                      3.1.0
     tensorboard                2.16.2
     tensorboard-data-server    0.7.2
     tensordict                 0.10.0
     tensorrt                   10.11.0.33
     terminado                  0.18.1
     thinc                      8.2.5
     threadpoolctl              3.6.0
     thriftpy2                  0.5.2
     tinycss2                   1.4.0
     toolz                      1.0.0
     torch                      2.8.0a0+5228986c39.nv25.6
     torch_tensorrt             2.8.0a0
     torchao                    0.11.0+git
     torchmetrics               1.0.3
     torchprofile               0.0.4
     torchrec                   1.2.0+440b1c6
     torchvision                0.22.0a0+95f10a4e
     torchx                     0.7.0
     tornado                    6.5.1
     tqdm                       4.67.1
     traitlets                  5.14.3
     transformer_engine         2.4.0+3cd6870
     treelite                   4.4.1
     typer                      0.16.0
     types-dataclasses          0.6.6
     types-python-dateutil      2.9.0.20250516
     typing_extensions          4.14.0
     typing-inspect             0.9.0
     typing-inspection          0.4.1
     tzdata                     2025.2
     ucx-py                     0.43.0
     ucxx                       0.43.0
     uri-template               1.3.0
     urllib3                    1.26.20
     virtualenv                 20.36.0
     wasabi                     1.1.3
     wcwidth                    0.2.13
     weasel                     0.4.1
     webcolors                  24.11.1
     webencodings               0.5.1
     websocket-client           1.8.0
     Werkzeug                   3.1.3
     wheel                      0.45.1
     wrapt                      1.17.2
     xdoctest                   1.0.2
     xgboost                    2.1.4
     yarl                       1.20.0
     zarr                       2.18.7
     zict                       3.0.0
     zipp                       3.22.0
     
</pre></details>

Additional context
The results are not only inconsistent but also show large deviations (in some cases nearly inverted), which seems unexpected.

Questions
• Has the consistency between training and inference been recently verified for this example?
• Are there any known pitfalls or required steps when running inference (e.g., feature preprocessing, normalization, checkpoint loading, dynamic embedding handling, etc.)?

Any guidance would be greatly appreciated. Thanks in advance!


By submitting this issue, you agree to follow our code of conduct and our contributing guidelines.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions