-
Notifications
You must be signed in to change notification settings - Fork 223
Closed
Labels
CI/CDCI/CD infrastructureCI/CD infrastructurecuda.coreEverything related to the cuda.core moduleEverything related to the cuda.core moduletestImprovements or additions to testsImprovements or additions to teststriageNeeds the team's attentionNeeds the team's attention
Description
QA reported test_vmm_allocator_basic_allocation and test_vmm_allocator_grow_allocation failures (see below) under Windows.
Looking at the logs of the latest CI run for main:
It turns out our CI does not run those tests on any Windows machine.
What would it take to run the tests on at least one Windows machine?
Test failures reproduced interactively (TITAN RTX, WDDM):
(TestVenv) PS C:\Users\rgrossekunst\forked\ctk-next\cuda_core> pytest -ra -s -v .\tests\test_memory.py
========================================================================================= test session starts =========================================================================================
platform win32 -- Python 3.13.9, pytest-9.0.1, pluggy-1.6.0 -- C:\Users\rgrossekunst\forked\ctk-next\TestVenv\Scripts\python.exe
cachedir: .pytest_cache
benchmark: 5.2.3 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
rootdir: C:\Users\rgrossekunst\forked\ctk-next\cuda_core\tests
configfile: pytest.ini
plugins: benchmark-5.2.3
collected 37 items
...
============================================================================================== FAILURES ===============================================================================================
_________________________________________________________________________________ test_vmm_allocator_basic_allocation _________________________________________________________________________________
def test_vmm_allocator_basic_allocation():
"""Test basic VMM allocation functionality.
This test verifies that VirtualMemoryResource can allocate memory
using CUDA VMM APIs with default configuration.
"""
device = Device()
device.set_current()
# Skip if virtual memory management is not supported
if not device.properties.virtual_memory_management_supported:
pytest.skip("Virtual memory management is not supported on this device")
options = VirtualMemoryResourceOptions()
# Create VMM allocator with default config
vmm_mr = VirtualMemoryResource(device, config=options)
# Test basic allocation
> buffer = vmm_mr.allocate(4096)
^^^^^^^^^^^^^^^^^^^^^
tests\test_memory.py:333:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cuda\core\experimental\_memory\_virtual_memory_resource.py:516: in allocate
raise_if_driver_error(res)
cuda\core\experimental\_utils\cuda_utils.pyx:67: in cuda.core.experimental._utils.cuda_utils._check_driver_error
cpdef inline int _check_driver_error(cydriver.CUresult error) except?-1 nogil:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> raise CUDAError(f"{name.decode()}: {expl}")
E cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_INVALID_VALUE: This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.
cuda\core\experimental\_utils\cuda_utils.pyx:78: CUDAError
_________________________________________________________________________________ test_vmm_allocator_grow_allocation __________________________________________________________________________________
def test_vmm_allocator_grow_allocation():
"""Test VMM allocator's ability to grow existing allocations.
This test verifies that VirtualMemoryResource can grow existing
allocations while preserving the base pointer when possible.
"""
device = Device()
device.set_current()
# Skip if virtual memory management is not supported (we need it for VMM)
if not device.properties.virtual_memory_management_supported:
pytest.skip("Virtual memory management is not supported on this device")
options = VirtualMemoryResourceOptions()
vmm_mr = VirtualMemoryResource(device, config=options)
# Create initial allocation
> buffer = vmm_mr.allocate(2 * 1024 * 1024)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
tests\test_memory.py:435:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
cuda\core\experimental\_memory\_virtual_memory_resource.py:516: in allocate
raise_if_driver_error(res)
cuda\core\experimental\_utils\cuda_utils.pyx:67: in cuda.core.experimental._utils.cuda_utils._check_driver_error
cpdef inline int _check_driver_error(cydriver.CUresult error) except?-1 nogil:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
> raise CUDAError(f"{name.decode()}: {expl}")
E cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_INVALID_VALUE: This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.
cuda\core\experimental\_utils\cuda_utils.pyx:78: CUDAError
======================================================================================= short test summary info =======================================================================================
SKIPPED [1] tests\test_memory.py:369: This test requires a device that doesn't support GPU Direct RDMA
SKIPPED [1] tests\test_memory.py:607: Driver rejects IPC-enabled mempool creation on this platform
FAILED tests\test_memory.py::test_vmm_allocator_basic_allocation - cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_INVALID_VALUE: This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.
FAILED tests\test_memory.py::test_vmm_allocator_grow_allocation - cuda.core.experimental._utils.cuda_utils.CUDAError: CUDA_ERROR_INVALID_VALUE: This indicates that one or more of the parameters passed to the API call is not within an acceptable range of values.
=============================================================================== 2 failed, 33 passed, 2 skipped in 1.03s ===============================================================================
rwgk-win11.localdomain:~/logs_19339188192 $ grep test_vmm_allocator_basic_allocation *Test*.txt | grep -v SKIPPED | cut -d: -f1
Test_linux-64___py3.10__12.9.1__local__GPU_v100.txt
Test_linux-64___py3.10__13.0.2__wheels__GPU_l4.txt
Test_linux-64___py3.11__12.9.1__wheels__GPU_rtxpro6000.txt
Test_linux-64___py3.11__13.0.2__local__GPU_l4.txt
Test_linux-64___py3.12__12.9.1__local__GPU_l4.txt
Test_linux-64___py3.12__13.0.2__wheels__GPU_l4.txt
Test_linux-64___py3.13__12.9.1__wheels__GPU_v100.txt
Test_linux-64___py3.13__13.0.2__local__GPU_H100.txt
Test_linux-64___py3.13__13.0.2__local__GPU_rtxpro6000.txt
Test_linux-64___py3.14__13.0.2__local__GPU_l4.txt
Test_linux-64___py3.14t__13.0.2__local__GPU_l4.txt
Test_linux-aarch64___py3.10__12.9.1__local__GPU_a100.txt
Test_linux-aarch64___py3.10__13.0.2__wheels__GPU_a100.txt
Test_linux-aarch64___py3.11__12.9.1__wheels__GPU_a100.txt
Test_linux-aarch64___py3.11__13.0.2__local__GPU_a100.txt
Test_linux-aarch64___py3.12__12.9.1__local__GPU_a100.txt
Test_linux-aarch64___py3.12__13.0.2__wheels__GPU_a100.txt
Test_linux-aarch64___py3.13__12.9.1__wheels__GPU_a100.txt
Test_linux-aarch64___py3.13__13.0.2__local__GPU_a100.txt
Test_linux-aarch64___py3.14__13.0.2__local__GPU_a100.txt
Test_linux-aarch64___py3.14t__13.0.2__local__GPU_a100.txt
rwgk-win11.localdomain:~/logs_19339188192 $ grep test_vmm_allocator_basic_allocation *Test*.txt | grep SKIPPED | cut -d: -f1
Test_win-64___py3.12__12.9.1__local__GPU_t4.txt
Test_win-64___py3.12__12.9.1__wheels__GPU_l4.txt
Test_win-64___py3.13__13.0.2__local__GPU_l4.txt
Test_win-64___py3.13__13.0.2__wheels__GPU_t4.txt
Test_win-64___py3.14__13.0.2__local__GPU_l4.txt
Test_win-64___py3.14__13.0.2__wheels__GPU_t4.txt
Test_win-64___py3.14t__13.0.2__local__GPU_l4.txt
Test_win-64___py3.14t__13.0.2__wheels__GPU_t4.txt
rwgk-win11.localdomain:~/logs_19339188192 $ grep test_vmm_allocator_grow_allocation *Test*.txt | grep -v SKIPPED | cut -d: -f1
Test_linux-64___py3.10__12.9.1__local__GPU_v100.txt
Test_linux-64___py3.10__13.0.2__wheels__GPU_l4.txt
Test_linux-64___py3.11__12.9.1__wheels__GPU_rtxpro6000.txt
Test_linux-64___py3.11__13.0.2__local__GPU_l4.txt
Test_linux-64___py3.12__12.9.1__local__GPU_l4.txt
Test_linux-64___py3.12__13.0.2__wheels__GPU_l4.txt
Test_linux-64___py3.13__12.9.1__wheels__GPU_v100.txt
Test_linux-64___py3.13__13.0.2__local__GPU_H100.txt
Test_linux-64___py3.13__13.0.2__local__GPU_rtxpro6000.txt
Test_linux-64___py3.14__13.0.2__local__GPU_l4.txt
Test_linux-64___py3.14t__13.0.2__local__GPU_l4.txt
Test_linux-aarch64___py3.10__12.9.1__local__GPU_a100.txt
Test_linux-aarch64___py3.10__13.0.2__wheels__GPU_a100.txt
Test_linux-aarch64___py3.11__12.9.1__wheels__GPU_a100.txt
Test_linux-aarch64___py3.11__13.0.2__local__GPU_a100.txt
Test_linux-aarch64___py3.12__12.9.1__local__GPU_a100.txt
Test_linux-aarch64___py3.12__13.0.2__wheels__GPU_a100.txt
Test_linux-aarch64___py3.13__12.9.1__wheels__GPU_a100.txt
Test_linux-aarch64___py3.13__13.0.2__local__GPU_a100.txt
Test_linux-aarch64___py3.14__13.0.2__local__GPU_a100.txt
Test_linux-aarch64___py3.14t__13.0.2__local__GPU_a100.txt
rwgk-win11.localdomain:~/logs_19339188192 $ grep test_vmm_allocator_grow_allocation *Test*.txt | grep SKIPPED | cut -d: -f1
Test_win-64___py3.12__12.9.1__local__GPU_t4.txt
Test_win-64___py3.12__12.9.1__wheels__GPU_l4.txt
Test_win-64___py3.13__13.0.2__local__GPU_l4.txt
Test_win-64___py3.13__13.0.2__wheels__GPU_t4.txt
Test_win-64___py3.14__13.0.2__local__GPU_l4.txt
Test_win-64___py3.14__13.0.2__wheels__GPU_t4.txt
Test_win-64___py3.14t__13.0.2__local__GPU_l4.txt
Test_win-64___py3.14t__13.0.2__wheels__GPU_t4.txt
Metadata
Metadata
Assignees
Labels
CI/CDCI/CD infrastructureCI/CD infrastructurecuda.coreEverything related to the cuda.core moduleEverything related to the cuda.core moduletestImprovements or additions to testsImprovements or additions to teststriageNeeds the team's attentionNeeds the team's attention