Skip to content

metatensor tests fail on GPU #155

@orionarcher

Description

@orionarcher

One shortcoming of our current CI is that all tests are only run on CPU. When running the tests on GPU, I am finding that the metatensor tests are failing, error shown below. It's possible that this is an environmental configuration issue on my end, but I'd like to check if the tests have been tested on GPU too. @frostedoyster, @Luthaf, would you mind taking a look? Do you have access to a CUDA device you could run the tests on?

The underlying error appears to be in vesin.

Running tests (pytest): /workspaces/propfoliotorchsim/torch-sim/tests/models/test_metatensor.py::test_mattersim_model_outputs
Running test with arguments: --rootdir /workspaces/propfoliotorchsim/torch-sim --override-ini junit_family=xunit1 --junit-xml=/tmp/tmp-2510298TvuaQeZ2OUXQ.xml ./tests/models/test_metatensor.py::test_mattersim_model_outputs
Current working directory: /workspaces/propfoliotorchsim/torch-sim
Workspace directory: /workspaces/propfoliotorchsim/torch-sim
Run completed, parsing output
./tests/models/test_metatensor.py::test_mattersim_model_outputs Failed: [undefined]RuntimeError: device cuda:0 is not supported in vesin
request = <FixtureRequest for <Function test_mattersim_model_outputs>>
device = device(type='cuda'), dtype = torch.float32

    def test_model_output_validation(
        request: pytest.FixtureRequest,
        device: torch.device,
        dtype: torch.dtype,
    ) -> None:
        """Test that a model implementation follows the ModelInterface contract."""
        # Get the model fixture dynamically
        model: ModelInterface = request.getfixturevalue(model_fixture_name)
    
        from ase.build import bulk
    
        from torch_sim.io import atoms_to_state
    
        assert model.dtype is not None
        assert model.device is not None
        assert model.compute_stress is not None
        assert model.compute_forces is not None
    
        try:
            if not model.compute_stress:
                model.compute_stress = True
            stress_computed = True
        except NotImplementedError:
            stress_computed = False
    
        try:
            if not model.compute_forces:
                model.compute_forces = True
            force_computed = True
        except NotImplementedError:
            force_computed = False
    
        si_atoms = bulk("Si", "diamond", a=5.43, cubic=True)
        fe_atoms = bulk("Fe", "fcc", a=5.26, cubic=True).repeat([3, 1, 1])
    
        sim_state = atoms_to_state([si_atoms, fe_atoms], device, dtype)
    
        og_positions = sim_state.positions.clone()
        og_cell = sim_state.cell.clone()
        og_batch = sim_state.batch.clone()
        og_atomic_numbers = sim_state.atomic_numbers.clone()
    
>       model_output = model.forward(sim_state)

tests/models/conftest.py:152: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
torch_sim/models/metatensor.py:226: in forward
    vesin.torch.metatensor.compute_requested_neighbors(
../propfolio/.venv/lib/python3.12/site-packages/vesin/torch/metatensor/_model.py:81: in compute_requested_neighbors
    neighbors = calculator.compute(system)
../propfolio/.venv/lib/python3.12/site-packages/vesin/torch/metatensor/_neighbors.py:120: in compute
    (P, S, D) = self._nl.compute(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <vesin.torch._neighbors.NeighborList object at 0x7ffbc4a07140>
points = tensor([[0.0000, 0.0000, 0.0000],
        [1.3575, 1.3575, 1.3575],
        [0.0000, 2.7150, 2.7150],
        [1.3575,...50, 0.0000],
        [4.0725, 4.0725, 1.3575]], device='cuda:0', dtype=torch.float64,
       grad_fn=<ToCopyBackward0>)
box = tensor([[5.4300, 0.0000, 0.0000],
        [0.0000, 5.4300, 0.0000],
        [0.0000, 0.0000, 5.4300]], device='cuda:0', dtype=torch.float64,
       grad_fn=<ToCopyBackward0>)
periodic = True, quantities = 'PSD', copy = True

    def compute(
        self,
        points: torch.Tensor,
        box: torch.Tensor,
        periodic: bool,
        quantities: str,
        copy: bool = True,
    ) -> List[torch.Tensor]:
        """
        Compute the neighbor list for the system defined by ``positions``, ``box``, and
        ``periodic``; returning the requested ``quantities``.
    
        ``quantities`` can contain any combination of the following values:
    
        - ``"i"`` to get the index of the first point in the pair
        - ``"j"`` to get the index of the second point in the pair
        - ``"P"`` to get the indexes of the two points in the pair simultaneously
        - ``"S"`` to get the periodic shift of the pair
        - ``"d"`` to get the distance between points in the pair
        - ``"D"`` to get the distance vector between points in the pair
    
        :param points: positions of all points in the system
        :param box: bounding box of the system
        :param periodic: should we use periodic boundary conditions?
        :param quantities: quantities to return, defaults to "ij"
        :param copy: should we copy the returned quantities, defaults to ``True``.
            Setting this to ``False`` might be a bit faster, but the returned tensors
            are view inside this class, and will be invalidated whenever this class is
            garbage collected or used to run a new calculation.
    
        :return: list of :py:class:`torch.Tensor` as indicated by ``quantities``.
        """
    
>       return self._c.compute(
            points=points,
            box=box,
            periodic=periodic,
            quantities=quantities,
            copy=copy,
        )
E       RuntimeError: device cuda:0 is not supported in vesin

../propfolio/.venv/lib/python3.12/site-packages/vesin/torch/_neighbors.py:50: RuntimeError

Total number of tests expected to run: 1
Total number of tests run: 1
Total number of tests passed: 0
Total number of tests failed: 1
Total number of tests failed with errors: 0
Total number of tests skipped: 0
Total number of tests with no result data: 0
Finished running tests!

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingecosystemComp-chem ecosystem related

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions