Skip to content

feat(tf2): add eager TensorFlow array backend#5598

Merged
njzjz merged 23 commits into
deepmodeling:masterfrom
njzjz:tf2
Jun 29, 2026
Merged

feat(tf2): add eager TensorFlow array backend#5598
njzjz merged 23 commits into
deepmodeling:masterfrom
njzjz:tf2

Conversation

@njzjz

@njzjz njzjz commented Jun 27, 2026

Copy link
Copy Markdown
Member

Summary

  • vendor ndtensorflow under deepmd/_vendors and add a TensorFlow eager Array API backend under deepmd/tf2
  • implement tf2 model wrappers, tf.function SavedModel export/deep-eval glue, and backend registration for .savedmodel
  • replace jax/jax2tf TensorFlow helper implementations with tf2 compatibility exports
  • add focused tf2 consistency tests for eager and tf.function model paths

Tests

  • ruff check .
  • ruff format .
  • pytest source/tests/consistent/test_tf2_backend.py -q
  • manual SavedModel export/load smoke test

Note: local pre-commit hook attempted to fetch astral-sh/ruff-pre-commit and timed out on GitHub port 443; the commit was created with --no-verify after running the checks above manually.

Summary by CodeRabbit

  • New Features
    • Added a TensorFlow 2 eager backend with SavedModel inference plus TF2 model/descriptor/fitting wrappers (energy/dipole/polar/DOS/property).
    • Introduced TF2 neighbor-list construction, region/PBC geometry utilities, and output/ghost-atom aggregation.
    • Expanded the TensorFlow-backed Array API with namespace, FFT, and linear algebra utilities.
  • Bug Fixes
    • Improved the non-eager TensorFlow compatibility error message.
  • Refactor
    • Converted multiple JAX2TF modules into TF2 compatibility shims that re-export TF2 implementations.
  • Tests
    • Added TF2 consistency coverage, including subprocess-based checks.
  • Chores
    • Added license headers/docs and updated lint configuration.

@coderabbitai

coderabbitai Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds a vendored TensorFlow Array API, a TF2 eager backend with conversion and model wrappers, SavedModel export/load paths, JAX compatibility shims, and TF2 consistency test coverage.

Changes

ndtensorflow Array API vendor package

Layer / File(s) Summary
Array package surface
deepmd/_vendors/__init__.py, deepmd/_vendors/ndtensorflow/__init__.py, deepmd/_vendors/ndtensorflow/_array.py, deepmd/_vendors/ndtensorflow/_info.py
Defines the vendored package initializers, the TensorFlow-backed Array extension type, DLPack/scalar/indexing behavior, operator bindings, and array-namespace info reporting.
Namespace, FFT, and linalg
deepmd/_vendors/ndtensorflow/_namespace.py, deepmd/_vendors/ndtensorflow/fft.py, deepmd/_vendors/ndtensorflow/linalg.py
Implements the vendored namespace helpers, FFT, and linear algebra functions, including dtype promotion, reductions, unique handling, transforms, and decompositions.

TF2 backend, model stack, shims, and tests

Layer / File(s) Summary
TF2 runtime and conversion plumbing
deepmd/tf2/__init__.py, deepmd/tf2/env.py, deepmd/tf2/common.py, deepmd/tf2/utils/*, pyproject.toml
Defines eager-mode setup, TF2 conversion and registration helpers, TF2-aware network/exclude-mask/type-embedding wrappers, and lint ignores for the TF2/vendored paths.
Geometry, atomic wrappers, descriptors, fittings, and models
deepmd/tf2/region.py, deepmd/tf2/nlist.py, deepmd/tf2/format_nlist.py, deepmd/tf2/transform_output.py, deepmd/tf2/make_model.py, deepmd/tf2/atomic_model/*, deepmd/tf2/descriptor/*, deepmd/tf2/fitting/*, deepmd/tf2/model/*
Implements coordinate utilities, neighbor-list construction and formatting, extended-output remapping, lower-call orchestration, atomic wrappers, descriptor/fitting wrappers, model factories, and TF2 model construction/forward paths.
SavedModel, backend, shims, and tests
deepmd/tf2/utils/serialization.py, deepmd/tf2/infer/deep_eval.py, deepmd/backend/tf2.py, deepmd/jax/jax2tf/*, deepmd/jax/utils/serialization.py, source/tests/consistent/*
Implements SavedModel export/load, the TF2 inference backend, backend registration, JAX compatibility re-exports, JAX serialization import updates, and TF2 consistency tests.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant deserialize_to_file
  participant SavedModel as tf.Module SavedModel
  participant DeepEval
  participant TF2SavedModelWrapper

  Client->>deserialize_to_file: model dict
  deserialize_to_file->>SavedModel: build tf.function entry points + metadata getters
  deserialize_to_file->>SavedModel: tf.saved_model.save
  Client->>DeepEval: load .savedmodel
  DeepEval->>TF2SavedModelWrapper: load SavedModel and read metadata
  Client->>DeepEval: eval(coords, atype, cells)
  DeepEval->>TF2SavedModelWrapper: __call__(coord, atype, box, fparam, aparam)
  TF2SavedModelWrapper-->>DeepEval: dict[str, np.ndarray]
  DeepEval-->>Client: dict[str, np.ndarray]
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Suggested labels

new feature, Python

Suggested reviewers

  • wanghan-iapcm
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title is concise and accurately summarizes the main change: adding an eager TensorFlow array backend for tf2.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@github-advanced-security github-advanced-security AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (22)
deepmd/_vendors/ndtensorflow/fft.py-360-375 (1)

360-375: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Sort __all__ for Ruff.

The supplied Ruff result flags RUF022 here; sorting this list or applying the auto-fix avoids a CI failure. As per coding guidelines, **/*.py: Install linter and run ruff check . before committing changes or the CI will fail.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/fft.py` around lines 360 - 375, The __all__ list
in fft.py is not sorted, triggering Ruff rule RUF022 and a CI failure. Update
the module-level __all__ definition by ordering the exported names
alphabetically, keeping the symbols like fft, ifft, fftn, and fftshift in sorted
order so Ruff’s auto-fix would pass cleanly.

Sources: Coding guidelines, Linters/SAST tools

deepmd/_vendors/ndtensorflow/linalg.py-317-319 (1)

317-319: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Respect the requested matrix axes in norm.

matrix_norm always reduces the last two dimensions, so norm(x, axis=(0, 1)) or any non-last axis pair currently computes over the wrong axes. Move the requested axes to the matrix positions before calling matrix_norm, and preserve keepdims in the original axis locations.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/linalg.py` around lines 317 - 319, The norm
handling in linalg.norm ignores the requested matrix axes because matrix_norm
always uses the last two dimensions. Update the tuple-axis branch in norm to
first move the requested axis pair into the matrix positions before calling
matrix_norm, then restore the reduced dimensions so keepdims is preserved in the
original axis locations. Use the norm and matrix_norm symbols to make the axis
handling work for any 2D axis pair, not just the last two axes.
deepmd/_vendors/ndtensorflow/_namespace.py-37-37 (1)

37-37: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Resolve the remaining Ruff failures in this namespace.

The builtin-shadowing names are part of the Array API surface, so prefer targeted per-file-ignores/noqa for A001/A002; the RUF021 precedence and RUF022 __all__ findings can be auto-fixed or parenthesized/sorted directly. As per coding guidelines, **/*.py: Install linter and run ruff check . before committing changes or the CI will fail.

Also applies to: 698-698, 995-995, 1266-1266, 1300-1300, 1359-1360, 1385-1385, 1395-1395, 1405-1405, 1415-1415, 1454-1454, 1668-1682, 2075-2239

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/_namespace.py` at line 37, Resolve the remaining
Ruff failures in the namespace module by handling the Array API
builtin-shadowing names with targeted per-file ignores or inline noqa for
A001/A002 in the namespace constants, and fix the RUF021 precedence and RUF022
__all__ issues directly by parenthesizing the affected expressions and
sorting/auto-fixing the __all__ entries. Use the existing namespace symbols like
bool, int, float, and __all__ to locate the remaining violations, then rerun
ruff check . to confirm the file is clean.

Sources: Coding guidelines, Linters/SAST tools

deepmd/_vendors/ndtensorflow/linalg.py-212-230 (1)

212-230: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Support batched vector RHS in solve.

For x1 shaped (..., n, n) and x2 shaped (..., n), this falls into the matrix-RHS branch and broadcasts x2 as if its last two dimensions were (m, k). Detect x2_.shape.rank == x1_.shape.rank - 1, expand a trailing singleton RHS dimension for tf.linalg.solve, then squeeze it back.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/linalg.py` around lines 212 - 230, The solve
function currently mishandles batched vector right-hand sides by treating x2
shaped (..., n) as a matrix RHS. Update solve in linalg.py to detect when
x2_.shape.rank is exactly x1_.shape.rank - 1, expand x2_ with a trailing
singleton dimension before calling tf.linalg.solve, and squeeze that dimension
from the result afterward. Keep the existing scalar/vector and matrix-RHS
handling in solve consistent with _promote_two, _shape_tuple, and the final
_wrap behavior.
deepmd/_vendors/ndtensorflow/_namespace.py-1538-1540 (1)

1538-1540: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Build include_initial shapes dynamically.

out.shape.as_list() can contain None under tf.function, and tf.zeros([None, ...]) will fail. Use tf.shape(out) and update the selected axis dynamically.

Proposed fix
-        shape = list(out.shape.as_list())
-        shape[axis] = 1
-        out = tf.concat([tf.zeros(shape, dtype=out.dtype), out], axis=axis)
+        shape = tf.tensor_scatter_nd_update(tf.shape(out), [[axis]], [1])
+        out = tf.concat([tf.zeros(shape, dtype=out.dtype), out], axis=axis)
@@
-        shape = list(out.shape.as_list())
-        shape[axis] = 1
-        out = tf.concat([tf.ones(shape, dtype=out.dtype), out], axis=axis)
+        shape = tf.tensor_scatter_nd_update(tf.shape(out), [[axis]], [1])
+        out = tf.concat([tf.ones(shape, dtype=out.dtype), out], axis=axis)

Also applies to: 1564-1566

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/_namespace.py` around lines 1538 - 1540, The
include_initial shape handling currently relies on out.shape.as_list(), which
can break under tf.function when dimensions are None. Update the logic in the
include_initial branch around the tf.concat call to build the shape dynamically
from tf.shape(out), replacing only the selected axis at runtime before creating
the zero tensor. Apply the same fix in both affected include_initial blocks so
the concatenation works with dynamic shapes.
deepmd/_vendors/ndtensorflow/_array.py-257-264 (1)

257-264: 🎯 Functional Correctness | 🟠 Major | 🏗️ Heavy lift

Handle prefix-shaped boolean assignment.

__getitem__ accepts boolean masks over leading dimensions, but this tf.where path only works when the mask is scalar or fully broadcast-compatible with the whole tensor. A row mask like shape (n,) against (n, m) will fail instead of assigning selected rows.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/_array.py` around lines 257 - 264,
`Array.__setitem__` currently handles boolean tensor keys with a plain
`tf.where`, but that only works for fully broadcast-compatible masks and fails
for prefix-shaped masks like a row selector over leading dimensions. Update the
boolean-assignment branch in `__setitem__` (the `tf.Tensor`/`tf.bool` path) to
detect when the mask matches only the leading dimensions and expand/broadcast it
across the remaining dimensions before calling `tf.where`, so masked row
assignments work correctly while preserving the existing scalar/full-shape
behavior.
deepmd/_vendors/ndtensorflow/linalg.py-165-165 (1)

165-165: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Clean up the Ruff violations.

ord is the public linalg API spelling, so use targeted A002 ignores if keeping it; the tuple-concatenation and __all__ findings can be auto-fixed. As per coding guidelines, **/*.py: Install linter and run ruff check . before committing changes or the CI will fail.

Also applies to: 194-194, 218-218, 242-242, 256-256, 313-313, 322-354

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/linalg.py` at line 165, Clean up the Ruff
violations in the linalg module by keeping the public API parameter name ord but
adding targeted A002 ignores where needed, rather than renaming the API.
Auto-fix the tuple-concatenation and __all__ issues in the affected definitions,
including the blocks around the linalg helpers and export list. Verify the
changes in the affected signatures and module-level exports, then run ruff check
. before committing to catch any remaining violations.

Sources: Coding guidelines, Linters/SAST tools

deepmd/_vendors/ndtensorflow/_array.py-222-222 (1)

222-222: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Fix the Ruff violations before CI.

These supplied Ruff findings are straightforward: add strict=False, use tuple unpacking, and parenthesize the mixed or/and condition.

Proposed fix
-                key_dim in (x_dim, 0) for x_dim, key_dim in zip(self.shape, key.shape)
+                key_dim in (x_dim, 0)
+                for x_dim, key_dim in zip(self.shape, key.shape, strict=False)
@@
-                shape = (0,) + self.shape[key.shape.rank :]
+                shape = (0, *self.shape[key.shape.rank :])
@@
-            isinstance(item, int)
-            or isinstance(item, tf.Tensor)
-            and _is_integer_index_tensor(item)
+            isinstance(item, int)
+            or (isinstance(item, tf.Tensor) and _is_integer_index_tensor(item))

As per coding guidelines, **/*.py: Install linter and run ruff check . before committing changes or the CI will fail.

Also applies to: 232-232, 389-390

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/_array.py` at line 222, The Ruff findings in the
array logic need to be cleaned up in the affected expressions in _array.py,
including the key_dim/x_dim comparison and the other reported spots. Add
strict=False where the numpy/tensor conversion or shape-check call requires it,
switch the relevant loops/comprehensions to tuple unpacking, and parenthesize
the mixed or/and condition so the intent is explicit. Use the surrounding
symbols in the array handling code to locate each occurrence and make the same
style-safe update at all reported matches.

Sources: Coding guidelines, Linters/SAST tools

deepmd/_vendors/ndtensorflow/__init__.py-20-25 (1)

20-25: 📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Suppress or rewrite the namespace re-export for Ruff.

xp imports depend on this initializer exposing the full namespace, but the current star import and computed __all__ trip F403/PLE0605. Add targeted suppressions or re-export via an explicit namespace module update. As per coding guidelines, **/*.py: Install linter and run ruff check . before committing changes or the CI will fail.

Minimal suppression option
-from ._namespace import *
+from ._namespace import *  # noqa: F403 - re-export Array API namespace
@@
-__all__ = sorted(
+__all__ = sorted(  # noqa: PLE0605 - computed from vendored namespace __all__
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/__init__.py` around lines 20 - 25, The namespace
re-export in the ndtensorflow initializer is triggering Ruff F403/PLE0605
because it uses a star import plus a computed __all__; update the module around
the __init__ import and __all__ definition to either add the appropriate
targeted lint suppressions or replace the pattern with an explicit namespace
re-export approach via the namespace module, while keeping xp imports working.
Make sure the fix preserves the full public surface exposed by _namespace_all
and then run ruff check . before committing.

Sources: Coding guidelines, Linters/SAST tools

deepmd/tf2/infer/deep_eval.py-314-323 (1)

314-323: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Derive nframes after normalizing coords.

eval() accepts flat single-frame coordinates, but _eval_model() uses the raw first dimension as nframes. For shape (natoms * 3,), this tiles types/params to natoms * 3 frames while coordinates reshape to one frame.

Proposed fix
-        nframes = coords.shape[0]
         if len(atom_types.shape) == 1:
             natoms = len(atom_types)
-            atom_types = np.tile(atom_types, nframes).reshape(nframes, -1)
         else:
             natoms = len(atom_types[0])
+        coords = np.reshape(np.array(coords), [-1, natoms * 3])
+        nframes = coords.shape[0]
+        if len(atom_types.shape) == 1:
+            atom_types = np.tile(atom_types, nframes).reshape(nframes, -1)
 
         coord_input = coords.reshape([-1, natoms, 3])
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/infer/deep_eval.py` around lines 314 - 323, Derive nframes after
normalizing coords in _eval_model so flat single-frame inputs are handled
correctly. The current logic in deep_eval.py uses coords.shape[0] before
reshaping, which misinterprets a 1D coords array as many frames and over-tiles
atom_types. Update the flow around nframes, atom_types, and coord_input so the
frame count is based on the reshaped coordinate tensor (and stays consistent
with cells), using the existing _eval_model and eval() entry points to locate
the fix.
deepmd/tf2/utils/serialization.py-32-35 (1)

32-35: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Attach the deserialized model to the SavedModel root. The exported functions close over model, but tf.saved_model.save needs it reachable from tf_model so its variables are serialized cleanly.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/utils/serialization.py` around lines 32 - 35, The SavedModel
export in the serialization flow is missing a direct reference from the root
object to the deserialized model. In the function that builds tf_model after
BaseModel.deserialize, attach model to tf_model before calling
tf.saved_model.save so the variables remain reachable and serialize correctly;
keep the existing model_def_script export logic unchanged.
deepmd/tf2/common.py-149-161 (1)

149-161: 🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Make lazy registration block concurrent readers until imports finish.

Line 152 returns immediately when another thread is still populating _DPMODEL_TO_TF2. In that window, try_convert_module() can miss the intended converter and either leave the dpmodel object unwrapped or auto-wrap the wrong NativeOP path. Use a lock/event so non-owner threads wait for registration completion.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/common.py` around lines 149 - 161, The lazy registration path in
_ensure_registrations currently lets concurrent callers return while imports are
still in progress, so try_convert_module can observe an incomplete
_DPMODEL_TO_TF2 mapping. Update the registration flow in _ensure_registrations
(and any caller like try_convert_module that depends on it) to use a
synchronization primitive such as a lock or event so only one thread performs
the imports and all other threads block until _REGISTRATIONS_READY is true. Keep
the existing _REGISTRATIONS_IN_PROGRESS/_REGISTRATIONS_READY state, but make
non-owner threads wait for completion instead of returning early.
deepmd/tf2/region.py-96-104 (1)

96-104: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Use an absolute cell volume for face distances.

tf.linalg.det(cell) is signed, but a face distance must be non-negative. A negative determinant makes to_face_distance() return negative values, which can undercount or eliminate periodic ghost images when extend_coord_with_ghosts() computes ceil(rcut / to_face).

Proposed fix
 def b_to_face_distance(cell: tnp.ndarray) -> tnp.ndarray:
-    volume = tf.linalg.det(cell)
+    volume = tf.abs(tf.linalg.det(cell))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/region.py` around lines 96 - 104, The b_to_face_distance helper is
using the signed determinant from tf.linalg.det(cell), which can produce
negative face distances for cells with negative orientation. Update
b_to_face_distance to use the absolute cell volume before dividing by the
cross-product norms, and keep the returned values non-negative so
to_face_distance() and extend_coord_with_ghosts() cannot undercount ghost
images.
deepmd/tf2/format_nlist.py-42-68 (1)

42-68: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Apply the cutoff mask on pad and pass-through paths too.

rcut is only enforced when n_nsel > nsel. If the incoming list is already short enough or exactly nsel, out-of-cutoff neighbors remain in the returned list.

Proposed fix
     elif n_nsel > nsel:
         # make a copy before revise
         m_real_nei = nlist >= 0
         ret = tnp.where(m_real_nei, nlist, 0)
@@
     else:  # n_nsel == nsel:
         ret = nlist
+
+    m_real_nei = ret >= 0
+    safe_ret = tnp.where(m_real_nei, ret, 0)
+    coord0 = extended_coord[:, :n_nloc, :]
+    index = tf.reshape(safe_ret, [n_nf, n_nloc * nsel, 1])
+    index = tnp.repeat(index, 3, axis=2)
+    coord1 = tnp.take_along_axis(extended_coord, index, axis=1)
+    coord1 = tf.reshape(coord1, [n_nf, n_nloc, nsel, 3])
+    rr2 = tnp.sum(tnp.square(coord0[:, :, None, :] - coord1), axis=-1)
+    ret = tnp.where(tnp.logical_or(~m_real_nei, rr2 > rcut * rcut), -1, ret)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/format_nlist.py` around lines 42 - 68, Apply the cutoff filtering
in format_nlist for all branches, not just when n_nsel > nsel: after building
ret in the pad path (n_nsel < nsel) and the pass-through path (n_nsel == nsel),
mask any neighbors with distance beyond rcut to -1. Reuse the existing rr2/rcut
logic in format_nlist, and keep the same behavior for padding while ensuring all
returned neighbor lists are cutoff-enforced.
deepmd/tf2/nlist.py-72-118 (1)

72-118: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Do not globally truncate before per-type neighbor selection.

When distinguish_types=True, taking only the nearest sum(sel) neighbors before nlist_distinguish_types() can drop valid neighbors of later types. For example, with sel=[1, 1], two closer type-0 atoms can exclude the closest type-1 atom even if it is within rcut.

Proposed direction
-    rr, nlist = tf.cond(nsel <= nnei, truncate, pad)
-    nlist = tnp.where(
-        tnp.logical_or((rr > rcut), is_vir[:, :nloc, None]),
-        tnp.full_like(nlist, -1),
-        nlist,
-    )
-
     if distinguish_types:
-        return nlist_distinguish_types(nlist, atype, sel)
-    else:
-        return nlist
+        min_type_candidates = max(sel)
+
+        def pad_type_candidates() -> tuple[tnp.ndarray, tnp.ndarray]:
+            return (
+                tnp.concatenate(
+                    [
+                        rr,
+                        tnp.ones(
+                            [batch_size, nloc, min_type_candidates - nnei],
+                            dtype=rr.dtype,
+                        )
+                        + rcut,
+                    ],
+                    axis=-1,
+                ),
+                tnp.concatenate(
+                    [
+                        nlist,
+                        tnp.full(
+                            [batch_size, nloc, min_type_candidates - nnei],
+                            -1,
+                            dtype=nlist.dtype,
+                        ),
+                    ],
+                    axis=-1,
+                ),
+            )
+
+        rr_t, nlist_t = tf.cond(
+            min_type_candidates <= nnei,
+            lambda: (rr, nlist),
+            pad_type_candidates,
+        )
+        nlist_t = tnp.where(
+            tnp.logical_or((rr_t > rcut), is_vir[:, :nloc, None]),
+            tnp.full_like(nlist_t, -1),
+            nlist_t,
+        )
+        return nlist_distinguish_types(nlist_t, atype, sel)
+
+    rr, nlist = tf.cond(nsel <= nnei, truncate, pad)
+    nlist = tnp.where(
+        tnp.logical_or((rr > rcut), is_vir[:, :nloc, None]),
+        tnp.full_like(nlist, -1),
+        nlist,
+    )
+    return nlist
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/nlist.py` around lines 72 - 118, The neighbor list logic in nlist
should not truncate to sum(sel) before per-type selection when distinguish_types
is enabled, because that can discard valid neighbors needed by later types.
Update the flow around rr/nlist and nlist_distinguish_types so type-aware
filtering happens before any per-type limit is applied, or otherwise ensure each
type’s quota is selected independently within rcut. Keep the existing
truncate/pad behavior only for the non-type-distinguishing path.
deepmd/tf2/descriptor/se_t_tebd.py-20-22 (1)

20-22: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Register the TF2 tebd descriptor entrypoint.

DescrptSeTTebd is another unregistered TF2 wrapper. The descriptor factory still has no name-based mapping for this backend class, so config/deserialization paths for the tebd descriptor will not resolve to the TF2 implementation.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/descriptor/se_t_tebd.py` around lines 20 - 22, The TF2 tebd
descriptor wrapper is defined but not actually registered in the descriptor
factory, so name-based lookup still won’t resolve to the TF2 implementation.
Update the TF2 registration path around DescrptSeTTebd and the `@tf2_module`
decorator so this class is exported/registered under the tebd descriptor
entrypoint the same way the other TF2 descriptor wrappers are.
deepmd/tf2/descriptor/se_t.py-4-13 (1)

4-13: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Register DescrptSeT with BaseDescriptor.

Unlike DescrptSeR, this TF2 wrapper is never added to the descriptor factory. Any path that resolves "se_t" by descriptor type will miss the TF2 class and fall back to the wrong implementation or fail to construct it.

Proposed fix
 from ..common import (
     tf2_module,
 )
 from ..utils import exclude_mask as _tf2_exclude_mask  # noqa: F401
 from ..utils import network as _tf2_network  # noqa: F401
+from .base_descriptor import (
+    BaseDescriptor,
+)
 
 
+@BaseDescriptor.register("se_t")
 `@tf2_module`
 class DescrptSeT(DescrptSeTDP):
     pass
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/descriptor/se_t.py` around lines 4 - 13, `DescrptSeT` is defined
in the TF2 wrapper but never registered with `BaseDescriptor`, so descriptor
lookup for "se_t" can miss the TF2 implementation. Update the `DescrptSeT`
declaration in the `DescrptSeT` wrapper to include the same factory registration
pattern used by `DescrptSeR`, ensuring the `BaseDescriptor` mapping resolves
`"se_t"` to this class instead of falling back to another implementation.
deepmd/tf2/descriptor/se_atten_v2.py-4-19 (1)

4-19: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Register the se_atten_v2 descriptor type.

deepmd/dpmodel/descriptor/se_atten_v2.py persists this descriptor as "se_atten_v2", and deepmd/tf2/model/model.py looks descriptors up through BaseDescriptor.get_class_by_type(descriptor_type). This wrapper never registers that type, so loading any TF2 model/config carrying type: "se_atten_v2" will fail at construction time.

🔧 Proposed fix
 from ..common import (
     register_dpmodel_mapping,
 )
+from .base_descriptor import (
+    BaseDescriptor,
+)
 from .dpa1 import (
     DescrptDPA1,
 )
@@
+@BaseDescriptor.register("se_atten_v2")
 class DescrptSeAttenV2(DescrptDPA1, DescrptSeAttenV2DP):
     pass
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/descriptor/se_atten_v2.py` around lines 4 - 19, The TF2 wrapper
for DescrptSeAttenV2 is missing descriptor type registration, so
BaseDescriptor.get_class_by_type cannot resolve "se_atten_v2" when models are
loaded. Update the se_atten_v2 module to register DescrptSeAttenV2 under the
exact "se_atten_v2" type using the existing registration flow used by other
descriptors, alongside the current dpmodel mapping, so tf2/model/model.py can
construct models with this descriptor.
deepmd/tf2/model/__init__.py-2-29 (1)

2-29: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Export DPZBLModel from the model package as well.

deepmd.tf2.model currently exposes the atomic-model helper DPZBLLinearEnergyAtomicModel, but not the actual DPZBLModel class that this layer introduces. That leaves the package’s public model API incomplete for ZBL consumers.

Suggested change
 from deepmd.tf2.atomic_model.linear_atomic_model import (
     DPZBLLinearEnergyAtomicModel,
 )
 
+from .dp_zbl_model import (
+    DPZBLModel,
+)
 from .dipole_model import (
     DipoleModel,
 )
@@
 __all__ = [
     "DOSModel",
     "DPZBLLinearEnergyAtomicModel",
+    "DPZBLModel",
     "DipoleModel",
     "EnergyModel",
     "PolarModel",
     "PropertyModel",
 ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/model/__init__.py` around lines 2 - 29, The model package export
list is missing the new DPZBLModel class, so update the deepmd.tf2.model package
init to import DPZBLModel alongside the existing model classes and add it to
__all__ while keeping DPZBLLinearEnergyAtomicModel exported; use the symbols in
__init__.py to place the new public API entry in the same section as the other
model exports.
deepmd/tf2/descriptor/repformers.py-51-57 (1)

51-57: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

Patch the inherited Python shape assert before exposing RepformerLayer to tf.function.

This TF2 wrapper currently brings the dpmodel repformer implementation into graph mode unchanged, and the pipeline is already failing on that path with OperatorNotAllowedInGraphError from assert list(atype_embd.shape) == [nf, nloc, self.g1_dim]. Please override that check here or patch the shared implementation to use TensorFlow-safe validation (tf.debugging.assert_* or static TensorShape checks) before merge.

Possible direction
 `@tf2_module`
 class RepformerLayer(RepformerLayerDP):
     _tf2_data_list_attrs: ClassVar[set[str]] = {
         "g1_residual",
         "g2_residual",
         "h2_residual",
     }
+
+    # Override the parent method that currently does:
+    #   assert list(atype_embd.shape) == [nf, nloc, self.g1_dim]
+    # and replace it with TF-safe shape validation.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/descriptor/repformers.py` around lines 51 - 57, RepformerLayer
still inherits a Python list-based shape assert from RepformerLayerDP that
breaks under tf.function with OperatorNotAllowedInGraphError. Update
RepformerLayer or the shared repformer implementation used by its graph path to
replace the assert list(atype_embd.shape) == [nf, nloc, self.g1_dim] check with
TensorFlow-safe validation such as tf.debugging.assert_* or static TensorShape
inspection, and keep the fix localized around the
RepformerLayer/RepformerLayerDP shape-check code path.

Source: Pipeline failures

deepmd/tf2/model/model.py-68-72 (1)

68-72: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Do not destructively edit the caller’s ZBL config.

Lines 69-72 mutate the original data and pop() both "type" keys. Reusing the same config dict for a second get_model(data) call will fail, unlike get_standard_model() which already protects the input with deepcopy().

Proposed fix
 def get_zbl_model(data: dict) -> DPZBLModel:
+    data = deepcopy(data)
     data["descriptor"]["ntypes"] = len(data["type_map"])
     descriptor_type = data["descriptor"].pop("type")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/model/model.py` around lines 68 - 72, get_zbl_model mutates the
caller’s config by writing to data["descriptor"]["ntypes"] and popping both
"type" fields, so repeated get_model(data) calls break. Update get_zbl_model to
work on a copied config (like get_standard_model does with deepcopy) and
read/remove fields from that copy, keeping the original data dict unchanged. Use
get_zbl_model and get_standard_model as the key symbols when making the fix.
deepmd/tf2/model/base_model.py-59-85 (1)

59-85: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

Differentiate the same reduction you expose in *_redu.

Line 84 always differentiates sum(grad_atomic_ret[kk]), even when Lines 60-67 expose an intensive mean or masked mean. That makes *_derv_r/*_derv_c* scaled incorrectly for intensive outputs.

Proposed fix
         kk_redu = get_reduce_name(kk)
-        if vdef.intensive:
-            mask = atomic_ret["mask"] if "mask" in atomic_ret else None
-            if mask is not None:
-                model_predict[kk_redu] = jnp.sum(vv, axis=atom_axis) / jnp.sum(
-                    mask, axis=-1, keepdims=True
-                )
-            else:
-                model_predict[kk_redu] = jnp.mean(vv, axis=atom_axis)
-        else:
-            model_predict[kk_redu] = jnp.sum(vv, axis=atom_axis)
+        def _reduce_output(value, mask):
+            if vdef.intensive:
+                if mask is not None:
+                    return jnp.sum(value, axis=atom_axis) / jnp.sum(
+                        mask, axis=-1, keepdims=True
+                    )
+                return jnp.mean(value, axis=atom_axis)
+            return jnp.sum(value, axis=atom_axis)
+
+        mask = atomic_ret["mask"] if "mask" in atomic_ret else None
+        model_predict[kk_redu] = _reduce_output(vv, mask)
@@
-                reduced_output = jnp.sum(grad_atomic_ret[kk], axis=atom_axis)
+                grad_mask = (
+                    grad_atomic_ret["mask"] if "mask" in grad_atomic_ret else None
+                )
+                reduced_output = _reduce_output(grad_atomic_ret[kk], grad_mask)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/model/base_model.py` around lines 59 - 85, The derivative path in
base_model.py is using a plain sum for reduced_output inside the
vdef.r_differentiable block, which does not match the reduction used for
model_predict[kk_redu] when vdef.intensive is true. Update the gradient
computation in the same branch that calls
self.atomic_model.forward_common_atomic and get_deriv_name(kk) so reduced_output
applies the same intensive or masked-mean reduction as the exposed *_redu value,
ensuring *_derv_r and *_derv_c remain consistent with the forward output.
🧹 Nitpick comments (3)
deepmd/_vendors/ndtensorflow/_namespace.py (1)

2005-2036: 🚀 Performance & Scalability | 🔵 Trivial | 🏗️ Heavy lift

Avoid quadratic memory in unique.

_unique materializes an n x n equality matrix, so large atom/type arrays can OOM. Prefer tf.unique/tf.unique_with_counts and handle NaN policy separately.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/_vendors/ndtensorflow/_namespace.py` around lines 2005 - 2036, The
_unique helper currently builds an n x n equality matrix, which can blow up
memory for large inputs. Update _unique in _namespace.py to avoid pairwise
materialization by using TensorFlow’s tf.unique / tf.unique_with_counts for
values, indices, inverse, and counts where possible, and preserve the current
NaN handling separately in the same function. Keep the existing return contract
of _unique unchanged while replacing the quadratic equality logic.
source/tests/consistent/test_tf2_backend.py (1)

131-183: 🩺 Stability & Availability | 🔵 Trivial | ⚡ Quick win

Add a SavedModel export/load smoke test.
This only covers eager execution and the inner tf.function path; add a tempdir round-trip through tf.saved_model.save(...) and reload to exercise the actual export path.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@source/tests/consistent/test_tf2_backend.py` around lines 131 - 183, The
current `test_tf_function_call_from_lower` only exercises eager execution and an
inner `tf.function`, so it misses the SavedModel export/load path. Extend this
test in `test_tf2_backend.py` to save the model to a temporary directory with
`tf.saved_model.save(...)` and then reload it, using the existing
`get_tf2_model`, `model_call_from_call_lower`, and `call_lower` flow to validate
the restored model behaves the same as the original. Keep the existing
assertions/outputs so the round-trip covers the full export/import smoke test.

Sources: Coding guidelines, Pipeline failures

deepmd/tf2/transform_output.py (1)

54-54: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Remove the dead mldims assignment.

mldims is never read, and CodeQL already reports it as unused.

Proposed fix
-            mldims = tf.shape(mapping)
             vldims = get_leading_dims(vv, vdef)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepmd/tf2/transform_output.py` at line 54, The mapping shape capture in
transform_output is dead code because mldims is never used. Remove the unused
mldims assignment from the function that computes the mapping shape, keeping the
surrounding logic in transform_output unchanged and ensuring no leftover
references to mldims remain.

Source: Linters/SAST tools

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Major comments:
In `@deepmd/_vendors/ndtensorflow/__init__.py`:
- Around line 20-25: The namespace re-export in the ndtensorflow initializer is
triggering Ruff F403/PLE0605 because it uses a star import plus a computed
__all__; update the module around the __init__ import and __all__ definition to
either add the appropriate targeted lint suppressions or replace the pattern
with an explicit namespace re-export approach via the namespace module, while
keeping xp imports working. Make sure the fix preserves the full public surface
exposed by _namespace_all and then run ruff check . before committing.

In `@deepmd/_vendors/ndtensorflow/_array.py`:
- Around line 257-264: `Array.__setitem__` currently handles boolean tensor keys
with a plain `tf.where`, but that only works for fully broadcast-compatible
masks and fails for prefix-shaped masks like a row selector over leading
dimensions. Update the boolean-assignment branch in `__setitem__` (the
`tf.Tensor`/`tf.bool` path) to detect when the mask matches only the leading
dimensions and expand/broadcast it across the remaining dimensions before
calling `tf.where`, so masked row assignments work correctly while preserving
the existing scalar/full-shape behavior.
- Line 222: The Ruff findings in the array logic need to be cleaned up in the
affected expressions in _array.py, including the key_dim/x_dim comparison and
the other reported spots. Add strict=False where the numpy/tensor conversion or
shape-check call requires it, switch the relevant loops/comprehensions to tuple
unpacking, and parenthesize the mixed or/and condition so the intent is
explicit. Use the surrounding symbols in the array handling code to locate each
occurrence and make the same style-safe update at all reported matches.

In `@deepmd/_vendors/ndtensorflow/_namespace.py`:
- Line 37: Resolve the remaining Ruff failures in the namespace module by
handling the Array API builtin-shadowing names with targeted per-file ignores or
inline noqa for A001/A002 in the namespace constants, and fix the RUF021
precedence and RUF022 __all__ issues directly by parenthesizing the affected
expressions and sorting/auto-fixing the __all__ entries. Use the existing
namespace symbols like bool, int, float, and __all__ to locate the remaining
violations, then rerun ruff check . to confirm the file is clean.
- Around line 1538-1540: The include_initial shape handling currently relies on
out.shape.as_list(), which can break under tf.function when dimensions are None.
Update the logic in the include_initial branch around the tf.concat call to
build the shape dynamically from tf.shape(out), replacing only the selected axis
at runtime before creating the zero tensor. Apply the same fix in both affected
include_initial blocks so the concatenation works with dynamic shapes.

In `@deepmd/_vendors/ndtensorflow/fft.py`:
- Around line 360-375: The __all__ list in fft.py is not sorted, triggering Ruff
rule RUF022 and a CI failure. Update the module-level __all__ definition by
ordering the exported names alphabetically, keeping the symbols like fft, ifft,
fftn, and fftshift in sorted order so Ruff’s auto-fix would pass cleanly.

In `@deepmd/_vendors/ndtensorflow/linalg.py`:
- Around line 317-319: The norm handling in linalg.norm ignores the requested
matrix axes because matrix_norm always uses the last two dimensions. Update the
tuple-axis branch in norm to first move the requested axis pair into the matrix
positions before calling matrix_norm, then restore the reduced dimensions so
keepdims is preserved in the original axis locations. Use the norm and
matrix_norm symbols to make the axis handling work for any 2D axis pair, not
just the last two axes.
- Around line 212-230: The solve function currently mishandles batched vector
right-hand sides by treating x2 shaped (..., n) as a matrix RHS. Update solve in
linalg.py to detect when x2_.shape.rank is exactly x1_.shape.rank - 1, expand
x2_ with a trailing singleton dimension before calling tf.linalg.solve, and
squeeze that dimension from the result afterward. Keep the existing
scalar/vector and matrix-RHS handling in solve consistent with _promote_two,
_shape_tuple, and the final _wrap behavior.
- Line 165: Clean up the Ruff violations in the linalg module by keeping the
public API parameter name ord but adding targeted A002 ignores where needed,
rather than renaming the API. Auto-fix the tuple-concatenation and __all__
issues in the affected definitions, including the blocks around the linalg
helpers and export list. Verify the changes in the affected signatures and
module-level exports, then run ruff check . before committing to catch any
remaining violations.

In `@deepmd/tf2/common.py`:
- Around line 149-161: The lazy registration path in _ensure_registrations
currently lets concurrent callers return while imports are still in progress, so
try_convert_module can observe an incomplete _DPMODEL_TO_TF2 mapping. Update the
registration flow in _ensure_registrations (and any caller like
try_convert_module that depends on it) to use a synchronization primitive such
as a lock or event so only one thread performs the imports and all other threads
block until _REGISTRATIONS_READY is true. Keep the existing
_REGISTRATIONS_IN_PROGRESS/_REGISTRATIONS_READY state, but make non-owner
threads wait for completion instead of returning early.

In `@deepmd/tf2/descriptor/repformers.py`:
- Around line 51-57: RepformerLayer still inherits a Python list-based shape
assert from RepformerLayerDP that breaks under tf.function with
OperatorNotAllowedInGraphError. Update RepformerLayer or the shared repformer
implementation used by its graph path to replace the assert
list(atype_embd.shape) == [nf, nloc, self.g1_dim] check with TensorFlow-safe
validation such as tf.debugging.assert_* or static TensorShape inspection, and
keep the fix localized around the RepformerLayer/RepformerLayerDP shape-check
code path.

In `@deepmd/tf2/descriptor/se_atten_v2.py`:
- Around line 4-19: The TF2 wrapper for DescrptSeAttenV2 is missing descriptor
type registration, so BaseDescriptor.get_class_by_type cannot resolve
"se_atten_v2" when models are loaded. Update the se_atten_v2 module to register
DescrptSeAttenV2 under the exact "se_atten_v2" type using the existing
registration flow used by other descriptors, alongside the current dpmodel
mapping, so tf2/model/model.py can construct models with this descriptor.

In `@deepmd/tf2/descriptor/se_t_tebd.py`:
- Around line 20-22: The TF2 tebd descriptor wrapper is defined but not actually
registered in the descriptor factory, so name-based lookup still won’t resolve
to the TF2 implementation. Update the TF2 registration path around
DescrptSeTTebd and the `@tf2_module` decorator so this class is
exported/registered under the tebd descriptor entrypoint the same way the other
TF2 descriptor wrappers are.

In `@deepmd/tf2/descriptor/se_t.py`:
- Around line 4-13: `DescrptSeT` is defined in the TF2 wrapper but never
registered with `BaseDescriptor`, so descriptor lookup for "se_t" can miss the
TF2 implementation. Update the `DescrptSeT` declaration in the `DescrptSeT`
wrapper to include the same factory registration pattern used by `DescrptSeR`,
ensuring the `BaseDescriptor` mapping resolves `"se_t"` to this class instead of
falling back to another implementation.

In `@deepmd/tf2/format_nlist.py`:
- Around line 42-68: Apply the cutoff filtering in format_nlist for all
branches, not just when n_nsel > nsel: after building ret in the pad path
(n_nsel < nsel) and the pass-through path (n_nsel == nsel), mask any neighbors
with distance beyond rcut to -1. Reuse the existing rr2/rcut logic in
format_nlist, and keep the same behavior for padding while ensuring all returned
neighbor lists are cutoff-enforced.

In `@deepmd/tf2/infer/deep_eval.py`:
- Around line 314-323: Derive nframes after normalizing coords in _eval_model so
flat single-frame inputs are handled correctly. The current logic in
deep_eval.py uses coords.shape[0] before reshaping, which misinterprets a 1D
coords array as many frames and over-tiles atom_types. Update the flow around
nframes, atom_types, and coord_input so the frame count is based on the reshaped
coordinate tensor (and stays consistent with cells), using the existing
_eval_model and eval() entry points to locate the fix.

In `@deepmd/tf2/model/__init__.py`:
- Around line 2-29: The model package export list is missing the new DPZBLModel
class, so update the deepmd.tf2.model package init to import DPZBLModel
alongside the existing model classes and add it to __all__ while keeping
DPZBLLinearEnergyAtomicModel exported; use the symbols in __init__.py to place
the new public API entry in the same section as the other model exports.

In `@deepmd/tf2/model/base_model.py`:
- Around line 59-85: The derivative path in base_model.py is using a plain sum
for reduced_output inside the vdef.r_differentiable block, which does not match
the reduction used for model_predict[kk_redu] when vdef.intensive is true.
Update the gradient computation in the same branch that calls
self.atomic_model.forward_common_atomic and get_deriv_name(kk) so reduced_output
applies the same intensive or masked-mean reduction as the exposed *_redu value,
ensuring *_derv_r and *_derv_c remain consistent with the forward output.

In `@deepmd/tf2/model/model.py`:
- Around line 68-72: get_zbl_model mutates the caller’s config by writing to
data["descriptor"]["ntypes"] and popping both "type" fields, so repeated
get_model(data) calls break. Update get_zbl_model to work on a copied config
(like get_standard_model does with deepcopy) and read/remove fields from that
copy, keeping the original data dict unchanged. Use get_zbl_model and
get_standard_model as the key symbols when making the fix.

In `@deepmd/tf2/nlist.py`:
- Around line 72-118: The neighbor list logic in nlist should not truncate to
sum(sel) before per-type selection when distinguish_types is enabled, because
that can discard valid neighbors needed by later types. Update the flow around
rr/nlist and nlist_distinguish_types so type-aware filtering happens before any
per-type limit is applied, or otherwise ensure each type’s quota is selected
independently within rcut. Keep the existing truncate/pad behavior only for the
non-type-distinguishing path.

In `@deepmd/tf2/region.py`:
- Around line 96-104: The b_to_face_distance helper is using the signed
determinant from tf.linalg.det(cell), which can produce negative face distances
for cells with negative orientation. Update b_to_face_distance to use the
absolute cell volume before dividing by the cross-product norms, and keep the
returned values non-negative so to_face_distance() and
extend_coord_with_ghosts() cannot undercount ghost images.

In `@deepmd/tf2/utils/serialization.py`:
- Around line 32-35: The SavedModel export in the serialization flow is missing
a direct reference from the root object to the deserialized model. In the
function that builds tf_model after BaseModel.deserialize, attach model to
tf_model before calling tf.saved_model.save so the variables remain reachable
and serialize correctly; keep the existing model_def_script export logic
unchanged.

---

Nitpick comments:
In `@deepmd/_vendors/ndtensorflow/_namespace.py`:
- Around line 2005-2036: The _unique helper currently builds an n x n equality
matrix, which can blow up memory for large inputs. Update _unique in
_namespace.py to avoid pairwise materialization by using TensorFlow’s tf.unique
/ tf.unique_with_counts for values, indices, inverse, and counts where possible,
and preserve the current NaN handling separately in the same function. Keep the
existing return contract of _unique unchanged while replacing the quadratic
equality logic.

In `@deepmd/tf2/transform_output.py`:
- Line 54: The mapping shape capture in transform_output is dead code because
mldims is never used. Remove the unused mldims assignment from the function that
computes the mapping shape, keeping the surrounding logic in transform_output
unchanged and ensuring no leftover references to mldims remain.

In `@source/tests/consistent/test_tf2_backend.py`:
- Around line 131-183: The current `test_tf_function_call_from_lower` only
exercises eager execution and an inner `tf.function`, so it misses the
SavedModel export/load path. Extend this test in `test_tf2_backend.py` to save
the model to a temporary directory with `tf.saved_model.save(...)` and then
reload it, using the existing `get_tf2_model`, `model_call_from_call_lower`, and
`call_lower` flow to validate the restored model behaves the same as the
original. Keep the existing assertions/outputs so the round-trip covers the full
export/import smoke test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 60dcc087-f41b-4b63-9f90-257c6b839a47

📥 Commits

Reviewing files that changed from the base of the PR and between 1f0ca6c and 016c5d0.

📒 Files selected for processing (70)
  • deepmd/_vendors/__init__.py
  • deepmd/_vendors/ndtensorflow/__init__.py
  • deepmd/_vendors/ndtensorflow/_array.py
  • deepmd/_vendors/ndtensorflow/_info.py
  • deepmd/_vendors/ndtensorflow/_namespace.py
  • deepmd/_vendors/ndtensorflow/fft.py
  • deepmd/_vendors/ndtensorflow/linalg.py
  • deepmd/_vendors/ndtensorflow/py.typed
  • deepmd/backend/tf2.py
  • deepmd/jax/jax2tf/__init__.py
  • deepmd/jax/jax2tf/format_nlist.py
  • deepmd/jax/jax2tf/make_model.py
  • deepmd/jax/jax2tf/nlist.py
  • deepmd/jax/jax2tf/region.py
  • deepmd/jax/jax2tf/serialization.py
  • deepmd/jax/jax2tf/transform_output.py
  • deepmd/jax/utils/serialization.py
  • deepmd/tf2/__init__.py
  • deepmd/tf2/atomic_model/__init__.py
  • deepmd/tf2/atomic_model/base_atomic_model.py
  • deepmd/tf2/atomic_model/dipole_atomic_model.py
  • deepmd/tf2/atomic_model/dos_atomic_model.py
  • deepmd/tf2/atomic_model/dp_atomic_model.py
  • deepmd/tf2/atomic_model/energy_atomic_model.py
  • deepmd/tf2/atomic_model/linear_atomic_model.py
  • deepmd/tf2/atomic_model/pairtab_atomic_model.py
  • deepmd/tf2/atomic_model/polar_atomic_model.py
  • deepmd/tf2/atomic_model/property_atomic_model.py
  • deepmd/tf2/common.py
  • deepmd/tf2/descriptor/__init__.py
  • deepmd/tf2/descriptor/base_descriptor.py
  • deepmd/tf2/descriptor/dpa1.py
  • deepmd/tf2/descriptor/dpa2.py
  • deepmd/tf2/descriptor/dpa3.py
  • deepmd/tf2/descriptor/hybrid.py
  • deepmd/tf2/descriptor/repflows.py
  • deepmd/tf2/descriptor/repformers.py
  • deepmd/tf2/descriptor/se_atten_v2.py
  • deepmd/tf2/descriptor/se_e2_a.py
  • deepmd/tf2/descriptor/se_e2_r.py
  • deepmd/tf2/descriptor/se_t.py
  • deepmd/tf2/descriptor/se_t_tebd.py
  • deepmd/tf2/env.py
  • deepmd/tf2/fitting/__init__.py
  • deepmd/tf2/fitting/base_fitting.py
  • deepmd/tf2/fitting/fitting.py
  • deepmd/tf2/format_nlist.py
  • deepmd/tf2/infer/__init__.py
  • deepmd/tf2/infer/deep_eval.py
  • deepmd/tf2/make_model.py
  • deepmd/tf2/model/__init__.py
  • deepmd/tf2/model/base_model.py
  • deepmd/tf2/model/dipole_model.py
  • deepmd/tf2/model/dos_model.py
  • deepmd/tf2/model/dp_model.py
  • deepmd/tf2/model/dp_zbl_model.py
  • deepmd/tf2/model/ener_model.py
  • deepmd/tf2/model/model.py
  • deepmd/tf2/model/polar_model.py
  • deepmd/tf2/model/property_model.py
  • deepmd/tf2/nlist.py
  • deepmd/tf2/region.py
  • deepmd/tf2/transform_output.py
  • deepmd/tf2/utils/__init__.py
  • deepmd/tf2/utils/exclude_mask.py
  • deepmd/tf2/utils/network.py
  • deepmd/tf2/utils/serialization.py
  • deepmd/tf2/utils/type_embed.py
  • pyproject.toml
  • source/tests/consistent/test_tf2_backend.py

@codecov

codecov Bot commented Jun 27, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 66.54157% with 1159 lines in your changes missing coverage. Please review.
✅ Project coverage is 81.88%. Comparing base (1f0ca6c) to head (7daa761).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
deepmd/_vendors/ndtensorflow/_namespace.py 56.41% 537 Missing ⚠️
deepmd/_vendors/ndtensorflow/fft.py 22.69% 126 Missing ⚠️
deepmd/_vendors/ndtensorflow/linalg.py 39.69% 117 Missing ⚠️
deepmd/_vendors/ndtensorflow/_array.py 71.46% 99 Missing ⚠️
deepmd/tf2/utils/serialization.py 77.18% 60 Missing ⚠️
deepmd/tf2/transform_output.py 0.00% 39 Missing ⚠️
deepmd/tf2/infer/deep_eval.py 80.61% 38 Missing ⚠️
deepmd/tf2/make_model.py 26.66% 33 Missing ⚠️
deepmd/tf2/common.py 81.57% 28 Missing ⚠️
deepmd/tf2/model/model.py 60.41% 19 Missing ⚠️
... and 18 more
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5598      +/-   ##
==========================================
- Coverage   82.35%   81.88%   -0.47%     
==========================================
  Files         896      953      +57     
  Lines      100923   104642    +3719     
  Branches     4058     4058              
==========================================
+ Hits        83113    85689    +2576     
- Misses      16348    17490    +1142     
- Partials     1462     1463       +1     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@source/tests/consistent/model/test_ener.py`:
- Around line 327-334: The TF2 comparison path in test_ener is skipping
atomic-virial outputs even though the TF2 eval helpers request
do_atomic_virial=True, so update the TF2 branches that return SKIP_FLAG to
compare the actual atom_virial and extended_virial results instead, or remove
the atomic-virial request from the TF2 eval helpers until support is complete.
Use the TF2 handling in the relevant test helper return paths to locate the
change and keep the consistency test aligned with the backend behavior.

In `@source/tests/consistent/test_tf2_backend.py`:
- Around line 17-56: The TF2 subprocess wrapper in
test_tf2_consistent_backend_subprocess only checks for TensorFlow’s presence, so
it can pass even when the tf2 backend is unavailable and all inner tests get
skipped. Update the guard to also verify the tf2 backend itself using
Backend.get_backend("tf2") and is_available() under the same DEEPMD_TEST_TF2
environment, or assert after subprocess.run that at least one tf2 test actually
executed. Use the existing test_tf2_consistent_backend_subprocess and
Backend.get_backend("tf2") symbols to locate the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 84474bc4-2aca-4173-9ccf-6f85bd3848fc

📥 Commits

Reviewing files that changed from the base of the PR and between 016c5d0 and dfa958b.

📒 Files selected for processing (4)
  • source/tests/consistent/common.py
  • source/tests/consistent/model/common.py
  • source/tests/consistent/model/test_ener.py
  • source/tests/consistent/test_tf2_backend.py

Comment thread source/tests/consistent/model/test_ener.py Outdated
Comment thread source/tests/consistent/test_tf2_backend.py Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepmd/_vendors/ndtensorflow/_array.py`:
- Around line 456-459: The leading-dimension comparison in the mask validation
logic needs an explicit strict zip to satisfy Ruff B905. Update the zip used in
the tensor/key shape check inside the array-matching code so it iterates over
tensor_shape sliced to rank and uses strict=True, keeping the existing dimension
mismatch test intact. Use the existing mask-shape validation block in _array.py
to locate the fix.

In `@source/tests/consistent/fitting/test_ener.py`:
- Around line 523-526: Mirror the TF2 unsupported-case skip logic from TestEner
in TestEnerStat so it does not run the same invalid EnerFittingTF2
parameterizations when TF2 is installed. Update the TestEnerStat setup around
skip_tf2, tf_class, and tf2_class to also skip TF2 for bfloat16, default
fparams, use_aparam_as_mask, and non-empty atom_ener, matching the existing
TestEner conditions.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 99eb1b7d-25a4-4e72-b162-388aa5c4c0a1

📥 Commits

Reviewing files that changed from the base of the PR and between d300f50 and 7ef11a5.

📒 Files selected for processing (18)
  • deepmd/_vendors/ndtensorflow/_array.py
  • deepmd/_vendors/ndtensorflow/_namespace.py
  • source/tests/consistent/descriptor/test_dpa1.py
  • source/tests/consistent/descriptor/test_dpa2.py
  • source/tests/consistent/descriptor/test_dpa3.py
  • source/tests/consistent/descriptor/test_hybrid.py
  • source/tests/consistent/descriptor/test_se_atten_v2.py
  • source/tests/consistent/descriptor/test_se_e2_a.py
  • source/tests/consistent/descriptor/test_se_r.py
  • source/tests/consistent/descriptor/test_se_t.py
  • source/tests/consistent/descriptor/test_se_t_tebd.py
  • source/tests/consistent/fitting/test_dipole.py
  • source/tests/consistent/fitting/test_dos.py
  • source/tests/consistent/fitting/test_ener.py
  • source/tests/consistent/fitting/test_polar.py
  • source/tests/consistent/fitting/test_property.py
  • source/tests/consistent/test_array_api.py
  • source/tests/consistent/test_tf2_backend.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • deepmd/_vendors/ndtensorflow/_namespace.py

Comment thread deepmd/_vendors/ndtensorflow/_array.py
Comment thread source/tests/consistent/fitting/test_ener.py Outdated
Comment thread pyproject.toml Outdated
@github-actions github-actions Bot added the C++ label Jun 28, 2026
@njzjz njzjz enabled auto-merge June 28, 2026 12:40
@njzjz njzjz disabled auto-merge June 28, 2026 12:40

@njzjz-bot njzjz-bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick review on the latest head (14368bb): thanks for splitting the .savedmodel / .savedmodeltf suffix handling and adding the dedicated C++ coverage for both export paths. The Python TF2 consistency matrix and backend detection coverage look much stronger now.

One merge blocker remains from CI: Test C++ (false, true, true, false) is failing in the LAMMPS DPA JAX SavedModel tests. The failure is consistently an out-of-range TensorFlow gather with a -1 neighbor index, e.g.

source/lmp/tests/test_lammps_dpa_jax.py::test_pair_deepmd
TensorFlow C API Error: indices[0,49,4] = [0, -1, 4] does not index into param shape [1,6,5]

The same pattern appears in the virial/model-deviation variants. This suggests the SavedModel lower-call path still lets padded/invalid neighbor entries reach a GatherNd after the new TF2/JAX2TF sharing and .savedmodeltf split. Before merge, I would fix this by preserving the previous invalid-neighbor masking/sentinel handling around the gather path, or by ensuring the exported lower functions sanitize -1 neighbor indices before any TensorFlow gather while keeping the mask applied to the resulting values.

Once that C++/LAMMPS job is green, I do not see another obvious blocker from the latest diff/check summary.

Reviewed by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)

njzjz-bot and others added 3 commits June 28, 2026 15:46
Some pytest-split groups can have no tests selected after the TF2 eager path and -k filters, which makes pytest exit with code 5. Treat that code as success for this CI shard while preserving other pytest failures.\n\nAuthored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)
Comment thread deepmd/dpmodel/utils/nlist.py Outdated
@njzjz njzjz requested a review from wanghan-iapcm June 29, 2026 13:17
@njzjz njzjz added this pull request to the merge queue Jun 29, 2026
Merged via the queue into deepmodeling:master with commit 73de44b Jun 29, 2026
79 of 81 checks passed
@njzjz njzjz deleted the tf2 branch June 29, 2026 23:35
SchrodingersCattt pushed a commit to SchrodingersCattt/deepmd-kit that referenced this pull request Jun 30, 2026
## Summary
- vendor ndtensorflow under deepmd/_vendors and add a TensorFlow eager
Array API backend under deepmd/tf2
- implement tf2 model wrappers, tf.function SavedModel export/deep-eval
glue, and backend registration for .savedmodel
- replace jax/jax2tf TensorFlow helper implementations with tf2
compatibility exports
- add focused tf2 consistency tests for eager and tf.function model
paths

## Tests
- ruff check .
- ruff format .
- pytest source/tests/consistent/test_tf2_backend.py -q
- manual SavedModel export/load smoke test

Note: local pre-commit hook attempted to fetch astral-sh/ruff-pre-commit
and timed out on GitHub port 443; the commit was created with
--no-verify after running the checks above manually.

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Added a TensorFlow 2 eager backend with SavedModel inference plus TF2
model/descriptor/fitting wrappers (energy/dipole/polar/DOS/property).
* Introduced TF2 neighbor-list construction, region/PBC geometry
utilities, and output/ghost-atom aggregation.
* Expanded the TensorFlow-backed Array API with namespace, FFT, and
linear algebra utilities.
* **Bug Fixes**
  * Improved the non-eager TensorFlow compatibility error message.
* **Refactor**
* Converted multiple JAX2TF modules into TF2 compatibility shims that
re-export TF2 implementations.
* **Tests**
  * Added TF2 consistency coverage, including subprocess-based checks.
* **Chores**
  * Added license headers/docs and updated lint configuration.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Signed-off-by: Jinzhe Zeng <njzjz@qq.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: njzjz-bot (driven by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5))[bot] <48687836+njzjz-bot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants