refact: refactor format backends into dpdata.formats#970
refact: refactor format backends into dpdata.formats#970njzjz-bot wants to merge 3 commits intodeepmodeling:masterfrom
Conversation
Move all format directories (abacus, amber, cp2k, deepmd, dftbplus, fhi_aims, gaussian, gromacs, lammps, lmdb, md, openmx, orca, psi4, pwmat, pymatgen, qe, rdkit, siesta, vasp, xyz) into a new formats/ subdirectory. This addresses issue deepmodeling#934. Changes: - Created dpdata/formats/ directory - Moved all format directories to dpdata/formats/ - Updated all import statements throughout the codebase - Updated relative imports in format modules (from .. to from ...) - Updated dpdata/__init__.py to import from new locations - Updated tests/context.py for new import paths The plugins directory remains at the root level as requested.
for more information, see https://pre-commit.ci
Merging this PR will degrade performance by 14.29%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | WallTime | test_cli |
331.5 ms | 371.4 ms | -10.75% |
| ❌ | WallTime | test_import |
9.5 ms | 11.1 ms | -14.29% |
Comparing njzjz-bot:oc-fix-pr-946-ci (ae36482) with master (99ea3bb)
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #970 +/- ##
==========================================
+ Coverage 86.70% 86.71% +0.01%
==========================================
Files 86 89 +3
Lines 8039 8048 +9
==========================================
+ Hits 6970 6979 +9
Misses 1069 1069 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
📝 WalkthroughWalkthroughThis PR centralizes format-related modules under a new dpdata.formats package and updates import paths across the codebase to reference dpdata.formats.*. Package initializers for cp2k, gaussian, and qe were added and many relative imports adjusted for the deeper package nesting. ChangesModule reorganization into dpdata.formats (single cohesive DAG)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
dpdata/bond_order_system.py (1)
81-91:⚠️ Potential issue | 🟠 Major | ⚡ Quick win
data-only initialization still raisesValueErrordue to branch structureWhen
datais provided withoutfile_name/rdkit_mol, Line 81 initializes from data, but control still reaches theelseat Line 91 and raises. This breaks the documenteddatainit path.Proposed fix
- if data: + if data is not None: mol = dpdata.formats.rdkit.utils.system_data_to_mol(data) self.from_rdkit_mol(mol) - if file_name: + elif file_name: self.from_fmt( file_name, fmt, type_map=type_map, begin=begin, step=step, **kwargs ) - elif rdkit_mol: + elif rdkit_mol is not None: self.from_rdkit_mol(rdkit_mol) else: raise ValueError("Please specify a mol/sdf file or a rdkit Mol object")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dpdata/bond_order_system.py` around lines 81 - 91, The init path processes `data` but then continues into the file/rdkit branches and hits the final else, causing the erroneous ValueError; update the conditional flow in the BondOrderSystem initializer (or the method handling construction) so that the `data` case stops further branching—either change the `if data:` block to `if data: ... elif file_name: ... elif rdkit_mol: ... else: ...` or keep `if data:` and add an immediate return after calling from_rdkit_mol; ensure you reference the existing calls to dpdata.formats.rdkit.utils.system_data_to_mol, self.from_rdkit_mol, and self.from_fmt when making the change.dpdata/plugins/pymatgen.py (1)
72-72:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd missing import
dpdata.systemto fix runtime AttributeError.Line 72 uses
dpdata.system.remove_pbc(data), butdpdata.systemis not imported. Thedpdatapackage does not re-export thesystemmodule in its__init__.py—only individual classes from it are exposed. This will raiseAttributeErrorat runtime whento_systemis called on aPyMatgenMoleculeFormatinstance.Proposed fix — add explicit import
import dpdata.formats.pymatgen.molecule import dpdata.formats.pymatgen.structure +import dpdata.system from dpdata.format import Format🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dpdata/plugins/pymatgen.py` at line 72, The call data = dpdata.system.remove_pbc(data) fails at runtime because the dpdata.system submodule isn't imported; add an explicit import (e.g., import dpdata.system) at the top of the file and then leave the call in PyMatgenMoleculeFormat.to_system as-is so dpdata.system.remove_pbc(data) resolves correctly.
🧹 Nitpick comments (1)
dpdata/plugins/xyz.py (1)
11-14: ⚡ Quick winRuntime imports placed after the
if TYPE_CHECKING:block — invert the order.Lines 13–14 are unconditional runtime imports but appear after the
if TYPE_CHECKING:guard. The conventional (and ruff/isort-expected) layout places theif TYPE_CHECKING:block last among all imports. This ordering may trigger anI001violation depending on the project's ruff configuration.♻️ Proposed fix
+from dpdata.formats.xyz.quip_gap_xyz import QuipGapxyzSystems, format_single_frame +from dpdata.formats.xyz.xyz import coord_to_xyz, xyz_to_coord + if TYPE_CHECKING: from dpdata.utils import FileType -from dpdata.formats.xyz.quip_gap_xyz import QuipGapxyzSystems, format_single_frame -from dpdata.formats.xyz.xyz import coord_to_xyz, xyz_to_coordAs per coding guidelines,
dpdata/**/*.pyfiles must passruff check dpdata/before committing.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dpdata/plugins/xyz.py` around lines 11 - 14, The import ordering is wrong: move the TYPE_CHECKING block so it comes after the runtime imports (i.e., place the "if TYPE_CHECKING: from dpdata.utils import FileType" block below the imports of QuipGapxyzSystems, format_single_frame, coord_to_xyz, and xyz_to_coord) to satisfy ruff/isort expectations and avoid I001; ensure the runtime symbols QuipGapxyzSystems, format_single_frame, coord_to_xyz, and xyz_to_coord remain imported unconditionally and only FileType is guarded by TYPE_CHECKING.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@dpdata/__init__.py`:
- Line 4: Run ruff check and fix the reported lint issues: sort the __all__
lists in dpdata/__init__ and dpdata/formats/deepmd/hdf5.py, replace mutable
default args/attributes in functions/classes (e.g., in ase_calculator.py,
rdf.py, water.py) with None and set defaults inside the function or __init__,
add explicit stacklevel=2 to all warnings.warn calls, rename variables shadowing
builtins (e.g., in driver.py, hdf5.py and pwmat-related files), remove or use
unused loop control variables (abacus, cp2k, fhi_aims, gromacs, lammps, md,
openmx, pwmat modules) or replace with _ if intentionally unused, add strict=...
to zip() calls, and address the remaining RUF/B/BLE/etc. issues (unused
unpacking, unnecessary conversions/concatenations, empty abstract methods,
assert False, missing shebangs, ambiguous names, overly broad exception
handlers) as indicated by ruff to make the codebase clean.
In `@dpdata/plugins/openmx.py`:
- Line 64: The unpacked variable `cs` from the call to
dpdata.formats.openmx.omx.to_system_data(fname, mdname) is unused and causes a
Ruff RUF059 lint error; update the unpack to either capture the unused value as
`_` (e.g., `data, _ = ...`) or assign only `data` (e.g., `data = ...`) inside
openmx.py where the call occurs, and then run ruff check dpdata/ and ruff format
dpdata/ to ensure linting/formatting compliance.
---
Outside diff comments:
In `@dpdata/bond_order_system.py`:
- Around line 81-91: The init path processes `data` but then continues into the
file/rdkit branches and hits the final else, causing the erroneous ValueError;
update the conditional flow in the BondOrderSystem initializer (or the method
handling construction) so that the `data` case stops further branching—either
change the `if data:` block to `if data: ... elif file_name: ... elif rdkit_mol:
... else: ...` or keep `if data:` and add an immediate return after calling
from_rdkit_mol; ensure you reference the existing calls to
dpdata.formats.rdkit.utils.system_data_to_mol, self.from_rdkit_mol, and
self.from_fmt when making the change.
In `@dpdata/plugins/pymatgen.py`:
- Line 72: The call data = dpdata.system.remove_pbc(data) fails at runtime
because the dpdata.system submodule isn't imported; add an explicit import
(e.g., import dpdata.system) at the top of the file and then leave the call in
PyMatgenMoleculeFormat.to_system as-is so dpdata.system.remove_pbc(data)
resolves correctly.
---
Nitpick comments:
In `@dpdata/plugins/xyz.py`:
- Around line 11-14: The import ordering is wrong: move the TYPE_CHECKING block
so it comes after the runtime imports (i.e., place the "if TYPE_CHECKING: from
dpdata.utils import FileType" block below the imports of QuipGapxyzSystems,
format_single_frame, coord_to_xyz, and xyz_to_coord) to satisfy ruff/isort
expectations and avoid I001; ensure the runtime symbols QuipGapxyzSystems,
format_single_frame, coord_to_xyz, and xyz_to_coord remain imported
unconditionally and only FileType is guarded by TYPE_CHECKING.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 9d56fd68-bd9e-45f8-b025-b4160b3f00b9
📒 Files selected for processing (99)
dpdata/__init__.pydpdata/bond_order_system.pydpdata/formats/__init__.pydpdata/formats/abacus/__init__.pydpdata/formats/abacus/md.pydpdata/formats/abacus/relax.pydpdata/formats/abacus/scf.pydpdata/formats/abacus/stru.pydpdata/formats/amber/__init__.pydpdata/formats/amber/mask.pydpdata/formats/amber/md.pydpdata/formats/amber/sqm.pydpdata/formats/cp2k/__init__.pydpdata/formats/cp2k/cell.pydpdata/formats/cp2k/output.pydpdata/formats/deepmd/__init__.pydpdata/formats/deepmd/comp.pydpdata/formats/deepmd/hdf5.pydpdata/formats/deepmd/mixed.pydpdata/formats/deepmd/raw.pydpdata/formats/dftbplus/__init__.pydpdata/formats/dftbplus/output.pydpdata/formats/fhi_aims/__init__.pydpdata/formats/fhi_aims/output.pydpdata/formats/gaussian/__init__.pydpdata/formats/gaussian/fchk.pydpdata/formats/gaussian/gjf.pydpdata/formats/gaussian/log.pydpdata/formats/gromacs/__init__.pydpdata/formats/gromacs/gro.pydpdata/formats/lammps/__init__.pydpdata/formats/lammps/dump.pydpdata/formats/lammps/lmp.pydpdata/formats/lmdb/__init__.pydpdata/formats/lmdb/format.pydpdata/formats/md/__init__.pydpdata/formats/md/msd.pydpdata/formats/md/pbc.pydpdata/formats/md/rdf.pydpdata/formats/md/water.pydpdata/formats/openmx/__init__.pydpdata/formats/openmx/omx.pydpdata/formats/orca/__init__.pydpdata/formats/orca/output.pydpdata/formats/psi4/__init__.pydpdata/formats/psi4/input.pydpdata/formats/psi4/output.pydpdata/formats/pwmat/__init__.pydpdata/formats/pwmat/atomconfig.pydpdata/formats/pwmat/movement.pydpdata/formats/pymatgen/__init__.pydpdata/formats/pymatgen/molecule.pydpdata/formats/pymatgen/structure.pydpdata/formats/qe/__init__.pydpdata/formats/qe/scf.pydpdata/formats/qe/traj.pydpdata/formats/rdkit/__init__.pydpdata/formats/rdkit/sanitize.pydpdata/formats/rdkit/utils.pydpdata/formats/siesta/__init__.pydpdata/formats/siesta/aiMD_output.pydpdata/formats/siesta/output.pydpdata/formats/vasp/__init__.pydpdata/formats/vasp/outcar.pydpdata/formats/vasp/poscar.pydpdata/formats/vasp/xml.pydpdata/formats/xyz/__init__.pydpdata/formats/xyz/quip_gap_xyz.pydpdata/formats/xyz/xyz.pydpdata/plugins/3dmol.pydpdata/plugins/abacus.pydpdata/plugins/amber.pydpdata/plugins/cp2k.pydpdata/plugins/deepmd.pydpdata/plugins/dftbplus.pydpdata/plugins/fhi_aims.pydpdata/plugins/gaussian.pydpdata/plugins/gromacs.pydpdata/plugins/lammps.pydpdata/plugins/lmdb.pydpdata/plugins/openmx.pydpdata/plugins/orca.pydpdata/plugins/psi4.pydpdata/plugins/pwmat.pydpdata/plugins/pymatgen.pydpdata/plugins/qe.pydpdata/plugins/rdkit.pydpdata/plugins/siesta.pydpdata/plugins/vasp.pydpdata/plugins/xyz.pydpdata/siesta/__init__.pydpdata/system.pydpdata/vasp/__init__.pydpdata/xyz/__init__.pytests/context.pytests/test_abacus_stru_dump.pytests/test_lammps_lmp_dump.pytests/test_lammps_spin.pytests/test_lmdb.py
4c16ae8 to
fda1705
Compare
There was a problem hiding this comment.
Pull request overview
This PR rebases the prior directory reorganization (moving format implementations under dpdata.formats) and updates internal imports/tests accordingly, with the stated goal of restoring backward-compatible access to historically public helper modules/namespaces after the move.
Changes:
- Relocates/introduces many format implementation modules under
dpdata/formats/**(e.g., VASP/LAMMPS/QE/Gaussian/CP2K/LMDB, etc.). - Updates plugin modules and some core code to import from
dpdata.formats.*instead of historical locations. - Adds a lazy attribute-based loader in
dpdata/__init__.pyfor some format modules (cp2k,gaussian,qe).
Reviewed changes
Copilot reviewed 44 out of 99 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
dpdata/__init__.py |
Switches top-level exports to dpdata.formats.* and adds lazy __getattr__ for some format modules. |
dpdata/system.py |
Updates internal imports to dpdata.formats.md.pbc and dpdata.formats.amber.mask. |
dpdata/bond_order_system.py |
Updates RDKit helpers import path to dpdata.formats.rdkit.*. |
dpdata/plugins/3dmol.py |
Updates XYZ helper import path to dpdata.formats.xyz.xyz. |
dpdata/plugins/abacus.py |
Points ABACUS plugin imports to dpdata.formats.abacus.*. |
dpdata/plugins/amber.py |
Points Amber plugin imports to dpdata.formats.amber.*. |
dpdata/plugins/cp2k.py |
Points CP2K plugin imports to dpdata.formats.cp2k.output. |
dpdata/plugins/deepmd.py |
Points DeePMD plugin imports to dpdata.formats.deepmd.*. |
dpdata/plugins/dftbplus.py |
Points DFTB+ plugin imports to dpdata.formats.dftbplus.output. |
dpdata/plugins/fhi_aims.py |
Points FHI-aims plugin imports to dpdata.formats.fhi_aims.output. |
dpdata/plugins/gaussian.py |
Points Gaussian plugin imports to dpdata.formats.gaussian.* and updates doc references. |
dpdata/plugins/gromacs.py |
Points Gromacs plugin imports to dpdata.formats.gromacs.gro. |
dpdata/plugins/lammps.py |
Points LAMMPS plugin imports to dpdata.formats.lammps.*. |
dpdata/plugins/lmdb.py |
Registers LMDB format via dpdata.formats.lmdb.format.LMDBFormat. |
dpdata/plugins/openmx.py |
Points OpenMX plugin imports to dpdata.formats.openmx.* and dpdata.formats.md.pbc. |
dpdata/plugins/orca.py |
Points ORCA plugin imports to dpdata.formats.orca.output. |
dpdata/plugins/psi4.py |
Points Psi4 plugin imports to dpdata.formats.psi4.*. |
dpdata/plugins/pwmat.py |
Points PWMAT plugin imports to dpdata.formats.pwmat.*. |
dpdata/plugins/pymatgen.py |
Points pymatgen plugin imports to dpdata.formats.pymatgen.*. |
dpdata/plugins/qe.py |
Points QE plugin imports to dpdata.formats.qe.* and dpdata.formats.md.pbc. |
dpdata/plugins/rdkit.py |
Points RDKit plugin imports to dpdata.formats.rdkit.utils. |
dpdata/plugins/siesta.py |
Points SIESTA plugin imports to dpdata.formats.siesta.*. |
dpdata/plugins/vasp.py |
Points VASP plugin imports to dpdata.formats.vasp.*. |
dpdata/plugins/xyz.py |
Points XYZ plugin imports to dpdata.formats.xyz.*. |
dpdata/formats/__init__.py |
Introduces the dpdata.formats package marker. |
dpdata/formats/abacus/md.py |
Adds/relocates ABACUS MD reader under formats. |
dpdata/formats/abacus/relax.py |
Adds/relocates ABACUS relax reader under formats. |
dpdata/formats/abacus/scf.py |
Adjusts imports for ABACUS SCF under formats. |
dpdata/formats/abacus/stru.py |
Adjusts imports for ABACUS STRU under formats. |
dpdata/formats/amber/mask.py |
Adds/relocates Amber mask utilities under formats. |
dpdata/formats/amber/md.py |
Adjusts imports for Amber MD under formats. |
dpdata/formats/amber/sqm.py |
Adds/relocates SQM parsing/input generation under formats. |
dpdata/formats/cp2k/cell.py |
Adds/relocates CP2K cell helper under formats. |
dpdata/formats/cp2k/output.py |
Adjusts imports for CP2K output reader under formats. |
dpdata/formats/deepmd/comp.py |
Adds/relocates deepmd/npy (“comp”) support under formats. |
dpdata/formats/deepmd/hdf5.py |
Adds/relocates deepmd/hdf5 support under formats. |
dpdata/formats/deepmd/mixed.py |
Adds/relocates deepmd mixed-type utilities under formats. |
dpdata/formats/deepmd/raw.py |
Adds/relocates deepmd/raw support under formats. |
dpdata/formats/dftbplus/output.py |
Adds/relocates DFTB+ output reader under formats. |
dpdata/formats/fhi_aims/output.py |
Adds/relocates FHI-aims output reader under formats. |
dpdata/formats/gaussian/fchk.py |
Adjusts relative imports for Gaussian fchk under formats. |
dpdata/formats/gaussian/gjf.py |
Adds/relocates Gaussian input generator/parser under formats. |
dpdata/formats/gaussian/log.py |
Adjusts relative imports for Gaussian log under formats. |
dpdata/formats/gromacs/gro.py |
Adjusts relative imports for Gromacs gro under formats. |
dpdata/formats/lammps/dump.py |
Adds/relocates LAMMPS dump parsing/writing under formats. |
dpdata/formats/lammps/lmp.py |
Adds/relocates LAMMPS data-file parsing/writing under formats. |
dpdata/formats/lmdb/format.py |
Adds/relocates LMDB format implementation under formats. |
dpdata/formats/md/msd.py |
Adds/relocates MSD implementation under formats. |
dpdata/formats/md/pbc.py |
Adds/relocates PBC utilities under formats. |
dpdata/formats/md/rdf.py |
Adds/relocates RDF implementation under formats. |
dpdata/formats/md/water.py |
Adds/relocates water analysis utilities under formats. |
dpdata/formats/openmx/omx.py |
Adjusts relative imports for OpenMX under formats. |
dpdata/formats/orca/output.py |
Adds/relocates ORCA output reader under formats. |
dpdata/formats/psi4/input.py |
Adds/relocates Psi4 input writer under formats. |
dpdata/formats/psi4/output.py |
Adds/relocates Psi4 output reader under formats. |
dpdata/formats/pwmat/atomconfig.py |
Adjusts relative imports for PWMAT atomconfig under formats. |
dpdata/formats/pwmat/movement.py |
Adjusts relative imports for PWMAT movement under formats. |
dpdata/formats/pymatgen/molecule.py |
Adds/relocates pymatgen Molecule conversion under formats. |
dpdata/formats/pymatgen/structure.py |
Adds/relocates pymatgen Structure conversion under formats. |
dpdata/formats/qe/scf.py |
Adds/relocates QE SCF parsing under formats. |
dpdata/formats/qe/traj.py |
Fixes relative imports within QE traj under formats. |
dpdata/formats/rdkit/utils.py |
Adds/relocates RDKit helper utilities under formats. |
dpdata/formats/siesta/aiMD_output.py |
Adds/relocates SIESTA aiMD output reader under formats. |
dpdata/formats/siesta/output.py |
Adds/relocates SIESTA output reader under formats. |
dpdata/formats/vasp/outcar.py |
Adds/relocates VASP OUTCAR parsing under formats. |
dpdata/formats/vasp/poscar.py |
Adds/relocates VASP POSCAR parsing/writing under formats. |
dpdata/formats/vasp/xml.py |
Adds/relocates VASP XML parsing under formats. |
dpdata/formats/xyz/quip_gap_xyz.py |
Adds/relocates QUIP/GAP XYZ support under formats. |
dpdata/formats/xyz/xyz.py |
Adds/relocates basic XYZ conversions under formats. |
tests/context.py |
Updates import smoke-loading to dpdata.formats.*. |
tests/test_abacus_stru_dump.py |
Updates ABACUS test imports to dpdata.formats.abacus.*. |
tests/test_lammps_lmp_dump.py |
Updates LAMMPS test imports to dpdata.formats.lammps.*. |
tests/test_lammps_spin.py |
Updates LAMMPS test imports to dpdata.formats.lammps.*. |
tests/test_lmdb.py |
Updates LMDB test imports to dpdata.formats.lmdb.*. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
92a9266 to
e23e05e
Compare
e23e05e to
2482d02
Compare
2482d02 to
ae36482
Compare
|
conflicts should be resolved. |
This PR fixes the CI failures from #946 after moving implementation modules.
Changes:
dpdata.formats.dpdata.mdinstead ofdpdata.formats.md.dpdata.lammps/dpdata.vaspas top-level exports.dpdata.formats.dpdata.formats.cp2k.cell.cell_to_low_triangledpdata.formats.gaussian.gjf.detect_multiplicitydpdata.formats.qe.traj.convert_celldmdpdata.md.msd.msddpdata.md.water.*Local checks:
cd tests && uv run pytest test_msd.py test_water_ions.py -q→ 3 passeduv run pytest tests/test_cell_to_low_triangle.py tests/test_gaussian_driver.py::TestMakeGaussian::test_detect_multiplicity tests/test_qe_cp_traj.py::TestConverCellDim -q→ 8 passeduv run pyright→ 0 errors / 0 warningsgit grep -n "formats\.md\|from dpdata\.formats\.md\|dpdata\.formats\.md" -- . ':(exclude)tests/data'→ no matchesAuthored by OpenClaw (model: custom-chat-jinzhezeng-group/gpt-5.5)