Skip to content

fix(data): preserve global dipole/polarizability in raw_to_set.sh#5696

Merged
wanghan-iapcm merged 1 commit into
deepmodeling:masterfrom
wanghan-iapcm:fix-raw-to-set-global-tensor
Jul 1, 2026
Merged

fix(data): preserve global dipole/polarizability in raw_to_set.sh#5696
wanghan-iapcm merged 1 commit into
deepmodeling:masterfrom
wanghan-iapcm:fix-raw-to-set-global-tensor

Conversation

@wanghan-iapcm

@wanghan-iapcm wanghan-iapcm commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

Problem

data/raw/raw_to_set.sh splits global dipole.raw/polarizability.raw into per-set chunks (dipole.raw000, …) and the in-set conversion block expects dipole.raw/polarizability.raw to exist inside set.<pi>/. However the per-set move block moved every other split file except the global dipole.raw$pi/polarizability.raw$pi chunks. As a result, datasets carrying global dipole/polarizability labels silently lost them during conversion: dipole.npy/polarizability.npy were never generated, and the split chunks were left orphaned in the raw directory.

Fix

Add the two missing mv lines, placed to mirror the existing split/convert order.

Test

The script previously had no test at all, which is why the omission survived. This PR adds source/tests/common/test_raw_to_set.py, a parametrized test that runs the script and asserts every split tensor label (dipole, polarizability, atomic_dipole, atomic_polarizability) is converted to set.<pi>/<label>.npy with round-tripped contents across multiple sets, and that no split chunks are orphaned in the raw dir. Verified failing-then-passing: on the unmodified script the dipole and polarizability cases fail (.npy not generated) while the atomic variants pass; after the fix all four pass.

Known limitation

The test requires bash, split, and python on PATH (all present in CI).

Fix #5692

raw_to_set.sh splits global dipole.raw/polarizability.raw into per-set
chunks and the in-set conversion block expects dipole.raw/polarizability.raw
inside set.<pi>/, but the move block never moved those chunks in. Datasets
with global dipole/polarizability labels silently lost them: the .npy files
were never generated and the split chunks were orphaned in the raw dir.

Add the missing moves, mirroring the existing split/convert order, and add
a test covering the move/convert symmetry for every tensor label the script
splits (the script previously had no test at all).

Fix deepmodeling#5692
@dosubot dosubot Bot added the bug label Jul 1, 2026
@github-actions github-actions Bot added the Python label Jul 1, 2026
@wanghan-iapcm wanghan-iapcm requested a review from njzjz July 1, 2026 05:13
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Updates raw_to_set.sh to move split global dipole.raw and polarizability.raw chunks into per-set directories, fixing a previously omitted move step. Adds a new pytest test module verifying move/convert symmetry for dipole, polarizability, atomic_dipole, and atomic_polarizability labels.

Changes

Preserve global dipole/polarizability handling in raw_to_set.sh

Layer / File(s) Summary
Move global tensor raw chunks into set directories
data/raw/raw_to_set.sh
Adds conditional moves of dipole.raw$pi and polarizability.raw$pi into set.$pi/dipole.raw and set.$pi/polarizability.raw, matching existing handling for other optional raw files.
Regression test validating move/convert symmetry
source/tests/common/test_raw_to_set.py
New parameterized test creates minimal raw inputs for dipole, polarizability, atomic_dipole, and atomic_polarizability labels, runs the script, and verifies generated per-set .npy files match original values with no leftover split raw chunks.

Estimated code review effort: 2 (Simple) | ~10 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Test as test_raw_to_set_preserves_tensor_labels
  participant TmpDir as tmp_path raw files
  participant Script as raw_to_set.sh
  participant SetDirs as set.$pi directories

  Test->>TmpDir: write box.raw, coord.raw, label.raw
  Test->>Script: run raw_to_set.sh with nline_per_set
  Script->>TmpDir: split label.raw into label.raw$pi chunks
  Script->>SetDirs: move label.raw$pi into set.$pi/label.raw
  Script->>SetDirs: convert set.$pi/label.raw to label.npy
  Test->>SetDirs: read label.npy per set
  Test->>Test: concatenate and compare to original values
  Test->>TmpDir: assert no leftover label.raw[0-9]* chunks
Loading

Related issues: #5692 (Preserve global dipole and polarizability in raw_to_set.sh)

Suggested labels: bug, test

Suggested reviewers: njzjz

🐰 A dipole once wandered, lost in the split,
Its chunks left behind in a raw-file abyss.
Now moved to its set, no longer forlorn,
A test stands on guard since the fix was reborn. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly states the primary fix: preserving global dipole/polarizability handling in raw_to_set.sh.
Linked Issues check ✅ Passed The script now moves the missing dipole and polarizability split files into each set directory, matching the issue's fix.
Out of Scope Changes check ✅ Passed The changes are limited to the requested script fix and a targeted regression test.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@wanghan-iapcm wanghan-iapcm enabled auto-merge July 1, 2026 06:09
@wanghan-iapcm wanghan-iapcm requested review from njzjz and removed request for njzjz July 1, 2026 06:11
@codecov

codecov Bot commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.83%. Comparing base (0e5c170) to head (f216b9b).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5696      +/-   ##
==========================================
- Coverage   81.97%   81.83%   -0.14%     
==========================================
  Files         959      959              
  Lines      105748   105747       -1     
  Branches     4102     4105       +3     
==========================================
- Hits        86684    86541     -143     
- Misses      17573    17711     +138     
- Partials     1491     1495       +4     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wanghan-iapcm wanghan-iapcm added this pull request to the merge queue Jul 1, 2026
Merged via the queue into deepmodeling:master with commit 44e5007 Jul 1, 2026
60 checks passed
@wanghan-iapcm wanghan-iapcm deleted the fix-raw-to-set-global-tensor branch July 1, 2026 11:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Code scan] Preserve global dipole and polarizability in raw_to_set.sh

2 participants