Conversation
…elated methods for improved data integrity
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Pull request overview
This PR adds duplicate-receiver cleanup to the SrcRec workflow (optionally keeping first occurrence or averaging values) and threads duplicate-handling options through several selection/mutation methods, along with a package version bump.
Changes:
- Added
SrcRec.remove_duplicate_rec_by_src()to de-duplicate receiver rows per source acrossrec_points,rec_points_cs, andrec_points_cr. - Updated
SrcRec.update()(and several mutating methods) to accept/pass duplicate-handling options during updates. - Bumped package version to
0.2.12.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| pytomoatt/src_rec.py | Implements duplicate receiver removal, adds mode support to update(), and propagates update options through several APIs. |
| pytomoatt/_version.py | Version bump for the release. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| def remove_duplicate_rec_by_src(self, mode="first"): | ||
| """ | ||
| Remove duplicate receivers for the same source. | ||
|
|
||
| :param mode: "first" to keep only the first occurrence, "mean" to average travel time. | ||
| :type mode: str | ||
| """ |
There was a problem hiding this comment.
New behavior (remove_duplicate_rec_by_src, and update(mode=...) changing how duplicates are handled) isn’t covered by tests. The repo already has test/test_src_rec.py; adding a focused unit test with a small synthetic SrcRec where one source has duplicate receiver rows would lock in expected behavior for both mode='first' and mode='mean' (including tt/weight aggregation).
| else: | ||
| self.rec_points.drop_duplicates( | ||
| subset=subset, keep="first", inplace=True, ignore_index=True | ||
| ) | ||
| self.rec_points.reset_index(drop=True, inplace=True) | ||
|
|
||
| after = len(self.rec_points) |
There was a problem hiding this comment.
After dropping/aggregating duplicate receiver rows, the per-source rec_index values are no longer guaranteed to be contiguous 0..n-1 (they’re computed before this in reset_index). Several downstream operations (sorting, dd generation, etc.) treat rec_index as a stable per-source index, so it should be recomputed after duplicates are removed (or run reset_index() again after remove_duplicate_rec_by_src).
| self.update_unique_src_rec() | ||
| self.remove_rec_by_new_src() | ||
| self.remove_src_by_new_rec() | ||
| self.update_num_rec() | ||
| self.reset_index() | ||
| self.remove_duplicate_rec_by_src(mode=mode) |
There was a problem hiding this comment.
update() calls update_unique_src_rec() before it mutates src_points/rec_points (removals, reindexing, and now duplicate receiver cleanup). As a result, self.sources/self.receivers can become stale after update() completes. Consider moving update_unique_src_rec() to the end of update() (after duplicate removal and any source/receiver filtering) so the unique tables reflect the final state, and align the implementation with the documented procedure order.
| self.remove_duplicate_rec_by_src(mode=mode) | ||
| self.remove_src_by_duplicate_event_id() | ||
| # sort by src_index | ||
| self.src_points.sort_values(by=["src_index"], inplace=True) |
There was a problem hiding this comment.
If remove_src_by_duplicate_event_id() drops any rows from src_points, rec_points/rec_points_cs/rec_points_cr are not filtered afterward, so they can retain src_index references that no longer exist in src_points. Consider re-running the consistency filters (remove_rec_by_new_src / remove_src_by_new_rec) and reindexing after removing duplicate event IDs, or folding that logic into remove_src_by_duplicate_event_id.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| @@ -624,6 +708,7 @@ def append(self, sr): | |||
|
|
|||
| :param sr: Another SrcRec object | |||
| :type sr: SrcRec | |||
| :param **kwargs: Additional keyword arguments for updating duplicate receivers | |||
| """ | |||
There was a problem hiding this comment.
Using **kwargs here (and in other selection/mutation methods) obscures the public API: currently the only supported update option is mode, and any other key will raise a TypeError in update(). Consider accepting mode explicitly on these methods (or making update() accept **kwargs and validate/normalize keys) to keep the API self-documenting and to provide clearer errors for unsupported options.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This pull request introduces a new method for removing duplicate receivers by source and refactors several selection and update methods in the
SrcRecclass to support this functionality. The changes improve data consistency and provide users with more control over how duplicates are handled (either by keeping the first occurrence or averaging values). Additionally, the update process is now more flexible, allowing parameters to be passed through various selection and modification methods.Duplicate receiver management and update process:
remove_duplicate_rec_by_srcto theSrcRecclass, which removes duplicate receivers for the same source inrec_points,rec_points_cs, andrec_points_crDataFrames. Users can choose to keep only the first occurrence or average travel times and weights.updatemethod to callremove_duplicate_rec_by_srcand accept amodeparameter, enabling duplicate receiver handling during updates. [1] [2]API consistency and extensibility:
append,erase_duplicate_events,select_by_phase,select_by_datetime,remove_specified_recs,select_by_box_region,select_by_depth,select_by_distance,select_by_azi_gap,select_by_num_rec,select_one_event_in_each_subgrid,generate_double_difference) to accept**kwargsand pass them toupdate, allowing flexible duplicate receiver handling throughout the workflow. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]Version update:
0.2.11to0.2.12to reflect these new features and improvements.