Skip to content

Remove duplicate receivers by source and improve data integrity#37

Merged
xumi1993 merged 4 commits intomainfrom
devel
Mar 6, 2026
Merged

Remove duplicate receivers by source and improve data integrity#37
xumi1993 merged 4 commits intomainfrom
devel

Conversation

@xumi1993
Copy link
Member

@xumi1993 xumi1993 commented Mar 5, 2026

This pull request introduces a new method for removing duplicate receivers by source and refactors several selection and update methods in the SrcRec class to support this functionality. The changes improve data consistency and provide users with more control over how duplicates are handled (either by keeping the first occurrence or averaging values). Additionally, the update process is now more flexible, allowing parameters to be passed through various selection and modification methods.

Duplicate receiver management and update process:

  • Added a new method remove_duplicate_rec_by_src to the SrcRec class, which removes duplicate receivers for the same source in rec_points, rec_points_cs, and rec_points_cr DataFrames. Users can choose to keep only the first occurrence or average travel times and weights.
  • Modified the main update method to call remove_duplicate_rec_by_src and accept a mode parameter, enabling duplicate receiver handling during updates. [1] [2]

API consistency and extensibility:

  • Updated selection and modification methods (append, erase_duplicate_events, select_by_phase, select_by_datetime, remove_specified_recs, select_by_box_region, select_by_depth, select_by_distance, select_by_azi_gap, select_by_num_rec, select_one_event_in_each_subgrid, generate_double_difference) to accept **kwargs and pass them to update, allowing flexible duplicate receiver handling throughout the workflow. [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17]

Version update:

  • Bumped the package version from 0.2.11 to 0.2.12 to reflect these new features and improvements.

@codecov
Copy link

codecov bot commented Mar 5, 2026

Codecov Report

❌ Patch coverage is 83.78378% with 12 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
pytomoatt/src_rec.py 83.56% 12 Missing ⚠️

📢 Thoughts on this report? Let us know!

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds duplicate-receiver cleanup to the SrcRec workflow (optionally keeping first occurrence or averaging values) and threads duplicate-handling options through several selection/mutation methods, along with a package version bump.

Changes:

  • Added SrcRec.remove_duplicate_rec_by_src() to de-duplicate receiver rows per source across rec_points, rec_points_cs, and rec_points_cr.
  • Updated SrcRec.update() (and several mutating methods) to accept/pass duplicate-handling options during updates.
  • Bumped package version to 0.2.12.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 7 comments.

File Description
pytomoatt/src_rec.py Implements duplicate receiver removal, adds mode support to update(), and propagates update options through several APIs.
pytomoatt/_version.py Version bump for the release.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +562 to +568
def remove_duplicate_rec_by_src(self, mode="first"):
"""
Remove duplicate receivers for the same source.

:param mode: "first" to keep only the first occurrence, "mean" to average travel time.
:type mode: str
"""
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New behavior (remove_duplicate_rec_by_src, and update(mode=...) changing how duplicates are handled) isn’t covered by tests. The repo already has test/test_src_rec.py; adding a focused unit test with a small synthetic SrcRec where one source has duplicate receiver rows would lock in expected behavior for both mode='first' and mode='mean' (including tt/weight aggregation).

Copilot uses AI. Check for mistakes.
Comment on lines +583 to +589
else:
self.rec_points.drop_duplicates(
subset=subset, keep="first", inplace=True, ignore_index=True
)
self.rec_points.reset_index(drop=True, inplace=True)

after = len(self.rec_points)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After dropping/aggregating duplicate receiver rows, the per-source rec_index values are no longer guaranteed to be contiguous 0..n-1 (they’re computed before this in reset_index). Several downstream operations (sorting, dd generation, etc.) treat rec_index as a stable per-source index, so it should be recomputed after duplicates are removed (or run reset_index() again after remove_duplicate_rec_by_src).

Copilot uses AI. Check for mistakes.
Comment on lines 883 to +888
self.update_unique_src_rec()
self.remove_rec_by_new_src()
self.remove_src_by_new_rec()
self.update_num_rec()
self.reset_index()
self.remove_duplicate_rec_by_src(mode=mode)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update() calls update_unique_src_rec() before it mutates src_points/rec_points (removals, reindexing, and now duplicate receiver cleanup). As a result, self.sources/self.receivers can become stale after update() completes. Consider moving update_unique_src_rec() to the end of update() (after duplicate removal and any source/receiver filtering) so the unique tables reflect the final state, and align the implementation with the documented procedure order.

Copilot uses AI. Check for mistakes.
Comment on lines +888 to 891
self.remove_duplicate_rec_by_src(mode=mode)
self.remove_src_by_duplicate_event_id()
# sort by src_index
self.src_points.sort_values(by=["src_index"], inplace=True)
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If remove_src_by_duplicate_event_id() drops any rows from src_points, rec_points/rec_points_cs/rec_points_cr are not filtered afterward, so they can retain src_index references that no longer exist in src_points. Consider re-running the consistency filters (remove_rec_by_new_src / remove_src_by_new_rec) and reindexing after removing duplicate event IDs, or folding that logic into remove_src_by_duplicate_event_id.

Copilot uses AI. Check for mistakes.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot open a new pull request to apply changes based on this feedback

Comment on lines 701 to 712
@@ -624,6 +708,7 @@ def append(self, sr):

:param sr: Another SrcRec object
:type sr: SrcRec
:param **kwargs: Additional keyword arguments for updating duplicate receivers
"""
Copy link

Copilot AI Mar 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using **kwargs here (and in other selection/mutation methods) obscures the public API: currently the only supported update option is mode, and any other key will raise a TypeError in update(). Consider accepting mode explicitly on these methods (or making update() accept **kwargs and validate/normalize keys) to keep the API self-documenting and to provide clearer errors for unsupported options.

Copilot uses AI. Check for mistakes.
xumi1993 and others added 2 commits March 4, 2026 23:34
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI commented Mar 5, 2026

@xumi1993 I've opened a new pull request, #38, to work on those changes. Once the pull request is ready, I'll request review from you.

@xumi1993 xumi1993 merged commit eb347ed into main Mar 6, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants