Skip to content

Fix Sweeper truncating entire layer#147

Merged
jacobdadams merged 11 commits intomainfrom
del-bug
Mar 23, 2026
Merged

Fix Sweeper truncating entire layer#147
jacobdadams merged 11 commits intomainfrom
del-bug

Conversation

@jacobdadams
Copy link
Copy Markdown
Member

As per #146, sweeper was nuking all the rows in the pre1978 layer. This may have been caused by the extremely long SQL where clause (https://learn.microsoft.com/en-us/sql/t-sql/language-elements/in-transact-sql?view=sql-server-ver17&redirectedfrom=MSDN#remarks).

This PR fixes #146

This PR chunks the delete OID list into lists of smaller size to hopefully work around this issue.

It also changes the arcpy delete behavior. Previously, it created a layer or table view containing only the records to be deleted and then nuked everything.

Now, it creates a layer or table view of the entire layer, runs a selection using the where clause, and then calls delete, which honors that selection. This is how the docs do deletions (https://pro.arcgis.com/en/pro-app/latest/tool-reference/data-management/delete-features.htm).

The change of deletion methods may fix the full truncate issue because now the where clause is being run against the layer created. If the layer creation creates an in-memory copy, the where clause would be applied locally without touching SQL Server. However, I don't know if the layer creation pulls over a full copy or just creates a shallow reference.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 20, 2026

Codecov Report

❌ Patch coverage is 98.14815% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 36.43%. Comparing base (69ba690) to head (5b4e75c).
⚠️ Report is 30 commits behind head on main.

Files with missing lines Patch % Lines
src/sweeper/sweepers/duplicates.py 98.14% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #147       +/-   ##
===========================================
+ Coverage   22.38%   36.43%   +14.04%     
===========================================
  Files          13       13               
  Lines         746      785       +39     
  Branches      132      139        +7     
===========================================
+ Hits          167      286      +119     
+ Misses        574      492       -82     
- Partials        5        7        +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jacobdadams
Copy link
Copy Markdown
Member Author

Right now the chunk_size is hardcoded in DuplicateTest._try_fix(). I'm not sure of the best way to set this with a default value given the optional nature of the config.json config file.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses an issue where the duplicates sweeper could delete/truncate more rows than intended (notably in the pre1978 layer) by chunking large OID-based deletes and switching to an ArcPy “select then delete” workflow that should respect layer/table-view selections.

Changes:

  • Chunk duplicate OID deletions into smaller batches and delete via SelectLayerByAttribute + DeleteFeatures/DeleteRows.
  • Add unit tests covering chunking and try_fix() behavior (and some sweep() behaviors).
  • Update VS Code debug launch config and add a repository Copilot instructions document.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 7 comments.

File Description
src/sweeper/sweepers/duplicates.py Implements batched deletion + selection-based delete helpers to avoid overly-large SQL clauses and reduce risk of unintended full deletes.
tests/test_duplicates.py Adds tests for batching and deletion behavior using mocked ArcPy/xxhash.
.vscode/launch.json Updates debug configuration to run the duplicates sweeper with specific args.
.github/copilot-instructions.md Adds project contribution/style guidance for Copilot usage.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

jacobdadams and others added 2 commits March 20, 2026 10:17
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 20, 2026

@jacobdadams I've opened a new pull request, #148, to work on those changes. Once the pull request is ready, I'll request review from you.

…ates.py (#148)

* Initial plan

* fix(tests): resolve ruff F821 errors by importing DuplicateTest at module level

Co-authored-by: jacobdadams <38168030+jacobdadams@users.noreply.github.com>
Agent-Logs-Url: https://github.com/agrc/sweeper/sessions/ba6f55df-5b0c-43fd-a7dd-25596cf24014

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: jacobdadams <38168030+jacobdadams@users.noreply.github.com>
@jacobdadams jacobdadams requested a review from stdavis March 20, 2026 16:44
@jacobdadams
Copy link
Copy Markdown
Member Author

Michael confirmed one of the errors showed up in the SQL Server logs:

Message
The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.
Date 3/18/2026 10:10:22 AM

The log doesn't specify the layer, but I think the time lines up with one of our manual tests.

@jacobdadams
Copy link
Copy Markdown
Member Author

And I've run this against a local copy of the 1978 data w/duplicates and it works as intended. Haven't tested it against anything in SDE yet.

Copy link
Copy Markdown
Member

@stdavis stdavis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking this on! I think that your hypothesis about the cause of the issue seems very plausible especially given the database log from Michael.

Haven't tested it against anything in SDE yet.

I think that it would be good to test this against an SDE database before merging. I'm sure that Michael could help you find a test database to work with.

Co-authored-by: Scott Davis <stdavis@utah.gov>
@jacobdadams
Copy link
Copy Markdown
Member Author

I've tested this on a test layer in internal and it found the expected number of duplicates and deleted them appropriately. The layer had the expected number of remaining features afterwards. It did take a little longer (I assume from the overhead of setting and clearing the selection for each batch along with the overhead for multiple delete calls), but I think this is much more robust than before.

@jacobdadams jacobdadams requested a review from stdavis March 23, 2026 22:44
@jacobdadams jacobdadams merged commit 29dbfc1 into main Mar 23, 2026
7 checks passed
@jacobdadams jacobdadams deleted the del-bug branch March 23, 2026 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Truncating SGID.Environment.DAQPre1978LeadInHomes

4 participants