Skip to content

Create api for syncing with database#273

Closed
jenny-codes wants to merge 1 commit intojennyshih/index-new-and-updatedfrom
jennyshih/remove-deleted-uris
Closed

Create api for syncing with database#273
jenny-codes wants to merge 1 commit intojennyshih/index-new-and-updatedfrom
jennyshih/remove-deleted-uris

Conversation

@jenny-codes
Copy link
Contributor

@jenny-codes jenny-codes commented Oct 28, 2025

Add an api to remove the deleted + stale uris and their associated entities from the database, and insert the updated and new entries.

See #210 for more context.

For the deletion we use the delete from ... where id in ...​ batch delete which is a bit more complicated than the single delete but it proves to be much faster than the single deletion queries. Calling this out because we have discovered that the batch mutation queries do not always improve performance in other cases.

Benchmark

Benchmarking on Shopify core. Method: run the index once, and then index an empty collection of documents.

Before (no deletion, no re-indexing unchanged documents)

Timing breakdown
  Initialization      0.001s (  0.0%)
  Listing             3.331s ( 57.1%)
  Indexing            2.505s ( 42.9%)
  Querying            0.000s (  0.0%)
  Database            0.000s (  0.0%)
  Cleanup             0.000s (  0.0%)
  Total:              5.838s

After (enabling deleting, without batch delete)

The first attempt was to delete the uris one by one, as is the pattern so far, but it proves to be extremely slow, taking 37 minutes to delete the whole shopify core index data. Even in the medium corpus with 1000 files, it takes 9 seconds for the deletion to complete.

Timing breakdown
  Initialization      0.001s (  0.0%)
  Listing             1.435s (  0.1%)
  Indexing         2259.725s ( 99.9%)
  Querying            0.000s (  0.0%)
  Database            0.000s (  0.0%)
  Cleanup             0.000s (  0.0%)
  Total:           2261.162s

After (enabling deleting, with batch delete)

The second attempt (the current implementation) is using bulk deletion, which is much faster. It now takes 9 seconds to delete all data from Shopify core.

Timing breakdown
  Initialization      0.002s (  0.0%)
  Indexing            9.992s (100.0%)
  Querying            0.000s (  0.0%)
  Database            0.000s (  0.0%)
  Cleanup             0.000s (  0.0%)
  Total:              9.994s

Copy link
Contributor Author

jenny-codes commented Oct 28, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@jenny-codes jenny-codes changed the title Add bulk delete uri method Remove data in cache no longer in the filesystem Oct 28, 2025
@jenny-codes jenny-codes requested a review from a team October 28, 2025 15:37
@jenny-codes jenny-codes marked this pull request as ready for review October 28, 2025 15:37
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch from 0bae2cc to 78fd02b Compare October 28, 2025 19:58
@jenny-codes jenny-codes force-pushed the jennyshih/index-new-and-updated branch from 7620ae3 to 2d7510b Compare October 28, 2025 19:58
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch from 78fd02b to d98722f Compare October 28, 2025 20:15
@jenny-codes jenny-codes marked this pull request as draft October 29, 2025 18:35
@jenny-codes jenny-codes changed the base branch from jennyshih/index-new-and-updated to graphite-base/273 October 29, 2025 23:18
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch from d98722f to ab5ad4b Compare October 30, 2025 15:02
@jenny-codes jenny-codes changed the title Remove data in cache no longer in the filesystem Create api for syncing with database Oct 30, 2025
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch from ab5ad4b to ccb51bd Compare October 30, 2025 15:09
@jenny-codes jenny-codes changed the base branch from graphite-base/273 to jennyshih/index-new-and-updated October 30, 2025 15:09
@jenny-codes jenny-codes marked this pull request as ready for review October 30, 2025 15:09
@jenny-codes jenny-codes force-pushed the jennyshih/index-new-and-updated branch from 9f8eefa to fbfd830 Compare October 30, 2025 15:47
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch from ccb51bd to 5a0ef63 Compare October 30, 2025 15:48
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch from 5a0ef63 to fead57d Compare October 30, 2025 16:23
@jenny-codes jenny-codes force-pushed the jennyshih/index-new-and-updated branch 2 times, most recently from 1a0b597 to 0134e3c Compare October 30, 2025 21:25
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch from fead57d to 9ec172e Compare October 30, 2025 21:25
@jenny-codes jenny-codes force-pushed the jennyshih/index-new-and-updated branch from 0134e3c to e1541ac Compare October 31, 2025 14:24
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch 2 times, most recently from 457473e to a83a585 Compare October 31, 2025 16:25
@jenny-codes jenny-codes force-pushed the jennyshih/index-new-and-updated branch from e1541ac to 5089b00 Compare October 31, 2025 16:25
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch from a83a585 to cb8608a Compare October 31, 2025 17:14
@jenny-codes jenny-codes force-pushed the jennyshih/index-new-and-updated branch 2 times, most recently from 00c882a to 4bcd1de Compare October 31, 2025 17:25
@jenny-codes jenny-codes force-pushed the jennyshih/remove-deleted-uris branch from cb8608a to 5c010c2 Compare October 31, 2025 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants