Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: deleted item re-appears upon next import of URLs #433

Closed
aayio opened this issue Aug 10, 2020 · 13 comments
Closed

Bugfix: deleted item re-appears upon next import of URLs #433

aayio opened this issue Aug 10, 2020 · 13 comments
Labels
size: easy status: done Work is completed and released (or scheduled to be released in the next version) type: bug report

Comments

@aayio
Copy link

aayio commented Aug 10, 2020

Thank you in advance for your help,
Sorry if this isn't experienced universally and it's just something I'm not doing right 馃槙

Describe the bug

Deleted item is re-imported upon the next import of (unrelated) URLs

Steps to reproduce

  1. Delete item from web UI by clicking on item timestamp > Delete
  2. Import new (unrelated) URLs in web UI
  3. New URLs import correctly, but the recently deleted item is also re-imported

Software versions

  • OS: Debian 10
  • ArchiveBox version: Docker c8e3aed
@cdvv7788
Copy link
Contributor

I was able to reproduce the bug. @mauvity for now, as a workaround, you can select the items you want to delete from the list and click the delete button at the top right:
image

I will send a PR to fix the issue soon.

@pirate
Copy link
Member

pirate commented Aug 10, 2020

@cdvv7788 the timestamp > delete version will be fixed automatically once we remove the json main index

don't bother fixing it for now, it would just add a bunch of workaround complexity for a problem that's going away soon anyway.

@cdvv7788
Copy link
Contributor

Ok. Please leave this open so we don't forget to check back once we merge the index changes.

@cdvv7788 cdvv7788 added status: needs followup Work is stalled awaiting a follow-up from the original issue poster or ArchiveBox maintainers and removed won't fix labels Aug 10, 2020
@cdvv7788
Copy link
Contributor

cdvv7788 commented Oct 7, 2020

@mauvity can you please check if the current version on master fixes it? We refactored the index internals.

@pirate
Copy link
Member

pirate commented Oct 7, 2020

There is still a functional difference between the two ways:

  • Delete button = delete the index record and all the archived files
  • timestamp -> delete = delete only the index record without removing any archived files (they become orphans that will be re-imported on the next archivebox init)

@cdvv7788
Copy link
Contributor

cdvv7788 commented Oct 7, 2020

Oh right, the delete functionality has not been touched in the refactor.

@cdvv7788
Copy link
Contributor

cdvv7788 commented Oct 9, 2020

@pirate what should we do about this? Maybe add a confirmation and change both methods to remove the actual files? If the admin is a way to maintain the index, leaving orphaned folders may be unnecessary.

@pirate
Copy link
Member

pirate commented Oct 10, 2020

I think removing the delete button from the snapshot admin detail page is enough for now. (Leave the delete button on the list page the way it is now).

@pirate
Copy link
Member

pirate commented Dec 11, 2020

@cdvv7788 is this fixed in v0.5.0? If not can we do that.

@pirate pirate added this to the v0.5.0 milestone Dec 11, 2020
@pirate pirate added status: wip Work is in-progress / has already been partially completed and removed status: needs followup Work is stalled awaiting a follow-up from the original issue poster or ArchiveBox maintainers labels Dec 11, 2020
@pirate pirate removed this from the v0.5.0 milestone Feb 1, 2021
@pirate pirate added the status: needs followup Work is stalled awaiting a follow-up from the original issue poster or ArchiveBox maintainers label Feb 1, 2021
@pirate
Copy link
Member

pirate commented Apr 6, 2021

I'm pretty sure this was already fixed in v0.5.6. Comment back here if you're still seeing the issue and I'll reopen the ticket.

@pirate pirate closed this as completed Apr 6, 2021
@pirate pirate added size: easy status: done Work is completed and released (or scheduled to be released in the next version) and removed status: needs followup Work is stalled awaiting a follow-up from the original issue poster or ArchiveBox maintainers status: wip Work is in-progress / has already been partially completed labels Apr 6, 2021
@235
Copy link

235 commented Jan 4, 2024

The bug re-appearing in ArchiveBox version v0.7.1. Quite odd to observe new import full of deleted entries earlier.

I've just observed another bug, which could be related - a handful of deleted entries re-appeared on the top of the list with newer dates. These entries weren't indexed yet, I suspect the extractor had them already in the queue, inserting them back as it went though them.

cc: @pirate

@pirate
Copy link
Member

pirate commented Jan 4, 2024

@235 Can you confirm this is happening when you delete an older completed Snapshot that does not have the same URL present in a later import?

Deleting does not prevent a URL from being re-added in the future, so if you deleted some Snapshots and then re-imported the same URLs later on, they will re-appear (as new Snapshot entries).

Deleting during an import is also totally broken/not advised. This is the downside of making all my import code immutable/indempotent (it overwrites entries entirely on changes instead of mutating them in-place). Because Snapshots are operated on in-memory, it rewrites the DB and disk entries several times from memory as it does work during the import process, and as long as it's still in-memory being operated on it doesn't notice when a user deletes the DB/disk entry out from underneath it.

@235
Copy link

235 commented Jan 17, 2024

As discussed in the other ticket - this was deletion DURING an import. We can ignore the report here, and focus on on the other ticket discussion. TY!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size: easy status: done Work is completed and released (or scheduled to be released in the next version) type: bug report
Projects
None yet
Development

No branches or pull requests

4 participants