Skip to content

Conversation

@ThomasWaldmann
Copy link
Member

No description provided.

@ThomasWaldmann ThomasWaldmann changed the title Soft delete / undelete soft delete / undelete Nov 2, 2024
@codecov
Copy link

codecov bot commented Nov 2, 2024

Codecov Report

Attention: Patch coverage is 83.08824% with 23 lines in your changes missing coverage. Please review.

Project coverage is 81.75%. Comparing base (5fc7208) to head (142a739).
Report is 18 commits behind head on master.

Files with missing lines Patch % Lines
src/borg/archive.py 57.14% 7 Missing and 2 partials ⚠️
src/borg/archiver/undelete_cmd.py 84.61% 5 Missing and 3 partials ⚠️
src/borg/archiver/check_cmd.py 0.00% 1 Missing and 1 partial ⚠️
src/borg/archiver/compact_cmd.py 75.00% 2 Missing ⚠️
src/borg/manifest.py 93.33% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #8515      +/-   ##
==========================================
+ Coverage   81.47%   81.75%   +0.28%     
==========================================
  Files          73       74       +1     
  Lines       13142    13240      +98     
  Branches     1927     1941      +14     
==========================================
+ Hits        10707    10824     +117     
+ Misses       1770     1753      -17     
+ Partials      665      663       -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@PhrozenByte PhrozenByte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good 👍

I have some comments and suggestions about the docs, e.g. I recommend being very consistent in always calling them "soft-deleted archives".

Important: I neither did a code review, nor did I test any of the changes.

@PhrozenByte
Copy link
Contributor

PhrozenByte commented Nov 3, 2024

What do you think about also updating the compact epilog to mention all the things users should know?

            Free repository space by deleting unused chunks.

            ``borg compact`` iterates over all chunks of data the repo consists
            of and checks whether they are used by any archive in the archives
            directory. All chunks that aren't referenced by any archive are
            then deleted from the filesystem to free disk space. This most
            notably includes all archives that were previously internally
            marked for deletion ("soft-deleted archives") by the delete and
            prune commands. Running the compact command consequently also
            impedes ``borg undelete`` from undeleting any soft-deleted
            archives. ``borg compact`` also removes not needed junk data from
            previous backups, e.g. data of aborted backups, files that had to
            be skipped due to I/O errors and other issues that arised while
            creating previous backups.

            You usually don't want to run ``borg compact`` after every write
            operation on the repository, but either regularly (e.g. once a
            month, possibly together with ``borg check``) or when disk space
            needs to be freed.

            **Important:** ``borg compact`` doesn't just remove data from
            intentionally deleted archives and junk data, but might in rare
            cases also delete data from archives that weren't intentionally
            deleted, but "lost" due to data corruption on a filesystem or
            hardware level. Such archives could potentially be restored with
            ``borg check --find-lost-archives [--repair]``, but note that
            finding such lost archives is a very time consuming task one
            usually doesn't want to take unless there are signs of lost
            archives (e.g. when seeing fatal errors when creating backups or
            when archives are missing in ``borg list``).

            Differently than borg 1.x, borg2's compact needs the borg key
            if the repo is encrypted.

@ThomasWaldmann
Copy link
Member Author

I thought about it:

I will remove all user docs of "soft-deleted" again from borg and only explain what's happening in borg undelete and/or borg compact command, where it is somehow relevant.

Mentioning it all over the place isn't necessarily helpful, explaining it everywhere makes the docs longer than necessary.

We are only interested in archive metadata objects here, thus for most repo objects
it is enough to read the repoobj's metadata and determine the object's type.

Only if it is the right type of object, we need to read the full object (metadata
and data).
Consider soft-deleted archives/ directory entries, but only create a new
archives/ directory entry if:
- there is no entry for that archive ID
- there is no soft-deleted entry for that archive ID either

Support running with or without --repair.

Without --repair, it can be used to detect such inconsistencies and return with rc != 0.

--repository-only contradicts --find-lost-archives.
@ThomasWaldmann
Copy link
Member Author

@PhrozenByte I did it a bit differently, but hopefully giving all the information that is useful to the user (and not just an implementation detail).

Copy link
Contributor

@PhrozenByte PhrozenByte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove all user docs of "soft-deleted" again from borg and only explain what's happening in borg undelete and/or borg compact command, where it is somehow relevant.

Mentioning it all over the place isn't necessarily helpful, explaining it everywhere makes the docs longer than necessary.

I understand that you want to shorten the docs and remove unnecessary detail, not that you object documenting this distinction at all, right?

I totally agree with shortening the docs: reading over my suggestions again they indeed were a bit excessive (I have that tendency 😆). However, there's a big issue with trying not to mention the distinction at all: At some occurrences Borg must talk about them, because Borg is specifically dealing with them. Therefore the docs must deal with the distinction and the current docs try solving it in two ways: By either calling them "deleted archives", or by not mentioning them at all. Both IMHO is pretty problematic:

  • Borg shouldn't call them "deleted archives" - because they simply weren't deleted yet. This IMHO creates way more confusion, because, as a user, I know the common term "deleted" to mean "gone for good". This was fine before, because deletion actually meant deletion (check --repair was and is data recovery), but with undelete it no longer does.

    Take email software like Thunderbird with default .mbox storage in comparison: Completely removing an email is called "Delete" in the UI, even though the email isn't actually deleted from the .mbox file, but only after the .mbox file is compacted separately. Thunderbird calls that action "Delete" nevertheless, because Thunderbird doesn't allow restoring that email. However, alternatively one can also "Trash" an email, meaning that the email isn't actually deleted, but just moved out of the user's sight. borg delete previously only did the first variant, but with undelete it rather does the latter - and therefore the term must change as well.

    People know what "delete" means, and if it unexpectedly means something else, it creates confusion - not for you and me, but new users, not knowing yet what Borg is doing there. Thus, instead of (i.e. not additionally, just instead of) calling them "deleted archives", better call them "soft-deleted archives": it's more precise, doesn't cause such confusion and gives a nice "heads up" that there must be an easy way to get them back (i.e. undelete).

  • I agree that the docs don't have to explain what "soft-deleted" means all over the place, especially if nothing special is happening. From an users perspective one expects "archive" (i.e. without an adjective) to mean a normal/not-deleted archive, i.e. the archives one sees with e.g. list. Calling them "normal/not-deleted" implies speciality that isn't there, so I agree that it causes confusion to call them anything but just "archive". However, if Borg (also) considers soft-deleted archives, it should mention this - and only then. Otherwise one just can't properly assess (both "not understanding" and "understanding wrong") what Borg will do from the docs.

    For example, in the check docs, "archive" meant both normal and soft-deleted archives in the main section, but just normal archive in the repair section (see line comment there). This sends the user on the wrong track within the main section, and prevents the user from knowing what will happen in the repair section, thus causing confusion. A very short and simple clarification is sufficient to completely fix that.

    If you just don't like the term "soft-deletion" I'd like to emphasize that there are alternatives. For example, many UI's use "trash" (delete) and "untrash" (undelete).

This naturally also means that one must explain what "soft-deleted archive" means at some point - and instead of excessively explaining it all over the place like I did in my previous suggestion, it's sufficient to just use the term "soft-delete" in prune and delete once respectively, together with the already existing mention of undelete. This makes it absolutely clear what "soft-deletion" means and allows users to understand it anywhere else, too. Simple and precise.

Comment on lines +129 to +130
metadata that doesn't match with an archive directory entry, it means that an
entry was lost.
Copy link
Contributor

@PhrozenByte PhrozenByte Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
metadata that doesn't match with an archive directory entry, it means that an
entry was lost.
metadata that doesn't match with an archive directory entry (including
soft-deleted archives), it means that an entry was lost.

IMHO it's particularly important to clarify this here, because without I'd assume that --find-lost-archives only considers the archives shown with list, meaning that --find-lost-archives also recovers soft-deleted archives - which it intentionally doesn't.

As said in the main review comment the "normal/not-deleted archive" clarification from the previous suggestion isn't necessary.


if deleted:
filters_group.add_argument(
"--deleted", dest="deleted", action="store_true", help="consider only deleted archives."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"--deleted", dest="deleted", action="store_true", help="consider only deleted archives."
"--deleted", dest="deleted", action="store_true", help="consider only soft-deleted archives."

Reopening, see main review comment.

if self.manifest.archives.exists_id(archive_id, deleted=False):
logger.debug(f"We already have an archives directory entry for {name} {archive_id_hex}.")
elif self.manifest.archives.exists_id(archive_id, deleted=True):
logger.debug(f"We already have a deleted archives directory entry for {name} {archive_id_hex}.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.debug(f"We already have a deleted archives directory entry for {name} {archive_id_hex}.")
logger.debug(f"We already have a soft-deleted archives directory entry for {name} {archive_id_hex}.")

Reopening, see main review comment.

logger.warning(f"{len(self.reappeared_chunks)} previously missing objects re-appeared!" + run_repair)
set_ec(EXIT_WARNING)

logger.info("Cleaning archives directory from deleted archives...")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logger.info("Cleaning archives directory from deleted archives...")
logger.info("Cleaning archives directory from soft-deleted archives...")

See main review comment.

Comment on lines +196 to +197
After compacting it is not possible anymore to use ``borg undelete`` to recover
previously deleted archives.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
After compacting it is not possible anymore to use ``borg undelete`` to recover
previously deleted archives.
After compacting it is no longer possible to use ``borg undelete`` to recover
soft-deleted archives.

See main review comment.

Comment on lines +185 to +187
- interrupted backups (maybe retry the backup first before running compact!)
- backup of source files that had an I/O error in the middle of their contents
and that were skipped due to this.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- interrupted backups (maybe retry the backup first before running compact!)
- backup of source files that had an I/O error in the middle of their contents
and that were skipped due to this.
- interrupted backups (maybe retry the backup first before running compact)
- backup of source files that had an I/O error in the middle of their contents
and that were skipped due to this

Unify punctuation.

- interrupted backups (maybe retry the backup first before running compact!)
- backup of source files that had an I/O error in the middle of their contents
and that were skipped due to this.
- corruption of the repository (e.g. the archives directory having lost entries)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- corruption of the repository (e.g. the archives directory having lost entries)
- corruption of the repository (e.g. the archives directory having lost
entries, see notes below)

@@ -64,8 +64,11 @@ def build_parser_delete(self, subparsers, common_parser, mid_common_parser):
"""
This command deletes archives from the repository.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
This command deletes archives from the repository.
This command soft-deletes archives from the repository.

See main review comment. It's really the only necessary change to delete and prune thanks to the later mention of undelete.

Comment on lines 215 to 216
The prune command prunes a repository by deleting all archives not matching
any of the specified retention options.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The prune command prunes a repository by deleting all archives not matching
any of the specified retention options.
The prune command prunes a repository by soft-deleting all archives not
matching any of the specified retention options.

See main review comment. It's really the only necessary change to delete and prune thanks to the later mention of undelete.


Important: Undeleting archives is only possible before compacting.
Once ``borg compact`` has run, all disk space occupied only by the
deleted archives will be freed and undelete is not possible anymore.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
deleted archives will be freed and undelete is not possible anymore.
soft-deleted archives will be freed and undelete is not possible
anymore.

See main review comment.

@ThomasWaldmann
Copy link
Member Author

I suggest you apply all your suggestions in a separate PR. I already mentioned that applying suggestions via the github UI doesn't always work. Then it sometimes also causes black to fail and in the end it is tedious to process here.

@ThomasWaldmann ThomasWaldmann merged commit 2dffc60 into borgbackup:master Nov 6, 2024
@ThomasWaldmann ThomasWaldmann deleted the soft-delete branch November 6, 2024 14:44
PhrozenByte added a commit to PhrozenByte/borg that referenced this pull request Nov 6, 2024
PhrozenByte added a commit to PhrozenByte/borg that referenced this pull request Nov 6, 2024
PhrozenByte added a commit to PhrozenByte/borg that referenced this pull request Nov 6, 2024
PhrozenByte added a commit to PhrozenByte/borg that referenced this pull request Nov 8, 2024
snorkelopstesting1-a11y pushed a commit to snorkel-marlin-repos/borgbackup_borg_pr_8515_773dbd92-8ba1-4a28-8082-236433b28a0d that referenced this pull request Oct 22, 2025
Original PR #8515 by ThomasWaldmann
Original: borgbackup/borg#8515
snorkelopstesting1-a11y pushed a commit to snorkel-marlin-repos/borgbackup_borg_pr_8515_2c1a7d7e-05c3-4f6f-b009-1e8d3a67bd70 that referenced this pull request Oct 22, 2025
Original PR #8515 by ThomasWaldmann
Original: borgbackup/borg#8515
snorkelopstesting1-a11y pushed a commit to snorkel-marlin-repos/borgbackup_borg_pr_8515_6118c931-0582-404d-a5b1-6b8b41c57202 that referenced this pull request Oct 22, 2025
Original PR #8515 by ThomasWaldmann
Original: borgbackup/borg#8515
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants