New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss Goals #1

Open
ThomasWaldmann opened this Issue May 13, 2015 · 18 comments

Comments

Projects
None yet
10 participants
@ThomasWaldmann
Member

ThomasWaldmann commented May 13, 2015

Ideas about potential goals for Borg

Borg is a fork of Attic and it was done to allow some different approaches to development, goals and policy (for details about how and why the fork happened, see jborg/attic#217 and the attic mailing list):

Openness

  • Borg is intended to be an effort by "The Borg Collective"
    (see AUTHORS) and you can be assimilated into it, if you like.
  • Welcome feature requests, discuss their general usefulness.
  • Accept pull requests of good quality and coding style,
    give feedback on PRs that can't be accepted "as is".
  • Openly discuss about stuff, don't work in the dark.

As simple as possible, but not simpler

  • Nobody likes tools that are too complicated, ...
  • ... but nobody likes everything to be fixed and inflexible either.
  • Do the usually right thing by default, but offer other options.
  • Accept the fact that the usually right defaults might be totally unfit
    for some users / use cases.

Compatibility - boon and bane of backup software

  • Don't break it accidentally / without good reason / without warning.
  • Break it if above does not apply. needs more thoughts/discussion
  • As the fork is "new software" from the perspective of a Borg user or
    a Borg packaging distribution, there is no past we need to stay compatible
    with - we have the chance to break compatibility and change everything
    that we think needs changing.
  • Over time, we'll have more users and incompatible changes get harder.
  • Avoid getting into the "compatible forever" trap - we should maybe not
    assure compatibility of development versions nor spanning major releases.
  • When used for long-term archiving, special considerations and care are required.
    E.g. a development snapshot of Borg might be not the right thing for this.
    Also, Borg exists to be able to change things. So if you don't like or can't
    live with a changing software, don't use it.
@silvio

This comment has been minimized.

silvio commented May 13, 2015

  • Avoid getting into the "compatible forever" trap - we should maybe not assure compatibility of development versions nor spanning major releases.
  • When used for long-term archiving, special care might be required.

I think extracting files of an archive should possible every time independent of the used version of borgbackup. At least with latest bb version should extracting of older version of archive possible. But not vice versa.
A converting operation could be the solution (thx, @joolswills)

@joolswills

This comment has been minimized.

joolswills commented May 13, 2015

One thing I would like to see, is if/when the repository format changes to offer new features, the ability to convert in place existing repositories, rather than having to start again.

Agree with your points. Thanks for your efforts.

@ThomasWaldmann

This comment has been minimized.

Member

ThomasWaldmann commented May 13, 2015

Converters are an option to think about when going from one release to an (incompatible) newer release.

But, someone would have to write that code and it is a burden and slows down development. Also, converting a large amount of historical archives might be a very time and space consuming affair, thus maybe impractical even if you had a converter.

Thus, I am still proposing just breaking compat now and then. Someone who wants conservative long time archiving might better off using tar (or attic) maybe, not with something that is being heavily developed. And the whole point of this fork is to accelerate development. :)

Also, of course nobody wants to write converters that convert between development snapshots and alpha/beta/rc releases.

@joolswills

This comment has been minimized.

joolswills commented May 13, 2015

agreed in regards to dev, but if a new feature is added such as a new compression method it would be nice to do things like recompress etc. Also things like decrypting/encrypting repositories after set up would be useful. Just ideas anyway!

@anarcat

This comment has been minimized.

Contributor

anarcat commented May 13, 2015

i really like the promise of attic to keep backwards compatibility forever for storage.

who knows when i will need to restore this old backup? do we need to make such changes now anyways?

maybe an alternative here would be to stabilise versions at some point and treat those things as "API changes". so we could have a X.0 release that will be backwards-compatible with X-1.0 and forward-compatible with X+1.0 so that you could migrate your repo by upgrading incrementally?

@ThomasWaldmann

This comment has been minimized.

Member

ThomasWaldmann commented May 13, 2015

@anarcat I thought about it quite a lot. Keeping something compatible forever sounds great, but from development and maintenance standpoint, it is a pain.

For example, look at flash player or windows - they try to be compatible forever (or at least over very long time) and it results in an accumulation of a lot of crap code (there was a talk by FX at a CCC conference about it, that went into quite some detail). They basically rewrote that thing multiple times (for reasons), but always kept the old code also. Thus you have now not just the bugs in the latest code, but all remaining bugs in all the old versions, too. Windows still has broken time stdlib functions, because they were broken in the same way on DOS.

I am not saying attic is quality like flash player. :) We are lucky that the code base is quite good, but I found some places where it is too hardcoded or where maybe even a layer is missing and blocking the development of a needed feature. Maybe we will find more over time, that has to be seen.

Your idea with +/- 1 version compatibility is somehow similar to a converter, right?

@ThomasWaldmann

This comment has been minimized.

Member

ThomasWaldmann commented May 13, 2015

BTW, I pushed some code to the repo. It is not compatible with attic due to changes of the magic strings (like ATTIC_KEY, ATTICSEG, ATTICIDX). Other than that, it was mostly the content of the rather conservative "merge" branch + s/attic/borg/g. See CHANGES.txt for details.

@anarcat

This comment has been minimized.

Contributor

anarcat commented May 13, 2015

so i understand how hard it can be for windows to keep backwards compat with crap like DOS, or keep flash player stable. i think that's a different problem than what we are living here: as you said, attic is fairly well designed and implemented, and there are some tweaks we want to do, unless i misunderstand something deeper here.

a good example is changing the ATTIC strings: why would we do that at all, if not gratiously break backwards compatibility?

@aride

This comment has been minimized.

aride commented May 13, 2015

@ThomasWaldmann it's true that backward compatibility can eventually turn into an inconvenience. But please keep in mind this is backup software we're talking about. You can't treat it like some UI or some other self-contained software. Its data is designed to last an indeterminate amount of time. Otherwise, it is not a backup. Please name one backup software that doesn't treat backup format with extreme care. All formats I can think of are very long-lived, especially those that have been successful think cpio, pax, tar... there may be format versions, but those are very few and still handled by recent versions. Proprietary backup solutions may show a bit more variability but still they can read old formats almost without exception.

Breaking compatibility with upstream attic is not a smart move, especially without a very good reason and a data migration path in place. To break it for some silly strings is simply absurd. This issue is called "Discuss Goals", so let's do that. What is the mission of backup software?: is it to have flexibility? is it to fix bugs fast? No, it is to keep data safe. Those things are nice, and we all want them, but they're not the software's mission. If backup software fails to keep data accessible, it fails as backup software, it's simply useless.

Besides, if so many changes are required to the backup format, then it is designed badly. Is the attic backup format design bad? Why? What are its shortcomings specifically?

Now, I can understand making no promises of data integrity or compatibility for development versions. That's just common sense. BUT development should strive from the start to stick to one and only one format (be it attic's or some variation on it). Format should be versioned, just in case there appears a real need to change it in the future, but if it's well designed it should support most features we might want. And if it doesn't, then either it wasn't well designed or we really should think hard whether such feature is really needed. And at a minimum, read-only backward compatibility is a must.

Just my opinion, of course.

@barsanuphe

This comment has been minimized.

barsanuphe commented May 13, 2015

It's a fork. If changes are necessary, now is the time, especially if those changes make borg more resilient to future changes.
Also, the idea of backup software being able to keep files and format "forever" is a little ambitious (for borg or attic). Recently a msgpack bug was discovered: attic is not standalone, its very format is dependant on third party libraries. If those evolve or are abandonned, you will need to do something about your repositories.
But I agree read-only backward compatibility is a minimum; the ability to update the format (or change compression level/encryption) would definitely come in handy.

@aride

This comment has been minimized.

aride commented May 13, 2015

@barsanuphe I didn't say "forever", I said an indeterminate amount of time. Meaning, "long enough". Is the time between borg releases long enough? I don't think so, not by a long shot. Tar files have lasted decades, I don't see a reason not to strive to reach that kind of quality. Users expect that kind of quality from backup software, not having to reencode archives at any upgrade.

You say "if changes are necessary", that's precisely my point. That necessity needs to be spelled out very clearly, for very good reasons, before format changes can be considered. And they haven't. I agree that now is the right time to discuss those needs. The sooner the format is established, the sooner borg development can begin.

@Ernest0x

This comment has been minimized.

Contributor

Ernest0x commented May 14, 2015

Converters are an option to think about when going from one release to an (incompatible) newer release.

But, someone would have to write that code and it is a burden and slows down development.

Maybe, but must be done.

Also, converting a large amount of historical archives might be a very time and space consuming affair, thus maybe impractical even if you had a converter.

Maybe or maybe not. You must provide the conversion functionality and let the user decide. He may choose to prune archives before converting and keep only a small subset of them. Or he may have the time and space and wants Everything.

@maltefiala

This comment has been minimized.

maltefiala commented May 14, 2015

Like anarcat and aride I too believe this fork should improve attic in a sensible way without braking too many things at once. Sure, nobody knows what bugs will be found in the future. However, I really don't see why changes in 3rd party libraries should brake compatibility with attic as attic needs to be upgraded as well in such a case.

An example would be the readability of attic's code. Variable names like "t0" and "st" are nice to write but don't make it very readable and brake PEP 0008 as they clearly aren't words:

lowercase with words separated by underscores as necessary to improve readability.

So should we change those variable names to something better to benefit readability or should we stay with them to benefit code compatibility? I would vote for the latter at the moment.

@level323

This comment has been minimized.

level323 commented May 14, 2015

Concerning attic and backward compatibility with older repo formats: I have an idea. It may completely suck... but here it is FWIW. Flick your patience switch to the ON position, because it involves/requires modularising the code and I need to discuss that first. Bare with me - the description may be long but the concept will (IMO) result in a quite neat, tidy, more functional and more extensible tool.

So, what I'm thinking is that there seems to be a pretty clear boundary line where the code can be modularised, as described below:

  1. The borg_core module. This is the 'engine room'. It is the only module that actually works on and touches backup repos. It's functionality is init, create, extract, check, delete, list, prune, info, change-passphrase - but these are only internal API's to the 'core' and not user facing... other modules wrap around borg_core to provide filesystem abstraction and user facing commands as described further below. However, under this modularised approach this 'core' only communicates file content and metadata via filesystem-agnostic data structures with bare minimum knowledge to carry out the above functions. The data structure is a list of one or more of what I'll call 'file packages' (FP's). FP's are relatively 'future proof' data structures (e.g. leveraging msgpack/protocol buffers/whatever) that contains the content borg_core needs to get it's job done (e.g. file content, file name, perhaps/probably file content checksum) and also provides for arbitrary additional content that can be used for filesystem- and/or OS-specific data (e.g. xattrs, ACL's, whatever) that borg_core just stores and retrieves from the repo but doesn't need to use directly or understand in any specific way.
  2. One or more filesystem-specific (or even OS-specific) interface modules (or perhaps we could call them "filesystem shims"...I dunno). These modules wrap around borg_core to provide filesystem-specific behaviour. For the sake of providing a concrete example, consider a module which I will give the name borg_extfs_shim, as it is designed to make borg work on ext3/4 filesystems:
    • In the case of init it simply passes through to the 'init' method of the 'core' module.
    • In the case of create it handles the filesystem scanning (exclude globs/regexps and special fs-specific stuff). The module reads the files to be backed up and packages them into a list/stream of FP's. It streams this list of FP's to borg_core.create' to be pushed into the repo. Options to the borg_extfs_shim.create method can specify how much/little of the metadata (perms, xattrs, ACLs, whatever) gets stored in the metadata portion of each FP and consequently stored in the attic repo via borg_core.
    • In the case of extract, the approximate reverse to create occurs. In the most simple case borg_core spews a list/stream of FP's back to borg_generic_linuxfs_shim, which unpacks the file content and metadata and writes the described files to the filesystem using it's special-sauce knowledge of (in this case) ext3/4 filesystem. There are more complex cases (e.g. partial extract of only certain files) that I won't go into for the sake of brevity as this post is already very long.
    • In the case of check, it could be as little as passing straight through to borg_core's own check method, but more feature-rich code could also be created to do certain checks on metadata on files in the archive if deemed worthwhile/necessary.
    • in the case of list, this module receives from borg_core.list a stream/list of FP's metadata only (no file content). This module then interprets the metadata and pretty-prints to stdout a detailed list of files and any metadata deemed relevant that is in the specified archive. In other words, it's very much like extract but only crunches metadata, not file content.
    • In the case of delete and prune, info and change-passphrase, these would be passed straight through to their counterpart methods in borg_core
  3. The borg module. This is the user facing module - the 'front end'. But there could be others made if a new use case warranted. By default, it automatically determines which filesystem shim module will be used to interact with borg_core, but there could be a command-line switch to force a specific shim to be used. For example:
    • In the case of borg create myrepo::my-archive ~/Documents, the borg module determines that the filesystem being read is ext4, so engages borg_extfs_shim.create(repo="myrepo",archive="my-archive",source="~/Documents" etc.
    • Hopefully you get the drift.

This design opens up numerous possibilities which both improve modularisation of the code AND could make the issue of repo backward compatibility a much easier goal to achieve.

Concerning modularisation, consider now a module borg_fuse_shim, which only implements borg mount. This is more neat/tidy/modularised, no? Fuse mounting is a great feature, but should not really be a part of the 'core' of borg.

Concerning modularisation once more, consider now a module borg_stdin_shim, which only implements borg create with the specific function of accepting stream on stdin and presenting it as a single FP to borg_core. Nice, neat solution that moves this feature, which is nice but not critical/central, out of the core functionality of borg.

Concerning repo backward compatibility, this new modularised approach brings the goal of repo backward compatibility much closer, for two reasons:

  1. borg_core is now (almost) entirely file metadata-agnostic. This ensures that there will be minimal need, in future, to change the repo data structures concerning the metadata of archived file content for the foreseeable future. Admittedly, however, it has no impact on backward-compatibility of on-disk repo format. Want to support a new feature in a specific filesystem (e.g. NTFS, xfs, reiser, btrfs, NFS, Amazon S3) in future? No problem! Just write a new shim or expand an existing one, leaving borg_core untouched.
  2. A specific shim can be written to output an entire repo as a stream (e.g. to stdout) in a well defined format. That format could be as simple as a serialised (msgpacked) dict where the key is the archive name and the value is the list of FP's (exactly the list/stream that the shims use to pass data back and forth with borg_core). Combined with a method to read an entire repo as a stream (e.g. from stdin) and you now have a mechanism for upgrading from one repo format to the next (and downgrading, for that matter). This might be in the form of the following piped command attic_v1.53 streamout my-old-repo | attic_2.01 streamin my-new-repo. All that is required to achieve this is the necessary disk space and adequate cups of coffee.

Sorry for the enormous post. Hopefully the idea doesn't suck. If it does, sorry for giving my readers eyestrain for no good reason.... ;-)

@anarcat

This comment has been minimized.

Contributor

anarcat commented May 15, 2015

[i'm hesitant in adding more to the wildly ranging conversation here, but it seems that one big issue in the goals of the project here is regarding backwards compatibility, so i'll add something about that. maybe a separate issue should be opened about this to summarize the conversation here and clarify borg's way of dealing the issue...]

anyways. so i understand where the "fork allows us to change" idea is coming from and i respect that. maybe it's fine to make a break to allow cleaning up bad assumptions in the code. i am worried about:

  1. gratitious changes: here i am refering specifically to 159315e - this commit changes magic number without a good justification. this seem to be contrary to even the goals stated in the summary here (namely "Don't break it accidentally / without good reason / without warning")
  2. eternal upgrade chase: even if we accept some of those changes, at some point, those changs need to stop and stabilise. maybe that's what a 1.0 release looks like. but then that means the software can't actually be used reliably in production until then, at which point it is locked. so a little more thought need to be put about how to introduce format changes safely and reliably.
  3. future-proofing: backup software should be self-contained (for disaster recovery) and able to deal with really old data. data that can't be read directly should be convertable, as a worst case option, but never lost (this already fails wrt to Attic because of the above commit, but i guess that's an acceptable compromise if we consider borg as a new backup software and not a fork (which it isn't))

I really like jborg's example of how old tar archives from 30 years ago can still be read. tar's specification also has the benefit of fitting with three paragraphs in wikipedia - clearly a different implementation. yet i believe is a standard any backup software should aspire to. notice how tar has dealt with potentially backwards-incompatible changes...

basically, my position is that attic/borg should not break backwards compatibility and support past formats forever. i haven't seen compelling evidence or changes that warrant such a break at this point, and I would like those proposing such changes to show such an example otherwise the conversation will likely continue to go nowhere... i believe that any such change can be made in a backwards compatible way, the current format is not so bad as it will explode in the future...

@anarcat

This comment has been minimized.

Contributor

anarcat commented May 22, 2015

since so many discussions were about backwards compatibility here, i thought it was relevant to open an issue specifically about this in #26.

@anarcat

This comment has been minimized.

Contributor

anarcat commented May 22, 2015

oh, and in PR #25, i actually suggest we document the goals stated in the summary here "as is" (mostly), meaning that i agree with those.

i wonder if a code of conduct or something similar wouldn't be a good idea too... a few ideas:

perguth added a commit that referenced this issue Jul 27, 2015

ThomasWaldmann pushed a commit that referenced this issue Sep 12, 2015

Merge pull request #1 from borgbackup/master
Pull latest upstream master

@anarcat anarcat referenced this issue Nov 9, 2015

Closed

1.0 goals #356

@anarcat anarcat added this to the 1.0 milestone Nov 9, 2015

@ThomasWaldmann ThomasWaldmann removed this from the 1.0 milestone Nov 16, 2015

ThomasWaldmann pushed a commit that referenced this issue Nov 2, 2018

Merge pull request #1 from borgbackup/master
Update form upstream to fork
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment