Skip to content

Commit

Permalink
Merge remote-tracking branch 'refs/remotes/local/master'
Browse files Browse the repository at this point in the history
  • Loading branch information
Edrusb committed Mar 28, 2024
2 parents 4d428f0 + b9e46a5 commit 6d811bd
Show file tree
Hide file tree
Showing 441 changed files with 2,073 additions and 1,863 deletions.
2 changes: 2 additions & 0 deletions doc/COMMAND_LINE
Expand Up @@ -21,6 +21,8 @@ a * alteration of operation --alter argument:
e[rase_ea]
f[ixed-date]
file-auth[entication]
first-slice
force-first-slice | ffs
g[lob]
h[oles-recheck]
header
Expand Down
126 changes: 124 additions & 2 deletions doc/FAQ.html
Expand Up @@ -88,6 +88,7 @@ <h2>Questions:</h2>
<a href="#sftppubkey">I have sftp pubkey authentication working with ssh/sftp, how to have dar using too this public key authentication for sftp?</a><br/>
<a href="#full-from-diff">I have a diff/incremental backup and I want to convert it to a full backup, how to do that?</a><br/>
<a href="#tapes">How to use dar with tapes (like LTO tapes)?</a><br/>
<a href="#Smallfiles">Why dar does not compress small files toghether for better compression ratio?</a><br/>
</div>
</div>

Expand Down Expand Up @@ -3515,6 +3516,127 @@ <h2>Answers:</h2>
any other command than <code>dar</code>, in particular, yes, you can use it with <code>tar</code>
if you don't want to rely on the additional features and resiliency <code>dar</code> provides.
</p>
</div>
</body>

<a name="Smallfiles"><b>Why dar does not compress small files together for better compression ratio?</b></a><br/>

<p>
Since around year 2010, this is a question/suggestion/remark/revew that haunted the dar-support mailing-list and
new feature requests, resurrecting from time to time: Why <em>dar</em> does not compress small files together in dar
archive for better compression, like <em>tar</em> does? (its grand and venerable brother).
</p>
<p>
First point to note: <em>tar</em> does not compress at all. This is gzip, bzip2, xz or other similar programs that
take as unstructured input what tar outputs, in order to produce an unstructured compressed data stream redirected
into a file.
</p>
<p>
It would be tempting to answer: "You can do the same with dar!", but there are better things to do, read below.
</p>
<p>
But before let's remind dar's design and objectives:
</p>
<ul>
<li>compression is done per file</li>
<li>a given file's data can be accessed directly</li>
</ul>
<p>
Doing so, has several advantages:
</p>
<ul>
<li>
In a given backup/archive, you can avoid compressing some file, while compressing others (gain of time and space, as compressing
already compressed file usually leads to waste storage space).
</li>
<li>
You can quickly restore a particular file, even from a several petabytes archive/backup, no need to read
(disk IO) and decompress (CPU cycles) all the data present before that file in the archive.
</li>
<li>
Your backups are more robust: if even just one byte data corruption occurred at some place in one of your
backup, it will concern only one file, but you will be able to restore all other files, even those located
after that corruption. At the
opposite, with tar's compression manner, you would lose all data following the data corruption...
</li>
</ul>
<p>
<em>dar</em> is doing that way, because tar's way was not addressing some major concerns in the backup
area. Yes, this has the drawback to degrade the compression ratio, but this is a design choice.
</p>
<p>
Now, looking for the best of both approaches, some
proposed to gather small files together and compress them together. This would not only break all the three
advantages exposed above, but also break another feature which is the order in which files are stored: Dar
does not inspect twice the same directory at backup time nor at restoration time. Doing so avoids
saving the full path of each directory and file (and at two places: in-line metadata and in the catalog).
This also leads to better performances as it better leverage disk cache for metadata (directory content). OK,
one could say that today with SSD and NVMe this is negligible, but one would ignore that direct RAM access
from cache, is still much faster than any NVMe disk access.
</p>
<p>
So, if you can't afford keeping small files uncompressed (see dar's --mincompr, -X and -I options
for example), or if compressing them with dar versus what tar does makes a so big difference that it worth
considering to compress them together, you have three options:
</p>
<ol>
<li>
<p>
<b>use tar in dar</b>
</p>
<ul>
<li>
make a tar archive of the many small files you have, just a tar file, without compression.
Note: you can automate this when entering some particular directory trees of your choices by mean of -< -> and -=
options, and remove those temporary tar file when dar exit those directories at backup time.
You would also have to exclude those files used to build the tar file you created dynamically (see
-g/-P/-X/-I/-[/-] options).
</li>
<li>
Then let dar perform the backup, compressing those tar files with other files, if they satisfy the
--mincompr size, or any other filtering of you choice (see -Z and -Y options). Doing so
can let you leverage parallel compression and reduced execution time, brought by dar, something you cannot have with
tar alone.
</li>
<li>
Of course, you benefit also of all other dar's features (slicing, ciphering, slice hashing in fly, isolated
catalogues, differential/incremental/decremental backups... and even delta binary!)
</li>
</ul>
<p>
But yes, you will lose dar's three advantages seen above, but just for those small files you have gathered in a tar in dar file,
not for the rest of what's under backup.
</p>
</li>
<li>
<p>
<b>use tar alone</b>
</p>
<p>
If dar does not match your need and/or if you do not need to leverage any of
the three dar's advantages seen above, tar is probably a better choice for you.
That's a pity, but there is not one tool that matches all needs...
</p>
</li>
<li>
<p>
<b>describe with details a new implementation/enhancement</b>
</p>
<p>
The proposal should take into account dar's design objectives (robustness to data
corruption, efficient directory seeking, fast access to any file's data) in a way
or another.
</p>
<p>
But please, do not make an imprecised proposal, that assumes it will just "magically" work: I only like magic
when I go to a magic show ;)
</p>
<p>
Thanks to detail both backup and restoration processes. Often times, pulling out the missing details one
after the other, results in something unfeasible or with unexpected complexity and/or much less
gain than expected. Also look at the <a href="Notes.html#archive_structure">Dar Archive Structure</a> to
see how it could fit or if not, what part should be redesigned and how.
</p>
</p>
</li>
</ol>
</div> </body>
</html>
12 changes: 12 additions & 0 deletions doc/Notes.html
Expand Up @@ -2818,6 +2818,18 @@ <h4>Cross reference matrix</h4>
<td>1.3.0</td>
<td>1.2.2</td>
</tr>
<tr>
<td>March 23rd, 2024</td>
<th>2.7.14</th>
<td>11.3</td>
<td>06</td>
<td>6.7.2</td>
<td>1.7.0</td>
<td>1.6.0</td>
<td>1.9.0</td>
<td>1.3.0</td>
<td>1.2.2</td>
</tr>

</table>
</div>
Expand Down
18 changes: 15 additions & 3 deletions doc/from_sources.html
Expand Up @@ -81,6 +81,10 @@ <h2><a name="requirements">Requirements</a></h2>
the <i>make</i> program (tested with
<a href="http://www.gnu.org/software/make/">gnu make</a>)
</li>
<li>
<a href="http://pkgconf.org/">pkg-config</a> to help detecting and configuring proper
CFLAGS/CXXFLAGS and LDFLAGS for optional libraries dar may relies on (see below)
</li>
</ol>

<p>
Expand All @@ -90,11 +94,11 @@ <h2><a name="requirements">Requirements</a></h2>

<ul>
<li>
<a href="http://www.gzip.org/zlib/">libz library</a>
<a href="http://zlib.net/">libz library</a>
for gzip compression support
</li>
<li>
<a href="http://sources.redhat.com/bzip2/">libbzip2 library</a> for bzip2
<a href="https://sourceware.org/bzip2/">libbzip2 library</a> for bzip2
compression support
</li>
<li>
Expand Down Expand Up @@ -364,6 +368,14 @@ <h3>Dependencies in distro packages</h3>
<th>Distro</th>
<th>Debian/Devuan/Ubuntu</th>
</tr>
<tr>
<th>
pkg-config tool
</th>
<td>
pkg-config
</td>
</tr>
<tr>
<th>
libz library
Expand Down Expand Up @@ -473,7 +485,7 @@ <h3>Dependencies in distro packages</h3>
libthreadar library
</th>
<td>
libthreadar has to be installed manually
libthreadar-dev
</td>
</tr>
<tr>
Expand Down
10 changes: 8 additions & 2 deletions man/dar.1
@@ -1,4 +1,4 @@
.TH DAR 1 "September 3rd, 2023"
.TH DAR 1 "March 23rd, 2024"
.UC 8
.SH NAME
dar \- creates, tests, lists, extracts, compares, merges, isolates, repairs dar archives
Expand Down Expand Up @@ -803,7 +803,7 @@ means 'Ask for user decision'. This uppercase letter concerns Data overwriting.
a
means 'Ask for user decision'. This lowercase letter is the equivalent for EA and FSA of the 'A' action. It is intended to be used in the same conditional statements described below.
.PP
An action is thus a couple of letters, the first being uppercase (for file's data) the second being lowercase (for file's EA and FSA). When -/ option is not given, the action is equivalent to '-/ Oo', making dar proceed to file, EA and FSA overwriting. This is to stay as close as possible to the former default action where neither -n nor -w where specified. Note that -w option stays untouched, in consequences, in this default condition for -/ option, a confirmation will be asked to the user before dar proceed to any overwriting. The former -n option (still used to handle slice overwriting) can be replaced by its equivalent '-/ Pp' for resolving file overwriting conflict (never overwrite). Here follows some examples of actions, all these are done for any entry found in conflict during archive merging or archive extraction, we will see further how to define conditional actions.
An action is thus a couple of letters, the first being uppercase (for file's data) the second being lowercase (for file's EA and FSA). When -/ option is not given, the action is equivalent to '-/ Pp', making dar proceed to file, EA and FSA preservation. Before release 2.4.0 (June 2011), only -n and -w options were available to define the overwriting policy and the default behavior was to warn and wait for confirmation before overwriting, which behavior can be set by the '-/ Oo' policy. It seems the default behavior changed to '-/ Pp' at that time and nobody complained or noticed the difference until 2023, so this default behavior will not be reverted, as nobody complained for more than 12 years about that. Here follows some examples of actions, all these are done for any entry found in conflict during archive merging or archive extraction, we will see further how to define conditional actions.
.TP 5
-/ Rr
will lead dar to remove any file from filesystem that ought to be restored(!). Note the action for EA/FSA is useless, the EA and FSA will always be erased as well as data using 'R'. Thus '-/ Rp' would lead to the same result.
Expand Down Expand Up @@ -1194,6 +1194,9 @@ Do not restore unix-sockets. By default saved unix sockets are recreated at rest
.TP 20
-ap, --alter=place
Since version 2.7.1 libdar stores the filesystem root path (given -R option) used when creating a backup, this is the known as the 'in-place' path. At restoration time by default, dar uses the provided -R option or if not specified uses the current directory as root directory for the restoration operation. Using -ap option lead dar to read the in-place path from the backup and restore the data using this path instead. This option is thus exclusive with -R option and may lead dar to report an error if the archive has not stored any in-place path (older archive format or backup resulting of the merging of two backups having different in-place path).
.TP 20
-affs, --alter=force-first-slice
This option only applies when restoring a backup with the help of an isolated catalogue. In that context dar still needs to read the archive format from the backup to restore. In direct access mode, this information is always fetched from the beginning of the archive, but in direct access mode it may be fetched reading the end of the last slice (default behavior) or reading the beginning of the first slice (when -affs is set). The objective is to avoid fetching the last slice when using very large backups. One can define a first slice (see -S option) of for example 1 kB while other slices (-s option) can be specified arbitrarily large. With the help of an isolated catalogue based on this archive and the first (small) slice, reading some data from this large backup then only needs the few (big) slices where the data is located, not more.
.PP
.B TESTING AND DIFFERENCE SPECIFIC OPTIONS (to use with -t or -d)
.PP
Expand All @@ -1207,6 +1210,9 @@ is also available as described just above for restoration options.
No other specific option, but all general options are available except for example -w which is useless, as testing and comparing only read data. -A option is available as described in GENERAL OPTIONS to backup of internal catalogue of the archive (assuming you have a previously isolated catalogue available).
.PP
Doing a difference in sequential read mode is possible but hard linked inodes can only be compared to the filesystem the first time they are met, next hard links to this same inode cannot obtain the corresponding data because skipping backward in sequential read mode is forbidden. In that situation, the hard links are reported as skipped, meaning that data comparison could not be performed.
.TP 20
-affs, --alter=force-first-slice
this option also applies to testing and difference operations (see details above).
.PP
.B LISTING OPTIONS (to use with -l)
.PP
Expand Down
2 changes: 1 addition & 1 deletion misc/Makefile.am
@@ -1,6 +1,6 @@
#######################################################################
# dar - disk archive - a backup/restoration program
# Copyright (C) 2002-2023 Denis Corbin
# Copyright (C) 2002-2024 Denis Corbin
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
Expand Down
2 changes: 1 addition & 1 deletion misc/batch_cygwin
Expand Up @@ -2,7 +2,7 @@

#######################################################################
# dar - disk archive - a backup/restoration program
# Copyright (C) 2002-2023 Denis Corbin
# Copyright (C) 2002-2024 Denis Corbin
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
Expand Down
2 changes: 1 addition & 1 deletion misc/batch_linux
Expand Up @@ -2,7 +2,7 @@

#######################################################################
# dar - disk archive - a backup/restoration program
# Copyright (C) 2002-2023 Denis Corbin
# Copyright (C) 2002-2024 Denis Corbin
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
Expand Down
2 changes: 1 addition & 1 deletion misc/batch_linux_ea
Expand Up @@ -2,7 +2,7 @@

#######################################################################
# dar - disk archive - a backup/restoration program
# Copyright (C) 2002-2023 Denis Corbin
# Copyright (C) 2002-2024 Denis Corbin
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
Expand Down
2 changes: 1 addition & 1 deletion misc/batch_solaris
Expand Up @@ -2,7 +2,7 @@

#######################################################################
# dar - disk archive - a backup/restoration program
# Copyright (C) 2002-2023 Denis Corbin
# Copyright (C) 2002-2024 Denis Corbin
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
Expand Down
2 changes: 1 addition & 1 deletion misc/init
Expand Up @@ -2,7 +2,7 @@

#######################################################################
# dar - disk archive - a backup/restoration program
# Copyright (C) 2002-2023 Denis Corbin
# Copyright (C) 2002-2024 Denis Corbin
#
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
Expand Down
2 changes: 1 addition & 1 deletion misc/todos.c
@@ -1,6 +1,6 @@
/*********************************************************************
// dar - disk archive - a backup/restoration program
// Copyright (C) 2002-2023 Denis Corbin
// Copyright (C) 2002-2024 Denis Corbin
//
// This program is free software; you can redistribute it and/or
// modify it under the terms of the GNU General Public License
Expand Down

0 comments on commit 6d811bd

Please sign in to comment.