Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1.1.0: strange slowdown when backupping device node files #3130

Closed
shartge opened this issue Oct 10, 2017 · 7 comments
Closed

1.1.0: strange slowdown when backupping device node files #3130

shartge opened this issue Oct 10, 2017 · 7 comments
Assignees
Labels
Milestone

Comments

@shartge
Copy link

shartge commented Oct 10, 2017

I have several Debian build-roots, all created with pbuilder/cowbuilder on one system.

During the backup of those build-roots, borgbackup slows down to a crawl, when it hits devices in /dev/ where no hardware exists in the host, for example in my case the floppy drive devices.

strace shows this:

[...]
     0.000000 getxattr("/var/cache/pbuilder/base.cow/dev/fd0u1680", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available)
     0.000370 getxattr("/var/cache/pbuilder/base.cow/dev/fd0u1680", "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)
     0.000270 ioctl(1, TIOCGWINSZ, {ws_row=44, ws_col=166, ws_xpixel=1848, ws_ypixel=1030}) = 0
     0.000462 write(2, "15.85 GB O 8.82 GB C 245.00 MB D"..., 167) = 167
     0.000389 lstat("/var/cache/pbuilder/base.cow/dev/fd0u1722", {st_mode=S_IFBLK|0660, st_rdev=makedev(2, 60), ...}) = 0
     0.000218 open("/var/cache/pbuilder/base.cow/dev/fd0u1722", O_RDONLY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = -1 ENXIO (No such device or address)
     3.082317 llistxattr("/var/cache/pbuilder/base.cow/dev/fd0u1722", "", 4096) = 0
     0.000133 open("/var/cache/pbuilder/base.cow/dev/fd0u1722", O_RDONLY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = -1 ENXIO (No such device or address)
     3.315956 getxattr("/var/cache/pbuilder/base.cow/dev/fd0u1722", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available)
     0.000138 getxattr("/var/cache/pbuilder/base.cow/dev/fd0u1722", "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)
     0.000216 ioctl(1, TIOCGWINSZ, {ws_row=44, ws_col=166, ws_xpixel=1848, ws_ypixel=1030}) = 0
     0.000313 write(2, "15.85 GB O 8.82 GB C 245.00 MB D"..., 167) = 167
     0.000309 lstat("/var/cache/pbuilder/base.cow/dev/fd0u1743", {st_mode=S_IFBLK|0660, st_rdev=makedev(2, 76), ...}) = 0
     0.000175 open("/var/cache/pbuilder/base.cow/dev/fd0u1743", O_RDONLY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = -1 ENXIO (No such device or address)
     3.085720 llistxattr("/var/cache/pbuilder/base.cow/dev/fd0u1743", "", 4096) = 0
     0.000083 open("/var/cache/pbuilder/base.cow/dev/fd0u1743", O_RDONLY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = -1 ENXIO (No such device or address)
     3.316940 getxattr("/var/cache/pbuilder/base.cow/dev/fd0u1743", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available)
     0.000113 getxattr("/var/cache/pbuilder/base.cow/dev/fd0u1743", "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)
     0.000296 ioctl(1, TIOCGWINSZ, {ws_row=44, ws_col=166, ws_xpixel=1848, ws_ypixel=1030}) = 0
     0.000429 write(2, "15.85 GB O 8.82 GB C 245.00 MB D"..., 167) = 167
     0.000288 lstat("/var/cache/pbuilder/base.cow/dev/fd0u1760", {st_mode=S_IFBLK|0660, st_rdev=makedev(2, 96), ...}) = 0
     0.000167 open("/var/cache/pbuilder/base.cow/dev/fd0u1760", O_RDONLY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = -1 ENXIO (No such device or address)
     3.075010 llistxattr("/var/cache/pbuilder/base.cow/dev/fd0u1760", "", 4096) = 0
     0.000162 open("/var/cache/pbuilder/base.cow/dev/fd0u1760", O_RDONLY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = -1 ENXIO (No such device or address)
     3.063643 getxattr("/var/cache/pbuilder/base.cow/dev/fd0u1760", "system.posix_acl_access", NULL, 0) = -1 ENODATA (No data available)
     0.000138 getxattr("/var/cache/pbuilder/base.cow/dev/fd0u1760", "system.posix_acl_default", NULL, 0) = -1 ENODATA (No data available)
     0.000219 ioctl(1, TIOCGWINSZ, {ws_row=44, ws_col=166, ws_xpixel=1848, ws_ypixel=1030}) = 0
     0.000327 write(2, "15.85 GB O 8.82 GB C 245.00 MB D"..., 167) = 167
     0.000256 lstat("/var/cache/pbuilder/base.cow/dev/fd0u1840", {st_mode=S_IFBLK|0660, st_rdev=makedev(2, 116), ...}) = 0
     0.000165 open("/var/cache/pbuilder/base.cow/dev/fd0u1840", O_RDONLY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = -1 ENXIO (No such device or address)
     3.083064 llistxattr("/var/cache/pbuilder/base.cow/dev/fd0u1840", "", 4096) = 0
[...]

Note the 3 second delay after each open(".../dev/fd0..."). (According to strace(1) the timestamp at the start of a line is the time elapsed since the start of the syscall on the line before.)

A normal "living" system wouldn't have any devices without the needed hardware, because /dev is a tmpfs populated by udev. But in the build-roots, borgbackup hits the real filesystem below the tmpfs with tons on unneeded devices nodes and things get strange.

In my case, the backup time regresses from ~45 seconds with 1.0.11 to over 12 minutes with 1.1.0.

@shartge shartge changed the title 1.1.0: strange slowdown when reading block device files 1.1.0: strange slowdown when backupping device node files Oct 10, 2017
@ThomasWaldmann
Copy link
Member

your borg command line?

@shartge
Copy link
Author

shartge commented Oct 10, 2017

Addition: dmesg shows this:

[381749.416085] floppy0: no floppy controllers found
[381752.744009] floppy0: no floppy controllers found
[381752.744045] work still pending
[381756.071910] floppy0: no floppy controllers found
[381759.400019] floppy0: no floppy controllers found
[381762.727859] floppy0: no floppy controllers found
[381762.727893] work still pending
[381766.055816] floppy0: no floppy controllers found
[381769.383866] floppy0: no floppy controllers found
[381772.455806] floppy0: no floppy controllers found
[381775.527889] floppy0: no floppy controllers found
[381775.527931] work still pending
[381778.599865] floppy0: no floppy controllers found
[381778.599899] work still pending
[381781.927835] floppy0: no floppy controllers found
[381785.255788] floppy0: no floppy controllers found
[381788.583688] floppy0: no floppy controllers found
[381791.655760] floppy0: no floppy controllers found
[381794.983756] floppy0: no floppy controllers found
[381798.055711] floppy0: no floppy controllers found
[381801.383548] floppy0: no floppy controllers found

@shartge
Copy link
Author

shartge commented Oct 10, 2017

borg create --stats --info $fullbackup --compression lz4 --chunker-params 19,23,21,4095 --exclude-caches --one-file-system --progress \
        --exclude '/var/log/journal' \
        --exclude '/var/lib/apt/lists' \
        --exclude '/var/cache/apt' \
        --exclude '/var/tmp' \
        "$REPOSITORY::{hostname}-{now:%Y-%m-%d_%H:%M:%S}" \
        / /boot/efi /srv/vm

@ThomasWaldmann ThomasWaldmann added this to the 1.1.1rc1 milestone Oct 10, 2017
@ThomasWaldmann
Copy link
Member

Looks like a bug. As you did not use --read-special, there should be no reason to open device files.

@ThomasWaldmann ThomasWaldmann self-assigned this Oct 10, 2017
@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Oct 11, 2017

It's the bsdflags support trying to open every file, even device files:

def get_flags(path, st):
    cdef int linux_flags
    try:
        fd = os.open(path, os.O_RDONLY|os.O_NONBLOCK|os.O_NOFOLLOW)
    except OSError:
        return 0
    try:
        if ioctl(fd, FS_IOC_GETFLAGS, &linux_flags) == -1:
            return 0
    finally:
        os.close(fd)
    bsd_flags = 0
    for bsd_flag, linux_flag in BSD_TO_LINUX_FLAGS.items():
        if linux_flags & linux_flag:
            bsd_flags |= bsd_flag
    return bsd_flags

For non-existent devices, there will be an OSError "no such device".

BSD_TO_LINUX_FLAGS = {
    # guess setting NODUMP on a device file isn't really happening
    stat.UF_NODUMP: FS_NODUMP_FL,
    # immutable for not-regular files also doesn't make much sense
    stat.UF_IMMUTABLE: FS_IMMUTABLE_FL,
    # append for not-regular files also doesn't make much sense
    stat.UF_APPEND: FS_APPEND_FL,
    # compressed for not-regular files also doesn't make much sense
    stat.UF_COMPRESSED: FS_COMPR_FL,
}

@enkore So, just skip bsdflags processing for fs items that are not regular files or directories?

ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Oct 11, 2017
opening a device file for a non-existing device can be very slow.

also: lstat -> stat(..., follow_symlinks=False) like everywhere else.
@shartge
Copy link
Author

shartge commented Oct 11, 2017

I can confirm 96691ba fixes #3130 for me, the backup run-time is back to its normal value of about 45s to 60s instead of taking up to 15 minutes.

@ThomasWaldmann
Copy link
Member

Fix in #3137 updated.

ThomasWaldmann added a commit that referenced this issue Oct 13, 2017
bsdflags support: skip blk/chr/lnk file type, fixes #3130
ThomasWaldmann added a commit to ThomasWaldmann/borg that referenced this issue Oct 13, 2017
opening a device file for a non-existing device can be very slow.
symlinks will make the open() call fail as it is using O_NOFOLLOW.

also: lstat -> stat(..., follow_symlinks=False) like everywhere else.
(cherry picked from commit a6ee4e9)
ThomasWaldmann added a commit that referenced this issue Oct 14, 2017
bsdflags support: do not open BLK/CHR/LNK files, fixes #3130
@ghost ghost mentioned this issue Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants