Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Newer glibc versions provide a lchmod() that needs /proc mounted (their bug) #109

Closed
alkisg opened this issue Oct 23, 2020 · 18 comments
Closed

Comments

@alkisg
Copy link

alkisg commented Oct 23, 2020

Hi, rsync 3.1.3 in Ubuntu 20.04 worked fine, while 3.2.3 in Ubuntu 20.10 has this problem:

# Emulate /proc not being mounted
$ mount --bind / /mnt
$ chroot /mnt rsync -a /bin/ls .
rsync: [receiver] failed to set permissions on "/.ls.CDExhu": Operation not supported (95)
rsync error: some files/attrs were not transferred (see previous errors) (code 3) at main.c(1330) [sender=3.2.3]

This also happens in Fedora 33 beta with rsync 3.2.2, so I believe it's not distro-specific. Thank you!

@WayneD
Copy link
Member

WayneD commented Oct 28, 2020

Rsync is telling you that the chmod() or lchmod() call failed (which is used to set permissions). If that is failing, it's an issue with the OS or the C library.

@WayneD WayneD closed this as completed Oct 28, 2020
@alkisg alkisg changed the title rsync 3.2.3 needs /proc mounted Building in new distributions makes rsync need /proc mounted Oct 29, 2020
@alkisg
Copy link
Author

alkisg commented Oct 29, 2020

I did some more tests. It is not related to the rsync version after all, so I updated this issue title. It's caused by something in the build environment that newer distributions have. Specifically:

  • Building any rsync version in Ubuntu 20.10 causes the problem. If one copies the "buggy" binary in other distributions or versions, it continues to have the bug.
  • Building any rsync version in Ubuntu 20.04 avoids the problem. If one copies the "good" binary in other distributions or versions, it continues to not have the bug.

I'll continue digging, but I'm not too skilled with the configure phase and the build-deps of rsync, so any hints will be appreciated...

Strace says that the issue happens when rsync tries to chmod("/proc/self/fd/0"):

[pid 43165] chmod("/proc/self/fd/0", 0755) = -1 ENOENT (No such file or directory)
rsync: failed to set permissions on "/tmp/ls": Operation not supported (95)

@alkisg
Copy link
Author

alkisg commented Oct 29, 2020

This is indeed caused by the lchmod() call in rsync/syscall.c.

Specifically, when compiling rsync in 20.04, I see: "checking for lchmod... no",
while when compiling in Ubuntu 20.10, I see: "checking for lchmod... yes".

From what I read, Linux/POSIX isn't supposed to support lchmod at all, so I'm not sure why configure says "yes" there. And a small test.c program proves that lchmod() internally tries to access /proc/self/fd, even if it's not mounted.

printf("lchmod returned: %d\n", lchmod("/tmp/ls", 0755));

Maybe rsync can just disable HAVE_LCHMOD when it detects that it's being compiled under Linux?

@alkisg
Copy link
Author

alkisg commented Oct 29, 2020

Python did just that, it disabled lchmod when running under Linux:

https://bugs.python.org/issue34652
python/cpython@69e9691#diff-49473dca262eeab3b4a43002adb08b4db31020d190caaad1594b47f1d5daa810R3140

if test "$MACHDEP" != linux; then
  AC_CHECK_FUNC(lchmod)
fi

I reported this to Ubuntu, in case they want to resolve it with a distro-specific patch, although that way it will continue to fail in the other Linux distributions:

https://bugs.launchpad.net/ubuntu/+source/rsync/+bug/1902109

@alkisg
Copy link
Author

alkisg commented Oct 30, 2020

I located an upstream bug report for this in glibc:
https://sourceware.org/bugzilla/show_bug.cgi?id=26401

@WayneD
Copy link
Member

WayneD commented Oct 30, 2020

Nice investigating. I went ahead and disabled lchmod on Linux in the configure check (for now at least).

@WayneD WayneD changed the title Building in new distributions makes rsync need /proc mounted Newer glic versions provide a lchmod() that needs /proc mounted (their bug) Nov 1, 2020
@arekm
Copy link

arekm commented Nov 6, 2020

Why not like this?

--- rsync-3.2.3/syscall.c~	2020-07-28 01:36:55.000000000 +0200
+++ rsync-3.2.3/syscall.c	2020-11-06 17:26:04.220502740 +0100
@@ -232,7 +232,8 @@ int do_chmod(const char *path, mode_t mo
 	RETURN_ERROR_IF_RO_OR_LO;
 #ifdef HAVE_LCHMOD
 	code = lchmod(path, mode & CHMOD_BITS);
-#else
+	if (code < 0 && errno == ENOTSUP) {
+#endif
 	if (S_ISLNK(mode)) {
 # if defined HAVE_SETATTRLIST
 		struct attrlist attrList;
@@ -247,6 +248,8 @@ int do_chmod(const char *path, mode_t mo
 # endif
 	} else
 		code = chmod(path, mode & CHMOD_BITS); /* DISCOURAGED FUNCTION */
+#ifdef HAVE_LCHMOD
+	}
 #endif /* !HAVE_LCHMOD */
 	if (code != 0 && (preserve_perms || preserve_executability))
 		return code;

@WayneD
Copy link
Member

WayneD commented Nov 29, 2020

Good point. I decided to use a state-machine like the set_times() call in util.c. So, it now tries to use lchmod(), but if that returns ENOTSUP, it switches over to using chmod().

@carnil
Copy link

carnil commented Sep 17, 2021

For cross-reference 85b8dc8 and then 9dd6252 deals with this issue as workaround.

@samueloph
Copy link
Contributor

samueloph commented Sep 25, 2021

@WayneD Do you think it's possible that the last patch introduced an issue with rsync + libc6 2.31?
I'm talking about this one 9dd6252

We added this patch on Debian and it's present on rsync >= 3.2.3-7.

Then I noticed that our integration tests started to fail with libc6 2.31 with the error Function not implemented (38), and the failures went away once libc6 2.32 was used. As if that commit fixed the issue for libc6 2.32 but it made it happen with libc6 < 2.31 instead.

There's also someone saying they had the same issue, though with libc6 2.30:
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=995046

My suspicion is that the last commit done to fix this issue introduces a regression on rsync + libc6 <= 2.31. Do you think that makes sense?

@WayneD
Copy link
Member

WayneD commented Sep 27, 2021

Try commenting out the switch_step++; line in do_chmod() and see if that makes the issue on an older libc go away. It's possible that something returned ENOTSUP, and then rsync on linux would stop being able to chmod symlinks. If that fixes the issue I'll change the code so that it always tries the lchmod() first. That's probably safer (I was just trying to make it more optimized when it was in a situation where lchmod() always fails).

@samueloph
Copy link
Contributor

Thank you @WayneD,

Unfortunately I wasn't able to reproduce the issue, I've tried to use a shortcut to avoid setting up everything needed for a full reproduction, but now I'll have to since I don't know whether the issue was caused by something else or I failed by not trying on the same environment where I saw the previous failures.

In case I manage to reproduce it, I will let you know if the workaround you mentioned was effective.

@samueloph
Copy link
Contributor

I was now able to reproduce the issue, it doesn't looks like an rsync issue per-se, but there might be room for improvement.

The issue arises when one compiles rsync 3.2.3 (with this patch applied 9dd6252) with libc6-dev 2.32 and then uses the binary in a system which has libc6 <= 2.31.
Reproduction can be done with:
rsync -va /etc/ld.so.cache /tmp

sending incremental file list
ld.so.cache
rsync: [receiver] failed to set permissions on "/tmp/.ld.so.cache.VBH9uY": Function not implemented (38)

sent 29,946 bytes  received 35 bytes  59,962.00 bytes/sec
total size is 29,838  speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]

From a packaging POV, this issue can be solved by making the binaries depend on libc6 2.32 if libc6-dev used for compilation was set in that version. On Debian we have a system which automatically tries to guess the minimum required version based on symbols usage, but this systems fails to detect this as libc6's symbol for lchmod didn't "change" (alternative being to manually set that).

I can't identify the root cause of this, and I feel like I might still be onto the wrong lead since the patch doesn't seems to change behavior based on compilation time, only runtime. It's possible that I'm still having some weirdness from my reproduction method.

But overall it doesn't look like an rsync issue, since one can always bump the dependency of the compiled rsync manually to libc6-2.32. Though there might be something that could be done to allow a "libc6-dev-2.32 compiled rsync" to be run with libc6 <= 2.31.

Note: I've tried a few workarounds, like commenting out switch_step++; or adding EMFILE, ENFILE, and EOPNOTSUPP[0] to the errors the patch checks, last one being a "blind patch" (as I've only tried it due to the mention of those errors in the doc). None of them solved the issue.

[0] https://www.gnu.org/software/gnulib/manual/html_node/lchmod.html

@WayneD
Copy link
Member

WayneD commented Oct 10, 2021

Error 38 should be ENOSYS, which indicates to me that the lchmod() in the compiled rsync is trying to use a function not present in the older glibc. I'm adding that errno to the fallback code, which now only increments the switch_step if errno == ENOSYS (it allows ENOTSUP to fall through to the compatability code without incrementing the step, just in case that might be transient in some circumstances).

--- a/syscall.c
+++ b/syscall.c
@@ -238,9 +238,12 @@ int do_chmod(const char *path, mode_t mode)
 	switch (switch_step) {
 #ifdef HAVE_LCHMOD
 #include "case_N.h"
-		if ((code = lchmod(path, mode & CHMOD_BITS)) == 0 || errno != ENOTSUP)
+		if ((code = lchmod(path, mode & CHMOD_BITS)) == 0)
+			break;
+		if (errno == ENOSYS)
+			switch_step++;
+		else if (errno != ENOTSUP)
 			break;
-		switch_step++;
 #endif
 
 #include "case_N.h"

While this will work, it would be better if the package required the newer glibc (since lchmod is safer than chmod).

@samueloph
Copy link
Contributor

I can confirm the patch above fixes the issue.

Error 38 should be ENOSYS, which indicates to me that the lchmod() in the compiled rsync is trying to use a function not present in the older glibc.

I believe older glibc probably declared the lchmod symbol, even though it wasn't implemented, and thus our mechanism that checks for symbols usage failed to identify that such old versions wouldn't satisfy the dependency.

While this will work, it would be better if the package required the newer glibc (since lchmod is safer than chmod).

Right, I can't yet say for sure which way I'd like to go on the packaging side.

What I can say for sure is that people can only hit this scenario if they are trying to do something that is not really supported already, like picking a package that was compiled for Debian release X and using it in release Y (this could also happen with other distros).
The impact is not big, and I don't want you to feel bothered by this issue.

I see that you merged this change, so for now I'll keep the dependency requirement relaxed (automatically set based on symbols). I will try and discuss this issue with Debian's glibc maintainers, mainly to understand the weight of the benefits of lchmod over chmod, and if I should drop support for running rsync with older glibc (at least the versions compiled on newer Debian releases) over it.

I will also keep and eye on the commits, so in the case of you wanting to remove the unsafe fallback, I'd add the version requirement.
Please feel free to do what you consider the best approach on your end, I will be able to adapt the packaging in any case.

Thank you a lot!

@liushapku
Copy link

I had the some issue and now I need to compile my own version of rsync. I think according to the conversation, either 3.2.4dev or 3.2.3-7 should work.

I cloned the repo, and could not find any tag related to the above two. I then visited https://download.samba.org/pub/rsync/src/ and could not find any of them, too.

Could you please teach me how I can get the source code or binary? I am using ubuntu 21.10

Thanks a lot

@WayneD
Copy link
Member

WayneD commented Dec 26, 2021

There were a number of commits related to the bug, so I recommend the latest source for the most recent version of the fix. Give the CI build for Ubuntu 20.04 a try and see if it works on 21.10. The CI produces artifacts, so if you click on the Actions tab, click on a recent build, and scroll down, there will be a link to the ubuntu-bin zip file. That will require some library packages, such as libxxhash, libacl1, etc. If you need a list, I'd suggest looking at the INSTALL.md file and installing the non-dev versions of the listed packages (though the dev versions should be fine too).

@reikred
Copy link

reikred commented Oct 24, 2022

Pls. fix typo in title of the page we are currently on: glibc not "glic".

I know, it is just a silly typo, but it is significant for better search matching for this important bug report and solution.

@WayneD WayneD changed the title Newer glic versions provide a lchmod() that needs /proc mounted (their bug) Newer glibc versions provide a lchmod() that needs /proc mounted (their bug) Oct 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants