Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed parsing of ld.so.cache on new glibc. #384

Closed

Conversation

cf-natali
Copy link
Contributor

@cf-natali cf-natali commented May 13, 2021

Since glibc 2.32, ld.so.cache now defaults to the "new" format, instead
of the "compat" format which was in use since glibc 2.2 (around 20 years
ago). It is now the default on e.g. Debian bullseye, and any recent Linux
distribution.

The code change adds support for the "new" format along with the existing
support for the "compat".

Before:

root@thinkpad:/home/cf/src/mesos/build# ldconfig -c new
root@thinkpad:/home/cf/src/mesos/build# ./bin/mesos-tests.sh --gtest_filter=*Ld*
[...]                                                                                                                                            
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LdcacheTest
[ RUN      ] LdcacheTest.Parse
../../src/tests/ldcache_tests.cpp:43: Failure
cache: Invalid format
[  FAILED  ] LdcacheTest.Parse (0 ms)
[----------] 1 test from LdcacheTest (0 ms total)

[----------] 3 tests from Ldd
[ RUN      ] Ldd.BinSh
../../src/tests/ldd_tests.cpp:43: Failure
cache: Invalid format
[  FAILED  ] Ldd.BinSh (0 ms)
[ RUN      ] Ldd.EmptyCache
[       OK ] Ldd.EmptyCache (1 ms)
[ RUN      ] Ldd.MissingFile
../../src/tests/ldd_tests.cpp:77: Failure
cache: Invalid format
[  FAILED  ] Ldd.MissingFile (0 ms)
[----------] 3 tests from Ldd (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (8 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] LdcacheTest.Parse
[  FAILED  ] Ldd.BinSh
[  FAILED  ] Ldd.MissingFile

 3 FAILED TESTS

After:

root@thinkpad:/home/cf/src/mesos/build# ldconfig -c new
root@thinkpad:/home/cf/src/mesos/build# ./bin/mesos-tests.sh --gtest_filter=*Ld*
[...]                                                                                                                                                                                                                                                                  
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LdcacheTest
[ RUN      ] LdcacheTest.Parse
[       OK ] LdcacheTest.Parse (529 ms)
[----------] 1 test from LdcacheTest (529 ms total)

[----------] 3 tests from Ldd
[ RUN      ] Ldd.BinSh
[       OK ] Ldd.BinSh (3 ms)
[ RUN      ] Ldd.EmptyCache
[       OK ] Ldd.EmptyCache (0 ms)
[ RUN      ] Ldd.MissingFile
[       OK ] Ldd.MissingFile (0 ms)
[----------] 3 tests from Ldd (3 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (541 ms total)
[  PASSED  ] 4 tests.

Closes #10220.

@asekretenko @andreaspeters

@andreaspeters
Copy link
Contributor

For me it looks fine. How about @asekretenko ?

@cf-natali
Copy link
Contributor Author

FWIW it's breaking other things:

[ RUN      ] HealthCheckTest.ROOT_HealthyTaskWithContainerImage
../../src/tests/health_check_tests.cpp:505: Failure
(testImage).failure(): Failed to create docker test image rootfs: Failed to parse ld.so cache: Invalid format
[  FAILED  ] HealthCheckTest.ROOT_HealthyTaskWithContainerImage (26 ms)

@cf-natali
Copy link
Contributor Author

@bmahler

@cf-natali
Copy link
Contributor Author

@qianzhangxa

@qianzhangxa
Copy link
Contributor

qianzhangxa commented May 29, 2021

FWIW it's breaking other things:

[ RUN      ] HealthCheckTest.ROOT_HealthyTaskWithContainerImage
../../src/tests/health_check_tests.cpp:505: Failure
(testImage).failure(): Failed to create docker test image rootfs: Failed to parse ld.so cache: Invalid format
[  FAILED  ] HealthCheckTest.ROOT_HealthyTaskWithContainerImage (26 ms)

@cf-natali Did you run this test in the same machine with the other tests (like LdcacheTest.Parse), all of these tests will call ldcache::parse() internally, so it does not make sense that one succeeds but the other fails.

In another hand, there are a couple of places in ldcache::parse() which will error out with Invalid format, maybe we should use distinct error message in each place so that we can better troubleshooting this issue.

@cf-natali
Copy link
Contributor Author

cf-natali commented May 29, 2021 via email

src/linux/ldcache.cpp Show resolved Hide resolved
src/linux/ldcache.cpp Outdated Show resolved Hide resolved
src/linux/ldcache.cpp Outdated Show resolved Hide resolved
src/linux/ldcache.cpp Outdated Show resolved Hide resolved
@qianzhangxa
Copy link
Contributor

Dunno, the issue seems pretty clear to me, it affects anything parsing ldcache due to the format change. And the attached change fixes it.

NVM, I thought HealthCheckTest.ROOT_HealthyTaskWithContainerImage failed after your code changes in this PR was applied, sorry for the confusion.

@cf-natali
Copy link
Contributor Author

cf-natali commented May 29, 2021

Dunno, the issue seems pretty clear to me, it affects anything parsing ldcache due to the format change. And the attached change fixes it.

NVM, I thought HealthCheckTest.ROOT_HealthyTaskWithContainerImage failed after your code changes in this PR was applied, sorry for the confusion.

Aha no worries it happens!

I think all comments are addressed now, let me know if you need anything else :).

@cf-natali
Copy link
Contributor Author

Ah and by the way, here's how I tested it:
compat format - until 2.32/2.31:

root@thinkpad:/home/cf/src/mesos/build# ldconfig -c compat
root@thinkpad:/home/cf/src/mesos/build# ./bin/mesos-tests.sh --gtest_filter=*Ld* 2>&1 | tail -n13

[----------] 3 tests from Ldd
[ RUN      ] Ldd.BinSh
[       OK ] Ldd.BinSh (2 ms)
[ RUN      ] Ldd.EmptyCache
[       OK ] Ldd.EmptyCache (0 ms)
[ RUN      ] Ldd.MissingFile
[       OK ] Ldd.MissingFile (1 ms)
[----------] 3 tests from Ldd (3 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (548 ms total)
[  PASSED  ] 4 tests.

new format - supported by this change:

root@thinkpad:/home/cf/src/mesos/build# ldconfig -c new
root@thinkpad:/home/cf/src/mesos/build# ./bin/mesos-tests.sh --gtest_filter=*Ld* 2>&1 | tail -n13

[----------] 3 tests from Ldd
[ RUN      ] Ldd.BinSh
[       OK ] Ldd.BinSh (3 ms)
[ RUN      ] Ldd.EmptyCache
[       OK ] Ldd.EmptyCache (0 ms)
[ RUN      ] Ldd.MissingFile
[       OK ] Ldd.MissingFile (1 ms)
[----------] 3 tests from Ldd (4 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (550 ms total)
[  PASSED  ] 4 tests.

old format - never supported - as a sanity check:

root@thinkpad:/home/cf/src/mesos/build# ldconfig -c old
root@thinkpad:/home/cf/src/mesos/build# ./bin/mesos-tests.sh --gtest_filter=*Ld* 2>&1 | tail -n13
cache: Invalid format
[  FAILED  ] Ldd.MissingFile (0 ms)
[----------] 3 tests from Ldd (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (7 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] LdcacheTest.Parse
[  FAILED  ] Ldd.BinSh
[  FAILED  ] Ldd.MissingFile

 3 FAILED TESTS

src/linux/ldcache.cpp Outdated Show resolved Hide resolved
@qianzhangxa
Copy link
Contributor

I think all comments are addressed now, let me know if you need anything else :).

Just posted a minor comment, otherwise I think we are good to merge :)

Since glibc 2.32, ld.so.cache now defaults to the "new" format, instead
of the "compat" format which was in use since glibc 2.2 (around 20 years
ago).
Closes #10220.
@asfgit asfgit closed this in 9598db0 May 31, 2021
@qianzhangxa
Copy link
Contributor

The commit in this PR has been merged, and I also marked https://issues.apache.org/jira/browse/MESOS-10220 as resolved, thanks @cf-natali for your contribution!

Lqp1 pushed a commit to criteo-forks/mesos that referenced this pull request Feb 23, 2024
Since glibc 2.32, `ld.so.cache` now defaults to the "new" format, instead
of the "compat" format which was in use since glibc 2.2 (around 20 years
ago). It is now the default on e.g. Debian bullseye, and any recent Linux
distribution.

The code change adds support for the "new" format along with the existing
support for the "compat".

Before:
```
root@thinkpad:/home/cf/src/mesos/build# ldconfig -c new
root@thinkpad:/home/cf/src/mesos/build# ./bin/mesos-tests.sh --gtest_filter=*Ld*
[...]
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LdcacheTest
[ RUN      ] LdcacheTest.Parse
../../src/tests/ldcache_tests.cpp:43: Failure
cache: Invalid format
[  FAILED  ] LdcacheTest.Parse (0 ms)
[----------] 1 test from LdcacheTest (0 ms total)

[----------] 3 tests from Ldd
[ RUN      ] Ldd.BinSh
../../src/tests/ldd_tests.cpp:43: Failure
cache: Invalid format
[  FAILED  ] Ldd.BinSh (0 ms)
[ RUN      ] Ldd.EmptyCache
[       OK ] Ldd.EmptyCache (1 ms)
[ RUN      ] Ldd.MissingFile
../../src/tests/ldd_tests.cpp:77: Failure
cache: Invalid format
[  FAILED  ] Ldd.MissingFile (0 ms)
[----------] 3 tests from Ldd (1 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (8 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 3 tests, listed below:
[  FAILED  ] LdcacheTest.Parse
[  FAILED  ] Ldd.BinSh
[  FAILED  ] Ldd.MissingFile

 3 FAILED TESTS
```

After:

```
root@thinkpad:/home/cf/src/mesos/build# ldconfig -c new
root@thinkpad:/home/cf/src/mesos/build# ./bin/mesos-tests.sh --gtest_filter=*Ld*
[...]
[==========] Running 4 tests from 2 test cases.
[----------] Global test environment set-up.
[----------] 1 test from LdcacheTest
[ RUN      ] LdcacheTest.Parse
[       OK ] LdcacheTest.Parse (529 ms)
[----------] 1 test from LdcacheTest (529 ms total)

[----------] 3 tests from Ldd
[ RUN      ] Ldd.BinSh
[       OK ] Ldd.BinSh (3 ms)
[ RUN      ] Ldd.EmptyCache
[       OK ] Ldd.EmptyCache (0 ms)
[ RUN      ] Ldd.MissingFile
[       OK ] Ldd.MissingFile (0 ms)
[----------] 3 tests from Ldd (3 ms total)

[----------] Global test environment tear-down
[==========] 4 tests from 2 test cases ran. (541 ms total)
[  PASSED  ] 4 tests.
```

This closes apache#384

(cherry picked from commit 9598db0)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants