Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix listing / tracing usdt probes in shared libraries #1600

Merged
merged 3 commits into from Jan 20, 2021

Conversation

xdrop
Copy link
Contributor

@xdrop xdrop commented Nov 8, 2020

Issue

Listing or tracing a usdt probes contained in a linked shared libs doesn't work.

# bpftrace -e 'usdt:./tests/testlibs/libusdt_tp.so:tracetestlib:lib_probe_1 { printf("hit\n");}' -p 21488
No probes to attach

Cause

During read_probes_from_pid in ustd.cpp we collect usdt probes from bcc and we maintain a cache of paths --> providers --> tracepoints
https://github.com/iovisor/bpftrace/blob/80d2d3c3144b8c74b84e12a72b6b34004229c2cb/src/usdt.cpp#L23-L29

In the scenario above the cache looks something like:

usdt_provider_cache = {
	"/proc/43133/root/bcc-build/bpftrace/build/tests/testlibs/libusdt_tp.so": {
		"testlibprovider": [tracepoint1, ...]
	}
}

Later in USDTHelper::probes_for_pid we get given a pid and try to perform a lookup based on the pid exe path.
https://github.com/iovisor/bpftrace/blob/80d2d3c3144b8c74b84e12a72b6b34004229c2cb/src/usdt.cpp#L78-L85

In this case it tries to look for /proc/43133/root/bcc-build/bpftrace/build-mine/tests/testprogs/usdt_lib in the cache, however only /proc/43133/root/bcc-build/bpftrace/build/tests/testlibs/libusdt_tp.so exists, and so it fails to find the probe.

Fix

The fix is when we iterate the probes given by bcc, we populate another separate map, that maps the paths for each pid found by bcc. So rather than trying to compute the path to look for in the cache using the pid's exe (https://github.com/iovisor/bpftrace/blob/80d2d3c3144b8c74b84e12a72b6b34004229c2cb/src/usdt.cpp#L78-L80) we instead can get the paths for the pid by indexing usdt_pid_to_paths_cache[pid]. This way we don't risk a cache miss due to different paths found by bcc / computed by bpftrace.

Checklist
  • User-visible and non-trivial changes updated in CHANGELOG.md
  • The new behaviour is covered by tests

@xdrop xdrop force-pushed the bugfix/usdt-probes-in-shared-lib branch from eb3b8ca to 738ebcf Compare November 8, 2020 14:13
@fbs fbs changed the title wip usdt in namespace fixes Nov 8, 2020
@xdrop xdrop force-pushed the bugfix/usdt-probes-in-shared-lib branch 12 times, most recently from 7d04e86 to 89799f9 Compare November 10, 2020 18:49
@xdrop xdrop closed this Nov 10, 2020
@xdrop xdrop reopened this Nov 10, 2020
@xdrop xdrop force-pushed the bugfix/usdt-probes-in-shared-lib branch 5 times, most recently from 9ef3cf2 to b8f197a Compare November 11, 2020 18:34
@xdrop xdrop marked this pull request as ready for review November 11, 2020 19:56
@fbs
Copy link
Contributor

fbs commented Nov 18, 2020

@dalehamel can you take a look at this one? :)

@dalehamel
Copy link
Contributor

@xdrop can you explain fully the situations that previously failed but which now pass? Eg, "without this patch, the following script fails". Going off of your tests, it looks like https://github.com/iovisor/bpftrace/pull/1600/files#diff-cc0e770d906adc6e162b517dcb638e6a50c2af8f1b5689421719743502eaeae2R137 is the case you are trying to ensure doesn't regress?

I have seen similar issues in the past but thought they had been fixed, but perhaps we are no longer using the bcc APIs, which would enter the mount namespace correctly. I see that you are introducing a map for caching the path lookups, in the past we have had issues with cache coherency when we've done similar things.

Does this only fail if you try to specify a path that explicitly does have "/proc/PID/..." in it? Ie, if you specify "-p", and a path relative to the mount namespace, it still works correctly, right?

Also, if you have two separate fixes (one for the /proc relative path, path, and one for the shared libraries), they should probably be in two separate PRs, as it will make it easier to review. The library part of this looks great, I'm still struggling to grasp why we need a path cache though for the /proc relative problem though.

Thanks for adding tests for both cases either way!

@xdrop
Copy link
Contributor Author

xdrop commented Nov 18, 2020

@xdrop can you explain fully the situations that previously failed but which now pass? Eg, "without this patch, the following script fails". Going off of your tests, it looks like https://github.com/iovisor/bpftrace/pull/1600/files#diff-cc0e770d906adc6e162b517dcb638e6a50c2af8f1b5689421719743502eaeae2R137 is the case you are trying to ensure doesn't regress?

Sure :)

Without 24c1267, this fails: (Under the condition that pid 83320 is in a different mount namespace, than the one bpftrace is running in.)

root:/tmp# bpftrace -e 'usdt:/usr/local/bin/python3.8:function__entry { printf("hit\n")}' -p 83320
ERROR: failed to write elf: filesystem error: cannot canonicalize: No such file or directory [/proc/83320/root/usr/local/bin/python3.8]

Without 8151e3a, this fails (or tracing any usdt tracepoint in a shared lib):

bpftrace -e 'usdt:/usr/local/lib/libpython3.8.so.1.0:function__entry { printf("hit\n")}' -p 83320

I think this fails regardless of whether we are in a different mount namespace or not because without my changes the library tests fail (which of course don't run in a different namespace). But can confirm tomorrow for sure.

I see that you are introducing a map for caching the path lookups, in the past we have had issues with cache coherency when we've done similar things.

Yeah I couldn't figure out any other way to get the tracepoint paths detected by bcc other than to store them in a map during it's collection of tracepoints in bcc_usdt_foreach(ctx, usdt_probe_each). But happy to accept suggestions.

Does this only fail if you try to specify a path that explicitly does have "/proc/PID/..." in it? Ie, if you specify "-p", and a path relative to the mount namespace, it still works correctly, right?

Nope it didn't work for me, maybe it's a regression since abs_path wasn't there forever? The test that's supposed to cover this scenario looks disabled.
https://github.com/iovisor/bpftrace/blob/53100892ae70d1ca7a456e6d3665270dbb1715cf/tests/runtime/usdt#L225-L229

I might give a go re-enabling it since it has passed at least on my system with my changes (and fails without).

Also, if you have two separate fixes (one for the /proc relative path, path, and one for the shared libraries), they should probably be in two separate PRs, as it will make it easier to review. The library part of this looks great, I'm still struggling to grasp why we need a path cache though for the /proc relative problem though.

We don't need the cache for the /proc issue, the cache is for storing the probes identified by bcc (fixes the shared library issue). I guess that re-emphasizes your point in splitting this into two PRs so that it's clear what fixes what. I will do that tomorrow.

Basically what happens is bcc finds the tracepoint for my pid, and correctly realizes it's in a shared lib (which is say /usr/local/lib/libpython3.8.so.1.0) and so puts it in the path cache as /usr/local/lib/libpython3.8.so.1.0. Then we attempt to do a cache lookup by using the pids exe path instead, thus failing to find it in the cache as it's under a different path (the lib path not the exe path).
https://github.com/iovisor/bpftrace/blob/53100892ae70d1ca7a456e6d3665270dbb1715cf/src/usdt.cpp#L74-L90

@dalehamel
Copy link
Contributor

Nope it didn't work for me, maybe it's a regression since abs_path wasn't there forever? The test that's supposed to cover this scenario looks disabled.

Lol there it is.

I was thinking about https://github.com/iovisor/bcc/pull/2817/files while reading this. There have definitely been a number of changes in this logic in BCC, so it's not surprising that this behavior has broken, but I was surprised because I was quite sure it worked in the past. The bcc that has the fix has long since been released (it is up to 0.17.0 now) so we should separately bump bcc and re-enable the test.

I guess that re-emphasizes your point in splitting this into two PRs so that it's clear what fixes what. I will do that tomorrow.

awesome!

If we split this up, we can also make an issue for re-enabling that regression test. Thanks for the great work here again, the extra regression test will keep this 💪

@xdrop xdrop force-pushed the bugfix/usdt-probes-in-shared-lib branch from b8f197a to 5808b76 Compare November 19, 2020 14:33
@xdrop xdrop changed the title usdt in namespace fixes Fix listing / tracing usdt probes in shared libraries Nov 19, 2020
@xdrop xdrop force-pushed the bugfix/usdt-probes-in-shared-lib branch 2 times, most recently from 382e193 to 88200da Compare November 19, 2020 18:58
@xdrop
Copy link
Contributor Author

xdrop commented Nov 19, 2020

Nope it didn't work for me, maybe it's a regression since abs_path wasn't there forever? The test that's supposed to cover this scenario looks disabled.

Lol there it is.

I was thinking about https://github.com/iovisor/bcc/pull/2817/files while reading this. There have definitely been a number of changes in this logic in BCC, so it's not surprising that this behavior has broken, but I was surprised because I was quite sure it worked in the past. The bcc that has the fix has long since been released (it is up to 0.17.0 now) so we should separately bump bcc and re-enable the test.

Great! I've created an issue for re-enabling the test #1638

If we split this up, we can also make an issue for re-enabling that regression test. Thanks for the great work here again, the extra regression test will keep this 💪

PR is now split, this PR contains the fixes + tests for shared libraries and #1637 contains the /proc mount ns fix. @dalehamel please review when you get a chance :)

usdt_probe_list probes;
for (auto const &usdt_probes : usdt_provider_cache[path])
for (auto const &path : usdt_pid_to_paths_cache[pid])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we cache pid → path here, and then the existing logic of the path caching is used right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly

@dalehamel
Copy link
Contributor

Thanks for splitting this up, it is easier to review now. Looks like you need to rebase for the changelog.md anyways.

@danobi since you have been into the USDT code more recently than me (and reviewed the other PR he split out of this) can you take a look as well?

@xdrop xdrop force-pushed the bugfix/usdt-probes-in-shared-lib branch from 88200da to b8564b2 Compare December 1, 2020 20:16
@xdrop
Copy link
Contributor Author

xdrop commented Dec 28, 2020

@danobi Have you had a chance to take a look? Right now I have to maintain my own fork to use bpftrace with USDT probes in shared libs, and it's not the most convenient option 😄 Let me know if I can be of any help

@fbs
Copy link
Contributor

fbs commented Jan 5, 2021

I hope to have this reviewed before the end of the week. Sorry for the delay

Copy link
Member

@danobi danobi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, really sorry about long delay.

Side note: all the global state in usdt.cpp really sucks. We should fix it at some point.

@xdrop
Copy link
Contributor Author

xdrop commented Jan 8, 2021

LGTM, really sorry about long delay.

No problem, thank you for reviewing. I will rebase to fix the changelog conflicts.

Side note: all the global state in usdt.cpp really sucks. We should fix it at some point.

Agreed.

@xdrop xdrop force-pushed the bugfix/usdt-probes-in-shared-lib branch from b8564b2 to 6c2b368 Compare January 8, 2021 18:42
@danobi
Copy link
Member

danobi commented Jan 12, 2021

ping @fbs

I'll merge in a few days if fbs is busy.

@fbs
Copy link
Contributor

fbs commented Jan 14, 2021

I'm not that familiar with the whole usdt part but it looks ok :)

The usdt bin paths may not necessarily correspond to the pids exe path
(they may be in a shared lib). This results in cache misses when the
tracepoint is in a shared library object and not in the main binary.

We instead chooses to keep track of the bin paths for each pid (provided
by bcc) and do a cache lookup based on those instead.
Adding runtime tests that ensure we can list and probe tracepoints not
present in the main binary, but rather in linked shared libraries.
@xdrop xdrop force-pushed the bugfix/usdt-probes-in-shared-lib branch from 6c2b368 to f04cece Compare January 14, 2021 19:27
@danobi danobi merged commit 3f0f13b into bpftrace:master Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants