Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Missing constructor for absl::synchronization_internal::KernelTimeout in shared builds #1630

Open
h-vetinari opened this issue Feb 27, 2024 · 2 comments

Comments

@h-vetinari
Copy link
Contributor

Describe the issue

While building tensorflow with new abseil 20240116.1, we ran into the following issue:

# Execution platform: @local_execution_config_platform//:platform
/home/conda/feedstock_root/build_artifacts/debug_1708918309173/_build_env/bin/../lib/gcc/x86_64-conda-linux-gnu/12.3.0/../../../../x86_64-conda-linux-gnu/bin/ld: bazel-out/k8-opt/bin/external/local_xla/xla/service/libslow_operation_alarm.pic.a(slow_operation_alarm.pic.o): in function `xla::SlowOperationAlarm::AlarmLoop()':
slow_operation_alarm.cc:(.text._ZN3xla18SlowOperationAlarm9AlarmLoopEv+0x21c): undefined reference to `absl::lts_20240116::synchronization_internal::KernelTimeout::KernelTimeout(absl::lts_20240116::Time)'

While it's clear where this symbol lives, we cannot link to absl/synchronization:kernel_timeout_internal, because it has private visibility.

However, after realizing tensorflow never explicitly uses KernelTimeout in its codebase, I decided to look back at abseil, and I think this is a similar situation to #1624, because AFAICT it concerns absl::Mutex and inlining (1cf6469).

In particular, in a shared build, some methods have to construct a synchronization_internal::KernelTimeout, e.g.

bool AwaitWithTimeout(const Condition& cond, absl::Duration timeout) {
return AwaitCommon(cond, synchronization_internal::KernelTimeout{timeout});
}
bool AwaitWithDeadline(const Condition& cond, absl::Time deadline) {
return AwaitCommon(cond, synchronization_internal::KernelTimeout{deadline});
}

Since that constructor cannot be found (due to visibility), we fail.

Steps to reproduce the problem

Build tensorflow against abseil 20240116.1

What version of Abseil are you using?

20240116.1

What operating system and version are you using?

Linux

What compiler and version are you using?

GCC 12, nvcc 12.0

What build system are you using?

bazel

Additional context

No response

@derekmauro
Copy link
Member

Does the patch proposed in #1624 (comment) fix the problem? That would be a hint that this is actually the same issue as #1624.

@h-vetinari
Copy link
Contributor Author

Does the patch proposed in #1624 (comment) fix the problem? That would be a hint that this is actually the same issue as #1624.

We ran into this problem with the patched abseil. There are three possibilities as far as I can see:

  • the patch is independent of the bug
  • the patch uncovered the bug
  • the patch introduced the bug

I still think that it's very closely related, because of the way how constructors/destructors are (apparently) missing in the shared library.

However, it is not the same in the sense that the tensorflow builds do set NDEBUG.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants