Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch setup.py for grpcio extension in TensorFlow 2.13.0 easyconfigs to take into account alternate sysroot #19268

Merged

Conversation

boegel
Copy link
Member

@boegel boegel commented Nov 20, 2023

(created using eb --new-pr)

fix for fatal error when building grpcio extension in TensorFlow 2.13.0 when using alternate sysroot (--sysroot EasyBuild configuration option, like we do in EESSI)

  /usr/include/stdio.h:781:10: fatal error: bits/sys_errlist.h: No such file or directory
    781 | #include <bits/sys_errlist.h>
        |          ^~~~~~~~~~~~~~~~~~~~
  compilation terminated.

@boegel boegel added bug fix EESSI Related to EESSI project labels Nov 20, 2023
@boegel boegel added this to the next release (4.9.0?) milestone Nov 20, 2023
@boegel
Copy link
Member Author

boegel commented Nov 20, 2023

@boegelbot please test @ generoso
CORE_CNT=16

@boegel boegel changed the title patch setup.py for grpcio extension in TensorFlow 2.13.0 easyconfigs to take into alternate sysroot patch setup.py for grpcio extension in TensorFlow 2.13.0 easyconfigs to take into account alternate sysroot Nov 20, 2023
@boegel boegel force-pushed the 20231120154028_new_pr_TensorFlow2130 branch from 3867c2c to e9b34d5 Compare November 20, 2023 14:44
@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=19268 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19268 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12195

Test results coming soon (I hope)...

- notification for comment with ID 1819198073 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/c53dffc1e5e97924c16ae08c453b2315 for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 20, 2023

@boegelbot please test @ jsc-zen2
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19268 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19268 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3763

Test results coming soon (I hope)...

- notification for comment with ID 1819731914 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 2 (2 easyconfigs in total)
jsczen2g1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/91e151c9d2681894e869fd3b66d6ec35 for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 21, 2023

Test report by @boegel
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
node3144.skitty.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz (skylake_avx512), Python 3.6.8
See https://gist.github.com/boegel/200e3aa081640fcc4e4b07696e6185ae for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 21, 2023

Error on jsc-zen2 is fallout of rebuilding pybind11 in #19270:

In file included from tensorflow/python/lite/toco_python_api_wrapper.cc:19:
/project/def-maintainers/boegelbot/Rocky8/zen2/software/pybind11/2.10.3-GCCcore-12.2.0/include/pybind11/pybind11.h:13:10: fatal error: detail/class.h: No such file or directory
   13 | #include "detail/class.h"
      |          ^~~~~~~~~~~~~~~~
compilation terminated.

@boegel
Copy link
Member Author

boegel commented Nov 21, 2023

@boegelbot please test @ jsc-zen2
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on jsczen2l1.int.jsc-zen2.easybuild-test.cluster

PR test command 'EB_PR=19268 EB_ARGS= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --mem-per-cpu=4000M --job-name test_PR_19268 --ntasks="16" ~/boegelbot/eb_from_pr_upload_jsc-zen2.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 3765

Test results coming soon (I hope)...

- notification for comment with ID 1820332848 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
jsczen2g1.int.jsc-zen2.easybuild-test.cluster - Linux Rocky Linux 8.5, x86_64, AMD EPYC 7742 64-Core Processor (zen2), Python 3.6.8
See https://gist.github.com/boegelbot/7fa527a94757a15c12da463b571cfb8f for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 22, 2023

@boegelbot please test @ generoso
CORE_CNT=16

@boegelbot
Copy link
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=19268 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_19268 --ntasks="16" ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 12210

Test results coming soon (I hope)...

- notification for comment with ID 1822182066 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 2 out of 2 (2 easyconfigs in total)
cnx1 - Linux Rocky Linux 8.5, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/38bf3631d882df302ecf53cd2365a141 for a full test report.

@verdurin
Copy link
Member

@boegel running a test build of this now and I saw this:
WARNING: 2 TensorFlow dependencies have not been resolved by EasyBuild. Check the log for details.

Is that expected?

@verdurin
Copy link
Member

Test report by @verdurin
SUCCESS
Build succeeded for 18 out of 18 (2 easyconfigs in total)
easybuild-c7.novalocal - Linux CentOS Linux 7.9.2009, x86_64, Intel Xeon Processor (Skylake, IBRS), Python 3.6.8
See https://gist.github.com/verdurin/4d116816f75d36ee321742e2bd3a0da8 for a full test report.

@boegel
Copy link
Member Author

boegel commented Nov 26, 2023

@boegel running a test build of this now and I saw this: WARNING: 2 TensorFlow dependencies have not been resolved by EasyBuild. Check the log for details.

Is that expected?

@Flamefire Thoughts on this?
There's no way that the changes proposed here introduce that problem...

@Flamefire
Copy link
Contributor

@Flamefire Thoughts on this? There's no way that the changes proposed here introduce that problem...

Maybe we should enhance the message to name the dependencies (possibly limited to a max of 3) and/or provide a hint on what to search for in the log.

One of those 2 is protobuf which isn't compatible (see comment in the EC) and the other is likely Abseil which had an issue. IIRC it was something about RE2, grpcio and TF using different incarnations of Abseil (the C++ variant) which made me give up trying to get all of those to work with an EB installed Abseil-cpp

Copy link
Member

@verdurin verdurin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fine.

@verdurin verdurin merged commit bdae6cc into easybuilders:develop Nov 28, 2023
9 checks passed
@boegel boegel deleted the 20231120154028_new_pr_TensorFlow2130 branch November 28, 2023 21:18
@boegel
Copy link
Member Author

boegel commented Dec 21, 2023

Test report by @boegel
Using easyblocks from PR(s) easybuilders/easybuild-easyblocks#3058
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
node4112.gallade.os - Linux Debian GNU/Linux 11 (bullseye), x86_64, AMD EPYC 7773X 64-Core Processor, Python 3.11.4
See https://gist.github.com/boegel/4b836778002cad16e1233f824e829316 for a full test report.

edit: this was tested in EESSI 2023.06 build environment, to verify the changes in easybuilders/easybuild-easyblocks#3058

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug fix EESSI Related to EESSI project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants