-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TensorFlow] TF2.11 EC2 release #2381
Conversation
Please update PR description |
|
||
ARG OPEN_MPI_PATH=/opt/amazon/openmpi | ||
ARG EFA_PATH=/opt/amazon/efa | ||
ARG EFA_VERSION=1.17.2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 1.17? Could we get confirmation from other stakeholders that rely on the EFA installed within the TF DLC that this version works for them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We had agreed upon keeping EFA as is to mimic TF2.10. There has been no pushback on this EFA version
# The 'apt-get install' of nvinfer-runtime-trt-repo-ubuntu1804-5.0.2-ga-cuda10.0 | ||
# adds a new list which contains libnvinfer library, so it needs another | ||
# 'apt-get update' to retrieve that list before it can actually install the | ||
# library. | ||
# We don't install libnvinfer-dev since we don't need to build against TensorRT, | ||
# and libnvinfer4 doesn't contain libnvinfer.a static library. | ||
# nvinfer-runtime-trt-repo doesn't have a 1804-cuda10.1 version yet. see: | ||
# https://developer.download.nvidia.cn/compute/machine-learning/repos/ubuntu1804/x86_64/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still applicable after having moved to TF 2.11 and CUDA 11.2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still needed for TF2.11? If this is for CUDA 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an avenue for further exploration. Not a blocker to merging this PR.
GitHub Issue #, if available:
Note: If merging this PR should also close the associated Issue, please also add that Issue # to the Linked Issues section on the right.
Description
This PR is to release EC2 TF2.11 Images for Training. This will not release SM as that is a separate Currency Release
PR Checklist
Benchmark Testing Checklist
dlc_developer_config.toml
in my PR branch by settingbenchmark_mode = true
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.