Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TensorFlow] TF2.11 EC2 release #2381

Merged
merged 45 commits into from
Nov 30, 2022
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
0ce84d9
Copy 2.10 -> 2.11, Updated BuildSpec for 2.11
ohadkatz Nov 2, 2022
e96cf3a
Removed SageMaker Build Phases from DockerFile
ohadkatz Nov 2, 2022
7b418ff
Updated EC2 Image DockerFiles with Latest Versions from Release Track…
ohadkatz Nov 2, 2022
43ff880
DLC Developer Config Updated
ohadkatz Nov 2, 2022
73e1714
Updated WHL's with RC2
ohadkatz Nov 2, 2022
29d4af4
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 2, 2022
de0da9a
Updated Binary
ohadkatz Nov 2, 2022
13b5ace
Upgraded horovod to latest version
ohadkatz Nov 4, 2022
e9556c7
Updated test safety check report
ohadkatz Nov 4, 2022
bb74501
Updated with RC2 install
ohadkatz Nov 7, 2022
f1405be
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 7, 2022
dd2c7bd
Updated with tf2.11 dep check
ohadkatz Nov 7, 2022
320e328
Reverted safety check commit
ohadkatz Nov 7, 2022
b510051
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 7, 2022
73d7645
Reverted TF Version for testing
ohadkatz Nov 7, 2022
6f05e8f
Debugging horovod test
ohadkatz Nov 8, 2022
529ceff
Added testing allowlist and updated horovod for debugging
ohadkatz Nov 8, 2022
6a8ec9f
Added legacy optimizers to support hvd
ohadkatz Nov 8, 2022
0424fef
Upgraded horovod to latest version
ohadkatz Nov 8, 2022
8e0e024
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 8, 2022
1ecf9d7
Reverted run_data_service_test
ohadkatz Nov 9, 2022
eaa5e7f
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 9, 2022
1504985
Revert horovod for testing
ohadkatz Nov 9, 2022
30952c6
Testing updates to DLAMI mapping
ohadkatz Nov 9, 2022
79fb4bd
Update horovod version
ohadkatz Nov 9, 2022
faf1278
Updated data service test
ohadkatz Nov 9, 2022
07952c4
Reverted AMI Instance types
ohadkatz Nov 10, 2022
951053a
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 10, 2022
46c2cfb
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 10, 2022
7c13fd8
Official Release of 2.11 Binary Addition
ohadkatz Nov 18, 2022
ad67602
Updated protobuf version due to incompatibility
ohadkatz Nov 18, 2022
ef06fab
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 18, 2022
430fdbf
Updated tensorflow testing with new protobuf requirement
ohadkatz Nov 18, 2022
8188ff9
Updated Protobuf Version
ohadkatz Nov 21, 2022
ff4718b
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 21, 2022
ea1855a
Merge branch 'master' into tf2.11_ec2_release
ohadkatz Nov 29, 2022
c1cfc72
Updated EC2 tf tests with latest master
ohadkatz Nov 29, 2022
a0a975d
Updated Tests for Backwards Compatibility
ohadkatz Nov 29, 2022
699771b
Updated logic for tests to update optimizers
ohadkatz Nov 29, 2022
495a227
Added Protobuf Upper Bound Comment
ohadkatz Nov 29, 2022
c5c7f37
Added same comments to GPU, updated developer config to test TF2.10 b…
ohadkatz Nov 29, 2022
f2683e5
TF2.10 Test
ohadkatz Nov 29, 2022
9326d3b
Addressed nit, protobuf restrictions for lower bound, and updated opt…
ohadkatz Nov 29, 2022
698797c
Revert Developer Config
ohadkatz Nov 30, 2022
1876143
Lower Bound set for TensorFlow Training
ohadkatz Nov 30, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions dlc_developer_config.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,19 +27,19 @@ benchmark_mode = false

[build]
# Frameworks for which you want to disable both builds and tests
# available frameworks - ["autogluon", "huggingface_tensorflow", "huggingface_pytorch", "huggingface_tensorflow_trcomp", "huggingface_pytorch_trcomp", "pytorch_trcomp", "tensorflow", "mxnet", "pytorch"]
skip_frameworks = []
# available frameworks - ["autogluon", "huggingface_tensorflow", "huggingface_pytorch", "huggingface_tensorflow_trcomp", "huggingface_pytorch_trcomp", "tensorflow", "mxnet", "pytorch"]
skip_frameworks = ["autogluon", "huggingface_tensorflow", "huggingface_pytorch", "huggingface_tensorflow_trcomp", "huggingface_pytorch_trcomp","mxnet", "pytorch"]
# Set to false in order to remove datetime tag on PR builds
datetime_tag = true
datetime_tag = false
# Note: Need to build the images at least once with datetime_tag = false
# before disabling new builds, or tests will fail
do_build = true

[test]
### On by default
sanity_tests = true
safety_check_test = false
ecr_scan_allowlist_feature = false
safety_check_test = true
ecr_scan_allowlist_feature = true
ecs_tests = true
eks_tests = true
ec2_tests = true
Expand Down
70 changes: 35 additions & 35 deletions tensorflow/buildspec.yml
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
account_id: &ACCOUNT_ID <set-$ACCOUNT_ID-in-environment>
region: &REGION <set-$REGION-in-environment>
framework: &FRAMEWORK tensorflow
version: &VERSION 2.10.0
short_version: &SHORT_VERSION "2.10"
version: &VERSION 2.11.0
short_version: &SHORT_VERSION "2.11"
arch_type: x86

repository_info:
Expand Down Expand Up @@ -92,39 +92,39 @@ images:
context:
<<: *TRAINING_CONTEXT

BuildTensorflowSageMakerGpuPy39Cu112TrainingDockerImage:
<<: *TRAINING_REPOSITORY
build: &TENSORFLOW_GPU_TRAINING_PY3 false
image_size_baseline: &IMAGE_SIZE_BASELINE 11000
device_type: &DEVICE_TYPE gpu
python_version: &DOCKER_PYTHON_VERSION py3
tag_python_version: &TAG_PYTHON_VERSION py39
cuda_version: &CUDA_VERSION cu112
os_version: &OS_VERSION ubuntu20.04
tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION,
"-", *OS_VERSION, "-sagemaker" ]
docker_file: !join [ docker/, *SHORT_VERSION, /, *DOCKER_PYTHON_VERSION, /, *CUDA_VERSION,
/Dockerfile., *DEVICE_TYPE ]
target: sagemaker
enable_test_promotion: true
context:
<<: *TRAINING_CONTEXT
BuildTensorflowSageMakerCpuPy39TrainingDockerImage:
<<: *TRAINING_REPOSITORY
build: &TENSORFLOW_CPU_TRAINING_PY3 false
image_size_baseline: &IMAGE_SIZE_BASELINE 7500
device_type: &DEVICE_TYPE cpu
python_version: &DOCKER_PYTHON_VERSION py3
tag_python_version: &TAG_PYTHON_VERSION py39
os_version: &OS_VERSION ubuntu20.04
tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *OS_VERSION, "-sagemaker"
]
docker_file: !join [ docker/, *SHORT_VERSION, /, *DOCKER_PYTHON_VERSION, /Dockerfile.,
*DEVICE_TYPE ]
target: sagemaker
enable_test_promotion: true
context:
<<: *TRAINING_CONTEXT
# BuildTensorflowSageMakerGpuPy39Cu112TrainingDockerImage:
# <<: *TRAINING_REPOSITORY
# build: &TENSORFLOW_GPU_TRAINING_PY3 false
# image_size_baseline: &IMAGE_SIZE_BASELINE 11000
# device_type: &DEVICE_TYPE gpu
# python_version: &DOCKER_PYTHON_VERSION py3
# tag_python_version: &TAG_PYTHON_VERSION py39
# cuda_version: &CUDA_VERSION cu112
# os_version: &OS_VERSION ubuntu20.04
# tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *CUDA_VERSION,
# "-", *OS_VERSION, "-sagemaker" ]
# docker_file: !join [ docker/, *SHORT_VERSION, /, *DOCKER_PYTHON_VERSION, /, *CUDA_VERSION,
# /Dockerfile., *DEVICE_TYPE ]
# target: sagemaker
# enable_test_promotion: true
# context:
# <<: *TRAINING_CONTEXT
# BuildTensorflowSageMakerCpuPy39TrainingDockerImage:
# <<: *TRAINING_REPOSITORY
# build: &TENSORFLOW_CPU_TRAINING_PY3 false
# image_size_baseline: &IMAGE_SIZE_BASELINE 7500
# device_type: &DEVICE_TYPE cpu
# python_version: &DOCKER_PYTHON_VERSION py3
# tag_python_version: &TAG_PYTHON_VERSION py39
# os_version: &OS_VERSION ubuntu20.04
# tag: !join [ *VERSION, "-", *DEVICE_TYPE, "-", *TAG_PYTHON_VERSION, "-", *OS_VERSION, "-sagemaker"
# ]
# docker_file: !join [ docker/, *SHORT_VERSION, /, *DOCKER_PYTHON_VERSION, /Dockerfile.,
# *DEVICE_TYPE ]
# target: sagemaker
# enable_test_promotion: true
# context:
# <<: *TRAINING_CONTEXT

# BuildEC2TensorflowCPUInferencePy3DockerImage:
# <<: *INFERENCE_REPOSITORY
Expand Down
Loading