TGI 1.0.3 Release #3285

amzn-choeric · 2023-08-28T18:23:40Z

GitHub Issue #, if available:

Note:

If merging this PR should also close the associated Issue, please also add that Issue # to the Linked Issues section on the right.
All PR's are checked weekly for staleness. This PR will be closed if not updated in 30 days.

Description

Modifying the release images for TGI 1.0.2 for preparation.

Tests run

NOTE: By default, docker builds are disabled. In order to build your container, please update dlc_developer_config.toml and specify the framework to build in "build_frameworks"

I have run builds/tests on commit for my changes.

NOTE: If you are creating a PR for a new framework version, please ensure success of the standard, rc, and efa sagemaker remote tests by updating the dlc_developer_config.toml file:

Revision A: sagemaker_remote_tests = "standard"
Revision B: sagemaker_remote_tests = "rc"
Revision C: sagemaker_remote_tests = "efa"

Additionally, please run the sagemaker local tests in at least one revision:

sagemaker_local_tests = true

Formatting

I have run black -l 100 on my code (formatting tool: https://black.readthedocs.io/en/stable/getting_started.html)

DLC image/dockerfile

Additional context

PR Checklist

I've prepended PR tag with frameworks/job this applies to : [mxnet, tensorflow, pytorch] | [ei/neuron/graviton] | [build] | [test] | [benchmark] | [ec2, ecs, eks, sagemaker]
If the PR changes affects SM test, I've modified dlc_developer_config.toml in my PR branch by setting sagemaker_tests = true and efa_tests = true
If this PR changes existing code, the change fully backward compatible with pre-existing code. (Non backward-compatible changes need special approval.)
(If applicable) I've documented below the DLC image/dockerfile this relates to
(If applicable) I've documented below the tests I've run on the DLC image
(If applicable) I've reviewed the licenses of updated and new binaries and their dependencies to make sure all licenses are on the Apache Software Foundation Third Party License Policy Category A or Category B license list. See https://www.apache.org/legal/resolved.html.
(If applicable) I've scanned the updated and new binaries to make sure they do not have vulnerabilities associated with them.

Pytest Marker Checklist

(If applicable) I have added the marker @pytest.mark.model("<model-type>") to the new tests which I have added, to specify the Deep Learning model that is used in the test (use "N/A" if the test doesn't use a model)
(If applicable) I have added the marker @pytest.mark.integration("<feature-being-tested>") to the new tests which I have added, to specify the feature that will be tested
(If applicable) I have added the marker @pytest.mark.multinode(<integer-num-nodes>) to the new tests which I have added, to specify the number of nodes used on a multi-node test
(If applicable) I have added the marker @pytest.mark.processor(<"cpu"/"gpu"/"eia"/"neuron">) to the new tests which I have added, if a test is specifically applicable to only one processor type

EIA/NEURON/GRAVITON Testing Checklist

When creating a PR:

I've modified dlc_developer_config.toml in my PR branch by setting ei_mode = true, neuron_mode = true or graviton_mode = true

Benchmark Testing Checklist

When creating a PR:

I've modified dlc_developer_config.toml in my PR branch by setting benchmark_mode = true

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

ystarikovich · 2023-09-06T21:35:19Z

@haixiw is there a plan to merge this PR in the near future? Is there a possibility of accessing the image before the official release? I was checking the image 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.0.3-gpu-py39-cu118-ubuntu20.04 but the manifest is still unknown.

I that the PR - awslabs/llm-hosting-container#30 got already approved

haixiw · 2023-09-07T18:12:34Z

@haixiw is there a plan to merge this PR in the near future? Is there a possibility of accessing the image before the official release? I was checking the image 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.0.3-gpu-py39-cu118-ubuntu20.04 but the manifest is still unknown.

I that the PR - awslabs/llm-hosting-container#30 got already approved

Yeah we are gonna release this in a week or so. Accessing image before official image is not allowed by the process, But let me know if it's urgently needed Thanks

ystarikovich · 2023-09-08T08:25:25Z

@haixiw is there a plan to merge this PR in the near future? Is there a possibility of accessing the image before the official release? I was checking the image 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.0.1-tgi1.0.3-gpu-py39-cu118-ubuntu20.04 but the manifest is still unknown.
I that the PR - awslabs/llm-hosting-container#30 got already approved

Yeah we are gonna release this in a week or so. Accessing image before official image is not allowed by the process, But let me know if it's urgently needed Thanks

Would be great to have EAP. I created the AWS case as well for the same, but generally speaking this would be helpful to have earlier access. We found a workaround, but maybe we will hit again some limitations and will be forced to use TGI 1.0.3. So I will contact again.

…3304) * update torchserve to 0.8.2 * retry failed tests * add allowlist for torch CVE * Trigger Build * trigger x86 tests * set datetime_tag = true * fix allowlists and run only sanity test * fix testTorchdata and only run ec2 test * revert testrunner.py and run whole ec2 test * revert graviton buildspec and test config --------- Co-authored-by: Sally Seok <sallyseo@amazon.com> Co-authored-by: Tejas Chumbalkar <34728580+tejaschumbalkar@users.noreply.github.com>

… 0.8.2 (#3316) * update torchserve to 0.8.2 and add allowlists * revert test config --------- Co-authored-by: Sally Seok <sallyseo@amazon.com> Co-authored-by: arjkesh <33526713+arjkesh@users.noreply.github.com>

…for Neuron release 2.13.2 (#3314) * Add Tensorflow inference images for release 2.13.2 * Fix file not updated and vulns * Fix ignore_ids json * Whitelist more scipy and grpcio vulns * Whitelist grpcio * Add collectives lib to neuronx image * Revert develop config

Co-authored-by: Sally Seok <sallyseo@amazon.com>

haixiw · 2023-09-11T18:19:30Z

new PR:#3323

TGI 1.0.2 Release

a530170

amzn-choeric requested review from a team as code owners August 28, 2023 18:23

aws-deep-learning-containers-ci bot added the Size:XS Determines the size of the PR label Aug 28, 2023

amzn-choeric marked this pull request as draft August 29, 2023 14:51

TGI 1.0.3 Release

5c8291a

haixiw changed the title ~~TGI 1.0.2 Release~~ TGI 1.0.3 Release Sep 6, 2023

haixiw marked this pull request as ready for review September 8, 2023 22:26

arjkesh previously approved these changes Sep 11, 2023

View reviewed changes

dkey-amazon and others added 13 commits September 11, 2023 18:07

Updating release images file for PyTorch 2.0.1 (#3287)

f104c45

Updating available images for PyTorch SM DLC 2.0.1 (#3290)

963493c

[test][ec2] Increase EC2 test parallelism (#3123)

e09390a

[EC2][PyTorch 2.0.1] Add P5 Support using CUDA12.1 in new image (#3240)

0f53c3d

[PT20-training-gpu] [CUDA121] Fix typos (#3296)

7a822ad

[Release] EC2 PT2.0.1 CUDA121 supporting P5 (#3301)

8fce3c8

[Release] PT2.0.1 CUDA11.8 EC2 (#3309)

a55b44c

[Available Images] P5 EC2 image PT2.0.1 CUDA12.1 (#3308)

46b3465

[build][test] Find buildspec for tests through artifact or env (#3280)

d0f800a

[PT2.0.1][CUDA12] remove ssh keys (#3306)

63277b1

[PT2.0+CUDA121] Update example image (#3317)

c5b9337

[PyTorch][Graviton][SM, EC2] Upgrade torchserve version from 0.8.1 to…

4d84b52

… 0.8.2 (#3316) * update torchserve to 0.8.2 and add allowlists * revert test config --------- Co-authored-by: Sally Seok <sallyseo@amazon.com> Co-authored-by: arjkesh <33526713+arjkesh@users.noreply.github.com>

catalin-manciu-aws and others added 4 commits September 11, 2023 18:07

Add Neuron 2.13.2 Tensorflow images to release_images (#3319)

4fb53f5

release image definitions (#3318)

5cc9f64

Co-authored-by: Sally Seok <sallyseo@amazon.com>

TGI 1.0.2 Release

c9d4a41

haixiw dismissed arjkesh’s stale review via c9d4a41 September 11, 2023 18:10

aws-deep-learning-containers-ci bot added build Reflects file change in build folder pytorch Reflects file change in pytorch folder Size:S Determines the size of the PR src Reflects file change in src folder tensorflow Reflects file change in tensorflow folder labels Sep 11, 2023

resolve conflicts

2280ff3

haixiw marked this pull request as draft September 11, 2023 18:11

haixiw closed this Sep 11, 2023

amzn-choeric deleted the tgi-1.0.2 branch November 14, 2023 15:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TGI 1.0.3 Release #3285

TGI 1.0.3 Release #3285

amzn-choeric commented Aug 28, 2023

ystarikovich commented Sep 6, 2023 •

edited

Loading

haixiw commented Sep 7, 2023

ystarikovich commented Sep 8, 2023

haixiw commented Sep 11, 2023

TGI 1.0.3 Release #3285

TGI 1.0.3 Release #3285

Conversation

amzn-choeric commented Aug 28, 2023

Description

Tests run

Formatting

DLC image/dockerfile

Additional context

PR Checklist

Pytest Marker Checklist

EIA/NEURON/GRAVITON Testing Checklist

Benchmark Testing Checklist

ystarikovich commented Sep 6, 2023 • edited Loading

haixiw commented Sep 7, 2023

ystarikovich commented Sep 8, 2023

haixiw commented Sep 11, 2023

ystarikovich commented Sep 6, 2023 •

edited

Loading