Skip to content

Conversation

@ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Jul 9, 2025

Progress on #827 (!!)

Tested at https://github.com/ROCm/TheRock/actions/runs/16185309410, which triggered these:

Some next steps after this, roughly in order:


This depends on some other open PRs, but I have tested parts of it all the way through to dev wheel publishing and then installation and testing on a dev machine: https://github.com/ROCm/TheRock/actions/runs/16181411431/job/45678668536, #827 (comment).

See the other PRs:

Comment on lines 19 to 20
# TODO(scotttodd): add schedule once working
# schedule:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, I dropped running on schedule from the Linux workflow and instead trigger this workflow from the release portable Linux packages workflow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I'll make those updates. I had been testing exclusively with workflow_dispatch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied over changes from #983 to these files. I'll want to test before marking this ready for review/merge... once the dependent PRs are merged.

Comment on lines 43 to 45
release_type: ${{ inputs.release_type || 'nightly' }}
s3_subdir: ${{ inputs.s3_subdir || 'v2' }}
cloudfront_url: ${{ inputs.cloudfront_url || 'd2awnip2yjpvqn.cloudfront.net/v2' }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The defaults can be dropped if not planning to run on schedule.

Copy link
Collaborator

@stellaraccident stellaraccident left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_prod_wheels changes look fine to me. Pipeline changes as well, but I am not the best reviewer on those.

@ScottTodd
Copy link
Member Author

ScottTodd commented Jul 10, 2025

Dependent PRs are all merged now. I'll trigger a dev release of the full pipeline, work through any remaining workflow issues, then mark ready for review hopefully tomorrow.

edit: https://github.com/ROCm/TheRock/actions/runs/16182932309 is queued up now

Previously I had those in a standalone step.
@ScottTodd
Copy link
Member Author

Ready for review now!

The test dev packages build at https://github.com/ROCm/TheRock/actions/runs/16185309410 succeeded then triggered these follow-on workflows runs, which also succeeded:

I now see wheels on the dev index pages like https://d25kgig7rdsyks.cloudfront.net/v2/gfx110X-dgpu/torch/ for python 3.11, 3.12, and 3.13. I'll sanity check those shortly.

@ScottTodd ScottTodd marked this pull request as ready for review July 10, 2025 14:38
@ScottTodd ScottTodd requested a review from marbre July 10, 2025 14:38
@ScottTodd
Copy link
Member Author

Sanity check on gfx1100 passed, except for this known issue where the smoketests do not terminate after passing: #999

Copy link
Member

@marbre marbre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice to see this landing!

shell: cmd
run: |
echo "Building PyTorch wheels for ${{ inputs.amdgpu_family }}"
python ./external-builds/pytorch/build_prod_wheels.py build --install-rocm --index-url "https://${{ inputs.cloudfront_url }}/${{ inputs.amdgpu_family }}/" --pytorch-dir ${{ env.CHECKOUT_ROOT }}/torch --clean --output-dir ${{ env.PACKAGE_DIST_DIR }} ${{ env.optional_build_prod_arguments }}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
python ./external-builds/pytorch/build_prod_wheels.py build --install-rocm --index-url "https://${{ inputs.cloudfront_url }}/${{ inputs.amdgpu_family }}/" --pytorch-dir ${{ env.CHECKOUT_ROOT }}/torch --clean --output-dir ${{ env.PACKAGE_DIST_DIR }} ${{ env.optional_build_prod_arguments }}
python ./external-builds/pytorch/build_prod_wheels.py \
build \
--install-rocm \
--index-url "https://${{ inputs.cloudfront_url }}/${{ inputs.amdgpu_family }}/" \
--pytorch-dir ${{ env.CHECKOUT_ROOT }}/torch \
--clean \
--output-dir ${{ env.PACKAGE_DIST_DIR }} ${{ env.optional_build_prod_arguments }}

This would follow the style in .github/workflows/build_linux_pytorch_wheels.yml and might be easier to read.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish... the single line is intentional given that the shell is cmd and not bash.

I think I can use ^ instead though...

https://stackoverflow.com/questions/69068/split-long-commands-in-multiple-lines-through-windows-batch-file

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, above you're setting shell: bash for the entire job or not?

Copy link
Member Author

@ScottTodd ScottTodd Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm switching between cmd and bash throughout the file 🙈. When I build locally I use cmd exclusively but I had some trouble with each shell for different steps on the CI runners. I'd like to go through and choose one consistently in the workflow... after checkpointing and getting the release train rolling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I see... you could have dropped shell: cmd here to make it work but I see that this makes copying over and testing locally unnecessary more difficult. Whatever works best for you :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using bash here would allow the \ characters to split the command across multiple lines, but the build process fails under bash (on my local machine and CI: #827 (comment)). That should be fixable, but it's a battle for another day.

It might be easier to get the rest of the workflow using cmd.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try with ^ and add some breadcrumb comments where the shell is load bearing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python_version: ["3.11", "3.12"]
python_version: ["3.11", "3.12", "3.13"]

uses: ./.github/workflows/build_windows_pytorch_wheels.yml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently revised this workflow further (PR #998) and expect for this particular line the workflows are identically. Could merge them and rather add an input parameter that tells if to build for Linux or Windows maybe?.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, merging should be possible now :D

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we can go for it as is, this isn't blocking but wanted to flag that we can clean in a follow up :)

@ScottTodd ScottTodd merged commit 11d7c75 into main Jul 10, 2025
4 of 5 checks passed
@github-project-automation github-project-automation bot moved this from TODO to Done in TheRock Triage Jul 10, 2025
@ScottTodd ScottTodd deleted the users/scotttodd/windows-nightly-pytorch-5 branch July 10, 2025 17:10
@Nem404
Copy link

Nem404 commented Jul 10, 2025

Yes, aotriton support would be a good idea too, as people say who use scottt and jammm's wheels, it gives a considerable performance boost

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

5 participants