-
Notifications
You must be signed in to change notification settings - Fork 155
[torch][windows] Build Windows release PyTorch wheels. #994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| # TODO(scotttodd): add schedule once working | ||
| # schedule: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI, I dropped running on schedule from the Linux workflow and instead trigger this workflow from the release portable Linux packages workflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, I'll make those updates. I had been testing exclusively with workflow_dispatch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copied over changes from #983 to these files. I'll want to test before marking this ready for review/merge... once the dependent PRs are merged.
| release_type: ${{ inputs.release_type || 'nightly' }} | ||
| s3_subdir: ${{ inputs.s3_subdir || 'v2' }} | ||
| cloudfront_url: ${{ inputs.cloudfront_url || 'd2awnip2yjpvqn.cloudfront.net/v2' }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The defaults can be dropped if not planning to run on schedule.
stellaraccident
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_prod_wheels changes look fine to me. Pipeline changes as well, but I am not the best reviewer on those.
…dows-nightly-pytorch-5
|
Dependent PRs are all merged now. I'll trigger a dev release of the full pipeline, work through any remaining workflow issues, then mark ready for review hopefully tomorrow. edit: https://github.com/ROCm/TheRock/actions/runs/16182932309 is queued up now |
Previously I had those in a standalone step.
|
Ready for review now! The test dev packages build at https://github.com/ROCm/TheRock/actions/runs/16185309410 succeeded then triggered these follow-on workflows runs, which also succeeded:
I now see wheels on the dev index pages like https://d25kgig7rdsyks.cloudfront.net/v2/gfx110X-dgpu/torch/ for python 3.11, 3.12, and 3.13. I'll sanity check those shortly. |
|
Sanity check on gfx1100 passed, except for this known issue where the smoketests do not terminate after passing: #999 |
marbre
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, nice to see this landing!
| shell: cmd | ||
| run: | | ||
| echo "Building PyTorch wheels for ${{ inputs.amdgpu_family }}" | ||
| python ./external-builds/pytorch/build_prod_wheels.py build --install-rocm --index-url "https://${{ inputs.cloudfront_url }}/${{ inputs.amdgpu_family }}/" --pytorch-dir ${{ env.CHECKOUT_ROOT }}/torch --clean --output-dir ${{ env.PACKAGE_DIST_DIR }} ${{ env.optional_build_prod_arguments }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| python ./external-builds/pytorch/build_prod_wheels.py build --install-rocm --index-url "https://${{ inputs.cloudfront_url }}/${{ inputs.amdgpu_family }}/" --pytorch-dir ${{ env.CHECKOUT_ROOT }}/torch --clean --output-dir ${{ env.PACKAGE_DIST_DIR }} ${{ env.optional_build_prod_arguments }} | |
| python ./external-builds/pytorch/build_prod_wheels.py \ | |
| build \ | |
| --install-rocm \ | |
| --index-url "https://${{ inputs.cloudfront_url }}/${{ inputs.amdgpu_family }}/" \ | |
| --pytorch-dir ${{ env.CHECKOUT_ROOT }}/torch \ | |
| --clean \ | |
| --output-dir ${{ env.PACKAGE_DIST_DIR }} ${{ env.optional_build_prod_arguments }} |
This would follow the style in .github/workflows/build_linux_pytorch_wheels.yml and might be easier to read.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish... the single line is intentional given that the shell is cmd and not bash.
I think I can use ^ instead though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, above you're setting shell: bash for the entire job or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm switching between cmd and bash throughout the file 🙈. When I build locally I use cmd exclusively but I had some trouble with each shell for different steps on the CI runners. I'd like to go through and choose one consistently in the workflow... after checkpointing and getting the release train rolling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see... you could have dropped shell: cmd here to make it work but I see that this makes copying over and testing locally unnecessary more difficult. Whatever works best for you :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using bash here would allow the \ characters to split the command across multiple lines, but the build process fails under bash (on my local machine and CI: #827 (comment)). That should be fixable, but it's a battle for another day.
It might be easier to get the rest of the workflow using cmd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try with ^ and add some breadcrumb comments where the shell is load bearing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to be working with ^: https://github.com/ROCm/TheRock/actions/runs/16201014899/job/45739794416
| python_version: ["3.11", "3.12"] | ||
| python_version: ["3.11", "3.12", "3.13"] | ||
|
|
||
| uses: ./.github/workflows/build_windows_pytorch_wheels.yml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recently revised this workflow further (PR #998) and expect for this particular line the workflows are identically. Could merge them and rather add an input parameter that tells if to build for Linux or Windows maybe?.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, merging should be possible now :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well we can go for it as is, this isn't blocking but wanted to flag that we can clean in a follow up :)
|
Yes, aotriton support would be a good idea too, as people say who use scottt and jammm's wheels, it gives a considerable performance boost |
Progress on #827 (!!)
Tested at https://github.com/ROCm/TheRock/actions/runs/16185309410, which triggered these:
Some next steps after this, roughly in order:
This depends on some other open PRs, but I have tested parts of it all the way through to dev wheel publishing and then installation and testing on a dev machine: https://github.com/ROCm/TheRock/actions/runs/16181411431/job/45678668536, #827 (comment).See the other PRs:[windows] Patch TENSILE_LIBRARY_DIR in hipBLASLt. #783[windows] Preload compiler DLLs in ROCmLibrariesTest. #980[torch][windows] Set hip_device_lib_path on Windows. #988[windows] Convert symlinks to hardlinks during tarfile expansion. #990