Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add deepspeed #21708

Merged
merged 15 commits into from
Jan 15, 2023
Merged

add deepspeed #21708

merged 15 commits into from
Jan 15, 2023

Conversation

hadim
Copy link
Member

@hadim hadim commented Jan 11, 2023

Checklist

  • Title of this PR is meaningful: e.g. "Adding my_nifty_package", not "updated meta.yaml".
  • License file is packaged (see here for an example).
  • Source is from official source.
  • Package does not vendor other packages. (If a package uses the source of another package, they should be separate packages or the licenses of all packages need to be packaged).
  • If static libraries are linked in, the license of the static library is packaged.
  • Package does not ship static libraries. If static libraries are needed, follow CFEP-18.
  • Build number is 0.
  • A tarball (url) rather than a repo (e.g. git_url) is used in your recipe (see here for more details).
  • GitHub users listed in the maintainer section have posted a comment confirming they are willing to be listed there.
  • When in trouble, please check our knowledge base documentation before pinging a team.

@hadim
Copy link
Member Author

hadim commented Jan 11, 2023

Another attempt to get deepspeed in conda-forge.

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipes/deepspeed) and found it was in an excellent condition.

@hadim hadim mentioned this pull request Jan 11, 2023
9 tasks
@hadim
Copy link
Member Author

hadim commented Jan 12, 2023

@jaimergp @hmaarrfk: I have some questions regarding that recipe.

In short, pytorch is only needed at build time when the deepspeed ops are being built which can only happen when CUDA is available.

I tried to play with the selectors, so the CPU build is not dependent on a specific pytorch version. The goal is to reduce the number of unneeded builds here.

Please let me know what you think.

@hmaarrfk
Copy link
Contributor

I guess you can build for slightly more cudas now, but the migrator will have a hard time updating you to the latest version of pytorch if you don't add the skips now.

This looks great!

Co-authored-by: Mark Harfouche <mark.harfouche@gmail.com>
@hadim
Copy link
Member Author

hadim commented Jan 12, 2023

Thanks @hmaarrfk. Regarding the skip for CPU and pytorch during build time, do you think this would not cause any issue for conda/mamba when it comes to pick the appropriate builds? I am afraid the CPU build will be picked more easily since it's not tight to a pytorch version versus the GPU builds that are tight to it (but maybe it's all good there?).

I guess you can build for slightly more cudas now

I guess I cannot do that in that PR and will wait for this to be merged.

@hmaarrfk
Copy link
Contributor

GPU builds that are tight to it

It may prefer the GPU because it has fewer dependenices. Use a build number trick for most robustness
https://github.com/conda-forge/pytorch-cpu-feedstock/blob/main/recipe/meta.yaml#L4

recipes/deepspeed/bld.bat Outdated Show resolved Hide resolved
recipes/deepspeed/build.sh Outdated Show resolved Hide resolved
recipes/deepspeed/build.sh Show resolved Hide resolved
hadim and others added 2 commits January 12, 2023 10:43
Co-authored-by: Mark Harfouche <mark.harfouche@gmail.com>
@hmaarrfk
Copy link
Contributor

Is there anything else you need from staged-recipies or can we merge?

@hadim
Copy link
Member Author

hadim commented Jan 15, 2023

I am ok to merge and address the ninja dep issue in the feedstock: #21708 (comment)

Also what should I do about CUDA arch above? Should I hard code the list?

@hmaarrfk
Copy link
Contributor

@hadim
Copy link
Member Author

hadim commented Jan 15, 2023

@hmaarrfk LGTM for me here

@hmaarrfk
Copy link
Contributor

  csrc/transformer/dropout_kernels.cu(102): error: no operator "*" matches these operands
              operand types are: __half2 * const __half2

  csrc/transformer/dropout_kernels.cu(103): error: no operator "*" matches these operands
              operand types are: __half2 * const __half2

  csrc/transformer/dropout_kernels.cu(216): error: no operator "*" matches these operands
              operand types are: __half2 * const __half2

  csrc/transformer/dropout_kernels.cu(217): error: no operator "*" matches these operands
              operand types are: __half2 * const __half2

  csrc/transformer/dropout_kernels.cu(335): error: no operator "*" matches these operands
              operand types are: __half2 * const __half2

  csrc/transformer/dropout_kernels.cu(336): error: no operator "*" matches these operands
              operand types are: __half2 * const __half2


you might have to raise the min version

@hadim
Copy link
Member Author

hadim commented Jan 15, 2023

@hmaarrfk seems ok now

@hmaarrfk hmaarrfk merged commit 2cf45ea into conda-forge:main Jan 15, 2023
@hadim hadim deleted the deepspeed branch January 15, 2023 23:09
@hadim
Copy link
Member Author

hadim commented Jan 15, 2023

Thanks @hmaarrfk !

@hmaarrfk
Copy link
Contributor

no problem.

@weiji14 weiji14 mentioned this pull request Jan 16, 2023
9 tasks
Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for packaging this!

I think the following is not up to scratch though and should be fixed soon.

recipes/deepspeed/meta.yaml Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants