Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set TF_NEED_CUDA correctly #367

Merged
merged 10 commits into from
Jan 21, 2024
Merged

Conversation

xhochy
Copy link
Member

@xhochy xhochy commented Jan 12, 2024

Closes #365
Closes #364

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@@ -299,6 +299,7 @@ def _impl(ctx):
if (len("${CUDA_HOME}")):
cxx_builtin_include_directories.append("${CUDA_HOME}/include")
cxx_builtin_include_directories.append("${CUDA_HOME}/targets/x86_64-linux/include/")
cxx_builtin_include_directories.append("${PREFIX}/targets/x86_64-linux/include")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

awesome!

# This logic should be safe to keep in even when the underlying issue is resolved
if [[ -x ${BUILD_PREFIX}/nvvm/bin/cicc ]]; then
cp ${BUILD_PREFIX}/nvvm/bin/cicc ${BUILD_PREFIX}/bin/cicc
fi
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to add the corresponding "rm" command right? otherwise you are modifying the user's BUILD_PREFIX.

I guess it doesn't appear in conda file, but I find it "good form"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a better way to specify the location of CICC?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I treat the BUILD_PREFIX as "trashed" as it will be destroyed after the build anyways. Especially as we are polluting it with a lot of bazel stuff, I don't think it is worth to care about cleanup (in contrast to $PREFIX!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing binding, just curiosity:

Can we add a reference to the "underlying issue"? Is this about the CUDA 11.8 & CUDA 12 setup divergence? Do we really want to copy instead of symlink?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copy vs symlink really doesn't matter. copy is more "resilient".

I added the reference by pushing a commit

@hmaarrfk
Copy link
Contributor

I'm wondering if we should take the win for CUDA 12.0 and allow 11.8 to be fixed later. What do you think?

@xhochy
Copy link
Member Author

xhochy commented Jan 13, 2024

I'm wondering if we should take the win for CUDA 12.0 and allow 11.8 to be fixed later. What do you think?

As I see no clear path to support CUDA 11.8, I would also be for this option. Should we then rebuild the full matrix or would it be OK to only run builds for linux+CUDA12?

@xhochy xhochy marked this pull request as ready for review January 13, 2024 19:49
@hmaarrfk
Copy link
Contributor

Linux + cuda 12.0 is ok if the other pinnings didn’t change.

We should mark older cuda 11.8 and 12.0 as broken

@h-vetinari
Copy link
Member

Should we then rebuild the full matrix or would it be OK to only run builds for linux+CUDA12?

Definitely preferable to have working CUDA12 builds than no (or broken) CUDA 11.8 builds. Also many people have moved on already (all conda info I've seen in recent bug reports had CUDA 12.x). So 👍 for dropping CUDA 11.8. Someone motivated/affected can always come back and try to fix things later.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The bazel toolchain stuff is a bit magical as always, but the PR looks good. Just some minor questions, but could be merged as is from my POV.

# This logic should be safe to keep in even when the underlying issue is resolved
if [[ -x ${BUILD_PREFIX}/nvvm/bin/cicc ]]; then
cp ${BUILD_PREFIX}/nvvm/bin/cicc ${BUILD_PREFIX}/bin/cicc
fi
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing binding, just curiosity:

Can we add a reference to the "underlying issue"? Is this about the CUDA 11.8 & CUDA 12 setup divergence? Do we really want to copy instead of symlink?

recipe/meta.yaml Outdated Show resolved Hide resolved
@njzjz
Copy link
Member

njzjz commented Jan 14, 2024

Should the build number be bumped?

@xhochy
Copy link
Member Author

xhochy commented Jan 14, 2024

With the fix für #364, I would rebuild the full matrix this week

@hmaarrfk
Copy link
Contributor

Thank you for finding the right tests to add to make this recipe even stronger in the future.

@xhochy
Copy link
Member Author

xhochy commented Jan 20, 2024

@hmaarrfk @h-vetinari Packages are up at uwe.korn-tf-gpu and uwe.korn-tf-experimental.

Logs:

@hmaarrfk
Copy link
Contributor

I’ll try to review Sunday. Thanks!

@hmaarrfk
Copy link
Contributor

  • build number bumped
  • passes: + python -c 'import tensorflow as tf; assert tf.test.is_built_with_cuda()'
  • passes python -c 'import tensorflow as tf;graph = tf.function(lambda x:x).get_concrete_function(1.).graph;tf.compat.v1.train.export_meta_graph(graph=graph,graph_def=graph.as_graph_def())'

@hmaarrfk
Copy link
Contributor

LABEL=main
DELEGATE=uwe.korn-tf-gpu
PACKAGE_VERSION=2.15.0
for package in tensorflow-base tensorflow tensorflow-estimator libtensorflow libtensorflow_cc tensorflow-cpu tensorflow-gpu; do
  anaconda copy --from-label ${LABEL} --to-label main --to-owner conda-forge ${DELEGATE}/${package}/${PACKAGE_VERSION}
done

DELEGATE=uwe.korn-tf-experimental
for package in tensorflow-base tensorflow tensorflow-estimator libtensorflow libtensorflow_cc tensorflow-cpu tensorflow-gpu; do
  anaconda copy --from-label ${LABEL} --to-label main --to-owner conda-forge ${DELEGATE}/${package}/${PACKAGE_VERSION}
done

@hmaarrfk hmaarrfk merged commit f152ce0 into conda-forge:main Jan 21, 2024
1 of 14 checks passed
@hmaarrfk
Copy link
Contributor

thank you!

@hmaarrfk
Copy link
Contributor

@xhochy i don't know how you feel about this, but I've been uploading to conda-forge's channel directly when I have the logs on github.

I feel like the original CFEP03 didn't consider that core members would be the ones mostly building things.

Considering it is already get core members to review PRs, I feel like it is fine to upload directly.

You did get a few eyes on this one early on too, so I feel like the review process is thorough enough.

@xhochy
Copy link
Member Author

xhochy commented Jan 22, 2024

I was OK with that already last time, but I simply forgot about it. My brain is too trained on the existing workflow. I can do this the next time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants