Set TF_NEED_CUDA correctly #367

xhochy · 2024-01-12T19:08:59Z

Closes #365
Closes #364

conda-forge-webservices · 2024-01-12T19:09:10Z

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

hmaarrfk · 2024-01-13T14:57:47Z

recipe/custom_toolchain/cc_toolchain_config.bzl

@@ -299,6 +299,7 @@ def _impl(ctx):
        if (len("${CUDA_HOME}")):
            cxx_builtin_include_directories.append("${CUDA_HOME}/include")
            cxx_builtin_include_directories.append("${CUDA_HOME}/targets/x86_64-linux/include/")
+            cxx_builtin_include_directories.append("${PREFIX}/targets/x86_64-linux/include")


hmaarrfk · 2024-01-13T14:59:06Z

recipe/build.sh

+        # This logic should be safe to keep in even when the underlying issue is resolved
+        if [[ -x ${BUILD_PREFIX}/nvvm/bin/cicc ]]; then
+            cp ${BUILD_PREFIX}/nvvm/bin/cicc ${BUILD_PREFIX}/bin/cicc
+        fi


you need to add the corresponding "rm" command right? otherwise you are modifying the user's BUILD_PREFIX.

I guess it doesn't appear in conda file, but I find it "good form"

is there a better way to specify the location of CICC?

I treat the BUILD_PREFIX as "trashed" as it will be destroyed after the build anyways. Especially as we are polluting it with a lot of bazel stuff, I don't think it is worth to care about cleanup (in contrast to $PREFIX!)

Nothing binding, just curiosity:

Can we add a reference to the "underlying issue"? Is this about the CUDA 11.8 & CUDA 12 setup divergence? Do we really want to copy instead of symlink?

copy vs symlink really doesn't matter. copy is more "resilient".

I added the reference by pushing a commit

hmaarrfk · 2024-01-13T15:08:08Z

I'm wondering if we should take the win for CUDA 12.0 and allow 11.8 to be fixed later. What do you think?

xhochy · 2024-01-13T19:49:24Z

I'm wondering if we should take the win for CUDA 12.0 and allow 11.8 to be fixed later. What do you think?

As I see no clear path to support CUDA 11.8, I would also be for this option. Should we then rebuild the full matrix or would it be OK to only run builds for linux+CUDA12?

hmaarrfk · 2024-01-13T20:16:56Z

Linux + cuda 12.0 is ok if the other pinnings didn’t change.

We should mark older cuda 11.8 and 12.0 as broken

h-vetinari · 2024-01-13T22:33:03Z

Should we then rebuild the full matrix or would it be OK to only run builds for linux+CUDA12?

Definitely preferable to have working CUDA12 builds than no (or broken) CUDA 11.8 builds. Also many people have moved on already (all conda info I've seen in recent bug reports had CUDA 12.x). So 👍 for dropping CUDA 11.8. Someone motivated/affected can always come back and try to fix things later.

h-vetinari

The bazel toolchain stuff is a bit magical as always, but the PR looks good. Just some minor questions, but could be merged as is from my POV.

h-vetinari · 2024-01-13T22:34:52Z

recipe/build.sh

+        # This logic should be safe to keep in even when the underlying issue is resolved
+        if [[ -x ${BUILD_PREFIX}/nvvm/bin/cicc ]]; then
+            cp ${BUILD_PREFIX}/nvvm/bin/cicc ${BUILD_PREFIX}/bin/cicc
+        fi


Nothing binding, just curiosity:

Can we add a reference to the "underlying issue"? Is this about the CUDA 11.8 & CUDA 12 setup divergence? Do we really want to copy instead of symlink?

recipe/meta.yaml

recipe/patches/0015-Remove-some-usage-of-absl-str_format-in-CUDA.patch

recipe/build.sh

njzjz · 2024-01-14T00:55:15Z

Should the build number be bumped?

Co-authored-by: h-vetinari <h.vetinari@gmx.com>

…nda-forge-pinning 2024.01.14.11.06.23

xhochy · 2024-01-14T13:46:10Z

With the fix für #364, I would rebuild the full matrix this week

hmaarrfk · 2024-01-14T16:30:47Z

Thank you for finding the right tests to add to make this recipe even stronger in the future.

xhochy · 2024-01-20T19:42:52Z

@hmaarrfk @h-vetinari Packages are up at uwe.korn-tf-gpu and uwe.korn-tf-experimental.

Logs:

hmaarrfk · 2024-01-20T20:22:04Z

I’ll try to review Sunday. Thanks!

hmaarrfk · 2024-01-21T19:17:11Z

build number bumped
passes: + python -c 'import tensorflow as tf; assert tf.test.is_built_with_cuda()'
passes python -c 'import tensorflow as tf;graph = tf.function(lambda x:x).get_concrete_function(1.).graph;tf.compat.v1.train.export_meta_graph(graph=graph,graph_def=graph.as_graph_def())'

hmaarrfk · 2024-01-21T19:21:28Z

LABEL=main
DELEGATE=uwe.korn-tf-gpu
PACKAGE_VERSION=2.15.0
for package in tensorflow-base tensorflow tensorflow-estimator libtensorflow libtensorflow_cc tensorflow-cpu tensorflow-gpu; do
  anaconda copy --from-label ${LABEL} --to-label main --to-owner conda-forge ${DELEGATE}/${package}/${PACKAGE_VERSION}
done

DELEGATE=uwe.korn-tf-experimental
for package in tensorflow-base tensorflow tensorflow-estimator libtensorflow libtensorflow_cc tensorflow-cpu tensorflow-gpu; do
  anaconda copy --from-label ${LABEL} --to-label main --to-owner conda-forge ${DELEGATE}/${package}/${PACKAGE_VERSION}
done

hmaarrfk · 2024-01-21T19:23:24Z

thank you!

hmaarrfk · 2024-01-21T19:35:00Z

@xhochy i don't know how you feel about this, but I've been uploading to conda-forge's channel directly when I have the logs on github.

I feel like the original CFEP03 didn't consider that core members would be the ones mostly building things.

Considering it is already get core members to review PRs, I feel like it is fine to upload directly.

You did get a few eyes on this one early on too, so I feel like the review process is thorough enough.

xhochy · 2024-01-22T05:50:43Z

I was OK with that already last time, but I simply forgot about it. My brain is too trained on the existing workflow. I can do this the next time.

Set TF_NEED_CUDA correctly

f88223c

Update patches

d8e4385

xhochy mentioned this pull request Jan 13, 2024

Draft: Conditionally disable NEED_CUDA #366

Closed

5 tasks

hmaarrfk reviewed Jan 13, 2024

View reviewed changes

xhochy marked this pull request as ready for review January 13, 2024 19:49

xhochy requested review from FarhanTejani, ghego, h-vetinari, hajapy, jschueller, ngam, njzjz, waitingkuo and wolfv as code owners January 13, 2024 19:49

h-vetinari reviewed Jan 13, 2024

View reviewed changes

hmaarrfk reviewed Jan 13, 2024

View reviewed changes

recipe/build.sh Show resolved Hide resolved

Add reference to issue in cuda-nvcc-impl feedstock

9fafab5

xhochy and others added 6 commits January 14, 2024 14:31

Update recipe/meta.yaml

8968ede

Co-authored-by: h-vetinari <h.vetinari@gmx.com>

Only remove absl::StrFormat with CUDA 11.

330d2e9

Bump build no

cd70eab

Add patch for conda-forge#364

33fa0a6

Skip CUDA 11.8

c5073c0

MNT: Re-rendered with conda-build 3.28.3, conda-smithy 3.30.4, and co…

d1b221a

…nda-forge-pinning 2024.01.14.11.06.23

Add check for conda-forge#364

1442e7b

hmaarrfk merged commit f152ce0 into conda-forge:main Jan 21, 2024
1 of 14 checks passed

xhochy deleted the set-need-cuda-correctly branch January 22, 2024 05:50

weiji14 mentioned this pull request Jan 26, 2024

Flax needs to be upgraded in the tensorflow/jax image pangeo-data/pangeo-docker-images#489

Closed

bdice mentioned this pull request Feb 5, 2024

CuDF and tensorflow GPU issues in python rapidsai/cudf#14963

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set TF_NEED_CUDA correctly #367

Set TF_NEED_CUDA correctly #367

xhochy commented Jan 12, 2024 •

edited by hmaarrfk

conda-forge-webservices bot commented Jan 12, 2024

hmaarrfk Jan 13, 2024

hmaarrfk Jan 13, 2024

hmaarrfk Jan 13, 2024

xhochy Jan 13, 2024

h-vetinari Jan 13, 2024

hmaarrfk Jan 13, 2024

hmaarrfk commented Jan 13, 2024

xhochy commented Jan 13, 2024

hmaarrfk commented Jan 13, 2024

h-vetinari commented Jan 13, 2024

h-vetinari left a comment

h-vetinari Jan 13, 2024

njzjz commented Jan 14, 2024

xhochy commented Jan 14, 2024

hmaarrfk commented Jan 14, 2024

xhochy commented Jan 20, 2024

hmaarrfk commented Jan 20, 2024

hmaarrfk commented Jan 21, 2024

hmaarrfk commented Jan 21, 2024

hmaarrfk commented Jan 21, 2024

hmaarrfk commented Jan 21, 2024

xhochy commented Jan 22, 2024

Set TF_NEED_CUDA correctly #367

Set TF_NEED_CUDA correctly #367

Conversation

xhochy commented Jan 12, 2024 • edited by hmaarrfk

conda-forge-webservices bot commented Jan 12, 2024

hmaarrfk Jan 13, 2024

Choose a reason for hiding this comment

hmaarrfk Jan 13, 2024

Choose a reason for hiding this comment

hmaarrfk Jan 13, 2024

Choose a reason for hiding this comment

xhochy Jan 13, 2024

Choose a reason for hiding this comment

h-vetinari Jan 13, 2024

Choose a reason for hiding this comment

hmaarrfk Jan 13, 2024

Choose a reason for hiding this comment

hmaarrfk commented Jan 13, 2024

xhochy commented Jan 13, 2024

hmaarrfk commented Jan 13, 2024

h-vetinari commented Jan 13, 2024

h-vetinari left a comment

Choose a reason for hiding this comment

h-vetinari Jan 13, 2024

Choose a reason for hiding this comment

njzjz commented Jan 14, 2024

xhochy commented Jan 14, 2024

hmaarrfk commented Jan 14, 2024

xhochy commented Jan 20, 2024

hmaarrfk commented Jan 20, 2024

hmaarrfk commented Jan 21, 2024

hmaarrfk commented Jan 21, 2024

hmaarrfk commented Jan 21, 2024

hmaarrfk commented Jan 21, 2024

xhochy commented Jan 22, 2024

xhochy commented Jan 12, 2024 •

edited by hmaarrfk