-
-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tensorflow 2.15.0 #353
tensorflow 2.15.0 #353
Conversation
Hi! This is the friendly automated conda-forge-linting service. I just wanted to let you know that I linted all conda-recipes in your PR ( |
…nda-forge-pinning 2023.11.15.00.22.50
Fails in the estimator build with:
|
Bisecting for this error:
Supposely it is this:
Maybe one of by bisects took a wrong turn? |
I got past the problem by carefully reading the Bazel scripts of |
CUDA builds fail with the following (I have no idea what it means):
|
Next one:
|
can I ask what your workflow is for this? how do you setup your environment? |
|
interesting. thanks! |
…nda-forge-pinning 2023.11.26.16.08.11
@conda-forge/tensorflow This is ready for review. I will clean up the patches locally and would start building everything on Friday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks hugely! 👏
Can you tell us what happened with the protobuf situation? It sounds you're adding some changes related to that, but I don't see the pinned versions changing in the .ci_support
files.
Generally LGTM, though the bazel stuff is over my head as usual. It would be lovely if you could write some more context into the commit messages of the patches that are necessary. That's all from my side, except perhapse my recurring nit of generating the patches with --no-signature
. ;-)
- url: https://raw.githubusercontent.com/jax-ml/ml_dtypes/v0.2.0/ml_dtypes/include/float8.h | ||
# yes, the headers come from a different version than the python package required below. | ||
- url: https://raw.githubusercontent.com/jax-ml/ml_dtypes/v0.3.1/ml_dtypes/include/float8.h | ||
fn: float8.h | ||
sha256: 7c3d32809adf01e1568434760bf3c347d0ef21d5fc4c5009815a5dd54635ed25 | ||
sha256: d2798fad4e64375b566b1df1d7bc440313e4b1024ca08f12cead3eaa4b73ff72 | ||
- url: https://raw.githubusercontent.com/jax-ml/ml_dtypes/v0.3.1/ml_dtypes/include/int4.h | ||
fn: int4.h | ||
sha256: b3a9970c3c6b169c41ac2fd4375f668d3fd1b492d48b912d89415fa1522a8f50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still quite confusing/surprising what's happening here. I guess the hopes back from 2.14 about this being simpler for 2.15 didn't materialize? Not a blocker, but just for understanding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They did a partial update of the ml_dtypes code as outlined in the comment. The Python part still uses 0.2.0 while the C++ code depends on 0.3.1 which comes with the new int4 type. In master, they have aligned it again: https://github.com/tensorflow/tensorflow/blob/99926785f7c9eaf53d94343916f14f300965ae72/tensorflow/tools/pip_package/setup.py#L93
If we want to get rid of pulling them in manually, we either need to add code to the libtensorflow_cc.targ.gz
generation code or add support for pulling in ml_dtypes
as a system dependency. Both are sadly more complicated than adding them as additional sources here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for my understanding, is this the sort of thing that a future TensorFlow release will have resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No idea 😢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have made a PR to staged-recipes to have the headers packaged separately: conda-forge/staged-recipes#24662
From 9e6d16913eedc72aad7ced7f9cdf07374c84fc8f Mon Sep 17 00:00:00 2001 | ||
From: "Uwe L. Korn" <uwe.korn@quantco.com> | ||
Date: Sun, 19 Nov 2023 20:50:29 +0000 | ||
Subject: [PATCH] Blacklist well-known protos |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you mention what this does or why it's necessary?
I'm assuming it will not (re)generate certain protos, and that those would be incompatible if we did regenerate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here was that this would pull in the well-known protos twice. Thus, when importing tensorflow, we got errors about already registered definitions. I found the fix in https://github.com/google/riegeli/blob/c2bcb54934acd28eace78bd4a1bf008347592cc4/third_party/protobuf.patch#L64
I will merge this patch with above one that adds the toolchain during cleanup.
+ patch_file = [ | ||
+ "//third_party/py/ml_dtypes:int4.patch", | ||
+ ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, patch-ception, wonderful 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add a comment that we need this patch for nvcc
to be able to compile code that imports int4
.
From b1c5c65cd5b4db7e06bcdf5f4886e744b324cfb0 Mon Sep 17 00:00:00 2001 | ||
From: "Uwe L. Korn" <uwe.korn@quantco.com> | ||
Date: Thu, 23 Nov 2023 09:05:37 +0000 | ||
Subject: [PATCH] Remove some usage of absl::str_format in CUDA |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Causes linker errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, nvcc
doesn't understand the C++ template due to new C++ features. We can only use absl::str_format
in code that isn't parsed/ingested by nvcc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you recall which new C++ features were used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a combination of sizeof...(args)
and std::enable_if
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting those look to be a C++11 (or in some cases C++14) features. Would have thought GCC 10 (on CUDA 11.2) and GCC 11 (on CUDA 11.8) would be new enough. Maybe there is some edge case that wasn't handled until a later GCC
Nothing has changed. All the protobuf related errors were down to the
I always forget that option. If you find a way to set that as default globally in git, I would appreciate that. |
I pushed the patches also as a branch to https://github.com/xhochy/tensorflow/tree/2.15.0-conda-forge-patches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks 💎
…nda-forge-pinning 2023.12.08.16.50.53
There were some issues now with the OSX builds but it seems we're fine and I have started the Linux and OSX builds now for all configurations. |
Builds are on my uwe.korn-tf-gpu and uwe.korn-tf-experimental channels with the following logs: |
@h-vetinari @hmaarrfk Please review/copy ;) |
Would the goal be for one of us to do light testing? I'm mostly trying to understand a protocol that we can follow in the future too. |
Testing should hopefully be covered by the tests in the feedstock. Otherwise, we should extend that. I think isuruf scanned these logs on whether they used the right OSX SDK. But that was back then when |
It's just pretty hard to test hardware acceleration without guaranteed access to the right hardware. I can scan the logs. |
thank you hugely |
(as a standby observer) - THANK YOU SO MUCH FOR WORKING ON THIS! |
Fixes #352