Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow 2.15.0 #353

Merged
merged 17 commits into from
Dec 17, 2023
Merged

tensorflow 2.15.0 #353

merged 17 commits into from
Dec 17, 2023

Conversation

xhochy
Copy link
Member

@xhochy xhochy commented Nov 15, 2023

Fixes #352

@conda-forge-webservices
Copy link

Hi! This is the friendly automated conda-forge-linting service.

I just wanted to let you know that I linted all conda-recipes in your PR (recipe) and found it was in an excellent condition.

@xhochy xhochy mentioned this pull request Nov 15, 2023
@xhochy
Copy link
Member Author

xhochy commented Nov 15, 2023

Fails in the estimator build with:

+ bazel build tensorflow_estimator/tools/pip_package:build_pip_package
Starting local Bazel server and connecting to it...
Loading:
Loading:
Loading: 0 packages loaded
Analyzing: target //tensorflow_estimator/tools/pip_package:build_pip_package (1 packages loaded, 0 targets configured)
Analyzing: target //tensorflow_estimator/tools/pip_package:build_pip_package (44 packages loaded, 283 targets configured)
INFO: Analyzed target //tensorflow_estimator/tools/pip_package:build_pip_package (47 packages loaded, 333 targets configured).
INFO: Found 1 target...
[0 / 8] [Prepa] BazelWorkspaceStatusAction stable-status.txt
ERROR: /home/uwe/mambaforge/conda-bld/tensorflow-split_1700044396155/work/tensorflow-estimator/tensorflow_estimator/python/estimator/canned/linear_optimizer/BUILD:55:11: Extracting tensorflow_estimator APIs for //tensorflow_estimator/python/estimator/canned/linear_optimizer:sharded_mutable_dense_hashtable_py to bazel-out/k8-fastbuild/bin/tensorflow_estimator/python/estimator/canned/linear_optimizer/sharded_mutable_dense_hashtable_py_extracted_tensorflow_estimator_api.json. failed: (Abo
rted): extractor_wrapper failed: error executing command (from target //tensorflow_estimator/python/estimator/canned/linear_optimizer:sharded_mutable_dense_hashtable_py) bazel-out/k8-opt-exec-2B5CBBC6/bin/tensorflow_estimator/python/estimator/api/extractor_wrapper --output ... (remaining 6 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
[libprotobuf ERROR google/protobuf/descriptor_database.cc:642] File already exists in database: google/protobuf/descriptor.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:1986] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
ERROR: /home/uwe/mambaforge/conda-bld/tensorflow-split_1700044396155/work/tensorflow-estimator/tensorflow_estimator/python/estimator/BUILD:633:11: Extracting tensorflow_estimator APIs for //tensorflow_estimator/python/estimator:dnn_linear_combined to bazel-out/k8-fastbuild/bin/tensorflow_estimator/python/estimator/dnn_linear_combined_extracted_tensorflow_estimator_api.json. failed: (Aborted): extractor_wrapper failed: error executing command (from target //tensorflow_estimator/python/e
stimator:dnn_linear_combined) bazel-out/k8-opt-exec-2B5CBBC6/bin/tensorflow_estimator/python/estimator/api/extractor_wrapper --output ... (remaining 6 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
[libprotobuf ERROR google/protobuf/descriptor_database.cc:642] File already exists in database: google/protobuf/descriptor.proto
[libprotobuf FATAL google/protobuf/descriptor.cc:1986] CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
terminate called after throwing an instance of 'google::protobuf::FatalException'
  what():  CHECK failed: GeneratedDatabase()->Add(encoded_file_descriptor, size):
[12 / 75] Extracting tensorflow_estimator APIs for //tensorflow_estimator/python/estimator:export_output to bazel-out/k8-fastbuild/bin/tensorflow_estimator/python/estimator/export_output_extracted_tensorflow_estimator_api.json.; 1s linux-sandbox ... (46 actions, 45 running)
Target //tensorflow_estimator/tools/pip_package:build_pip_package failed to build
Use --verbose_failures to see the command lines of failed build steps.
ERROR: /home/uwe/mambaforge/conda-bld/tensorflow-split_1700044396155/work/tensorflow-estimator/tensorflow_estimator/tools/pip_package/BUILD:18:10 Middleman _middlemen/tensorflow_Uestimator_Stools_Spip_Upackage_Sbuild_Upip_Upackage-runfiles failed: (Aborted): extractor_wrapper failed: error executing command (from target //tensorflow_estimator/python/estimator/canned/linear_optimizer:sharded_mutable_dense_hashtable_py) bazel-out/k8-opt-exec-2B5CBBC6/bin/tensorflow_estimator/python/estim
ator/api/extractor_wrapper --output ... (remaining 6 arguments skipped)

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
INFO: Elapsed time: 5.216s, Critical Path: 1.16s
INFO: 59 processes: 59 internal.

@xhochy
Copy link
Member Author

xhochy commented Nov 17, 2023

Bisecting for this error:

Supposely it is this:

% git bisect bad
7c8a95f2ab9b8996eccf5c33729018a45af467cb is the first bad commit
commit 7c8a95f2ab9b8996eccf5c33729018a45af467cb
Author: Shixin Li <shixinli@google.com>
Date:   Fri Sep 22 13:05:26 2023 -0700

    Enable cross compilation for PJRT GPU compiler:
    1. StreamExecutorGpuCompiler compiles w/o client.
    2. Add StreamExecutorGpuExecutable (the unloaded pjrt executable).
    3. Load StreamExecutorGpuExecutable to PjRtLoadedExecutable through `Load` API.

    PiperOrigin-RevId: 567697879

 third_party/xla/xla/client/local_client.h          |   2 +
 third_party/xla/xla/pjrt/BUILD                     |  16 ++
 third_party/xla/xla/pjrt/gpu/BUILD                 |  95 +++++++++++-
 third_party/xla/xla/pjrt/gpu/se_gpu_pjrt_client.cc |  45 ++++++
 third_party/xla/xla/pjrt/gpu/se_gpu_pjrt_client.h  |   5 +
 .../xla/xla/pjrt/gpu/se_gpu_pjrt_compiler.cc       | 108 +++++++++++++
 .../xla/xla/pjrt/gpu/se_gpu_pjrt_compiler.h        |  15 ++
 .../xla/pjrt/gpu/se_gpu_pjrt_compiler_aot_test.cc  | 167 +++++++++++++++++++++
 .../xla/xla/pjrt/gpu/se_gpu_pjrt_compiler_test.cc  |   1 +
 .../xla/xla/pjrt/pjrt_stream_executor_client.cc    |  39 ++---
 .../xla/xla/pjrt/pjrt_stream_executor_client.h     |   1 +
 .../pjrt/stream_executor_unloaded_executable.cc    |  31 ++++
 .../xla/pjrt/stream_executor_unloaded_executable.h |  78 ++++++++++
 .../pjrt/stream_executor_unloaded_executable.proto |  28 ++++
 third_party/xla/xla/service/gpu/BUILD              |  14 ++
 third_party/xla/xla/service/gpu/gpu_compiler.cc    |  15 --
 third_party/xla/xla/service/gpu/gpu_compiler.h     |  13 +-
 .../xla/xla/service/gpu/gpu_target_config.cc       |  38 +++++
 .../xla/xla/service/gpu/gpu_target_config.h        |  41 +++++
 19 files changed, 705 insertions(+), 47 deletions(-)
 create mode 100644 third_party/xla/xla/pjrt/gpu/se_gpu_pjrt_compiler_aot_test.cc
 create mode 100644 third_party/xla/xla/pjrt/stream_executor_unloaded_executable.cc
 create mode 100644 third_party/xla/xla/pjrt/stream_executor_unloaded_executable.h
 create mode 100644 third_party/xla/xla/pjrt/stream_executor_unloaded_executable.proto
 create mode 100644 third_party/xla/xla/service/gpu/gpu_target_config.cc
 create mode 100644 third_party/xla/xla/service/gpu/gpu_target_config.h

Maybe one of by bisects took a wrong turn?

@xhochy
Copy link
Member Author

xhochy commented Nov 20, 2023

I got past the problem by carefully reading the Bazel scripts of riegeli. Next stop: CUDA.

@xhochy
Copy link
Member Author

xhochy commented Nov 21, 2023

CUDA builds fail with the following (I have no idea what it means):

/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/bind.h: In constructor 'absl::lts_20230125::str_format_internal::FormatSpecTemplate<Args>::FormatSpecTemplate(const absl::lts_20230125::str_format_internal::ExtendedParsedFormat<absl::lts_20230
125::FormatConversionCharSet(C)...>&)':
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/bind.h:171:1: error: parse error in template argument list
  171 |     CheckArity<sizeof...(C), sizeof...(Args)>();
      | ^   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/bind.h:171:63: error: expected ';' before ')' token
  171 |     CheckArity<sizeof...(C), sizeof...(Args)>();
      |                                                               ^
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/bind.h:172:147: error: template argument 1 is invalid
  172 |     CheckMatches<C...>(absl::make_index_sequence<sizeof...(C)>{});
      |                                                                                                                                                   ^
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/bind.h:172:151: error: expected primary-expression before '{' token
  172 |     CheckMatches<C...>(absl::make_index_sequence<sizeof...(C)>{});
      |                                                                                                                                                       ^
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/bind.h:172:151: error: expected ';' before '{' token
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/bind.h:172:153: error: expected primary-expression before ')' token
  172 |     CheckMatches<C...>(absl::make_index_sequence<sizeof...(C)>{});
      |                                                                                                                                                         ^
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/arg.h: In instantiation of 'constexpr absl::lts_20230125::FormatConversionCharSet absl::lts_20230125::str_format_internal::ArgumentToConv() [with Arg = long int]':
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/str_format.h:268:156:   required by substitution of 'template<class ... Args> using FormatSpec = absl::lts_20230125::str_format_internal::FormatSpecTemplate<absl::lts_20230125::FormatConversionCharSet((ArgumentToC
onv<Args>)())...> [with Args = {long int, const tensorflow::ResourceBase*}]'
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/str_format.h:351:1:   required by substitution of 'template<class ... Args> std::string absl::lts_20230125::StrFormat(absl::lts_20230125::FormatSpec<Args ...>&, const Args& ...) [with Args = {long int, const tenso
rflow::ResourceBase*}]'
./tensorflow/core/framework/resource_base.h:44:23:   required from here
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/arg.h:403:43: error: no matching function for call to 'ExtractCharSet(ConvResult)'
  403 |   return absl::str_format_internal::ExtractCharSet(ConvResult{});
      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/arg.h:196:1: note: candidate: 'template<absl::lts_20230125::FormatConversionCharSet C> constexpr absl::lts_20230125::FormatConversionCharSet absl::lts_20230125::str_format_internal::ExtractChar
Set(absl::lts_20230125::FormatConvertResult<(absl::lts_20230125::FormatConversionCharSet)(C)>)'
  196 | constexpr FormatConversionCharSet ExtractCharSet(FormatConvertResult<C>) {
      | ^~~~~~~~~~~~~~
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/arg.h:196:1: note:   template argument deduction/substitution failed:
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/arg.h:403:43: note:   couldn't deduce template parameter 'C'
  403 |   return absl::str_format_internal::ExtractCharSet(ConvResult{});
      |        ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/arg.h:201:1: note: candidate: 'template<absl::lts_20230125::FormatConversionCharSet C> constexpr absl::lts_20230125::FormatConversionCharSet absl::lts_20230125::str_format_internal::ExtractChar
Set(absl::lts_20230125::str_format_internal::ArgConvertResult<(absl::lts_20230125::FormatConversionCharSet)(C)>)'
  201 | constexpr FormatConversionCharSet ExtractCharSet(ArgConvertResult<C>) {
      | ^~~~~~~~~~~~~~
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/arg.h:201:1: note:   template argument deduction/substitution failed:
/home/conda/feedstock_root/build_artifacts/tensorflow-split_1700557095434/_h_env_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_placehold_plac/include/absl/strings/internal/str_format/arg.h:403:43: note:   couldn't deduce template parameter 'C'
…

@xhochy
Copy link
Member Author

xhochy commented Nov 23, 2023

Next one:

tensorflow/core/kernels/cast_op_gpu.cu.cc(32): warning #846-D: this partial specialization would have made the instantiation of class "tensorflow::functor::CastFunctor<tensorflow::functor::GPUDevice, tsl::float8_e4m3fn, Eigen::half>" ambiguous

external/eigen_archive/Eigen/src/Core/MathFunctions.h(429): error: more than one user-defined conversion from "const tsl::uint4" to "tsl::int4" applies:
            function template "ml_dtypes::i4<UnderlyingTy>::operator T() const [with UnderlyingTy=uint8_t]"
bazel-out/k8-opt/bin/external/ml_dtypes/_virtual_includes/int4/ml_dtypes/include/int4.h(52): here
            function template "ml_dtypes::i4<UnderlyingTy>::i4(T) [with UnderlyingTy=int8_t]"
bazel-out/k8-opt/bin/external/ml_dtypes/_virtual_includes/int4/ml_dtypes/include/int4.h(42): here
          detected during:
            instantiation of "NewType Eigen::internal::cast_impl<OldType, NewType, EnableIf>::run(const OldType &) [with OldType=tsl::uint4, NewType=tsl::int4, EnableIf=void]"
(462): here
            instantiation of "NewType Eigen::internal::cast<OldType,NewType>(const OldType &) [with OldType=tsl::uint4, NewType=tsl::int4]"
external/eigen_archive/Eigen/src/Core/functors/UnaryFunctors.h(179): here
            instantiation of "const NewType Eigen::internal::scalar_cast_op<Scalar, NewType>::operator()(const Scalar &) const [with Scalar=tsl::uint4, NewType=tsl::int4]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h(238): here
            instantiation of "TargetType Eigen::internal::CoeffConv<SrcType, TargetType, IsSameT>::run(const Eigen::TensorEvaluator<ArgType, Device> &, Eigen::Index) [with SrcType=tsl::uint4, TargetType=tsl::int4, IsSameT=false, ArgType=const Eigen::TensorMap<Eigen::Tensor<const tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, Device=Eigen::GpuDevice]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h(395): here
            instantiation of "Eigen::TensorEvaluator<const Eigen::TensorConversionOp<TargetType, ArgType>, Device>::CoeffReturnType Eigen::TensorEvaluator<const Eigen::TensorConversionOp<TargetType, ArgType>, Device>::coeff(Eigen::TensorEvaluator<const Eigen::TensorConversionOp<TargetType, ArgType>, Device>::Index) const [with TargetType=tsl::int4, ArgType=const Eigen::TensorMap<Eigen::Tensor<const tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, Device=Eigen::GpuDevice]"

external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(174): here
            instantiation of "void Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::evalScalar(Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Index) const [with LeftArgType=Eigen::TensorMap<Eigen::Tensor<tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, RightArgType=const Eigen::TensorConversionOp<tsl::int4, const Eigen::TensorMap<Eigen::Tensor<const tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eig
en::MakePointer>>, Device=Eigen::GpuDevice]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(607): here
            instantiation of "void Eigen::internal::EigenMetaKernelEval<Evaluator, StorageIndex, Vectorizable>::run(Evaluator &, StorageIndex, StorageIndex, StorageIndex) [with Evaluator=Eigen::TensorEvaluator<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, const Eigen::TensorConversionOp<tsl::int4, const Eigen::TensorMap<Eigen::Tensor<const tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>, Eigen::G
puDevice>, StorageIndex=Eigen::DenseIndex, Vectorizable=false]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(644): here
            instantiation of "void Eigen::internal::EigenMetaKernel(Evaluator, StorageIndex) [with Evaluator=Eigen::TensorEvaluator<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, const Eigen::TensorConversionOp<tsl::int4, const Eigen::TensorMap<Eigen::Tensor<const tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>, Eigen::GpuDevice>, StorageIndex=Eigen::DenseIndex]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(665): here
            instantiation of "void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable, Tiling>::run(const Expression &, const Eigen::GpuDevice &) [with Expression=const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, const Eigen::TensorConversionOp<tsl::int4, const Eigen::TensorMap<Eigen::Tensor<const tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>, Vectorizable=false, Tiling=Eige
n::internal::Off]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(39): here
            instantiation of "Eigen::TensorDevice<ExpressionType, DeviceType> &Eigen::TensorDevice<ExpressionType, DeviceType>::operator=(const OtherDerived &) [with ExpressionType=Eigen::TensorMap<Eigen::Tensor<tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, DeviceType=tensorflow::functor::GPUDevice, OtherDerived=Eigen::TensorConversionOp<tsl::int4, const Eigen::TensorMap<Eigen::Tensor<const tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>]"
tensorflow/core/kernels/cast_op_gpu.cu.cc(32): here
            instantiation of "void tensorflow::functor::CastFunctor<tensorflow::functor::GPUDevice, OUT_TYPE, IN_TYPE>::operator()(const tensorflow::functor::GPUDevice &, tensorflow::TTypes<OUT_TYPE, 1, Eigen::DenseIndex>::Flat, tensorflow::TTypes<IN_TYPE, 1, Eigen::DenseIndex>::ConstFlat, __nv_bool) [with OUT_TYPE=tsl::int4, IN_TYPE=tsl::uint4]"
tensorflow/core/kernels/cast_op_gpu.cu.cc(177): here

external/eigen_archive/Eigen/src/Core/MathFunctions.h(429): error: more than one user-defined conversion from "const tsl::int4" to "tsl::uint4" applies:
            function template "ml_dtypes::i4<UnderlyingTy>::operator T() const [with UnderlyingTy=int8_t]"
bazel-out/k8-opt/bin/external/ml_dtypes/_virtual_includes/int4/ml_dtypes/include/int4.h(52): here
            function template "ml_dtypes::i4<UnderlyingTy>::i4(T) [with UnderlyingTy=uint8_t]"
bazel-out/k8-opt/bin/external/ml_dtypes/_virtual_includes/int4/ml_dtypes/include/int4.h(42): here
          detected during:
            instantiation of "NewType Eigen::internal::cast_impl<OldType, NewType, EnableIf>::run(const OldType &) [with OldType=tsl::int4, NewType=tsl::uint4, EnableIf=void]"
(462): here
            instantiation of "NewType Eigen::internal::cast<OldType,NewType>(const OldType &) [with OldType=tsl::int4, NewType=tsl::uint4]"
external/eigen_archive/Eigen/src/Core/functors/UnaryFunctors.h(179): here
            instantiation of "const NewType Eigen::internal::scalar_cast_op<Scalar, NewType>::operator()(const Scalar &) const [with Scalar=tsl::int4, NewType=tsl::uint4]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h(238): here
            instantiation of "TargetType Eigen::internal::CoeffConv<SrcType, TargetType, IsSameT>::run(const Eigen::TensorEvaluator<ArgType, Device> &, Eigen::Index) [with SrcType=tsl::int4, TargetType=tsl::uint4, IsSameT=false, ArgType=const Eigen::TensorMap<Eigen::Tensor<const tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, Device=Eigen::GpuDevice]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorConversion.h(395): here
            instantiation of "Eigen::TensorEvaluator<const Eigen::TensorConversionOp<TargetType, ArgType>, Device>::CoeffReturnType Eigen::TensorEvaluator<const Eigen::TensorConversionOp<TargetType, ArgType>, Device>::coeff(Eigen::TensorEvaluator<const Eigen::TensorConversionOp<TargetType, ArgType>, Device>::Index) const [with TargetType=tsl::uint4, ArgType=const Eigen::TensorMap<Eigen::Tensor<const tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, Device=Eigen::GpuDevice]"

external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(174): here
            instantiation of "void Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::evalScalar(Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Index) const [with LeftArgType=Eigen::TensorMap<Eigen::Tensor<tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, RightArgType=const Eigen::TensorConversionOp<tsl::uint4, const Eigen::TensorMap<Eigen::Tensor<const tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Ei
gen::MakePointer>>, Device=Eigen::GpuDevice]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(607): here
            instantiation of "void Eigen::internal::EigenMetaKernelEval<Evaluator, StorageIndex, Vectorizable>::run(Evaluator &, StorageIndex, StorageIndex, StorageIndex) [with Evaluator=Eigen::TensorEvaluator<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, const Eigen::TensorConversionOp<tsl::uint4, const Eigen::TensorMap<Eigen::Tensor<const tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>, Eigen::
GpuDevice>, StorageIndex=Eigen::DenseIndex, Vectorizable=false]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(644): here
            instantiation of "void Eigen::internal::EigenMetaKernel(Evaluator, StorageIndex) [with Evaluator=Eigen::TensorEvaluator<const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, const Eigen::TensorConversionOp<tsl::uint4, const Eigen::TensorMap<Eigen::Tensor<const tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>, Eigen::GpuDevice>, StorageIndex=Eigen::DenseIndex]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(665): here
            instantiation of "void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable, Tiling>::run(const Expression &, const Eigen::GpuDevice &) [with Expression=const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, const Eigen::TensorConversionOp<tsl::uint4, const Eigen::TensorMap<Eigen::Tensor<const tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>>, Vectorizable=false, Tiling=Eig
en::internal::Off]"
external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(39): here
            instantiation of "Eigen::TensorDevice<ExpressionType, DeviceType> &Eigen::TensorDevice<ExpressionType, DeviceType>::operator=(const OtherDerived &) [with ExpressionType=Eigen::TensorMap<Eigen::Tensor<tsl::uint4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>, DeviceType=tensorflow::functor::GPUDevice, OtherDerived=Eigen::TensorConversionOp<tsl::uint4, const Eigen::TensorMap<Eigen::Tensor<const tsl::int4, 1, 1, Eigen::DenseIndex>, 16, Eigen::MakePointer>>]"
tensorflow/core/kernels/cast_op_gpu.cu.cc(32): here
            instantiation of "void tensorflow::functor::CastFunctor<tensorflow::functor::GPUDevice, OUT_TYPE, IN_TYPE>::operator()(const tensorflow::functor::GPUDevice &, tensorflow::TTypes<OUT_TYPE, 1, Eigen::DenseIndex>::Flat, tensorflow::TTypes<IN_TYPE, 1, Eigen::DenseIndex>::ConstFlat, __nv_bool) [with OUT_TYPE=tsl::uint4, IN_TYPE=tsl::int4]"
tensorflow/core/kernels/cast_op_gpu.cu.cc(178): here

2 errors detected in the compilation of "tensorflow/core/kernels/cast_op_gpu.cu.cc".

@hmaarrfk
Copy link
Contributor

can I ask what your workflow is for this? how do you setup your environment?

@xhochy
Copy link
Member Author

xhochy commented Nov 23, 2023

can I ask what your workflow is for this? how do you setup your environment?

  • Run conda-build
  • let it fail
  • cd into the work directory
  • source build_env_setup.sh
  • git init . && git add . && git commit -m "Initial commit" --no-verify --no-gpg-sign
  • Iterate with bash $RECIPE_DIR/build.sh

@hmaarrfk
Copy link
Contributor

interesting. thanks!

@xhochy
Copy link
Member Author

xhochy commented Nov 28, 2023

@conda-forge/tensorflow This is ready for review. I will clean up the patches locally and would start building everything on Friday.

Copy link
Member

@h-vetinari h-vetinari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks hugely! 👏

Can you tell us what happened with the protobuf situation? It sounds you're adding some changes related to that, but I don't see the pinned versions changing in the .ci_support files.

Generally LGTM, though the bazel stuff is over my head as usual. It would be lovely if you could write some more context into the commit messages of the patches that are necessary. That's all from my side, except perhapse my recurring nit of generating the patches with --no-signature. ;-)

Comment on lines -31 to +40
- url: https://raw.githubusercontent.com/jax-ml/ml_dtypes/v0.2.0/ml_dtypes/include/float8.h
# yes, the headers come from a different version than the python package required below.
- url: https://raw.githubusercontent.com/jax-ml/ml_dtypes/v0.3.1/ml_dtypes/include/float8.h
fn: float8.h
sha256: 7c3d32809adf01e1568434760bf3c347d0ef21d5fc4c5009815a5dd54635ed25
sha256: d2798fad4e64375b566b1df1d7bc440313e4b1024ca08f12cead3eaa4b73ff72
- url: https://raw.githubusercontent.com/jax-ml/ml_dtypes/v0.3.1/ml_dtypes/include/int4.h
fn: int4.h
sha256: b3a9970c3c6b169c41ac2fd4375f668d3fd1b492d48b912d89415fa1522a8f50
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still quite confusing/surprising what's happening here. I guess the hopes back from 2.14 about this being simpler for 2.15 didn't materialize? Not a blocker, but just for understanding.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They did a partial update of the ml_dtypes code as outlined in the comment. The Python part still uses 0.2.0 while the C++ code depends on 0.3.1 which comes with the new int4 type. In master, they have aligned it again: https://github.com/tensorflow/tensorflow/blob/99926785f7c9eaf53d94343916f14f300965ae72/tensorflow/tools/pip_package/setup.py#L93

If we want to get rid of pulling them in manually, we either need to add code to the libtensorflow_cc.targ.gz generation code or add support for pulling in ml_dtypes as a system dependency. Both are sadly more complicated than adding them as additional sources here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my understanding, is this the sort of thing that a future TensorFlow release will have resolved?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No idea 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have made a PR to staged-recipes to have the headers packaged separately: conda-forge/staged-recipes#24662

From 9e6d16913eedc72aad7ced7f9cdf07374c84fc8f Mon Sep 17 00:00:00 2001
From: "Uwe L. Korn" <uwe.korn@quantco.com>
Date: Sun, 19 Nov 2023 20:50:29 +0000
Subject: [PATCH] Blacklist well-known protos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you mention what this does or why it's necessary?

I'm assuming it will not (re)generate certain protos, and that those would be incompatible if we did regenerate?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem here was that this would pull in the well-known protos twice. Thus, when importing tensorflow, we got errors about already registered definitions. I found the fix in https://github.com/google/riegeli/blob/c2bcb54934acd28eace78bd4a1bf008347592cc4/third_party/protobuf.patch#L64

I will merge this patch with above one that adds the toolchain during cleanup.

Comment on lines 41 to 43
+ patch_file = [
+ "//third_party/py/ml_dtypes:int4.patch",
+ ],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, patch-ception, wonderful 😅

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add a comment that we need this patch for nvcc to be able to compile code that imports int4.

From b1c5c65cd5b4db7e06bcdf5f4886e744b324cfb0 Mon Sep 17 00:00:00 2001
From: "Uwe L. Korn" <uwe.korn@quantco.com>
Date: Thu, 23 Nov 2023 09:05:37 +0000
Subject: [PATCH] Remove some usage of absl::str_format in CUDA
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Causes linker errors?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, nvcc doesn't understand the C++ template due to new C++ features. We can only use absl::str_format in code that isn't parsed/ingested by nvcc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you recall which new C++ features were used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a combination of sizeof...(args) and std::enable_if.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting those look to be a C++11 (or in some cases C++14) features. Would have thought GCC 10 (on CUDA 11.2) and GCC 11 (on CUDA 11.8) would be new enough. Maybe there is some edge case that wasn't handled until a later GCC

@xhochy
Copy link
Member Author

xhochy commented Nov 28, 2023

Can you tell us what happened with the protobuf situation? It sounds you're adding some changes related to that, but I don't see the pinned versions changing in the .ci_support files.

Nothing has changed. All the protobuf related errors were down to the protobuf_toolchain and not the version itself. Once this is merged, I would start working on the unpinned build again.

except perhapse my recurring nit of generating the patches with --no-signature. ;-)

I always forget that option. If you find a way to set that as default globally in git, I would appreciate that.

@xhochy
Copy link
Member Author

xhochy commented Nov 29, 2023

I pushed the patches also as a branch to https://github.com/xhochy/tensorflow/tree/2.15.0-conda-forge-patches

recipe/meta.yaml Show resolved Hide resolved
Copy link
Contributor

@ngam ngam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks 💎

@xhochy
Copy link
Member Author

xhochy commented Dec 15, 2023

There were some issues now with the OSX builds but it seems we're fine and I have started the Linux and OSX builds now for all configurations.

@xhochy
Copy link
Member Author

xhochy commented Dec 17, 2023

Builds are on my uwe.korn-tf-gpu and uwe.korn-tf-experimental channels with the following logs:

@xhochy
Copy link
Member Author

xhochy commented Dec 17, 2023

@h-vetinari @hmaarrfk Please review/copy ;)

@hmaarrfk
Copy link
Contributor

Would the goal be for one of us to do light testing? I'm mostly trying to understand a protocol that we can follow in the future too.

@xhochy
Copy link
Member Author

xhochy commented Dec 17, 2023

Testing should hopefully be covered by the tests in the feedstock. Otherwise, we should extend that. I think isuruf scanned these logs on whether they used the right OSX SDK. But that was back then when build-locally.py didn't take care of that.

@hmaarrfk
Copy link
Contributor

It's just pretty hard to test hardware acceleration without guaranteed access to the right hardware.

I can scan the logs.

@hmaarrfk hmaarrfk merged commit 2998b25 into conda-forge:main Dec 17, 2023
1 of 17 checks passed
@hmaarrfk
Copy link
Contributor

thank you hugely

@yuvipanda
Copy link

(as a standby observer) - THANK YOU SO MUCH FOR WORKING ON THIS!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants