Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tensorflow-haskell: Unbreak / update to Tensorflow 2.4 #119411

Closed

Conversation

mikesperber
Copy link
Contributor

Motivation for this change

This just updates the Haskell Tensorflow bindings to Tensorflow 2.4, which has been the default version for a few months.

This supersedes

#111399

... which has stalled. As per the discussion there, I'm submitting a fresh pull request.

Things done
  • Tested using sandboxing (nix.useSandbox on NixOS, or option sandbox in nix.conf on non-NixOS linux)
  • Built on platform(s)
    • NixOS
    • macOS
    • other Linux distributions
  • Tested via one or more NixOS test(s) if existing and applicable for the change (look inside nixos/tests)
  • Tested compilation of all pkgs that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review wip"
  • Tested execution of all binary files (usually in ./result/bin/)
  • Determined the impact on package closure size (by running nix path-info -S before and after)
  • Ensured that relevant documentation is up to date
  • Fits CONTRIBUTING.md.

Copy link
Member

@cdepillabout cdepillabout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like tensorflow-core-ops, tensorflow-logging, tensorflow, tensorflow-opgen, and tensorflow-ops are all marked as broken in pkgs/development/haskell-modules/configuration-hackage2nix.yaml.

Could you remove them from the broken packages list (assuming that they all compile)?

@mikesperber
Copy link
Contributor Author

@cdepillabout Ah, sorry - forgot that bit again. Done.

@cdepillabout
Copy link
Member

cdepillabout commented Apr 15, 2021

Thanks for fixing this.

When compiling tensorflow-ops, I'm seeing a problem with the tests:

$ nix-build -A haskellPackages.tensorflow-ops
these derivations will be built:
  /nix/store/3aws1msvblwnpwig3whn8r5dh4gfias6-tensorflow-ops-0.2.0.1.drv
building '/nix/store/3aws1msvblwnpwig3whn8r5dh4gfias6-tensorflow-ops-0.2.0.1.drv'...
setupCompilerEnvironmentPhase
Build with /nix/store/2ip3lwzqswai3zz33407h4b2q3har28p-ghc-8.10.4.
unpacking sources
unpacking source archive /nix/store/7yn2qvqpvbm460fkdhzvwzja985pfgmb-source
source root is source/tensorflow-ops
patching sources
compileBuildDriverPhase
setupCompileFlags: -package-db=/build/setup-package.conf.d -j4 +RTS -A64M -RTS -threaded -rtsopts
[1 of 1] Compiling Main             ( Setup.hs, /build/Main.o )
Linking Setup ...
configuring
configureFlags: --verbose --prefix=/nix/store/vx502lr8cqhamy7yp4vwp338lnki7qfg-tensorflow-ops-0.2.0.1 --libdir=$prefix/lib/$compiler --libsubdir=$abi/$libname --docdir=/nix/store/0603zibs9q8
vc88d964ngpm8yw20c4gx-tensorflow-ops-0.2.0.1-doc/share/doc/tensorflow-ops-0.2.0.1 --with-gcc=gcc --package-db=/build/package.conf.d --ghc-options=-j4 +RTS -A64M -RTS --disable-split-objs --e
nable-library-profiling --profiling-detail=exported-functions --disable-profiling --enable-shared --disable-coverage --enable-static --disable-executable-dynamic --enable-tests --disable-ben
chmarks --enable-library-vanilla --disable-library-for-ghci --ghc-option=-split-sections --extra-lib-dirs=/nix/store/hdpihl2yn8cpdqmc9sysbh3fvwsxchky-ncurses-6.2/lib --extra-lib-dirs=/nix/st
ore/vzqia3jcpy0xdqh4nzmw5qmdv6hx27dp-libffi-3.3/lib --extra-lib-dirs=/nix/store/ak9n4w3nsnvn5gxqyi3dhc342yk9ia06-gmp-6.2.1/lib
Using Parsec parser
Configuring tensorflow-ops-0.3.0.0...

...

[1 of 1] Compiling Main             ( tests/QueueTest.hs, dist/build/QueueTest/QueueTest-tmp/Main.o )
Linking dist/build/QueueTest/QueueTest ...
Preprocessing test suite 'BuildTest' for tensorflow-ops-0.3.0.0..
Building test suite 'BuildTest' for tensorflow-ops-0.3.0.0..
[1 of 1] Compiling Main             ( tests/BuildTest.hs, dist/build/BuildTest/BuildTest-tmp/Main.o )
Linking dist/build/BuildTest/BuildTest ...
running tests
Running 14 test suites...
Test suite OpsTest: RUNNING...
Test suite OpsTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-OpsTest.log
Test suite GradientTest: RUNNING...
Test suite GradientTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-GradientTest.log
Test suite MatrixTest: RUNNING...
Test suite MatrixTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-MatrixTest.log
Test suite VariableTest: RUNNING...
Test suite VariableTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-VariableTest.log
Test suite ArrayOpsTest: RUNNING...
Test suite ArrayOpsTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-ArrayOpsTest.log
Test suite NNTest: RUNNING...
Test suite NNTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-NNTest.log
Test suite RegressionTest: RUNNING...
Test suite RegressionTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-RegressionTest.log
Test suite TypesTest: RUNNING...
Test suite TypesTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-TypesTest.log
Test suite MiscTest: RUNNING...
Test suite MiscTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-MiscTest.log
Test suite EmbeddingOpsTest: RUNNING...
Test suite EmbeddingOpsTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-EmbeddingOpsTest.log
Test suite DataFlowOpsTest: RUNNING...
Test suite DataFlowOpsTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-DataFlowOpsTest.log
Test suite TracingTest: RUNNING...
Test suite TracingTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-TracingTest.log
Test suite QueueTest: RUNNING...
2021-04-15 01:46:48.721476: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU
 instructions in performance-critical operations:  AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-04-15 01:46:48.738886: I tensorflow/core/platform/profile_utils/cpu_utils.cc:112] CPU Frequency: 1991865000 Hz
testBasic: [Failed]
ERROR: Malformed TF_STRING tensor; Result: incomplete input
CallStack (from HasCallStack):
  error, called at src/TensorFlow/Types.hs:326:25 in tensorflow-0.3.0.0-6G0yW4KI5NJCJocu4cHCKp:TensorFlow.Types
testPump: [Failed]
ERROR: Malformed TF_STRING tensor; Result: incomplete input
CallStack (from HasCallStack):
  error, called at src/TensorFlow/Types.hs:326:25 in tensorflow-0.3.0.0-6G0yW4KI5NJCJocu4cHCKp:TensorFlow.Types
TensorFlowException TF_CANCELLED "Run call was cancelled"
testAsync: [Failed]
ERROR: Malformed TF_STRING tensor; Result: incomplete input
CallStack (from HasCallStack):
  error, called at src/TensorFlow/Types.hs:326:25 in tensorflow-0.3.0.0-6G0yW4KI5NJCJocu4cHCKp:TensorFlow.Types

         Test Cases  Total      
 Passed  0           0          
 Failed  3           3          
 Total   3           3          
Test suite QueueTest: FAIL
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-QueueTest.log
Test suite BuildTest: RUNNING...
Test suite BuildTest: PASS
Test suite logged to: dist/test/tensorflow-ops-0.3.0.0-BuildTest.log
13 of 14 test suites (13 of 14 test cases) passed.
builder for '/nix/store/3aws1msvblwnpwig3whn8r5dh4gfias6-tensorflow-ops-0.2.0.1.drv' failed with exit code 1
error: build of '/nix/store/3aws1msvblwnpwig3whn8r5dh4gfias6-tensorflow-ops-0.2.0.1.drv' failed

Are you not seeing this error?

@mikesperber
Copy link
Contributor Author

@cdepillabout Yes, I see it: Sorry for that - I ran the tests in a slightly different environment, I see now. I'll investigate, but might take me a few days.

Thanks for the feedback!

@cdepillabout
Copy link
Member

@mikesperber No problem, thanks for looking into this :-)

@mikesperber
Copy link
Contributor Author

Just logging some debugging work:

  • the problem is in decoding bytestrings, the general queue functionality seems to work
  • building from source in Stack, using nightly-2021-04-06 (same version as the Nix branch is using), does not exhibit this problem

@maralorn maralorn closed this May 7, 2021
@maralorn maralorn deleted the branch NixOS:haskell-updates May 7, 2021 21:55
@maralorn maralorn reopened this May 7, 2021
@sternenseemann sternenseemann deleted the branch NixOS:haskell-updates May 19, 2021 01:53
@mikesperber
Copy link
Contributor Author

I'm still on this one, it's just slow-going.

@sternenseemann
Copy link
Member

sternenseemann commented May 19, 2021 via email

@mikesperber
Copy link
Contributor Author

Just rebased the patch and did a bit more debugging.

The offsets for decoding strings seem to be out of whack. In QueueTest, the correct FFI.TensorData records for the strings look like this:

tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,2,72,105]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,3,66,97,114]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,3,66,97,122]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,3,66,97,122]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,5,65,115,121,110,99]}
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [0,0,0,0,0,0,0,0,5,65,115,121,110,99]}

The broken ones I'm seeing look like this:

tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [8,72,105,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
dataBytes: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [12,66,97,122,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
dataBytes: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
tensorData: TensorData {tensorDataDimensions = [], tensorDataType = DT_STRING, tensorDataBytes = [20,65,115,121,110,99,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0]}
dataBytes: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Feels like there's an offset of 8 going on.

@stale
Copy link

stale bot commented Jan 3, 2022

I marked this as stale due to inactivity. → More info

@stale stale bot added the 2.status: stale https://github.com/NixOS/nixpkgs/blob/master/.github/STALE-BOT.md label Jan 3, 2022
@sheepforce sheepforce mentioned this pull request Jul 28, 2022
12 tasks
@mikesperber
Copy link
Contributor Author

Superseded by #217812.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants