Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arrow/Flatbuffers incompatibility #6372

Closed
AlexDBlack opened this issue Sep 5, 2018 · 7 comments

Comments

@AlexDBlack
Copy link
Contributor

commented Sep 5, 2018

Arrow 0.9.0/0.10.0 (and possibly earlier) uses an old/incompatible/3rd-party version of flatbuffers, and we can't simply exclude this version and use the newer/official dependency.

The issue has been reported upstream here with more details: https://issues.apache.org/jira/browse/ARROW-3175
However, it may take some time before we get a fix and a new release of Arrow.
Consequently, we need a solution in the short term.

So far the only solution (other than downgrading everything to the old/unofficial arrow version - clearly not acceptable) is to shade both Arrow and the 3rd party flatbuffers dependency for use in DataVec (it's not possible to shade just the flatbuffers dependency alone)

@AlexDBlack AlexDBlack self-assigned this Sep 5, 2018

@AlexDBlack

This comment has been minimized.

Copy link
Contributor Author

commented Sep 9, 2018

So it looks like the API change here is pretty minimal on FlatBufferBuilder:
old version: int createString(String s)
new version: int createString(CharSequence s)

A couple of (fairly hackish) ideas here that might avoid shading the whole library are to modify just the FlatBufferBuilder class (to add createString(String)) or the Arrow Field class (to call createString(CharSequence s) instead). Either via ByteBuddy or excluding the classes from the build while substituting our own.

@agibsonccc

This comment has been minimized.

Copy link

commented Sep 9, 2018

I wouldn't be against this if we want to just avoid a workaround for now. @saudet how hard would custom arrow presets be?

@saudet

This comment has been minimized.

Copy link
Member

commented Sep 9, 2018

A lot harder than ByteBuddy to add one method :)

@AlexDBlack

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2018

OK, so I've spent a few hours messing around with ByteBuddy, and it was a good idea, but I don't think it'll work.
In principle, we can add the required FlatBufferBuilder method, that's not really an issue.
However, I'm seeing three blockers for this approach:

  1. Modifying classes can only be done before the class is first loaded - but we won't control order of classpath loading on the user's classpath. Specifically, if the user (or any of their dependent libraries) happen to load the FlatBufferBuilder class before any/all of our classes can execute the class modification, our modifications will fail
  2. The approach does not seem robust to different ways of constructing the classpath - ByteBuddy can't seem to find the class to modify when running in IntelliJ using "Shorten command line: classpath file"
  3. I can't get the modified class to load (see code below) - it complains about the class already being loaded, yet I'm sure that the class is first loaded internally in bytebuddy as part of the class modification process. I'm not sure why this is occuring (I'm following the docs/examples for adding a method) and can't see a way around this "already loaded" issue.
        ByteBuddyAgent.install();
        TypePool typePool = TypePool.Default.ofClassPath();
        new ByteBuddy()
                .redefine(typePool.describe("com.google.flatbuffers.FlatBufferBuilder").resolve(), ClassFileLocator.ForClassLoader.ofClassPath())
//                .rebase(typePool.describe("com.google.flatbuffers.FlatBufferBuilder").resolve(), ClassFileLocator.ForClassLoader.ofClassPath())
                .defineMethod("createString", int.class, Modifier.PUBLIC)
                .withParameters(String.class)
                .intercept(FixedValue.value(0))
                .make()
                .load(ClassLoader.getSystemClassLoader());  //Class already loaded!?


Plan B: Shade flatbuffers in the following way:

  1. Relocate one file only: FlatBufferBuilder to say org.nd4j.shade.flatbuffers.FlatBufferBuilder
  2. Add our own com.google.flatbuffers.FlatBufferBuilder class that extends org.nd4j.shade.flatbuffers.FlatBufferBuilder, adding the single createString(String) method that we need
  3. Use shaded flatbuffers artifact in our various projects (nd4j-arrow and datavec)
  4. Switch back to standard/unshaded Flatbuffers once Arrow 0.11 is out with the fix we need.
@AlexDBlack

This comment has been minimized.

Copy link
Contributor Author

commented Sep 10, 2018

I've implemented a fix via shading here: #6404

@AlexDBlack

This comment has been minimized.

Copy link
Contributor Author

commented Oct 17, 2018

@AlexDBlack AlexDBlack closed this Oct 17, 2018

@lock

This comment has been minimized.

Copy link

commented Nov 16, 2018

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@lock lock bot locked and limited conversation to collaborators Nov 16, 2018

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants
You can’t perform that action at this time.