Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An attempt to create flatc support, and using the Arrow.jl FlatBuffer submodule as the 'official' implementation #61

Open
jonalm opened this issue Dec 19, 2021 · 11 comments

Comments

@jonalm
Copy link
Collaborator

jonalm commented Dec 19, 2021

I'm opening this issue to discuss the prospect of transitioning into using the Arrow.jl FlatBuffer submodule as the 'official' julia FlatBuffer implementation. The main motivation for doing so is two fold. First, the current implementation isn't really 'flat' in the sense that you need to parse the buffer by mapping the data into julia structs, rather than querying the buffer directly, which is a major motivation for using FlatBuffers in the first place. Second, as the Arrow.jl implementation is based on the official go implementation of FlatBuffers, which makes it easy to implement flatc support by modifying existing code. Having flatc support means that buffer access functionality is code generated. This is needed to be granted "official supported language" status, and to avoid having to manually implement julia code to match any given FlatBuffer schema.

I have a WIP fork of the flatc code here, which generates Arrow.jl FlatBuffer compatible julia code for fbs schemas. Some major current limitations is that it doesn't handle union types, and that all the generated code is put in a single module scope (I haven't figured out how to deal with the schema namespace yet).

I also have a WIP fork of FlatBuffer.jl here, containing the Arrow.jl code, and some updated tests from the official Monster.fbs example schema. In particular, the code generation is made here, and the generated code example is here.

@evetion
Copy link

evetion commented Apr 26, 2022

@jonalm I've used your code--both flatc and FlatBuffer.jl--to update Flatgeobuf.jl.

Most things are smooth, but hit issues on definitions that are field: [double]. They generate the following code FlatBuffers.Array{Float64}(x, o) 1, and on parsing I get:

ERROR: ArgumentError: unsafe_wrap: pointer 0x7fb952ec2d0c is not properly aligned to 8 bytes
Stacktrace:
 [1] #unsafe_wrap#89
   @ ./pointer.jl:89 [inlined]
 [2] unsafe_wrap
   @ ./pointer.jl:89 [inlined]
 [3] (FlatBuffers.Array{Float64})(t::FlatGeobuf.Gen.Geometry, off::UInt16)
   @ FlatBuffers ~/.julia/packages/FlatBuffers/3YdKu/src/table.jl:107
 [4] getproperty(x::FlatGeobuf.Gen.Geometry, field::Symbol)
   @ FlatGeobuf.Gen ~/code/FlatGeobuf.jl/src/schema/generated.jl:35
 [5] top-level scope
   @ REPL[72]:1

The new code seems a huge improvement in terms of performance, as I don't need to allocate thousands of (empty) structs anymore. 👍🏻

Footnotes

  1. https://github.com/evetion/FlatGeobuf.jl/blob/feat/new-flatbuffers/src/schema/generated.jl#L561 and https://github.com/evetion/FlatGeobuf.jl/blob/feat/new-flatbuffers/src/schema/generated.jl#L35

@jonalm
Copy link
Collaborator Author

jonalm commented Apr 26, 2022

Thanks for the bug report @evetion! I added a test to the branch, using an array of doubles, which currently fails with the same error message that you describe. May have a chance to look a bit more into it later this week.

@jonalm
Copy link
Collaborator Author

jonalm commented Apr 26, 2022

@jonalm
Copy link
Collaborator Author

jonalm commented Apr 27, 2022

I've been digging a bit. It seems to be an issue with all arrays of native types, I've added tests for reading and writing arrays of int32 floats and doubles, see here https://github.com/jonalm/FlatBuffers.jl/blob/af5a137609912f3ca46ca7fc2773239835b4fa17/test/runtests.jl#L114. Next step would be to investigate if the issue stems from writing or reading the buffer, could perhaps store the modified 'moster' buffer used in tests and parse it in python?

I suspect that it is related to parsing, and that the issue is here somewhere https://github.com/jonalm/FlatBuffers.jl/blob/af5a137609912f3ca46ca7fc2773239835b4fa17/src/table.jl#L103. Note that this code has been copied from the Arrow.jl repo, and that particular function was last modified here apache/arrow-julia#234 , I got that to work almost by accident so I would not be surprised if it contains some buggy behaviour. @quinnj do you have any input here? I believe that this issue would be relevant for Arrow.jl.

@evetion
Copy link

evetion commented Apr 27, 2022

FWIW I'm parsing "official" flat(geo)buffer files, so the issue will at least be in reading/parsing.

@evetion
Copy link

evetion commented Apr 27, 2022

Well, a complete hack like this for L106 makes it work.

# ptr = convert(Ptr{S}, pointer(bytes(t), a + 1))
# data = unsafe_wrap(Base.Array, ptr, vectorlen(t, off))
data = reinterpret(S, bytes(t)[a+1:a+sizeof(S)*vectorlen(t, off)])

In this case, one avoids the bytes(t) with a length of 604 and the offset a+1 of 85, which indeed doesn't align at all to 8 bytes...

@jonalm
Copy link
Collaborator Author

jonalm commented Apr 27, 2022

@evetion does that give correct numerical values for your official buffer files? I've updated the branch with your temporary solution, with tests. The test now run, but the fail in writing/reading the same data. Do you have access to some official buffers that are public? If so I can set up some more tests.

@evetion
Copy link

evetion commented Apr 28, 2022

They do. This temporary solution indicates that the data and the found offset is correct, but that the bytes have been shifted and are now misaligned? Writing them will produce corrupt data as you've found.. edit: bytes are not shifted, just aligned to 4 bytes (and the 85 threw me off, but we do +1 indexing), so unsafe_warp won't work for types > 4 bytes such as doubles. Reinterpret is the way.

Tomorrow I'll check the original file and scan for the correct data to find the actual offset, to see how this compares to the internal buffer.

@evetion
Copy link

evetion commented Apr 28, 2022

@jonalm In your tests, you should replace prependoffset! by prepend! for the vectors. And if you then replace the ... === floats_[1] by ... === floats[1], you're golden.

@jonalm
Copy link
Collaborator Author

jonalm commented Apr 29, 2022

Thanks, @evetion, I'll update this when I get to it. Also I moved this fork to it's own repo https://github.com/jonalm/FlatBuffers2.jl , this will allow us to track issues independently (I expect more), and to compare it with the current FlatBuffer.jl implementation.

@quinnj
Copy link
Member

quinnj commented May 4, 2022

@jonalm, I'm happy to do a complete overhaul of the package code here and make you an admin if that would end up cleaner. We can just hold off on tagging the new breaking major release until you think things are ready.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants