Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flatbuffers may be interested in accepting this upstream #1

Closed
ahundt opened this issue Nov 13, 2015 · 6 comments
Closed

Flatbuffers may be interested in accepting this upstream #1

ahundt opened this issue Nov 13, 2015 · 6 comments

Comments

@ahundt
Copy link

ahundt commented Nov 13, 2015

Would you consider incorporating this tool directly into flatbuffers itself? It seems several people are interested here: google/flatbuffers#7.

@mikkelfj
Copy link
Contributor

I have given it some thought, but having a C schema compiler was a more important goal when the project started.

Personally I need to put a limit to how much additional work I put into this project as there are other pressing issues, but for anyone who wants to take it on:

There are several ways to go about an integration:

  1. write the codegenerator in C++ for flatc and reuse the runtime parts.

  2. rewrite the codegenerator to use binary schema and interface this with flatc.

  3. add the libflatcc library as is and call directly from flatc cli.

  4. alternative binary schema.

  5. Was avoided initially so projects didn't have to rely on a C++11 compiler and retrofitting it is a lot of work, but possible. There is great risk of the flatcc and flatc code generators quickly becoming incompatible. So it will also be difficult to maintain - not really a good option.

  6. was initially expected to be supported at some point. But I am not sure the binary schema has enough information. For example it doesn't order structs by dependency. That requires more work in code generation.

  7. This is currently the only practical short term solution. Error messages would be different, but the flatc could pre-check the schema before passing it on. That would be very simple to do since all path logic is compatible. Any edge cases in compatibility would have to be sorted as they are discovered, but that is only a good thing. However, I am not sure that Googles flatc project would want to make a major special case for one language backend - and it would blow up the code size significantly. It could work as an optional build option that pulls source from FlatCC github and enables C output in the command line interface.

  8. I am envisioning an alternative binary schema that includes hash tables for fast json parsing and which might also be a replacement for option 2) above. This requires a flatc backend to a new binary schema format that flatcc also uses internally. This would be the best solution and would work for other code generators in other languages as well. However, this is a long term objective and not something to expect any time soon and would require coordination with Googles FlatBuffers project. This would also clean up the flatcc codegenerator which currently works on obscure parser AST graphs.

But even with 4) there is very little benefit over simply passing the schema on to libflatcc - parsing text is after all not such a big deal in itself and I am not sure an alternative schema is worth the effort.

@ghost
Copy link

ghost commented Nov 16, 2015

We'd be happy to have C support directly in the main project. And while I understand the value of an all-C toolchain, for maintainability anything going into the main project would have to use the C++ schema parser.

So 1) is preferable, and 2) would be acceptable, but I don't see the point, since reading from the binary schema is not necessarily easier than just reading the C++ parser state.
3) is a maintenance problem, in which case I'd rather just point people to your project if they want to use C.
4) I is unlikely to happen.

@mikkelfj
Copy link
Contributor

This is understandable.

  1. is simpler not because it might be conceived as easier than the C++ interface, but because both schema frontend parsers can generate it such that the code generator can be shared and thus be maintained. The code generator would still be in C but use whatever frontend parser is attached. For MSVC as frontend target, a bit care is required.

As it is 2) is insufficient or at least suboptimal - but it doesn't have to be total rewrite to enhance the schema:

a) there need to be an order of structs - at least the order supplied in the original schema. The current listing is sorted for lookup by binary search. It can probably be worked around using forward struct declarations, but and additional vector in the schema with all symbols in original schema order would be useful - also for regenerating the text schema.
b) there need to be a list of include files in order of dependency - again - the order of includes in the schema is sufficient - without this either everything gets mashed together into a single file, or the user must manually add include statements for dependent schema.
c) there need to be a scope reference for each symbol. At can just be a dotted prefix in the name - FlatCC does this by default in the binary schema output, but can strip it.
d) I am not quite sure if dependent symbols are needed at code generation level, but if they are present there need to be a way to know if they are dependencies or something part of the current schema being generated.

With this covered, I think the code generator has enough information, just on top of my head.

@ghost
Copy link

ghost commented Nov 20, 2015

a) The binary search order is important for speed in most reflection cases. I think the better approach is if the C generator wants to work from this data, that it does its own sort. Doesn't even have to be a sort, simply iterate thru all struct multiple times, and only output those who don't rely on structs that haven't been generated yet.
b) Yes, the schema is self contained, and this makes sense for its use case. Replicating how the C++ parser handles seems like a very fragile thing to me.
c) / d) This is not data that most reflection uses care for, so.. can be done, but not ideal.

It seems like ensuring the binary schema holds all information the parser provides is more pain than it is worth. A better solution would be, if you want to keep sharing code between the two projects, is to put the bulk of your code generator in a .c file with an interface, then have specialized code call into that from both flatc and flatcc.

@mikkelfj
Copy link
Contributor

a) [edited] yes fix-point iteration over the dataset could work though it does require some book keeping the generator would like to avoid.

I think you suggestion about a common interface is good. Some data structures will be needed and whether they are C arrays of structs, or another flatbuffer schema is not really important as long as all the issues a) .. d) are being addressed somehow.

@mikkelfj
Copy link
Contributor

After considering the interface structure required for an integration, and the many small details required, it does not seem worthwhile for the time being. I am in favor of an extended binary schema format that all parses can output to and all code generators can generate from, but this is an extended effort for everyone involved so I will let it rest for now and close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants