ARROW-2843: [Format] Removing field layout from Schema.fbs breaks bac…#2263
ARROW-2843: [Format] Removing field layout from Schema.fbs breaks bac…#2263yufeldman wants to merge 1 commit intoapache:masterfrom
Conversation
…kward compatibility - Adding removed field back and mark deprecated to prevent from using it and to reduce generated code footprint
jacques-n
left a comment
There was a problem hiding this comment.
Let's review flatbuf spec. I believe a simple bytes field is serialization compatible with a layout which should allow us to remove Vector Layout from the definition
|
I think adding this field back in will make metadata written with 0.8.0 or 0.9.0 unreadable. I can confirm if that would be helpful. I don't think we should break the metadata two more times (adding this would take us to V5, then removing it again would take us to V6). |
This suggests that maintaining backwards compatibility would require keeping a deprecated field in the schema forever. We should confirm, though |
|
regarding typing for deprecated fields - yes looks like it could be a placeholder per say: There was a question
And and answer
|
|
Here is some test code in Python to write a schema to a file, then read it, so different codebase revisions can be compared: I will double check but it seems that these changes do not impact data written with 0.9.x. I'm concerned about having a lot of cruft lying around in our protocol, particularly when the justification for the cruft is based on a contraindicated use for Arrow that has not reached a stage of maturity where we have committed to maintaining backward compatibility. |
|
Regarding data generated with arrow 0.8, 0.9 - very good point. We definitely need to test it. |
|
Sorry that test that I gave is not sufficient because it's the field metadata that's impacted. I'll post an updated one |
Data written by 0.9.0 or master branch, after this patch the metadata seems to be dropped from reads (but the process doesn't crash in this example at least). |
|
You mean when you write with 0.9 and read with my patch, right? |
|
Yes, write with 0.9.0, then read with this patch. The generated C++ bindings don't validate by default; it's possible that using a Verifier would catch the table VTable mismatch |
|
Ok, Let me see what to do with versions 0.8 and 0.9 - we can't abandon them. |
|
I'm personally fine with breaking the metadata if it's the right thing to do because we have warned users not to use Arrow for persistence yet. We're discussing two options right now:
I agree that neither is ideal. I would prefer to not have the cruft |
|
Yes, let me get back on that. Need to think if/how we can keep "wolfs fed and sheep not eaten" :). |
|
One option to help with migrating persisted data: you can write these schemas to JSON with Arrow 0.7.x, then parse the JSON (which has not changed, save for the omission of the vector layout) with 0.9.x/0.10.x and write out the new binary data. |
|
Thank you for the suggestion. Though not sure why would I need to write it out to JSON first - I could probably write it directly to binary withe the omission of vector layout (after reading with earlier version)? Am I missing anything? |
|
Ah, I was suggesting as an alternative to having to fashion a reader for the old protocol and a writer for the new protocol in the same process. If you made a simple CLI for JSON <-> Binary (like the integration tests do) and to run it against two JARs (one to read, the other to write). |
|
Understood. Thank you for the suggestion. Will see. |
|
I am going to close this PR and JIRA. |
Fix backward compatibility of Schema