Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not check in the generated file from proto #2616

Closed
liukun4515 opened this issue Aug 31, 2022 · 17 comments
Closed

Do not check in the generated file from proto #2616

liukun4515 opened this issue Aug 31, 2022 · 17 comments

Comments

@liukun4515
Copy link
Contributor

Can we add ignore item in the .ignore file to ignore the changes for arrow.flight.protocol.rs and arrow.flight.protocol.sql.rs?

Are you suggesting we don't check in the generated code? I personally agree with this, but I know that others prefer checking it in

What's your opinions about this? @alamb @carols10cents

I just know java/go don't check in the diff about the generated file in file.

Originally posted by @liukun4515 in #2586 (comment)

@liukun4515
Copy link
Contributor Author

After I changed the proto file and build them, there are many changes for the generated file.
generally, we should not check in the generated file.

@liukun4515 liukun4515 self-assigned this Aug 31, 2022
@alamb
Copy link
Contributor

alamb commented Aug 31, 2022

My opinion on this topic is that there is a tradeoff:

If you check in the generated file, then the required build tools (e.g. protobuf compiler) are not needed for most developers, thus sometimes simplifying the build environment. However, re-generating the file when changes are made is harder as now the special build tools are required.

Thus my opinion is that

  • if arrow-flight already require the special build tools (e.g. protoc) then it would be fine to not check in the generated files.
  • However, if arrow-flight does not currently require protoc to be installed, and this change would begin requiring protoc I like that less (though I would personally still be ok)

All the build environments I deal with (Arrow CI, DataFusion CI, IOx CI) all need protoc so it wouldn't affect me or my projects personally.

@liukun4515
Copy link
Contributor Author

My opinion on this topic is that there is a tradeoff:

If you check in the generated file, then the required build tools (e.g. protobuf compiler) are not needed for most developers, thus sometimes simplifying the build environment. However, re-generating the file when changes are made is harder as now the special build tools are required.

Thus my opinion is that

  • if arrow-flight already require the special build tools (e.g. protoc) then it would be fine to not check in the generated files.
  • However, if arrow-flight does not currently require protoc to be installed, and this change would begin requiring protoc I like that less (though I would personally still be ok)

All the build environments I deal with (Arrow CI, DataFusion CI, IOx CI) all need protoc so it wouldn't affect me or my projects personally.

I will check that concerned from you.
If need the compile tool e.g. protoc, we will not delete the generated file.

@avantgardnerio
Copy link
Contributor

I think a fair compromise in this case is adding documentation to brew / snap / choco install protoc, which should be a single line per OS.

@alamb
Copy link
Contributor

alamb commented Aug 31, 2022

Our CI jobs probably also need to install protoc which we can point to as another set of instructions of what to use

@avantgardnerio
Copy link
Contributor

as another set of instructions

☝️ this, 100% scripts as docs never get out of date and the testing is built in :)

It's too bad docker doesn't support snaps because more stuff (including protobuf) is being distributed that way.

@tustvold
Copy link
Contributor

tustvold commented Sep 1, 2022

If you check in the generated file, then the required build tools (e.g. protobuf compiler) are not needed for most developers

Can we verify this, I'm fairly certain this isn't the case. The build.rs will always require these tools, as it will always "generate" the file regardless if it already exists, based on the modification timestamps of the sources. If the source protos are new, e.g. the repo was just checked out, the generation process will be run and protoc will be required.

@alamb
Copy link
Contributor

alamb commented Sep 1, 2022

Yes -- exactly, we should verify this. If we always regenerate the files anyways, having them checked in seems unnecessary. If we are going to check the in, we should change our build system (e.g. remove build.rs somehow) to not require them.

@alamb
Copy link
Contributor

alamb commented Sep 1, 2022

BTW there is a related discussion in tikv/pprof-rs#158 (comment) (which is all fallout from prost-build no longer bundling protoc)

@avantgardnerio
Copy link
Contributor

Since what kicked this off was prost unbundling protoc, how would everyone feel about bundling it ourselves?

We could modify build.rs to look for it in the target folder, check the platform type, and download the appropriate binary if not present already?

@alamb
Copy link
Contributor

alamb commented Sep 2, 2022

Since what kicked this off was prost unbundling protoc, how would everyone feel about bundling it ourselves?

I would personally prefer not to do this. Reasoning:

  1. I selfishly don't want to be maintaining a protoc build system in arrow
  2. Any system that uses prost-build itself and depends on arrow will have to install protoc some other way (so at best we'll simply be using a different protoc for arrow-flight than the rest of that downstream project). My instincts tell me this will cause all sorts of silly build issues

@avantgardnerio
Copy link
Contributor

will have to install protoc some other way

To clarify, I meant build.rs would download and extract protoc - all other builds and developers would not have to be aware of this step, just as it was when prost bundled it. "it just works"

@alamb
Copy link
Contributor

alamb commented Sep 4, 2022

To clarify, I meant build.rs would download and extract protoc - all other builds and developers would not have to be aware of this step, just as it was when prost bundled it. "it just works"

👍 My point was that I think this solution would only be used for arrow-rs.

Thus, if the downstream project used protobuf for some other purpose (e.g. datafusion/ballista using it to serialize plans) then that build system for the downstream project still needs to manage an install protobuf itself and then potentially has two versions of protoc (and thus two potential sources of failure):

  1. The one installed / managed by arrow-rs
  2. The one installed / managed by itself

Maybe this is better for some reason I don't get yet, but it seems like it might be even worse than the current state of affairs

@liukun4515 liukun4515 removed their assignment Sep 5, 2022
@avantgardnerio
Copy link
Contributor

better for some reason I don't get yet

No, you got it. I just have a strong preference for zero-dev-setup when it can be done. I'm happy enough leaving it unbundled. You've seen the datafusion PR, and that exemplifies what I'm proposing for arrow. Perhaps we can see how that works out and let it inform our decision over here.

@alamb
Copy link
Contributor

alamb commented Sep 5, 2022

For anyone following along, apache/datafusion#3333 is the datafusion PR

@tustvold
Copy link
Contributor

We ended up checking the generated code into DataFusion in order to avoid people needing to have protoc installed - apache/datafusion#3950

@tustvold
Copy link
Contributor

tustvold commented Dec 6, 2022

The consensus appears to be that we would prefer to not require downstreams to have protoc installed, and so vendoring the source code is where consensus has landed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants