Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducible builds proposal: make gem define SOURCE_DATE_EPOCH itself #2290

Closed
2 of 5 tasks
anthraxx opened this issue May 14, 2018 · 19 comments · Fixed by #2882
Closed
2 of 5 tasks

reproducible builds proposal: make gem define SOURCE_DATE_EPOCH itself #2290

anthraxx opened this issue May 14, 2018 · 19 comments · Fixed by #2882

Comments

@anthraxx
Copy link
Contributor

I would like to suggest to make gem itself a potential SOURCE_DATE_EPOCH declarer instead of "only" making reproducible artifacts whenever the outside world defines the SOURCE_DATE_EPOCH environment variable.

While above works perfectly for all distros, as they and teir build tools and pipelines itself define SOURCE_DATE_EPOCH it would be awesome if the gem tool/script could define SOURCE_DATE_EPOCH itself.
This proposal would allow every gem aquired from the rubygems repository purely build with gem instead of any distro or other packaging related tool defining SOURCE_DATE_EPOCH to be independently reproduced.

This would need to check SOURCE_DATE_EPOCH in the gem command line tool, and if it is not yet define, it should define it to the current utc timestamp.

Example like it is done in Arch Linux's makepkg:
https://git.archlinux.org/pacman.git/tree/scripts/makepkg.sh.in#n93

This issue is related to:

  • Network problems
  • Installing a library
  • Publishing a library
  • The command line gem
  • Other

Related to pull requests:
#2289 #2278

Spec:
https://reproducible-builds.org/specs/source-date-epoch/

Buy-in:
https://reproducible-builds.org/docs/buy-in/

I will abide by the code of conduct.

@anthraxx
Copy link
Contributor Author

I just want to know if this is something that would be wanted/accepted by rubygems, then I can propose a pull request for this as well.

Awaiting feedback
CC @hsbt @segiddins @duckinator @lamby

@lamby
Copy link

lamby commented May 14, 2018

Just to underline that it should only set SOURCE_DATE_EPOCH if it's not already set (as you can see here: https://git.archlinux.org/pacman.git/tree/scripts/makepkg.sh.in#n91)

@anthraxx
Copy link
Contributor Author

@lamby thanks to make sure this happens that way, but thats literally what i wrote in the 3th block 🐱

@lamby
Copy link

lamby commented May 14, 2018

@anthraxx I saw exactly that, hence my "underline" :)

@bronzdoc
Copy link
Member

bronzdoc commented Jul 18, 2018

This would need to check SOURCE_DATE_EPOCH in the gem command line tool, and if it is not yet define, it should define it to the current utc timestamp.

Rubygems already does that, see https://github.com/rubygems/rubygems/blob/master/lib/rubygems/package.rb#L163

@anthraxx
Copy link
Contributor Author

anthraxx commented Jul 18, 2018 via email

@anthraxx
Copy link
Contributor Author

anthraxx commented Jul 18, 2018 via email

@hsbt hsbt reopened this Aug 11, 2018
@duckinator
Copy link
Member

Sorry for the late response, @anthraxx. I only just now saw this issue.

My understanding of the issue is that, every place RubyGems checks for SOURCE_DATE_EPOCH, it does something along the lines of:

time = ENV["SOURCE_DATE_EPOCH"] ? Time.at(ENV["SOURCE_DATE_EPOCH"].to_i) : Time.now

... which means if you don't explicitly define it, then the times won't be consistent throughout the entire process.

This means that you can only get reproducible builds via RubyGems by setting SOURCE_DATE_EPOCH outside of RubyGems.

Whereas if we check it in that way once, and set the environment variable ourselves if it is not already set, then we're defaulting to reproducible builds.

Is that correct, @anthraxx?

@anthraxx
Copy link
Contributor Author

Yes, the first set of commits i did just ensured everything works fine if we define SOURCE_DATE_EPOCH outside (which we do in our distro), which was the priority for me so i can have reproducible packages.

This is about making every gem package produced through gem and published to rubygems potentially reproducible.

To achieve this, as you summarized correctly, gem needs to define SOURCE_DATE_EPOCH env var once in an early stage (only if it is not yet defined from the outside) to the same uniform value for SOURCE_DATE_EPOCH and not Time.now. Doing this over an env var is important so any build process that invokes other libs or subprocesses that may respect SOURCE_DATE_EPOCH are also able to respect the same value.

@duckinator
Copy link
Member

@anthraxx 👍 okay. I'm definitely in favor of adding that.

@simi
Copy link
Member

simi commented Oct 25, 2018

@anthraxx I did some tests and it will be probably enough to pass build_time in Gem::Package to Gem::Package::TarWriter and use it at needed places. Feel free to ping me if you'll need any help with this.

@anthraxx
Copy link
Contributor Author

anthraxx commented Oct 25, 2018 via email

@MSP-Greg
Copy link
Contributor

MSP-Greg commented Dec 8, 2018

it would be awesome if the gem tool/script could define SOURCE_DATE_EPOCH itself.

Maybe if git log -1 --format=%at is correct/available, use its value?

@Foxboron
Copy link

Foxboron commented Apr 4, 2019

https://snyk.io/blog/malicious-remote-code-execution-backdoor-discovered-in-the-popular-bootstrap-sass-ruby-gem/

Attacks like this could be discovered more easily if this issue was resolved.

@duckinator
Copy link
Member

If I'm understanding everything correctly (about both reproducible builds and that CVE) then, if we can get both

  • fully-working reproducible builds using SOURCE_DATE_EPOCH (this issue), and
  • RubyGems.org listing the SOURCE_DATE_EPOCH value used (this would be a separate thing)

then you could re-build the gem locally using the same SOURCE_DATE_EPOCH value and the expected source (e.g. from git), and compare the checksum of that to the one at the bottom of the RubyGems.org page (e.g. https://rubygems.org/gems/bootstrap-sass/versions/3.2.0.3 ).

And, if that checksum were to be different, it'd signify the code has been modified between the expected source and released version.

Is that correct?


Assuming that is correct: if the SOURCE_DATE_EPOCH is provided in a machine-accessible way (e.g. via the rubygems.org API or in the .gem file or something), I think we could possibly even partially-automate this process.

E.g., have a tool that takes a gem name ("bootstrap-sass"), gem version ("3.2.0.3"), makes the required network requests to get other information, and see if it all matches up as expected.

@Foxboron
Copy link

Foxboron commented Apr 5, 2019

Is that correct?

Yes. Some testing might be needed to figure out if more env variables should be included. It might also be an idea to make this information available in a buildinfo file, but an API for this is sufficient.

have a tool that takes a gem name ("bootstrap-sass"), gem version ("3.2.0.3"), makes the required network requests to get other information, and see if it all matches up as expected.

I know rust has been working towards something like this, and on the distribution side we also have been working on tools to recreate packages.

@duckinator
Copy link
Member

I'm looking into this again tonight. I'm hoping to have at least a rough version of a PR for this done in the next few hours. 🙂

@Foxboron reading your link about buildinfo files, it looks like ArchLinux includes the .BUILDINFO file in the package (so it is signed as part of the package, instead of independently). Do you know of any downsides to this? Also, is there somewhere I can read more about both rust's work on recreating packages+ the tools for recreating packages that you mentioned?

duckinator added a commit to duckinator/rubygems that referenced this issue Aug 17, 2019
Fixes rubygems#2290.

1. `Gem::Specification.date` returns SOURCE_DATE_EPOCH when defined,
2. this commit makes RubyGems set it _persistently_ when not provided.

This combination means that you can build a gem, check the build time,
and use that value to generate a new build -- and then verify they're
the same.
@duckinator
Copy link
Member

I think I've got it working, including tests! #2882

duckinator added a commit to duckinator/rubygems that referenced this issue Aug 17, 2019
Fixes rubygems#2290.

1. `Gem::Specification.date` returns SOURCE_DATE_EPOCH when defined,
2. this commit makes RubyGems set it _persistently_ when not provided.

This combination means that you can build a gem, check the build time,
and use that value to generate a new build -- and then verify they're
the same.
duckinator added a commit to duckinator/rubygems that referenced this issue Aug 17, 2019
Fixes rubygems#2290.

1. `Gem::Specification.date` returns SOURCE_DATE_EPOCH when defined,
2. this commit makes RubyGems set it _persistently_ when not provided.

This combination means that you can build a gem, check the build time,
and use that value to generate a new build -- and then verify they're
the same.
duckinator added a commit to duckinator/rubygems that referenced this issue Aug 30, 2019
Fixes rubygems#2290.

1. `Gem::Specification.date` returns SOURCE_DATE_EPOCH when defined,
2. this commit makes RubyGems set it _persistently_ when not provided.

This combination means that you can build a gem, check the build time,
and use that value to generate a new build -- and then verify they're
the same.
@eli-schwartz
Copy link

@Foxboron reading your link about buildinfo files, it looks like ArchLinux includes the .BUILDINFO file in the package (so it is signed as part of the package, instead of independently). Do you know of any downsides to this? Also, is there somewhere I can read more about both rust's work on recreating packages+ the tools for recreating packages that you mentioned?

Embedding the buildinfo into our package means that in order to reproduce the package you must treat the full list of build VM packages as "build inputs". You cannot reproduce a package completely without making sure the versions of all dependencies and other system software are identical. On the other hand, versions of packages are things that might be expected to influence output anyway (different versions of gcc will surely emit different ELF binaries!)

The benefit of explicitly including it is that we can guarantee the buildinfo is always available -- it is attached by the same tool that is required to create a valid package, and you do not need to keep track of two different files (one being the release artifact/package, the other being the buildinfo file).

There are pros and cons to both sides. Debian has chosen to store the buildinfo for .deb packages separately, with the rationale that this makes it easier to, say, change the version of sed or gawk and still get the same .deb file. OTOH just yesterday it turns out the rubygem "gpgme" no longer builds from source when gawk 5 is installed due to issues with its bundled libgpg-error (see https://bugs.archlinux.org/task/63654 for details) so even the simple, non-obvious tools can have surprising ramifications...

Arch Linux is okay with requiring all known environment modifiers that have been declared to be significant, to be part of the input in the context of distribution packages in order to reproduce things. :)

ghost pushed a commit that referenced this issue Sep 14, 2019
2882: Set SOURCE_DATE_EPOCH env var if not provided. r=djberg96 a=duckinator

# Description:

Set SOURCE_DATE_EPOCH env var if not provided.

Fixes #2290.

1. `Gem::Specification.date` returns SOURCE_DATE_EPOCH when defined,
2. this commit makes RubyGems set it _persistently_ when not provided.

This combination means that you can build a gem, check the build time,
and use that value to generate a new build -- and then verify they're
the same.

# Tasks:

- [x] Describe the problem / feature
- [x] Write tests
- [x] Write code to solve the problem
- [ ] Get code review from coworkers / friends

I will abide by the [code of conduct](https://github.com/rubygems/rubygems/blob/master/CODE_OF_CONDUCT.md).


Co-authored-by: Ellen Marie Dash <the@smallest.dog>
@ghost ghost closed this as completed in d830d53 Sep 14, 2019
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants