Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hardlinks are lost by git build cacher #827

Open
richardc opened this issue Mar 13, 2018 · 1 comment
Open

Hardlinks are lost by git build cacher #827

richardc opened this issue Mar 13, 2018 · 1 comment

Comments

@richardc
Copy link
Contributor

Description

The build cache does not preserve hardlinks in the install directory. This means things like git end up with the version in embedded/bin/git and a copy (installed by the software definition as a hardlink) as embedded/libexec/git-core/git

Omnibus Version

5.6.10 Reproduced with master@9a3a97bb0c6b

Platform Version

CentOS7

omnibus 9a3a97b

Replication Case

https://gitlab.com/richardc/omnibus-helloworld/tree/git-hardlink-caches

The hardlinks project deliberately creates a hardlink from "#{install_dir}/bin/helloworld"

On first run, this correctly creates a hardlinked file in the output directory, and the resulting packages also feature hardlinks:

# ls -al /opt/hardlinks/bin
total 16
drwxr-xr-x 2 root root 4096 Mar 13 13:06 .
drwxr-xr-x 4 root root 4096 Mar 13 13:06 ..
-rwxr-xr-x 2 root root   29 Mar 13 13:06 helloworld
-rwxr-xr-x 2 root root   29 Mar 13 13:06 helloworld-hardlink

(Link count of 2 in the second column)

On a subsequent build, and a build cache hit, they are present as two identical files, not the expected hardlink.

# bundle exec omnibus build hardlinks
Building hardlinks 0.0.1+20180313131129...
[snip]
    [Software: hardlinks] I | 2018-03-13T13:11:29+00:00 | Restored from cache

# ls -al /opt/hardlinks/bin
total 16
drwxr-xr-x 2 root root 4096 Mar 13 13:11 .
drwxr-xr-x 4 root root 4096 Mar 13 13:11 ..
-rwxr-xr-x 1 root root   29 Mar 13 13:11 helloworld
-rwxr-xr-x 1 root root   29 Mar 13 13:11 helloworld-hardlink
@richardc
Copy link
Contributor Author

I spiked up something to track hardlinks in the commit message of the build cache.

master...richardc:rc-spike-hardlink-recording

It does fix it for my reproduction, but it's a little messy and is possibly a wrong-headed approach

avar added a commit to avar/git that referenced this issue Mar 13, 2018
Add a INSTALL_SYMLINKS option which if enabled, changes the default
hardlink installation method to one where the relevant binaries in
libexec/git-core are symlinked back to ../../bin, instead of being
hardlinked.

This new option also overrides the behavior of the existing
NO_*_HARDLINKS variables which in some cases would produce symlinks
within to libexec/, e.g. "git-add" symlinked to "git" which would be
copy of the "git" found in bin/, now "git-add" in libexec/ is always
going to be symlinked to the "git" found in the bin/ directory.

This option is being added because:

 1) I think it makes what we're doing a lot more obvious. E.g. I'd
    never noticed that the libexec binaries were really just hardlinks
    since e.g. ls(1) won't show that in any obvious way. You need to
    start stat(1)-ing things and look at the inodes to see what's
    going on.

 2) Some tools have very crappy support for hardlinks, e.g. the Git
    shipped with GitLab is much bigger than it should be because
    they're using a chef module that doesn't know about hardlinks, see
    chef/omnibus#827

    I've also ran into other related issues that I think are explained
    by this, e.g. compiling git with debugging and rpm refusing to
    install a ~200MB git package with 2GB left on the FS, I think that
    was because it doesn't consider hardlinks, just the sum of the
    byte size of everything in the package.

As for the implementation, the "../../bin" noted above will vary given
some given some values of "../.." and "bin" depending on the depth of
the gitexecdir relative to the destdir, and the "bindir" target,
e.g. setting "bindir=/tmp/git/binaries gitexecdir=foo/bar/baz" will do
the right thing and produce this result:

    $ file /tmp/git/foo/bar/baz/git-add
    /tmp/git/foo/bar/baz/git-add: symbolic link to ../../../binaries/git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
avar added a commit to avar/git that referenced this issue Mar 14, 2018
Add a INSTALL_SYMLINKS option which if enabled, changes the default
hardlink installation method to one where the relevant binaries in
libexec/git-core are symlinked back to ../../bin, instead of being
hardlinked.

This new option also overrides the behavior of the existing
NO_*_HARDLINKS variables which in some cases would produce symlinks
within to libexec/, e.g. "git-add" symlinked to "git" which would be
copy of the "git" found in bin/, now "git-add" in libexec/ is always
going to be symlinked to the "git" found in the bin/ directory.

This option is being added because:

 1) I think it makes what we're doing a lot more obvious. E.g. I'd
    never noticed that the libexec binaries were really just hardlinks
    since e.g. ls(1) won't show that in any obvious way. You need to
    start stat(1)-ing things and look at the inodes to see what's
    going on.

 2) Some tools have very crappy support for hardlinks, e.g. the Git
    shipped with GitLab is much bigger than it should be because
    they're using a chef module that doesn't know about hardlinks, see
    chef/omnibus#827

    I've also run into other related issues that I think are explained
    by this, e.g. compiling git with debugging and rpm refusing to
    install a ~200MB git package with 2GB left on the FS, I think that
    was because it doesn't consider hardlinks, just the sum of the
    byte size of everything in the package.

As for the implementation, the "../../bin" noted above will vary given
some values of "../.." and "bin" depending on the depth of the
gitexecdir relative to the destdir, and the "bindir" target,
e.g. setting "bindir=/tmp/git/binaries gitexecdir=foo/bar/baz" will do
the right thing and produce this result:

    $ file /tmp/git/foo/bar/baz/git-add
    /tmp/git/foo/bar/baz/git-add: symbolic link to ../../../binaries/git

FIXME: Needs to actually be rebased on the NO_INSTALL_CP_FALLBACK
change, now it just clobbers it.

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
gitster pushed a commit to git/git that referenced this issue Mar 15, 2018
Add a INSTALL_SYMLINKS option which if enabled, changes the default
hardlink installation method to one where the relevant binaries in
libexec/git-core are symlinked back to ../../bin, instead of being
hardlinked.

This new option also overrides the behavior of the existing
NO_*_HARDLINKS variables which in some cases would produce symlinks
within to libexec/, e.g. "git-add" symlinked to "git" which would be
copy of the "git" found in bin/, now "git-add" in libexec/ is always
going to be symlinked to the "git" found in the bin/ directory.

This option is being added because:

 1) I think it makes what we're doing a lot more obvious. E.g. I'd
    never noticed that the libexec binaries were really just hardlinks
    since e.g. ls(1) won't show that in any obvious way. You need to
    start stat(1)-ing things and look at the inodes to see what's
    going on.

 2) Some tools have very crappy support for hardlinks, e.g. the Git
    shipped with GitLab is much bigger than it should be because
    they're using a chef module that doesn't know about hardlinks, see
    chef/omnibus#827

    I've also ran into other related issues that I think are explained
    by this, e.g. compiling git with debugging and rpm refusing to
    install a ~200MB git package with 2GB left on the FS, I think that
    was because it doesn't consider hardlinks, just the sum of the
    byte size of everything in the package.

As for the implementation, the "../../bin" noted above will vary given
some given some values of "../.." and "bin" depending on the depth of
the gitexecdir relative to the destdir, and the "bindir" target,
e.g. setting "bindir=/tmp/git/binaries gitexecdir=foo/bar/baz" will do
the right thing and produce this result:

    $ file /tmp/git/foo/bar/baz/git-add
    /tmp/git/foo/bar/baz/git-add: symbolic link to ../../../binaries/git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit to git-for-windows/git that referenced this issue May 29, 2018
Add a INSTALL_SYMLINKS option which if enabled, changes the default
hardlink installation method to one where the relevant binaries in
libexec/git-core are symlinked back to ../../bin, instead of being
hardlinked.

This new option also overrides the behavior of the existing
NO_*_HARDLINKS variables which in some cases would produce symlinks
within to libexec/, e.g. "git-add" symlinked to "git" which would be
copy of the "git" found in bin/, now "git-add" in libexec/ is always
going to be symlinked to the "git" found in the bin/ directory.

This option is being added because:

 1) I think it makes what we're doing a lot more obvious. E.g. I'd
    never noticed that the libexec binaries were really just hardlinks
    since e.g. ls(1) won't show that in any obvious way. You need to
    start stat(1)-ing things and look at the inodes to see what's
    going on.

 2) Some tools have very crappy support for hardlinks, e.g. the Git
    shipped with GitLab is much bigger than it should be because
    they're using a chef module that doesn't know about hardlinks, see
    chef/omnibus#827

    I've also ran into other related issues that I think are explained
    by this, e.g. compiling git with debugging and rpm refusing to
    install a ~200MB git package with 2GB left on the FS, I think that
    was because it doesn't consider hardlinks, just the sum of the
    byte size of everything in the package.

As for the implementation, the "../../bin" noted above will vary given
some given some values of "../.." and "bin" depending on the depth of
the gitexecdir relative to the destdir, and the "bindir" target,
e.g. setting "bindir=/tmp/git/binaries gitexecdir=foo/bar/baz" will do
the right thing and produce this result:

    $ file /tmp/git/foo/bar/baz/git-add
    /tmp/git/foo/bar/baz/git-add: symbolic link to ../../../binaries/git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
dscho pushed a commit to git-for-windows/git that referenced this issue May 29, 2018
Add a INSTALL_SYMLINKS option which if enabled, changes the default
hardlink installation method to one where the relevant binaries in
libexec/git-core are symlinked back to ../../bin, instead of being
hardlinked.

This new option also overrides the behavior of the existing
NO_*_HARDLINKS variables which in some cases would produce symlinks
within to libexec/, e.g. "git-add" symlinked to "git" which would be
copy of the "git" found in bin/, now "git-add" in libexec/ is always
going to be symlinked to the "git" found in the bin/ directory.

This option is being added because:

 1) I think it makes what we're doing a lot more obvious. E.g. I'd
    never noticed that the libexec binaries were really just hardlinks
    since e.g. ls(1) won't show that in any obvious way. You need to
    start stat(1)-ing things and look at the inodes to see what's
    going on.

 2) Some tools have very crappy support for hardlinks, e.g. the Git
    shipped with GitLab is much bigger than it should be because
    they're using a chef module that doesn't know about hardlinks, see
    chef/omnibus#827

    I've also ran into other related issues that I think are explained
    by this, e.g. compiling git with debugging and rpm refusing to
    install a ~200MB git package with 2GB left on the FS, I think that
    was because it doesn't consider hardlinks, just the sum of the
    byte size of everything in the package.

As for the implementation, the "../../bin" noted above will vary given
some given some values of "../.." and "bin" depending on the depth of
the gitexecdir relative to the destdir, and the "bindir" target,
e.g. setting "bindir=/tmp/git/binaries gitexecdir=foo/bar/baz" will do
the right thing and produce this result:

    $ file /tmp/git/foo/bar/baz/git-add
    /tmp/git/foo/bar/baz/git-add: symbolic link to ../../../binaries/git

Signed-off-by: Ævar Arnfjörð Bjarmason <avarab@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant