Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bug] hard links are not preserved during creation of conan_package.tgz #6810

Closed
marsmike opened this issue Apr 6, 2020 · 9 comments
Closed

Comments

@marsmike
Copy link

marsmike commented Apr 6, 2020

Dear Conan Community, thank you all for your valuable time and great work: We simply love Conan!

There is an issue when using Conan to package tools and compilers from tar balls (i.e. cross compilation toolchains) which are containing many hard links. To reduce the size on the hard disk, many existing toolchains are packaged as *.tar.bz2 file and in there are a lot of the same files which are hard linked together.

Finding: The hard links are ignored, when creating the final conan_package.tgz. Conan creates copies of the files.

This results in larger Conan packages and a slower package creation and upload process. Depending on the toolchain, the Conan package size is one to several GB larger.

Environment Details

  • Operating System + version: Ubuntu 18.04 + Mac OS 10.15.4
  • Conan version: 1.24
  • Python version: 3.x ; 3.7.7

Steps to reproduce

Created a tar.gz file with hard links, file_10MB is the original file, link1..link4 are hard links:

dd if=/dev/urandom bs=1024 count=10000 of=**file_10MB** conv=notrunc
ln file_10MB file_10MB_link1
...
ln file_10MB file_10MB_link4
tar zcvf file* hard_links_example.tar.gz

File structure, all file* pointing to the same inode:

ls -liha
9971364 -rw-r--r--   5 mike  staff   9,8M  6 Apr 20:15 file_10MB
9971364 -rw-r--r--   5 mike  staff   9,8M  6 Apr 20:15 file_10MB_link1
9971364 -rw-r--r--   5 mike  staff   9,8M  6 Apr 20:15 file_10MB_link2
9971364 -rw-r--r--   5 mike  staff   9,8M  6 Apr 20:15 file_10MB_link3
9971364 -rw-r--r--   5 mike  staff   9,8M  6 Apr 20:15 file_10MB_link4
9971361 -rw-r--r--   1 mike  staff   9,8M  6 Apr 20:48 hard_links_example.tar.gz

Tar respects the hard links, size of resulting tarball is 10MB.

Now create a Conan package with this conanfile.py:

import os
from conans import ConanFile, tools

class ToolchainConan(ConanFile):
    name = "toolchain-example"
    version = "1.0"
    author = "user"
    # An example tar.gz with a 10 MB file and 4 hard links to it
    url="https://lithium.li/hard_links_example.tar.gz"

    @property
    def default_user(self):
        return "user"

    @property
    def default_channel(self):
        return "channel"

    def build(self):
        tools.download(self.url, "hard_links_example.tar.gz", verify="true")
        tools.unzip("hard_links_example.tar.gz", self.package_folder)
        os.unlink("hard_links_example.tar.gz")

    def package(self):
        self.output.info("tar.gz extracted to package folder to preserve hard links")

Logs

conan create . && conan upload '*' -r *** --allExporting package recipe
toolchain-example/1.0: A new conanfile.py version was exported
toolchain-example/1.0: Folder: /Users/mike/.conan/data/toolchain-example/1.0/user/channel/export
toolchain-example/1.0: Exported revision: 3a1b53f51b2a5ab995328574f6ef57ea
Configuration:
[settings]
...
toolchain-example/1.0@user/channel: Forced build from source
Installing package: toolchain-example/1.0@user/channel
Requirements
    toolchain-example/1.0@user/channel from local cache - Cache
Packages
    toolchain-example/1.0@user/channel:5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9 - Build

Installing (downloading, building) binaries...
toolchain-example/1.0@user/channel: Configuring sources in /Users/mike/.conan/data/toolchain-example/1.0/user/channel/source
toolchain-example/1.0@user/channel: Copying sources to build folder
toolchain-example/1.0@user/channel: Building your package in /Users/mike/.conan/data/toolchain-example/1.0/user/channel/build/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9
toolchain-example/1.0@user/channel: Generator txt created conanbuildinfo.txt
toolchain-example/1.0@user/channel: Calling build()
Downloading hard_links_example.tar.gz completed [10003.41k]

toolchain-example/1.0@user/channel: Package '5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9' built
toolchain-example/1.0@user/channel: Build folder /Users/mike/.conan/data/toolchain-example/1.0/user/channel/build/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9
toolchain-example/1.0@user/channel: Generated conaninfo.txt
toolchain-example/1.0@user/channel: Generated conanbuildinfo.txt
toolchain-example/1.0@user/channel: Generating the package
toolchain-example/1.0@user/channel: Package folder /Users/mike/.conan/data/toolchain-example/1.0/user/channel/package/5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9
toolchain-example/1.0@user/channel: Calling package()
toolchain-example/1.0@user/channel: tar.gz extracted to package folder to preserve hard links
toolchain-example/1.0@user/channel package(): Packaged 5 files
toolchain-example/1.0@user/channel: Package '5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9' created
toolchain-example/1.0@user/channel: Created package revision 38b0db1137f3f356e544095e33fb9fd0
Are you sure you want to upload 'toolchain-example/1.0@user/channel' to 'kube'? (yes/no): y
Uploading to remote 'kube':
Uploading toolchain-example/1.0@user/channel to remote 'kube'
Uploaded conanfile.py -> toolchain-example/1.0@user/channel [0.72k]
Uploaded conanmanifest.txt -> toolchain-example/1.0@user/channel [0.06k]
Uploaded conan recipe 'toolchain-example/1.0@user/channel' to 'kube': https://***/artifactory/api/conan/conan
Uploading package 1/1: 5ab84d6acfe1f23c4fae0ab88f26e3a396351ac9 to '***'
Compressing conan_package.tgz completed [5 files]
Uploaded conan_package.tgz -> toolchain-example/1.0@user/channel:5ab8 **[50015.90k]**
Uploaded conaninfo.txt -> toolchain-example/1.0@user/channel:5ab8 [0.15k]
Uploaded conanmanifest.txt -> toolchain-example/1.0@user/channel:5ab8 [0.30k]

After creation everything is fine, because we extract directly to the package folder. But when uploading the final package is created with a much larger size (50MB).

What we tried so far:

  • Tar in Tar ... it works, but we want small and very fast downloadable Conan tool packages in our CI
  • Symlinks are not affected by this issue and fully working ... is the cross compiler working after converting all the hard links into symlinks?
  • Just use the recipe and do not try to upload, how to prevent re-packaging and upload for everyone?

We think Conan should support hard links in the (mid term) future.
Thank you for your help and ideas!

B.r.
Mike

@memsharded
Copy link
Member

Hi @marsmike

Dear Conan Community, thank you all for your valuable time and great work: We simply love Conan!

Thanks very much for your kind words! We are always happy to learn from users that Conan is being useful there!

Regarding your issue, it is possible. There is some provision in Conan to package symlinks, but the truth is that we haven't been reported or considered hard links in the past. We need to check this. It might be difficult to fix, but it would be great to identify the issue and cause. It is likely than in Conan 2.0 we will completely re-think how Conan package symlinks/hardlinks.

Thanks very much for providing the above detailed explanation and code, including the https://lithium.li/hard_links_example.tar.gz link.

Lets try at least to investigate this (labeled as "look into"). @uilianries Could you please try to reproduce the above issue with a unit test in our test suite? Thanks!

@memsharded memsharded added this to the 1.25 milestone Apr 6, 2020
uilianries added a commit to uilianries/conan that referenced this issue Apr 9, 2020
Signed-off-by: Uilian Ries <uilianries@gmail.com>
@uilianries
Copy link
Member

@memsharded The PR #6827 contains a very similar scenario where I can reproduce such behavior.

@fulara
Copy link
Contributor

fulara commented Apr 28, 2020

when building git - I think it relies on hardlinks being present. and conan simplifies them to simple copies.
this causes blowout of 10x the taken space.
As a workaround i will try to detect hardlinks and change them to a soft links here. should work possibly.

@memsharded
Copy link
Member

@uilianries did reproduce in #6827 , indeed an issue.
Not easy to fix, we need to check what to do in different places that might be breaking: the manifests, the creation of .tgz from them, and how to handle failed uncompressions (for example hardlinks in tgz files will crash if tried to untar in Windows). Seems complex, sorry cannot make it for 1.25, lets see if we can for 1.26.

@memsharded memsharded modified the milestones: 1.25, 1.26 Apr 30, 2020
@jgsogo jgsogo added this to Needs triage in Symlinks via automation May 25, 2020
@jgsogo
Copy link
Contributor

jgsogo commented May 25, 2020

It looks like you can have hardlinks in Windows with mklink /h (https://docs.microsoft.com/es-es/windows-server/administration/windows-commands/mklink) (cannot test it as I'm not running Windows). No idea if they behave the same on both OSs or what happens if you create the tgz in one of them and untar in the other...

I agree we need to work in this, Conan needs to support links, but IMHO this is too risky for the last week before freezing 1.26, if we go for it we will need more time to deliver this feature with all required testing.

Yes, and depending on what we store in the manifest it can compute different recipe revisions (#6153)...

I would add this investigation for the next release 1.27, at least we need to know how things work in different systems and maybe we can implement something without breaking. There is a lot of research to do before trying to implement anything.

@jgsogo
Copy link
Contributor

jgsogo commented May 26, 2020

We need to move this issue to v1.27 at least. I've opened this other issue (#7093), we first need to do some research related to links in general and how we want them to behave in Conan, then we can think about all these more concrete issues.

@jgsogo jgsogo modified the milestones: 1.26, 1.27 May 26, 2020
@memsharded
Copy link
Member

#7178 is introducing a tool to "fix" or prepare packages with symlinks for compression.
This will most likely be the way to go in Conan 2.0, but it seems challenging to fix it at the moment without breaking.

@memsharded memsharded modified the milestones: 1.27, 1.28 Jun 30, 2020
@memsharded memsharded modified the milestones: 1.28, 1.29 Jul 27, 2020
@memsharded memsharded modified the milestones: 1.29, 1.30 Aug 31, 2020
@memsharded memsharded modified the milestones: 1.30, 1.31 Sep 28, 2020
@memsharded memsharded modified the milestones: 1.31, 1.32 Oct 28, 2020
@SSE4
Copy link
Contributor

SSE4 commented Nov 23, 2020

#7760 shows how to store and preserve hard-links during the archive creation

@memsharded
Copy link
Member

Solved in #13137 for 2.0 (beta.10 today)

Symlinks automation moved this from Needs triage to Closed Feb 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Symlinks
  
Closed
Development

Successfully merging a pull request may close this issue.

6 participants