Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Automatic directory management in source unpack support #6479

Closed
1 task done
chris-se opened this issue Feb 6, 2020 · 10 comments · Fixed by #8208
Closed
1 task done

[feature] Automatic directory management in source unpack support #6479

chris-se opened this issue Feb 6, 2020 · 10 comments · Fixed by #8208

Comments

@chris-se
Copy link

chris-se commented Feb 6, 2020

Debian's dpkg-source has a nice feature when unpacking upstream tarballs to ensure that the target directory always has a specific name. It will do the following:

  • Create the target directory that should be created
  • Unpack within that directory
  • If the top-level contents of the archive was just a single directory, move the contents of that directory one level up and remove that directory

The following situations exist then:

  • Archive contains a directory tree in the following form:
    dir/
    dir/CMakeLists.txt
    dir/sourcefile.c
    Calling dpkg-source -x would create a directory of the expected name (in Debian's case package-version) and that directory would contain the CMakeLists.txt file and the sourcefile.c file - regardless of the name dir/ within the archive.

  • Archive contains a directory tree in the following form:
    CMakeLists.txt
    sourcefile.c
    Calling dpkg-source -x would create the same result as before, even though the archive doesn't contain a subdirectory.

I think something like that would be interesting for conan, because I see the following pattern very often in recipies:

  • unpack source archive
  • rename directory to _source_subfolder (or something otherwise deterministic)

Since the rename step requires the user to know the directory name within the source archive, and projects might change that name between releases, this can add quite a bit of complexity to the conan file.

Since this changes the behavior of unzip / get I would suggest adding a keyword argument autodir (default False) that introduces this new behavior.

Example usage could be:

tools.get(url, sha256=hash, destination='_source_subfolder', autodir=True)

This would ensure that the contents of the downloaded archive is available in the directory _source_subfolder according to the described algorithm, regardless of how the archive was organized internally.

I could implement this and provide a pull request if this is something you'd be interested in adding to conan.

@jgsogo
Copy link
Contributor

jgsogo commented Feb 6, 2020

Hi! I've seen this behavior on desktop tools too, and it is useful in general terms, but for an automated tool... I think it is not so annoying to have that extra directory. I would like to hear more feedback from the community that is contributing with recipes to Conan Center to know if it would be useful or they will still rename the directory to _source_subfolder (cc/ @conan-io/community-leaders ).

If it is really useful, as an opt-in, it won't break anything and we can consider including it.

Thanks for the suggestion!

@jgsogo jgsogo self-assigned this Feb 6, 2020
@chris-se
Copy link
Author

chris-se commented Feb 6, 2020

Just fyi, because I feel that I probably wasn't completely clear in my initial posting based on your response: the main issue I see is not the fact that there might be an extra directory (there'll be an extra directory anyway, even with this scheme), but that the name of the extra directory has to be constructed every time from the package version, adding a lot of boiler-plate code - especially if upstream doesn't follow the packagename-version scheme. (Most packages do, but some don't.)

Anyway, thanks for considering this, and as I said, I'd be willing to implement this if you approve.

@jgsogo
Copy link
Contributor

jgsogo commented Feb 7, 2020

I see, it is about providing a destination that is the final directory, so the result of.

tools.get(url='https://.../lib-2.0.tar.gz', destination='_source_subfolder',...)

would be:

.../_souce_subfolder/
├───CMakeLists.txt
├───sourcefile.c

instead of:

.../_souce_subfolder/lib-2.0/
├───CMakeLists.txt
├───sourcefile.c

From the UX perspective, I find it a useful feature, although the tarfile library, which is the one we are using to unzip the files, doesn't have that feature available (docs here):

TarFile.extractall(path=".", members=None, *, numeric_owner=False)

Extract all members from the archive to the current working directory or directory path.

As a Conan developer, I prefer to keep things simple and, whenever it is possible, to keep the same signature as the underlying functions we are calling instead of adding a layer of functionality on top of it. That's my only concern, but I'll tag this as a feature request, and share it with the team, so we can prioritize it. Nevertheless, if you want to contribute to Conan and you create the PR we will consider it.

@jgsogo
Copy link
Contributor

jgsogo commented Feb 7, 2020

👍 @chris-se, waiting for your PR, if you need help writing the feature or the tests, just ping me. Thanks!

@chris-se
Copy link
Author

chris-se commented Feb 7, 2020

Thanks! I'll start working on it.

@jgsogo jgsogo added this to the 1.23 milestone Feb 7, 2020
@SSE4
Copy link
Contributor

SSE4 commented Feb 7, 2020

thanks for the suggestion. for the time being, I was thinking about the same. I have simple implementation (not for conan, for my own scripts):

def extract(filename):
    if filename.endswith(".zip"):
        f = zipfile.ZipFile(filename, 'r')
        names = f.namelist()
    else:
        f = tarfile.open(filename, 'r')
        names = f.getnames()
    dstdir = TMP
    dirname = os.path.commonprefix(names)
    if not dirname:
        dirname = os.path.splitext(filename)[0]
        dstdir = os.path.join(TMP, dirname)
    dirname = os.path.join(TMP, dirname)
    if os.path.isdir(dirname):
        logging.info('extracting "%s" into "%s"... (cached)' % (filename, dirname))
    else:
        logging.info('extracting "%s" into "%s"...' % (filename, dirname))
        f.extractall(path=dstdir)
    return dirname

@Minimonium
Copy link
Contributor

It'd be really great to have that for the CCI. Some archive names are harder to decipher, like for example ones from direct commit hash links - you either pass the hash to the recipe or glob downloaded files, both ways are inconvenient.
It would be awesome to have that!

@memsharded memsharded removed this from the 1.23 milestone Mar 2, 2020
@memsharded
Copy link
Member

As there has not been progress here, and we want to release 1.23, removing the milestone. When we have something to review, we will plan the release to be merged in. Thanks!

@lasote lasote assigned lasote and unassigned chris-se Dec 14, 2020
@lasote lasote added this to the 1.33 milestone Dec 14, 2020
@lasote
Copy link
Contributor

lasote commented Dec 14, 2020

I'm a big fan of this. In Conan-Center there are several recipes that are doing tricks in the source() method like creating a self._source_folder and carrying that subfolder in all the recipe. And the only reason is that the zip subfolder (something like whatever_1.2.3) is renamed to self._source_folder and that is the way to skip the extract subfolder for all the contents.
I'm working on a PR to introduce a new argument for the unzipping that will cause that the directory containing all the files is removed and all the contents moved to the parent.
I detected also this pain while working on a POC of the recipe layouts here: #6261 where we aim to avoid this "subfolders" tricks by declaring in the recipe the relevant folders.

@memsharded
Copy link
Member

#8208 implemented strip_root argument in tools.unzip(), tools.get(), etc, and it will be released in Conan 1.33.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants