Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

0.27 tarball contains cruft #620

Closed
a17r opened this issue Dec 27, 2018 · 28 comments
Closed

0.27 tarball contains cruft #620

a17r opened this issue Dec 27, 2018 · 28 comments
Milestone

Comments

@a17r
Copy link
Contributor

a17r commented Dec 27, 2018

find exiv2-0.27.0-Source -name ".*" | wc -l
1335

Probably none of those should end up in a release tarball, but worse, at least in case of include directory they also end up being installed.

@clanmills clanmills added this to the v0.27.1 milestone Dec 27, 2018
@clanmills
Copy link
Collaborator

Season's Greetings. Thanks for not sending me this "surprise gift" on Christmas Day.

You are totally correct. The source bundle is created on the Mac and polluted with Mac hidden files. One nasty hidden file for every real file. For example:

./unitTests/test_tiffheader.cpp    # This is a real file
./unitTests/._test_tiffheader.cpp  # This is Mac magic (extended attributes?)

These files are "super hidden" because when I "untar" the bundle on Mac, they are totally invisible. Here's what I see on the Mac (all legitimate hidden files, I believe).

690 rmills@rmillsmbp:~/Downloads/exiv2-0.27.0-Source $ find . -name ".*" -type f
./.gitignore
./.clang-format
./.gitlab-ci.yml
./.travis.yml
691 rmills@rmillsmbp:~/Downloads/exiv2-0.27.0-Source $ 

My instant thought is to clean the bundle on exiv2.org (and exiv2.dyndns.org) and update the published sha256. I will call the bundle exiv2-0.27.0a.tar.gz No changes. I open the bundle, remove the "cruft", re-tar, test, update the published sha256 and post everything.

This is a nasty unpleasant surprise. Going forward, I can easily modify the build script to ensure that the source bundle is created on Linux.

The source bundle is generated from code in cmake/packaging.cmake. I'll experiment with this to see if this can be modified to ensure this never happens again, even if the build is performed on MacOS-X. I will probably raise an issue with Kitware and get their opinion.

# https://libwebsockets.org/git/libwebsockets/commit/minimal-examples?id=3e25edf1ee7ea8127e941fd7b664e0e962cfeb85
set(CPACK_SOURCE_IGNORE_FILES $(CPACK_SOURCE_IGNORE_FILES) "/.git/" "/build/" "\\\\.tgz$" "\\\\.tar\\\\.gz$" "\\\\.zip$" "/test/tmp/" )

@clanmills
Copy link
Collaborator

I've done some work on this and made several of discoveries:

  1. This seems to be an open-issue with CMake/CPack/MacOSX
    https://gitlab.kitware.com/cmake/cmake/issues/16168

  2. I encountered this very issue with v0.26. The solution was to set the environment string COPYFILE_DISABLE=1 when using tar to create the bundle. Build fails if EXIV2_ENABLE_BUILD_PO=TRUE #14 (comment)

  3. I have not found a way to prevent this on CMake/CPack/MacOSX
    I tried setting COPYFILE_DISABLE before calling $ make package_source
    I have been unsuccessful in excluding those files with set(CPACK_SOURCE_IGNORE_FILES...

  4. I found an impractical work-around that requires tar xzf ...source..tar.gz --exclude '._*'
    I believe this will work. How would anybody know to untar using that option?

So, I feel have two ways to fix this in the build:

  1. Modify Jenkins to build the source bundle on Linux.
    I don't believe I have to modify any code to make that happen.
    I tell Jenkins to always execute that part of the build on a Linux build node.

  2. Modify the build script to untar the source bundle, remove those files, then retar the bundle.
    I dislike this idea. Although the bundles are not signed, that could change in future. Opening the bundle is a recipe for future trouble.

I will do something about this on exiv2.org (and exiv2.dyndns.org). I don't want to issue a "dot release" as this issue only involves packaging. I will probably adopt the "exiv2-0.27.0a" proposal above. At this time, I will manually "retar the file" on Linux. For Exiv2 v0.27.1, I will ensure that building package_source is performed on Linux.

I will do nothing for a few days. Perhaps another idea or suggestion will appear.

@D4N
Copy link
Member

D4N commented Dec 31, 2018

Creating the tarball on Linux is imho the better solution here. At some point we can even create a build docker container, to get 100% reproducible builds.

@clanmills
Copy link
Collaborator

I agree with you, Dan. I'll get Jenkins to perform that build step on Linux in future.

I see Andreas Schneider has brought our attention to this: https://build.opensuse.org/request/show/662405

I've commented on that. I'm wondering what to do. I don't want to issue a "dot" release immediately. It doesn't feel worth changing the version to v0.27.1 for a packaging issue with no code changes.

I'm not sure that v0.27 has been tagged (#605) and I believe a patch concerning musl #615 has broken the MinGW build. I've done a lot of work with Gilles and Kamran on cross-compiling MinGW on Linux (#610). And all of this is destined for v0.27.1. However, I don't want to push 0.27.1 out quickly only to discover there are other matters that need attention.

I'm quite stressed by this. We had 3 Release Candidates (and 2 months of review) on Exiv2 v0.27 in the hope of finding, fixing and avoiding this.

@cryptomilk and @a17r I am happy to get your input on how best to proceed.

@cryptomilk
Copy link
Collaborator

I've patched the tarball to install files without globbing them but by a file list.

@cryptomilk
Copy link
Collaborator

Improtant: Do not modify the source tarball and keep the same version number!

If you change something you need to bump it and if it is just a typo fix!

@a17r
Copy link
Contributor Author

a17r commented Jan 2, 2019

I will call the bundle exiv2-0.27.0a.tar.gz

^ Yes, that is the way to go

@clanmills
Copy link
Collaborator

Unless somebody says "Stop", at 2019-01-02+20:00UTC I will retar the file as exiv2-0.27.0a.tar.gz and update the sha256sum on this page: http://www.exiv2.org/download.html

That's it. Zero code changes. The code will unzip and build as 0.27.0 - which is exactly what is in the bundle. This is a "pure packaging issue".

I will change the build/release procedure to ensure the source bundle is always built on Linux in future to ensure this never happens again.

@clanmills
Copy link
Collaborator

I have updated: http://exiv2.dyndns.org/download.html and added the file exiv2-0.27.0a-Source.tar.gz. I have also added a one line log entry on whatsnew.html

The opensuse request is: https://build.opensuse.org/request/show/662411. I updated the request today to let them know to expect this change.

If nobody reports an issue with the new Source tar-ball, I will update exiv2.org at 2019-01-03+10:00UTC and inform opensuse that this change has been made.

@clanmills
Copy link
Collaborator

I have uploaded the new bundle to exiv2.org (and exiv2.dyndns.org) and notified the folks at open suse.
http://www.exiv2.org/builds/exiv2-0.27.0a-Source.tar.gz

I'm going to close this and hope this never recurs.

a17r added a commit to a17r/gentoo that referenced this issue Jan 3, 2019
See also: Exiv2/exiv2#620

Signed-off-by: Andreas Sturmlechner <asturm@gentoo.org>
Package-Manager: Portage-2.3.51, Repoman-2.3.11
anaveragehuman pushed a commit to anaveragehuman/gentoo that referenced this issue Jan 4, 2019
See also: Exiv2/exiv2#620

Signed-off-by: Andreas Sturmlechner <asturm@gentoo.org>
Package-Manager: Portage-2.3.51, Repoman-2.3.11
@imsodin
Copy link

imsodin commented Feb 1, 2019

Same thing is true for __pycache__ dirs, which are present in the source tarball:

> find tests -type d -name __pycache__
tests/bugfixes/github/__pycache__
tests/bugfixes/__pycache__
tests/bugfixes/redmine/__pycache__
tests/__pycache__
tests/tiff_test/__pycache__

The same directories are not present when checking out 0.27.

I thought this is minor and related enough to not open a new issue - if you still want me to, just say so.

@D4N D4N reopened this Feb 1, 2019
@D4N D4N changed the title 0.27 tarball contains Apple filesystem cruft 0.27 tarball contains cruft Feb 1, 2019
@D4N
Copy link
Member

D4N commented Feb 1, 2019

The tarball should honor the .gitignore, then it wouldn't contain these files.

@imsodin
Copy link

imsodin commented Feb 1, 2019

Actually __pycache__ isn't (at least wasn't at the time of 0.27) in .gitignore.

I noticed an unrelated potentially extraneous file generated on install: usr/share/exiv2/cmake/exiv2Config-none.cmake. It's definitely not related to this issue (not a source file) and potentially not an issue at all - so I am hesitant to spam the issue tracker with something that might be a support request. Do you use another more suitable means of communication or should I just open github issue(s)?

@D4N
Copy link
Member

D4N commented Feb 1, 2019 via email

@cryptomilk
Copy link
Collaborator

Don't you use cpack for creating the source tarball? It has variable which defines what it should ignore, I guess it could read .gitignore.

See e.g.: https://git.cryptomilk.org/projects/cmocka.git/tree/CPackConfig.cmake

@clanmills
Copy link
Collaborator

@cryptomilk Yes, in v0.27, we use CPack to generate the tarball.

I've modified Jenkins to generate the package_source on Linux. The build is performed on a fresh clone and, although we build the code, we don't run the test suite. So there should be no pythonic or test artefacts and no MacOS-X "cruft".

I hope to publish Exiv2 v0.27.1 RC1 by the scheduled date of 2019-03-15.

@a17r and @imsodin I would appreciate you inspecting Exiv2 v0.27.1 RC1 when it's available and closing this issue if you agree that the new bundles are clean. I'll update this issue when I've published RC1.

My work on Exiv2 v0.27.1 has been delayed by having to change the ISP/host for exiv2.org. However Exiv2 v0.27.1 is 84% complete (87% complete when this issue is closed), so I hope to achieve this next week. I hope there will be zero code changes between RC1 and GM (only documentation changes).

@D4N
Copy link
Member

D4N commented Mar 12, 2019 via email

@clanmills
Copy link
Collaborator

@D4N. I think you've misunderstood my approach. The builds are performed in a two step process.

  1. On every platform, we delete the build tree, clone, build, test and create the package (libraries, headers etc). The bundle includes the output of the build and the test suite.

  2. On Linux (after step 1), we delete the build tree, clone, build and create package_source

Here's such a package_source which I generated this morning. https://clanmills.com/files/exiv2-0.27.1.19-Source-2019:03:12_11:53:12.tar.gz

I can't see any "cruft" (MacOS-X ._ files), nor pythonic artefacts. Could you inspect and confirm this? https://clanmills.com/files/exiv2-0.27.1.19-Source-2019:03:12_11:53:12.tar.gz

I thank you in advance if you find something wrong and I will be very happy to change the build script to fix the bundle.

I tested package_source on Linux, where it builds and passes the test suite. At the moment, my focus is to ensure the matters reported in this issue are resolved. I'll test the source bundles on the supported platforms later this week (MacOS-X, Linux, DOS, Cygwin, MinGW/msys2).

@D4N
Copy link
Member

D4N commented Mar 12, 2019 via email

@clanmills
Copy link
Collaborator

Ah, yes. I know about that file. It's a sample application which was provided by Brad (the Microsofter who invented basicio.cpp, I believe). It's not in samples, because it requires boost to build it, and we didn't want to add boost to the build. It hasn't been changed for years. It should be in contrib/organize/organize.cpp The work tree (the work/exiv2/contrib..... ) should be removed.

@clanmills
Copy link
Collaborator

There are more of those "empty" files. I think we can remove them, and the "empty" trees above:

rmills@rmillsmbp-ubuntu:~/temp$ cd exiv2-0.27.1.19-Source/
rmills@rmillsmbp-ubuntu:~/temp/exiv2-0.27.1.19-Source$ find . -size 0c
./tests/bugfixes/__init__.py
./tests/bugfixes/github/__init__.py
./tests/bugfixes/redmine/__init__.py
./tests/tiff_test/__init__.py
./contrib/organize/work/exiv2/contrib/organize/organize.cpp
rmills@rmillsmbp-ubuntu:~/temp/exiv2-0.27.1.19-Source$

There's weirdness concerning the .tar.gz file. It can be opened on Mac with the command:

$ curl -O https://clanmills.com/files/exiv2-0.27.1.19-Source-2019:03:12_11:53:12.tar.gz
$ tar xzfv exiv2-0.27.1.19-Source-2019:03:12_11:53:12.tar.gz

However, it hangs on Linux/Cygwin/MinGW. How odd. It feels as though tar is trying to read from stdin - so it's just sitting dormant. I can open it with the GUI using xdc-open. It has something to do with the filename and I suspect the - in the filename is causing tar to read stdin. When released, it will be called exiv2-0.27.1.Source.tar.gz and that's OK (as it was in 0.27.0)

@D4N
Copy link
Member

D4N commented Mar 12, 2019 via email

@clanmills
Copy link
Collaborator

@D4N. Thanks for your feedback on this.

  1. Removing the tree contrib/organize/work/
    Can you submit a PR to 'master' and '0.27-maintenance' to remove that tree.

  2. Thanks for explaining the "empty" __init__.py files
    That's ugly. Is it possible to create those "on the fly" from python? Or are they needed before your code starts to run?

  3. Let's not worry about the strange tar command-line parsing behaviour. This doesn't happen with the names of the release source bundles as they do not contain the date. The buildserver generates those files. However the scripts to publish on exiv2.org know to remove the date.

rmills@rmillsmbp-ubuntu:~/foo$ curl -O http://www.exiv2.org/builds/exiv2-0.27.0a-Source.tar.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 25.9M  100 25.9M    0     0  8388k      0  0:00:03  0:00:03 --:--:-- 8388k
rmills@rmillsmbp-ubuntu:~/foo$ tar xzf exiv2-0.27.0a-Source.tar.gz 
rmills@rmillsmbp-ubuntu:~/foo$ ls -l
total 26552
-rw-r--r--  1 rmills rmills 27168207 Mar 13 16:06 exiv2-0.27.0a-Source.tar.gz
drwxr-xr-x 15 rmills rmills     4096 Jan  2 17:29 exiv2-0.27.0-Source
rmills@rmillsmbp-ubuntu:~/foo$

MacOS-X is Unix and uses a different version of tar from Linux. The "3 negatives" in the filename exiv2-0.27.1.19-Source-2019:03:12_11:53:12.tar.gz appears to confuse Linux into reading stdin.

527 rmills@rmillsmbp:~/foo $ ssh rmills@localhost 'tar --version'
bsdtar 2.8.3 - libarchive 2.8.3
528 rmills@rmillsmbp:~/foo $ ssh rmills@rmillsmbp-ubuntu 'tar --version'
tar (GNU tar) 1.29
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by John Gilmore and Jay Fenlason.
529 rmills@rmillsmbp:~/foo $ 

@boardhead
Copy link
Collaborator

boardhead commented Apr 5, 2019

I'm a bit late to this party, but for what its worth, I run the attached script on MacOS to delete the resource files before each Exiftool release. (I've added a .txt extension to be able to attach the script to this comment.)
del_rsrc.txt

@clanmills
Copy link
Collaborator

Good ideas are never late for the party.

I've saved your script:

692 rmills@rmillsmbp:~/gnu/exiv2/team/contrib $ svn info scripts/del_rsrc 
Path: scripts/del_rsrc
Name: del_rsrc
Working Copy Root Path: /Users/rmills/gnu/exiv2/team
URL: svn://dev.exiv2.org/svn/team/contrib/scripts/del_rsrc
Relative URL: ^/team/contrib/scripts/del_rsrc
Repository Root: svn://dev.exiv2.org/svn
Repository UUID: b7c8b350-86e7-0310-a4b4-de8f6a8f16a3
Revision: 4846
Node Kind: file
Schedule: normal
Last Changed Author: robinwmills
Last Changed Rev: 4846
Last Changed Date: 2019-04-05 22:39:22 +0100 (Fri, 05 Apr 2019)
Text Last Updated: 2019-04-05 22:38:43 +0100 (Fri, 05 Apr 2019)
Checksum: 84035500a1b90335794abab526ad7bf1db5701ad

693 rmills@rmillsmbp:~/gnu/exiv2/team/contrib $ 

Thanks very much.

We're on track to visit Canada in July 2020. We will visit Scotland, France, Finland and Thailand in 2019.

@cryptomilk
Copy link
Collaborator

If you want to have a clean tree with git you can run 'git clean -dfx`. This will remove everything which is not checked into git! See the manpage for more details ;-)

@clanmills
Copy link
Collaborator

Thanks @cryptomilk

The build script uses $ rm -rf buildserver to remove the build tree, then it does $ git clone --depth 1 url xxxx; cd xxxx ; git fetch --unshallow. Too aggressive? For sure it's effective.

I've found cloning from GitHub.com is not robust and that's when I started using the git two step (-depth 1/--unshallow). I've also used $ git config --global http.postBuffer 524288000 without noticeable improvement. I get different messages about curl and your buddy Gnutls. https://stackoverflow.com/questions/6842687/the-remote-end-hung-up-unexpectedly-while-git-cloning

So, I manually update a clone on the buildserver from GitHub. The build script gets the code from the local clone and that's solid and reliable. It's also faster because it reduces the volume of data being read from GitHub.com

@clanmills
Copy link
Collaborator

I'm closing this as I believe the "Tarball Cruft" is fixed. Unless somebody knows otherwise!

nehaljwani pushed a commit to Exiv2/team that referenced this issue Dec 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants