New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/keep tgz download #7886
Feature/keep tgz download #7886
Conversation
mkdir(package_folder) # Just in case it doesn't exist, because uncompress did nothing | ||
for file_name, file_path in zipped_files.items(): # copy CONANINFO and CONANMANIFEST | ||
os.rename(file_path, os.path.join(package_folder, file_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we moving these files now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they are being downloaded to the "download" folder, and these files need to be in the final package folder. There were 2 options:
- Download to the "package folder", then move the .tgz to other place (the download folder)
- Download to the "download folder", leave the .tgz there and move the other files to the package folder.
I have decided the second one, to don't move the "heavy" .tgz file.
package_checksums = calc_files_checksum(zipped_files) | ||
download_pkg_folder = layout.download_package(pref) | ||
# Download files to the pkg_tgz folder, not to the final one | ||
zipped_files = self._call_remote(remote, "get_package", pref, download_pkg_folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere around these lines, we need to set/unset the dirty flag to the downloaded file, it can fail during the download. I think the get_package
call is only assigning the dirty flag to the package folder.
I see we are no longer removing the folder in the except BaseException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the thing, as long as the whole package_folder is protected with a "dirty" flag, everything inside is dirty. There was not dirty flag for this download before, I would say it is not necessary. Probably the other dirty flags over individual .tgz files could also be removed, but I didn't want to change more things in this PR.
The removal of the folder also doesn't make sense, and apparently no test is covering it. If we have the "dirty" mechanism in place to protect interrupted processes, then it shouldn't be necessary to try to remove the folder (in some cases, it could even be annoying, as it won't let debug)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't tried it, but this is my concern:
- Download starts
package_folder
marked as dirty
- Download fails
package_folder
mark is still there- We have a broken file in
<layout>/download_folder/pid/xxxxxx.tgz
(not dirty)
- Upload the package: lines in the uploader are checking the dirty for the tgz file, not the
package_folder
, will it upload a broken file?download_pkg_folder = layout.download_package(pref) package_tgz = os.path.join(download_pkg_folder, PACKAGE_TGZ_NAME) if is_dirty(package_tgz): self._output.warn("%s: Removing %s, marked as dirty" % (str(pref), PACKAGE_TGZ_NAME)) os.remove(package_tgz) clean_dirty(package_tgz) ...
Without running the test I'm not sure if this is something that can happen or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code is doing both:
def _compress_package_files(self, layout, pref, integrity_check):
t1 = time.time()
if layout.package_is_dirty(pref):
raise ConanException("Package %s is corrupted, aborting upload.\n"
"Remove it with 'conan remove %s -p=%s'"
% (pref, pref.ref, pref.id))
download_pkg_folder = layout.download_package(pref)
package_tgz = os.path.join(download_pkg_folder, PACKAGE_TGZ_NAME)
if is_dirty(package_tgz):
self._output.warn("%s: Removing %s, marked as dirty" % (str(pref), PACKAGE_TGZ_NAME))
os.remove(package_tgz)
clean_dirty(package_tgz)
First, it will raise if the package is dirty (as it doesn't know how to recover from that). If only the .tgz is dirty, it can try to remove it and re-compress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test in test_upload_dirty()
covers the case where the package is marked as dirty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, we are covered with that raise
when the package_folder is dirty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed the PR:
- Removed the non related parts of the migration
- Fixed the copy (only affected package-ids)
Please check the comments
mkdir(package_folder) # Just in case it doesn't exist, because uncompress did nothing | ||
for file_name, file_path in zipped_files.items(): # copy CONANINFO and CONANMANIFEST | ||
os.rename(file_path, os.path.join(package_folder, file_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because they are being downloaded to the "download" folder, and these files need to be in the final package folder. There were 2 options:
- Download to the "package folder", then move the .tgz to other place (the download folder)
- Download to the "download folder", leave the .tgz there and move the other files to the package folder.
I have decided the second one, to don't move the "heavy" .tgz file.
package_checksums = calc_files_checksum(zipped_files) | ||
download_pkg_folder = layout.download_package(pref) | ||
# Download files to the pkg_tgz folder, not to the final one | ||
zipped_files = self._call_remote(remote, "get_package", pref, download_pkg_folder) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the thing, as long as the whole package_folder is protected with a "dirty" flag, everything inside is dirty. There was not dirty flag for this download before, I would say it is not necessary. Probably the other dirty flags over individual .tgz files could also be removed, but I didn't want to change more things in this PR.
The removal of the folder also doesn't make sense, and apparently no test is covering it. If we have the "dirty" mechanism in place to protect interrupted processes, then it shouldn't be necessary to try to remove the folder (in some cases, it could even be annoying, as it won't let debug)
@memsharded Is it possible to turn this feature of? Some of our Devs don't want to store redundant data on their drives. |
Hi @ngerke At the moment it is not possible. Partly, because this is a bit more than a feature, but it is also fixing some bugs. Read the rationale in the blog post: https://blog.conan.io/2020/11/05/New-conan-release-1-31.html. Could you please give an order of magnitude of your developer caches sizes? Trying to understand a bit the pain and size of the issue. In any case, the solution is likely to be based more on the ongoing work with the "LRU", and new features in order to keep the cache size limited. Have a look at: #7980 |
Our main product has a pkg size of ~ 2 GB (worst case) the SDK pkg ~ 300 MB. Also we have a other pkgs (e.g. Qt) with ~ 300MB if you store multiple versions of the pkgs, cache size of 20-30 GB are not unusual (Without tar.gz pkgs). |
Changelog: Feature: Download and keep the conan_package.tgz in the cache, so they are not affected by different Operating Systems compression and de-compression and uploading is way more efficient.
Docs: Omit
The same for other .tgz files will follow in separate PR in: #7938
#tags: slow
#revisions: 1