Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

Loading…

Package cache #915

Closed
wants to merge 12 commits into from

7 participants

@chEbba

Reviewed system package cache from #62.

How it works:

  • After package was downloaded it is copied (actualy repacked from targetDir) to local system repository with modified dist url.
  • Local system repository is added to the RepositoryManager as the first repository.
  • If a package is found in the local system reposity it is installed from it (just copied).

Options:

  • system-repository (default false) to enable system repository in manager,
  • package-cache (default false) to store downloaded packages in the system repository

Notes:

  • This PR comes with improved Archive support (for now only with zip and tar), and can be used in downloaders (1 ArchiveDownloader with Extractors).
  • Packages are compressed with zip (if enabled) or with tar (fallback).
  • Works only for dists.

Planned improvements:

  • Additional configuration options.
  • CLI option to disable cache for command.
  • CLI commands for system repository manipulation.

Discussion:

  • General workflow and details.
  • Configuration and factory creation.
@stof stof commented on the diff
src/Composer/Storage/RepositoryStorage.php
((13 lines not shown))
+namespace Composer\Storage;
+
+use Composer\Package\PackageInterface;
+use Composer\Package\Loader\ArrayLoader;
+use Composer\Package\Dumper\ArrayDumper;
+use Composer\Repository\WritableRepositoryInterface;
+
+/**
+ * RepositoryStorage write packages to repository and stores them inside internal storage
+ *
+ * @author Kirill chEbba Chebunin <iam@chebba.org>
+ */
+class RepositoryStorage implements StorageInterface
+{
+ private static $loader;
+ private static $dumper;
@stof
stof added a note

why making them static ?

@chEbba
chEbba added a note

Because they are used in static method convertPackage(). Why is this method static? Don't know :) It was more clear for me to have such util method as static with cached dumper and loader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/Composer/Util/Archive/ZipArchiver.php
((89 lines not shown))
+ $zip->close();
+ }
+
+ /**
+ * Add directory to archive recursively
+ *
+ * @param ZipArchive $zip Archive file
+ * @param string $dir Direory to add
+ * @param string $parent Parent directory
+ *
+ * @throws RuntimeException On any error
+ */
+ private function addDirectoryToArchive(ZipArchive $zip, $dir, $parent = '')
+ {
+ if ($parent) {
+ if (!$zip->addEmptyDir($parent)) {
@stof
stof added a note

you should use a single if for it, using &&

@chEbba
chEbba added a note

agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/Composer/Util/Archive/ZipArchiver.php
((102 lines not shown))
+ {
+ if ($parent) {
+ if (!$zip->addEmptyDir($parent)) {
+ throw new RuntimeException("Can not add directory '{$dir}' to zip archive");
+ }
+ }
+ /* @var DirectoryIterator $fileInfo */
+ foreach (new DirectoryIterator($dir) as $fileInfo) {
+ if ($fileInfo->isDot()) {
+ continue;
+ }
+
+ if ($fileInfo->isDir()) {
+ $this->addDirectoryToArchive($zip, $fileInfo->getPathname(), $parent . $fileInfo->getFilename() . '/');
+ } else {
+ if (!$zip->addFile($fileInfo->getPathname(), $parent . $fileInfo->getFilename())) {
@stof
stof added a note

this should use elseif instead of nesting the if in the else

@chEbba
chEbba added a note

agree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/Composer/Util/Archive/ZipArchiver.php
((96 lines not shown))
+ * @param string $dir Direory to add
+ * @param string $parent Parent directory
+ *
+ * @throws RuntimeException On any error
+ */
+ private function addDirectoryToArchive(ZipArchive $zip, $dir, $parent = '')
+ {
+ if ($parent) {
+ if (!$zip->addEmptyDir($parent)) {
+ throw new RuntimeException("Can not add directory '{$dir}' to zip archive");
+ }
+ }
+ /* @var DirectoryIterator $fileInfo */
+ foreach (new DirectoryIterator($dir) as $fileInfo) {
+ if ($fileInfo->isDot()) {
+ continue;
@stof
stof added a note

why skipping hidden files if they were part of the original package ? it means you are only caching a partial package

@chEbba
chEbba added a note

But i think it would be better to use SKIP_DOTS flag.

@stof
stof added a note

ah sorry

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
src/Composer/Factory.php
@@ -147,6 +147,16 @@ public function createComposer(IOInterface $io, $localConfig = null)
// initialize repository manager
$rm = $this->createRepositoryManager($io, $config);
+ // add package cache
+ $storage = null;
+ if ($config->get('package-cache')) {
+ $systemRepository = $this->addSystemRepository($rm, $config);
+ $storage = new Storage\RepositoryStorage(
+ $systemRepository,
+ new Storage\ArchiveStorage($config->get('home') . '/repository', new Util\Archive\ZipArchiver())
@stof
stof added a note

you should configure it only when the zip archive exists as it depends on it whereas composer currently does not (it tries to fallback to the system unzip)

@chEbba
chEbba added a note

Good note. I think it should be some "guesser" for archiver type (need to implement all archivers), and fallback to no-cache or some plain format (just a directory?). I think Factory integration is the most unclear thing in this PR :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@stof stof commented on the diff
src/Composer/Downloader/CachedDownloadManager.php
((43 lines not shown))
+ /**
+ * Get storage
+ *
+ * @return StorageInterface
+ */
+ public function getStorage()
+ {
+ return $this->storage;
+ }
+
+ /**
+ * {@inheritDoc}
+ */
+ public function download(PackageInterface $package, $targetDir, $preferSource = null)
+ {
+ parent::download($package, $targetDir, $preferSource);
@stof
stof added a note

shouldn't it try to find it in the cache first ?

@chEbba
chEbba added a note

No, (for current workflow). If package was in the system repository (cache) downloader will already try to download this package from local storage (because dependency resolver should choose it instead of remote)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@palex-fpt

You write to cache at DownloadManager and read from it from RepositoryManager.
It has following flaw: in CachedDownloadManage you assume $targetDir contains correctly installed package after parent::install() or parent::update(). It is not true, as installers does not run at this point yet. You should either cache downloaded content (aka http cache) at Download Manager or installed packages (it is hard at this point as there is no way to tell how package spreads to filesystem after installation) at Repository Manager.
It is possible to add some persistant key-value cache to installers (so installer can cache data per package and reuse it), but I doubt it would make more benefit than transport cache.

@travisbot

This pull request fails (merged 8fb07f1e into f128182).

@travisbot

This pull request fails (merged fc7d1ac1 into f128182).

@travisbot

This pull request passes (merged 8ebd4b74 into f128182).

@chEbba

@palex-fpt Actually it is not a simple cache in one place when you try to replace some havy operation with simple one. It consits of 2 concepts: copy all packages to the dedicated repository (like some proxy), use fast local repository for packages. This concepst can be used separately. You can cache evrething but not enable the system repository. Or you can use system repository with no auto-cache.
About $targetDir: at the downloader level there is nothing about package installation. I assume $targetDir contains raw package (with no idea about installation). At this level i have package metada and raw source in the directory, so i can do with this package whatever i whant. e.g can store my copy of packages in zip (or tar.gz or plain) instead of origin format.

@palex-fpt

Oh, I got the point. Your repack packages to local distribution.
Just be sure to add one level of directory structure to zip archive, as ZipDownloader (and any ArchiveDownloader descendant) do special processing with containers with only directory at top level.

@chEbba

@palex-fpt For now it compresses targetDir content as is (without targetDir itself in the archive).

@palex-fpt palex-fpt referenced this pull request
Closed

pear2 autoload issue #921

@gagarine

What append to this issue? Package cache will be great for continuous integration when you have to rebuilt everything at each commit to run you test...

@chEbba

I will update this PR on weekend with some improvements for defaults and separation of the system repository and cache.

@chEbba

Rebased and improved. Description was updated.
Configuration is separated to 2 options: system-repository & config-cache.
Add tar archiver to support systems without zip.

@Seldaek
Owner

@chEbba maybe I missed something but it seems to me like this is overly complex. We already have a Cache class that could be integrated in the downloaders with very few changes instead of adding a gazillion new lines of code to maintain. In particular, I fail to see the benefits of repackaging and of the split between repository and cache functionalities. What's the use case with only using one of those (repository & cache)? Furthermore, the fact that you cache packages with a higher priority means that for dev packages you will get stale data instead of getting the newest from packagist.

@Seldaek
Owner

See #1282 for what I mean by lighter approach. It works just as well and with a few more lines will have GC of items not used in more than X. It probably could use one more option to completely disable it, but you get the idea. I'm open for discussion but unless this adds great benefits over the other PR, I would rather have less code to maintain.

@Seldaek Seldaek closed this
@chEbba

Sorry for late answer. Yes it is not so simple, like file cache. When i started with this problem i understand that there are 2 ways of solving it in the current architecture:

  • Simple file cache on downloader level, when you cache downloaded files (as your #1282)
  • Package cache on higher level when you save a package not depending on dist format.

Then I remembered about another package manager concept - local repository (which i called system, as we already have one local in Composer). This repository is useful for development, CI, etc. when you don't need to create any remote repositry (ex. maven local repository).
And I got an idea to implement local storage + package cache (because package cache seems more clear with local storage).

So i will save this branch for future, if system storage concept will be interesting for Composer.

@Seldaek
Owner

Ok I see a bit better, but with my file cache the dist format does not matter either, it just stores it so the actual download step can be skipped later.

I agree it might be useful to have a concept of local repo, but then again you can already set a "composer" repository to a local filesystem path I believe (and if not, it wouldn't be a huge change), so I would rather reuse that than adding a lot of new code.

@patcon

@chEbba was the goal to provide a general means to support a feature like git clone --reference-type caches? Would your solution be the only one that supported this?

@Seldaek
Owner

@patcon I don't think this would help much, but that's interesting though, maybe we should do all clones to a central location and then reference the repo like that if it already exists. Only problem would be to do reference counting somehow to avoid using space for repos that aren't used anywhere on disk anymore. If you'd like to create a new issue for this I'd be happy to discuss it further. It would mean much faster clones which would be pretty cool.

@chEbba

@patcon I think this feature can be implemented on CvsDownloader level. We used such feature in one of our deploy system where we had a local storage with git repositories (we didn't use alternates just clone from local one, but i thnik alternates is better solution). It really reduces build time.

If you create a new issue please link it here. I'm interested in this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Something went wrong with that request. Please try again.