Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Package cache #915

Closed
wants to merge 12 commits into
from

Conversation

Projects
None yet
7 participants
Contributor

chEbba commented Jul 15, 2012

Reviewed system package cache from #62.

How it works:

  • After package was downloaded it is copied (actualy repacked from targetDir) to local system repository with modified dist url.
  • Local system repository is added to the RepositoryManager as the first repository.
  • If a package is found in the local system reposity it is installed from it (just copied).

Options:

  • system-repository (default false) to enable system repository in manager,
  • package-cache (default false) to store downloaded packages in the system repository

Notes:

  • This PR comes with improved Archive support (for now only with zip and tar), and can be used in downloaders (1 ArchiveDownloader with Extractors).
  • Packages are compressed with zip (if enabled) or with tar (fallback).
  • Works only for dists.

Planned improvements:

  • Additional configuration options.
  • CLI option to disable cache for command.
  • CLI commands for system repository manipulation.

Discussion:

  • General workflow and details.
  • Configuration and factory creation.

@stof stof commented on the diff Jul 15, 2012

src/Composer/Storage/RepositoryStorage.php
+namespace Composer\Storage;
+
+use Composer\Package\PackageInterface;
+use Composer\Package\Loader\ArrayLoader;
+use Composer\Package\Dumper\ArrayDumper;
+use Composer\Repository\WritableRepositoryInterface;
+
+/**
+ * RepositoryStorage write packages to repository and stores them inside internal storage
+ *
+ * @author Kirill chEbba Chebunin <iam@chebba.org>
+ */
+class RepositoryStorage implements StorageInterface
+{
+ private static $loader;
+ private static $dumper;
@stof

stof Jul 15, 2012

Contributor

why making them static ?

@chEbba

chEbba Jul 15, 2012

Contributor

Because they are used in static method convertPackage(). Why is this method static? Don't know :) It was more clear for me to have such util method as static with cached dumper and loader.

@stof stof and 1 other commented on an outdated diff Jul 15, 2012

src/Composer/Util/Archive/ZipArchiver.php
+ $zip->close();
+ }
+
+ /**
+ * Add directory to archive recursively
+ *
+ * @param ZipArchive $zip Archive file
+ * @param string $dir Direory to add
+ * @param string $parent Parent directory
+ *
+ * @throws RuntimeException On any error
+ */
+ private function addDirectoryToArchive(ZipArchive $zip, $dir, $parent = '')
+ {
+ if ($parent) {
+ if (!$zip->addEmptyDir($parent)) {
@stof

stof Jul 15, 2012

Contributor

you should use a single if for it, using &&

@chEbba

chEbba Jul 15, 2012

Contributor

agree

@stof stof and 1 other commented on an outdated diff Jul 15, 2012

src/Composer/Util/Archive/ZipArchiver.php
+ {
+ if ($parent) {
+ if (!$zip->addEmptyDir($parent)) {
+ throw new RuntimeException("Can not add directory '{$dir}' to zip archive");
+ }
+ }
+ /* @var DirectoryIterator $fileInfo */
+ foreach (new DirectoryIterator($dir) as $fileInfo) {
+ if ($fileInfo->isDot()) {
+ continue;
+ }
+
+ if ($fileInfo->isDir()) {
+ $this->addDirectoryToArchive($zip, $fileInfo->getPathname(), $parent . $fileInfo->getFilename() . '/');
+ } else {
+ if (!$zip->addFile($fileInfo->getPathname(), $parent . $fileInfo->getFilename())) {
@stof

stof Jul 15, 2012

Contributor

this should use elseif instead of nesting the if in the else

@chEbba

chEbba Jul 15, 2012

Contributor

agree

@stof stof and 1 other commented on an outdated diff Jul 15, 2012

src/Composer/Util/Archive/ZipArchiver.php
+ * @param string $dir Direory to add
+ * @param string $parent Parent directory
+ *
+ * @throws RuntimeException On any error
+ */
+ private function addDirectoryToArchive(ZipArchive $zip, $dir, $parent = '')
+ {
+ if ($parent) {
+ if (!$zip->addEmptyDir($parent)) {
+ throw new RuntimeException("Can not add directory '{$dir}' to zip archive");
+ }
+ }
+ /* @var DirectoryIterator $fileInfo */
+ foreach (new DirectoryIterator($dir) as $fileInfo) {
+ if ($fileInfo->isDot()) {
+ continue;
@stof

stof Jul 15, 2012

Contributor

why skipping hidden files if they were part of the original package ? it means you are only caching a partial package

@chEbba

chEbba Jul 15, 2012

Contributor

But i think it would be better to use SKIP_DOTS flag.

@stof

stof Jul 15, 2012

Contributor

ah sorry

@stof stof and 1 other commented on an outdated diff Jul 15, 2012

src/Composer/Factory.php
@@ -147,6 +147,16 @@ public function createComposer(IOInterface $io, $localConfig = null)
// initialize repository manager
$rm = $this->createRepositoryManager($io, $config);
+ // add package cache
+ $storage = null;
+ if ($config->get('package-cache')) {
+ $systemRepository = $this->addSystemRepository($rm, $config);
+ $storage = new Storage\RepositoryStorage(
+ $systemRepository,
+ new Storage\ArchiveStorage($config->get('home') . '/repository', new Util\Archive\ZipArchiver())
@stof

stof Jul 15, 2012

Contributor

you should configure it only when the zip archive exists as it depends on it whereas composer currently does not (it tries to fallback to the system unzip)

@chEbba

chEbba Jul 15, 2012

Contributor

Good note. I think it should be some "guesser" for archiver type (need to implement all archivers), and fallback to no-cache or some plain format (just a directory?). I think Factory integration is the most unclear thing in this PR :(

@stof stof commented on the diff Jul 15, 2012

src/Composer/Downloader/CachedDownloadManager.php
+ /**
+ * Get storage
+ *
+ * @return StorageInterface
+ */
+ public function getStorage()
+ {
+ return $this->storage;
+ }
+
+ /**
+ * {@inheritDoc}
+ */
+ public function download(PackageInterface $package, $targetDir, $preferSource = null)
+ {
+ parent::download($package, $targetDir, $preferSource);
@stof

stof Jul 15, 2012

Contributor

shouldn't it try to find it in the cache first ?

@chEbba

chEbba Jul 15, 2012

Contributor

No, (for current workflow). If package was in the system repository (cache) downloader will already try to download this package from local storage (because dependency resolver should choose it instead of remote)

You write to cache at DownloadManager and read from it from RepositoryManager.
It has following flaw: in CachedDownloadManage you assume $targetDir contains correctly installed package after parent::install() or parent::update(). It is not true, as installers does not run at this point yet. You should either cache downloaded content (aka http cache) at Download Manager or installed packages (it is hard at this point as there is no way to tell how package spreads to filesystem after installation) at Repository Manager.
It is possible to add some persistant key-value cache to installers (so installer can cache data per package and reuse it), but I doubt it would make more benefit than transport cache.

This pull request fails (merged 8fb07f1e into f128182).

This pull request fails (merged fc7d1ac1 into f128182).

This pull request passes (merged 8ebd4b74 into f128182).

Contributor

chEbba commented Jul 16, 2012

@palex-fpt Actually it is not a simple cache in one place when you try to replace some havy operation with simple one. It consits of 2 concepts: copy all packages to the dedicated repository (like some proxy), use fast local repository for packages. This concepst can be used separately. You can cache evrething but not enable the system repository. Or you can use system repository with no auto-cache.
About $targetDir: at the downloader level there is nothing about package installation. I assume $targetDir contains raw package (with no idea about installation). At this level i have package metada and raw source in the directory, so i can do with this package whatever i whant. e.g can store my copy of packages in zip (or tar.gz or plain) instead of origin format.

Oh, I got the point. Your repack packages to local distribution.
Just be sure to add one level of directory structure to zip archive, as ZipDownloader (and any ArchiveDownloader descendant) do special processing with containers with only directory at top level.

Contributor

chEbba commented Jul 16, 2012

@palex-fpt For now it compresses targetDir content as is (without targetDir itself in the archive).

@palex-fpt palex-fpt referenced this pull request Jul 18, 2012

Closed

pear2 autoload issue #921

What append to this issue? Package cache will be great for continuous integration when you have to rebuilt everything at each commit to run you test...

Contributor

chEbba commented Oct 26, 2012

I will update this PR on weekend with some improvements for defaults and separation of the system repository and cache.

Contributor

chEbba commented Oct 27, 2012

Rebased and improved. Description was updated.
Configuration is separated to 2 options: system-repository & config-cache.
Add tar archiver to support systems without zip.

Owner

Seldaek commented Nov 1, 2012

@chEbba maybe I missed something but it seems to me like this is overly complex. We already have a Cache class that could be integrated in the downloaders with very few changes instead of adding a gazillion new lines of code to maintain. In particular, I fail to see the benefits of repackaging and of the split between repository and cache functionalities. What's the use case with only using one of those (repository & cache)? Furthermore, the fact that you cache packages with a higher priority means that for dev packages you will get stale data instead of getting the newest from packagist.

Owner

Seldaek commented Nov 1, 2012

See #1282 for what I mean by lighter approach. It works just as well and with a few more lines will have GC of items not used in more than X. It probably could use one more option to completely disable it, but you get the idea. I'm open for discussion but unless this adds great benefits over the other PR, I would rather have less code to maintain.

@Seldaek Seldaek closed this Nov 11, 2012

Contributor

chEbba commented Nov 13, 2012

Sorry for late answer. Yes it is not so simple, like file cache. When i started with this problem i understand that there are 2 ways of solving it in the current architecture:

  • Simple file cache on downloader level, when you cache downloaded files (as your #1282)
  • Package cache on higher level when you save a package not depending on dist format.

Then I remembered about another package manager concept - local repository (which i called system, as we already have one local in Composer). This repository is useful for development, CI, etc. when you don't need to create any remote repositry (ex. maven local repository).
And I got an idea to implement local storage + package cache (because package cache seems more clear with local storage).

So i will save this branch for future, if system storage concept will be interesting for Composer.

Owner

Seldaek commented Nov 13, 2012

Ok I see a bit better, but with my file cache the dist format does not matter either, it just stores it so the actual download step can be skipped later.

I agree it might be useful to have a concept of local repo, but then again you can already set a "composer" repository to a local filesystem path I believe (and if not, it wouldn't be a huge change), so I would rather reuse that than adding a lot of new code.

patcon commented Nov 13, 2012

@chEbba was the goal to provide a general means to support a feature like git clone --reference-type caches? Would your solution be the only one that supported this?

Owner

Seldaek commented Nov 13, 2012

@patcon I don't think this would help much, but that's interesting though, maybe we should do all clones to a central location and then reference the repo like that if it already exists. Only problem would be to do reference counting somehow to avoid using space for repos that aren't used anywhere on disk anymore. If you'd like to create a new issue for this I'd be happy to discuss it further. It would mean much faster clones which would be pretty cool.

Contributor

chEbba commented Nov 13, 2012

@patcon I think this feature can be implemented on CvsDownloader level. We used such feature in one of our deploy system where we had a local storage with git repositories (we didn't use alternates just clone from local one, but i thnik alternates is better solution). It really reduces build time.

If you create a new issue please link it here. I'm interested in this discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment