Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for archive downloads without extension #11513

Closed
imme-emosol opened this issue Jun 14, 2023 · 17 comments · Fixed by #11520
Closed

Allow for archive downloads without extension #11513

imme-emosol opened this issue Jun 14, 2023 · 17 comments · Fixed by #11520
Labels

Comments

@imme-emosol
Copy link
Contributor

FileDownloader.php its getFileName seems to extract an extension based on the given url.

Recently I came across an archived package publication to a extensionless file name, e.g. ".../download".

Upon inspecting the curl-headers I saw the response included a Content-Disposition header, which I used to detect the extension, which seems to affect the chosen unarchiving procedure.
Later I thought about that Content-Type might be more widely used/available.

Anyway, perhaps a feature to consider adding into composer.

I temporarily resolved my issue by this replacing getFileName its content with this pile of characters.

With php hilighting (as that seems to get lost in the ```diff):

    protected function getFileName(PackageInterface $package, string $path): string
    {
        $extension = $this->getDistPath($package, PATHINFO_EXTENSION);
        if (!$extension) {
            $context=stream_context_create(['http'=>['method'=>'HEAD',],]);
            $headers = get_headers($package->getDistUrl(),true,$context);
            if ($headers['Content-Disposition']){
                $disposition = explode(';', $headers['Content-Disposition']);
                $disposition = array_map('trim', $disposition);
                foreach ($disposition as $k => $e) {
                    $f=explode('=',$e);
                    unset($disposition[$k]);
                    $disposition[$f[0]]=$f[1]??null;
                }
                if (isset($disposition['filename*'])) {
                    $fileName = str_replace('UTF-8\'\'', '', $disposition['filename*']);
                    $extension = pathinfo($fileName, PATHINFO_EXTENSION);
                }
                elseif (isset($disposition['filename'])) {
                    $fileName = substr($disposition['filename'], 1, -1);
                    $extension = pathinfo($fileName, PATHINFO_EXTENSION);
                }
            }
        }
        return rtrim($this->config->get('vendor-dir') . '/composer/tmp-' . md5($package . spl_object_hash($package)) . '.' . $extension, '.');
    }

The ```diff:

    protected function getFileName(PackageInterface $package, string $path): string
    {
-        return rtrim($this->config->get('vendor-dir') . '/composer/tmp-' . md5($package . spl_object_hash($package)) . '.' . $this->getDistPath($package, PATHINFO_EXTENSION), '.');
+        $extension = $this->getDistPath($package, PATHINFO_EXTENSION);
+        if (!$extension) {
+            $context=stream_context_create(['http'=>['method'=>'HEAD',],]);
+            $headers = get_headers($package->getDistUrl(),true,$context);
+            if ($headers['Content-Disposition']){
+                $disposition = explode(';', $headers['Content-Disposition']);
+                $disposition = array_map('trim', $disposition);
+                foreach ($disposition as $k => $e) {
+                    $f=explode('=',$e);
+                    unset($disposition[$k]);
+                    $disposition[$f[0]]=$f[1]??null;
+                }
+                if (isset($disposition['filename*'])) {
+                    $fileName = str_replace('UTF-8\'\'', '', $disposition['filename*']);
+                    $extension = pathinfo($fileName, PATHINFO_EXTENSION);
+                }
+                elseif (isset($disposition['filename'])) {
+                    $fileName = substr($disposition['filename'], 1, -1);
+                    $extension = pathinfo($fileName, PATHINFO_EXTENSION);
+                }
+            }
+        }
+        return rtrim($this->config->get('vendor-dir') . '/composer/tmp-' . md5($package . spl_object_hash($package)) . '.' . $extension, '.');
    }
@imme-emosol
Copy link
Contributor Author

If Content-Disposition , this might be interesting: https://github.com/cardinalby/phpContentDisposition .

@Seldaek
Copy link
Member

Seldaek commented Jun 22, 2023

Yeah I am not sure.. That seems pretty messy to me to do a request just to get the headers, and then do another request with the download after, also you may not have the proper auth setup at that point etc. Just overall not a workable generic solution.

Just out of curiosity though, what's the problem you were seeing without this? As far as I know the downloader/archive extraction used is defined by the dist type of the package and the filename/extension has no actual impact, only cosmetic.

@imme-emosol
Copy link
Contributor Author

Given a https://acme-registries.example.org/composer/packages.json :

{
    "packages": {
        "acme\/package": {
            "0.0.0-alpha230608": {
                "name": "acme\/package",
                "version": "0.0.0-alpha230608",
                "dist": {
                    "url": "https:\/\/acme-gitlab.example.org\/public-group\/composer-packages\/-\/package_files\/335\/download",
                    "type": "tar"
                }
            },
            "0.0.0-alpha230612": {
                "name": "acme\/package",
                "version": "0.0.0-alpha230612",
                "dist": {
                    "url": "https:\/\/acme-gitlab.example.org\/public-group\/composer-packages\/-\/package_files\/340\/download",
                    "type": "tar"
                }
            },
            "0.0.0-alpha230613": {
                "name": "acme\/package",
                "version": "0.0.0-alpha230613",
                "dist": {
                    "url": "https:\/\/acme-gitlab.example.org\/public-group\/composer-packages\/-\/package_files\/337\/download",
                    "type": "tar"
                }
            }
        }
    }
}

And a global composer config that uses that repository :

{
    "repositories": [
        {
            "type": "composer",
            "url": "https://acme-registries.example.org/composer/"
        }
    ]
}

Then composer install/update could not unpack the downloaded file.

  - Installing acme/package (0.0.0-alpha230613): Extracting archive
    Install of acme/package failed

In TarDownloader.php line 31:
                                                                                                                                                                  
  Cannot create phar '/app/vendor/composer/tmp-40b7db84222436ca03c817e685233d74', file extension (or combination) not recognised or the directory does not exist

@Seldaek
Copy link
Member

Seldaek commented Jun 22, 2023

Ok I guess this is a problem of the PharData class needing an extension to identify the file format. Can you try manually opening one of those files with PharData to see if passing Phar::TAR explicitly to $format or some other value would help fix this? See https://www.php.net/manual/en/phardata.construct.php

@imme-emosol
Copy link
Contributor Author

php -r '$a = new \PharData("/app/vendor/composer/tmp-dac7ef442f58594e6ad1da088dff7cdb",FilesystemIterator::SKIP_DOTS | FilesystemIterator::UNIX_PATHS,null,Phar::TAR);' 

Fatal error: Uncaught UnexpectedValueException: Cannot create phar '/app/vendor/composer/tmp-dac7ef442f58594e6ad1da088dff7cdb', file extension (or combination) not recognised or the directory does not exist in Command line code:1

@imme-emosol
Copy link
Contributor Author

The downloaded file seems to be downloaded into the cache with a .tar extension.

If I use that file instead of the one without extension in (/app/vendor/composer/tmp...), the new \PharData(.. does not throw anything in my face.

@stof
Copy link
Contributor

stof commented Jun 22, 2023

maybe the code could reuse the same logic than for the cache to add the extension

@Seldaek
Copy link
Member

Seldaek commented Jun 22, 2023

@stof sorry but what code do you mean?

@imme-emosol
Copy link
Contributor Author

I saw this message passing by with composer i -vvv

Writing /tmp/cache/files/acme/package/084c61ce0b7c660b97a1866d9ea2ae23103c4c81.tar into cache from /app/vendor/composer/tmp-dac7ef442f58594e6ad1da088dff7cdb

Seems to be from

$this->io->writeError('Writing '.$this->root . $file.' into cache from '.$source);

@Seldaek
Copy link
Member

Seldaek commented Jun 22, 2023

Yeah found the code in question

return $package->getName().'/'.$cacheKey.'.'.$package->getDistType();
- it's kinda chance that the dist type matches the extension phardata expects though. There is no smarts here about content types or whatever.

@Seldaek
Copy link
Member

Seldaek commented Jun 22, 2023

But yes, probably could make sure that if there is no file extension in url we at least append the dist type to the file name.. That'd be an easy fix and might help.

@Seldaek
Copy link
Member

Seldaek commented Jun 22, 2023

Ok please try with composer self-update --snapshot if it solves it for you :)

@imme-emosol
Copy link
Contributor Author

Just remembered the archive is also gzip`ed, but adding that to the PharData::__construct does not work either.

It seems that PharData does not look at the file contents at all.
As apparently the file did no longer exist, but PharData did not throw anything about that.

Putting an empty file into PharData seems to throw another type off message, it seems:

touch f.tar

Fatal error: Uncaught UnexpectedValueException: internal corruption of phar "/app/f.tar" (truncated entry) in Command line code:1

re: if solved: yes, it does solve it for me +1 😃

@imme-emosol
Copy link
Contributor Author

Thanks!

@Seldaek
Copy link
Member

Seldaek commented Jun 22, 2023

Ok cool, yes PharData isn't the smartest cookie it seems, but that's all we got to handle tar (and tbh that's one of the reasons we prefer the zip format even tho github does tgz as well..).

@imme-emosol
Copy link
Contributor Author

There seems to be a php-native untar.
https://packagist.org/packages/splitbrain/php-archive
https://github.com/splitbrain/php-archive/blob/master/src/Tar.php

But I can imagine this is not high on the list, as most systems able to produce .tar are capable of .zip as well.

@Seldaek
Copy link
Member

Seldaek commented Jul 1, 2023

Yeah also this doesn't support tar compression without the zip extension though so it's only so useful..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants