Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: cache file names longer than 260 #4349

Closed

Conversation

StarsbySea
Copy link

Fix: #4327
Normally, this function behaves as before when the cache filename is less than 260 characters long.
The modified program is designed to shorten cached filenames while retaining as much complete URL information as possible.
If the filename is greater than or equal to 260 characters in length, it will first attempt to remove everything after the question mark from the URL, and if it is still greater than 260 in length, only the filename in the URL will be retained.
Tested locally and passed.

@pratikpc
Copy link
Contributor

pratikpc commented Nov 5, 2021

@StarsbySea which manifest currently faces the issue?

@StarsbySea
Copy link
Author

@StarsbySea which manifest currently faces the issue?

like this: https://github.com/42wim/scoop-bucket/blob/master/bucket/msedge-stable.json

@mbl-35
Copy link

mbl-35 commented Feb 2, 2022

I'd prefer this code to prevent the reach of windows MAX_PATH limitation with the temporary file extensions (`.download) added while downloading ..

function cache_path($app, $version, $url) {
    $MAX_PATH=260
    $downloadExt='.download'
    $cachefile = "$cachedir\$app#$version#$($url -replace '[^\w\.\-]+', '_')"
    $cachefile = $cachefile.substring(0, [System.Math]::Min($MAX_PATH, $cachefile.Length) - $downloadExt.Length)
    return $cachefile
}

I had the problem with downloading googlechrome extensions like:
https://clients2.google.com/service/update2/crx?response=redirect&acceptformat=crx2,crx3&prodversion=97.0&x=id%3Dmmhlniccooihdimnnjhamobppdhaolme%26installsource%3Dondemand%26uc#/dl.7z

Local test successful..

An other solution will be to create a SHA1 of the computed string as download filename vs. directly save it, as follow (but not tested):

function cache_path($app, $version, $url) {
    $enc   = [system.Text.Encoding]::UTF8
    $sha1 = New-Object System.Security.Cryptography.SHA1CryptoServiceProvider
    $hash = $sha1.ComputeHash($enc.GetBytes($url))
    $cachefile="$app#$version#$($hash)" -replace '\s',''
    "$cachedir/$cachefile"
}

@rashil2000
Copy link
Member

@StarsbySea please retarget this PR to develop branch

@StarsbySea StarsbySea changed the base branch from master to develop February 2, 2022 14:29
@Cologler
Copy link

Cologler commented Sep 9, 2022

In the Windows API (with some exceptions discussed in the following paragraphs), the maximum length for a path is MAX_PATH, which is defined as 260 characters.

The PR won't work because you are testing the file name, not the file path.

I prefer the 2nd solution mentioned by @mbl-35.

@r15ch13
Copy link
Member

r15ch13 commented Sep 9, 2022

also, should it be only 259 letters because of the NULL termination?

A local path is structured in the following order: drive letter, colon, backslash, name components separated by backslashes, and a terminating null character. For example, the maximum path on drive D is "D:\some 256-character path string" where "" represents the invisible terminating null character for the current system codepage.
https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation

But using SHA1 for the URL is the better option.

Fix: ScoopInstaller#4327
Normally, this function behaves as before when the cache filename is less than 260 characters long.
The modified program is designed to shorten cached filenames while retaining as much complete URL information as possible.
If the filename is greater than or equal to 260 characters in length, it will first attempt to remove everything after the question mark from the URL, and if it is still greater than 260 in length, only the filename in the URL will be retained.
Tested locally and passed.
@StarsbySea
Copy link
Author

StarsbySea commented May 15, 2023

Following @mbl-35 and @r15ch13's suggestion, some changes have been made, as follows:

The final output filename is composed as follows:

"$cachedir": This is the directory where the cached files are stored.
"$app#": This is the name of the application that the cached file is associated with, followed by the # character.
"$version#": This is the version of the application that the cached file is associated with, followed by the # character.
"$sha1": This is the SHA1 hash of the URL from where the file was downloaded. This ensures that even if the same application and version are being downloaded from different URLs, they will not overwrite each other in the cache.

So, for example, if the application is "app1", the version is "v1.0", and the SHA1 hash of the URL is "abc123", and the cache directory is at "C:\cache", the output will be: "C:\cache\app1#v1.0#abc123".

Signed-off-by: StarsbySea <66008060+StarsbySea@users.noreply.github.com>
Signed-off-by: StarsbySea <66008060+StarsbySea@users.noreply.github.com>
@r15ch13
Copy link
Member

r15ch13 commented May 29, 2023

Looks good 👍🏽

image

@rashil2000
Copy link
Member

@StarsbySea Is there any advantage of using SHA256 instead of SHA1? Since cryptographic security is not a priority here, SHA1 would be better as it is shorter.

@niheaven
Copy link
Member

niheaven commented May 1, 2024

Inherited by #5929

@niheaven niheaven closed this May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cached files with URL and filename > 260 characters issue and possible solution
7 participants