Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Installation of DSGE.jl broken on Julia v1.5 for Windows #1943

Closed
JuergenWiemers opened this issue Aug 6, 2020 · 13 comments
Closed

Installation of DSGE.jl broken on Julia v1.5 for Windows #1943

JuergenWiemers opened this issue Aug 6, 2020 · 13 comments

Comments

@JuergenWiemers
Copy link

JuergenWiemers commented Aug 6, 2020

As mentioned here installation on Windows starting from Julia v1.1 seems to be broken right now. I can confirm that this is still an issue on Julia v1.5 for Windows. When trying to install DSGE.jl in a clean environment I get:

ERROR: Error when installing package DSGE:
AssertionError: length(dirs) == 1
Stacktrace:
 [1] install_archive(::Array{Pair{String,Bool},1}, ::Base.SHA1, ::String) at D:\buildbot\worker\ppackage_win64\build\usr\share\julia\stdlib\v1.5\Pkg\src\Operations.jl:562
 [2] macro expansion at D:\buildbot\worker\package_win64\build\usr\share\julia\stdlib\v1.5\Pkg\srackage_win64\build\usr\share\julia\stdlib\v1.rc\Operations.jl:727 [inlined]
 [3] (::Pkg.Operations.var"#58#61"{Bool,Pkg.Types.Context,Dict{Base.UUID,Array{String,1}},Channelc\Operations.jl:727 [inlined]l{Any},Channel{Any}})() at .\task.jl:356  

The function install_archive() in Operations.jl checks whether the temporary folder, which includes the extracted package, contains exactly one path - excluding a possible spurious file called pax_global_header, which seemingly 7z might create on Windows. Starting on line 559 of Operations.jl:

dirs = readdir(dir)
# 7z on Win might create this spurious file
filter!(x -> x != "pax_global_header", dirs)
@assert length(dirs) == 1
unpacked = joinpath(dir, dirs[1])

However, for DSGE.jl the contents of the temporary folder that is created during the installation of DSGE.jl look like this:

julia> tp = "C:/Users/Juergen/AppData/Local/Temp/84mXPHT7jq6E/" # that's the random temp path
julia> readdir(tp)
13-element Array{String,1}:
 "4e6ee4544e19ad2bafb35de9c647ce2e1d3f74c7.data"
 "4e6ee4544e19ad2bafb35de9c647ce2e1d3f74c7.paxheader"
 "5b8fa286e6a8358c29532fad1b593b644169585e.data"
 "5b8fa286e6a8358c29532fad1b593b644169585e.paxheader"
 "634244007a6e3cbe50fba432cce3285b48c42ee2.data"
 "634244007a6e3cbe50fba432cce3285b48c42ee2.paxheader"
 "FRBNY-DSGE-DSGE.jl-a824dcd"
 "bc56c6b6969d5a0662d910e703089c2786d69168.data"
 "bc56c6b6969d5a0662d910e703089c2786d69168.paxheader"
 "e17f8432420fc31d1ddcca8003fba9b91968c096.data"
 "e17f8432420fc31d1ddcca8003fba9b91968c096.paxheader"
 "eef350acd564c108036f8adf58f321418d003fa1.data"
 "eef350acd564c108036f8adf58f321418d003fa1.paxheader"

Thus, filtering pax_global_header doesn't help because it isn't even there. Instead there is a bunch of other .data and .paxheader files, which are not filtered out such that @assert length(dirs) == 1 errors.

A quick fix would simply replace

filter!(x -> x != "pax_global_header", dirs)

with

filter!(x -> (x != "pax_global_header") &
            !endswith(x, ".data") &
            !endswith(x, ".paxheader") , dirs)

(However, the fix is not really future proof, because installation of a package will break whenever 7z writes other spurious files which don't match the filter patterns above...). I should also mention that I never had this issue with any other package in Julia (and I installed quite a lot...).

Version info:

Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-6700K CPU @ 4.00GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA = "C:\Users\Juergen\AppData\Local\Programs\Julia 1.5.0\bin\"
  JULIA_EDITOR = "C:\Users\Juergen\AppData\Local\Programs\Microsoft VS Code\Code.exe"
  JULIA_NUM_THREADS = 8
@KristofferC
Copy link
Sponsor Member

Or we remove that assert completely...

@JuergenWiemers
Copy link
Author

Or that... but then the following line

unpacked = joinpath(dir, dirs[1])

assumes the relevant path to be the first element of dirs, which probably won't be the case, right?

BTW, I just rebuilt the sysimage using my fix above. Installation of DSGE.jl now works for me.

@KristofferC
Copy link
Sponsor Member

Yeah! That's true... So we need to be a bit careful with that..

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Aug 6, 2020

And this is why I wrote Tar.jl... which unfortunately doesn't help us with 1.5. Do you have a link to the archive that it's trying to install so that I can take a look at it and see what's going on? Seems like it may be a TAR archive with large files or some other features that 7z doesn't know what to do with. It might be possible to rewrite the archive so that it's less confusing to 7z or to configure the program differently.

@chenwilliam77
Copy link

chenwilliam77 commented Aug 6, 2020

Do you have an idea where I might find this TAR archive? I'm not precisely sure where to start looking for this file (fyi I only have access to MacOS and Linux, so it's something specific to Windows, I won't be able to find that link). On the subject of large files, there are some large JLD2 files used for testing (50-60MB). These files are git tracked. If that may be an issue, then I can try reducing the size of those JLD2 files.

@KristofferC
Copy link
Sponsor Member

The URL is https://api.github.com/repos/JuliaLang/MbedTLS.jl/tarball/2d94286a9c2f52c63a16146bb86fd6cdfbf677c6 but insert the correct organization, package name and tree hash.

@chenwilliam77
Copy link

chenwilliam77 commented Aug 6, 2020

@StefanKarpinski
Copy link
Sponsor Member

StefanKarpinski commented Aug 6, 2020

This tarball has some paths that are too long for the standard TAR format to handle, so it's being created with POSIX extended headers. The 7z tool doesn't understand these headers so it extracts them as if they were files, which are in the root of the tarball instead of where they're supposed to be. Note that file sizes aren't the issue here, it's just the file names. Names like FRBNY-DSGE-DSGE.jl-a3f0158/test/reference/output_data/an_schorfheide/ss0/forecast/raw/decompdettrendpseudo_cond=none_para=full_vint=151001__an_schorfheide_ss0_cond=none_para=full_vint=151001.jld2 which, I have to say, is a bit much—the path is 195 bytes long and the file name alone is 109 bytes. The basic TAR format has hard limit of 100 bytes for the file name and 155 bytes for the directory prefix of the path, so this cannot be put in a TAR archive without extended headers. Since 7z doesn't understand the extended format, there's a few options:

  1. Install a tar program on the client that can handle extended TAR features.
  2. Change the tarball so that it doesn't include files with such long names.
  3. Change the tarball so that the broken extraction is less broken.

The last option could be done by hand and would allow 7z to extract something a bit better—like a truncated version of these files—but it would still not be correct. There's simply no way to get 7z to extract this file tree correctly.

Note, however, that the max total path length on Windows is 260 bytes, and since the total path of one of these files ends up being $HOME/.julia/artifacts/$sha1/$path and sha1 hashes are 40 bytes, that means that unless the user's home directory path is no longer than 260 - 195 - 19 - 40 = 6, which is quite unlikely, then as I understand it on Windows this tarball couldn't be extracted anyway since the file system won't support it.

Bottom line: don't put paths this long in your artifacts or packages.

@JuergenWiemers
Copy link
Author

JuergenWiemers commented Aug 7, 2020

Ah, ancient Windows restrictions! 😖However, at least for Windows 10, Version 1607 and above there is another solution: Enabling long paths! For whatever reasons, long paths have to be opted in, see here. Opting in is done by setting the DWORD LongPathsEnabled (or creating it, if it doesn't already exist) in the registry key HKLM\SYSTEM\CurrentControlSet\Control\FileSystem to 1 and restarting Windows.

After that, I could install DSGE.jl without using my dirty hack above.

I'm not sure why Microsoft chose to make this opt-in, because enabling long paths doesn't seem to have any real downside, see e.g. here:

There is one caveat. This new setting won’t necessarily work with every application out there, but it will work with most. Specifically, any modern applications should be fine, as should all 64-bit applications. Older 32-bit applications need to be manifested in order to work, which really just means that the developer has indicated in the application’s manifest file that the application supports longer paths. Most popular 32-bit apps should experience no problem. Still, you don’t risk anything by trying the setting out. If an application doesn’t work, the only thing that will happen is that it won’t be able to open or save files that are saved in places where the full path exceeds 260 characters.

Still, a more informative error message might be useful. Package maintainers who exclusively develop on Linux might not even be aware of the 260 bytes limitation and therefore might not follow Stefan's sound advice :

Bottom line: don't put paths this long in your artifacts or packages.

And a lot of Windows users will simply give up (maybe even on Julia) as soon as they get

AssertionError: length(dirs) == 1

Maybe something like:

@assert length(dirs) == 1 "At least one of the paths in the package is longer than the Windows limit of 260 bytes. 
Enable long paths (if possible) and/or report this issue to the package maintainers."

(Here I implicitly assume that this error will only be triggered in this particular situation, which is probably too optimistic.)

@StefanKarpinski
Copy link
Sponsor Member

The Windows limit isn't the main issue here though—it's not even getting that far. The main problem is that the 7z extraction tool doesn't know how to handle POSIX extended TAR features, which are necessary to put these files into a tarball. It's not even getting to the point where Windows can have a problem with file lengths because the 7z extracts the files with long names incorrectly into the root of the folder. I agree that the error message is not great, but the message is misleading. The real solution going forward is to use the Pkg protocol to get better formed tarballs and Tar.jl on the client for extraction. Tarballs sent via Pkg protocol don't have the package name as a leading directory like this, so there's no need for the assertion or figuring out which item to look for the tree under. Tar.jl knows how to extract tarballs with POSIX extended features, so it will correctly extract these. Which will lead to the point where it can hit the Windows path length limit, which should hopefully have a clear enough error message.

In short, aside from improving the error message here, I don't think there's a whole lot to do for 1.5. The most pragmatic immediate solution is that the DGSE package should be modified to not include such long file names.

@chenwilliam77
Copy link

I am working on fixing the problem for DSGE.jl. The problematic files appear to just be git tracked output produced during tests. In principle, the tests should still be able to run if I delete the output files after the tests finish, so these files will no longer need to be git tracked. Thanks for all the assistance!

@StefanKarpinski
Copy link
Sponsor Member

Note that if someone runs the tests on Windows, they may end up hitting the path limit. Not sure if that's a concern.

@chenwilliam77
Copy link

Thanks for the heads up; I'll leave a note on the main page about which tests would be problematic to Windows users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants