.julia stored on network drive is very slow #9944

riegaz · 2015-01-27T22:18:29Z

If the .julia folder is on a network drive (e.g. company sets the user path on a network drive). Julia does not work very well when it comes to package management.

Pkg.update() needs 30 minutes. Pkg.add("") takes up to 20 minutes. So it slows down extremely or it just hangs forever.

I found related topics about this issue several times on the internet. Could we solve this now?

simonster · 2015-01-27T22:27:20Z

Pretty sure there's nothing we can do until #4158/#7584

mschauer · 2015-01-28T10:57:39Z

Is this a general problem or only under windows? The current workaround is setting JULIA_PKGDIR to a local path, see #4334 .

tkelman · 2015-01-31T03:08:26Z

I wrote this somewhere else, but I'll ask again. Is there any way to limit the number of tasks that get spawned by an @async block? Spawning several dozen copies of git.exe simultaneously is kind of a disaster.

ihnorton · 2015-02-11T16:58:56Z

These simple git config changes might be worth trying: http://stackoverflow.com/a/24045966

There is also an interesting comment here (and several others below that one) which indicate that the slowness might be caused by interaction between msysgit and the mingw syscall implementation for setuid, leading to a blocking call for domain server lookup. So, possibly this is not completely caused by the forking-slowness issues we hope to solve with libgit2.

tkelman · 2015-02-12T01:38:45Z

~~mingw~~ msys syscall

Switching to libgit2 will eliminate the need to carry around msys-1.0.dll.

PallHaraldsson · 2015-08-28T10:47:30Z

"Pretty sure there's nothing we can do until #4158/#7584" that are closed and I see two merges. Should this be closed? Is it only fixed in 0.4? Waiting until it is released as 0.3 is not fixed?

Is there a policy on when to close? Does an issue only have to be fixed in master or also backported?

I think a friend had this issue (and/or it was a firewall..). I do not have Windows myself. I'm thinking, can I start recommending him to use again..?

pao · 2015-08-28T15:07:08Z

I'm not clear if those were the direct cause or merely adjunct. Easiest way to figure it out would be for someone to test it on a current build.

jakebolewski · 2015-08-28T18:09:09Z

No, network filesystems and git do not like each other. The only way to speed this up is to remove the number individual file read / writes we do.

tkelman · 2015-08-28T23:19:47Z

On both those issues:

Closing this in favor of #11196

wildart · 2015-09-14T04:42:43Z

We need to start thinking about using a bare repo for METADATA after #11196.

tkelman · 2015-09-14T04:53:12Z

A bare repo is just the .git contents without a working copy of the files, right? So you wouldn't be able to manually modify files except through libgit2? I can see the justification for maybe wanting a shallow clone of metadata, but not bare.

wildart · 2015-09-14T23:53:35Z

Developers could use proper repo, but for ordinary users using bare repo could provide a good speed up. FYI, bare repo is 7Mb and 26 files vs full repo - 70Mb, ~16000 files.

tkelman · 2015-09-15T00:41:47Z

Hm, that is tempting. A shallow clone would be somewhere in between right (at least on size)? Would libgit2 have functions to convert local checkouts to or from bare repos?

yuyichao · 2015-09-15T01:09:41Z

Seems that a shallow clone won't have too much effect because the 70M is mainly the checkout. (!!?)

Is it really true that a directory with two small files takes 12K? Or is this just becuase ext4 is not very effecient at storing lots of small files/directories.

yyc2:~/projects/tmp/METADATA.jl
yuyichao% du -h ZMQ/versions/0.1.6/
12K     ZMQ/versions/0.1.6/
yyc2:~/projects/tmp/METADATA.jl
yuyichao% ll -h ZMQ/versions/0.1.6/
total 16K
drwxr-xr-x  2 yuyichao yuyichao 4.0K Sep 14 21:03 ./
drwxr-xr-x 25 yuyichao yuyichao 4.0K Sep 14 21:03 ../
-rw-r--r--  1 yuyichao yuyichao   15 Sep 14 21:03 requires
-rw-r--r--  1 yuyichao yuyichao   41 Sep 14 21:03 sha1

wildart · 2015-09-15T01:11:24Z

Need to benchmark how fast branch tree can be traversed and blobs can be read vs file system operations.

nalimilan · 2015-09-15T08:11:53Z

@wildart You keep saying "bare repo", but what you mean is "shallow", right? How would a bare repo divide the repo size by more than 2? OTOH, I've found this about shallow repos, which says that the gain is quite limited: https://blogs.gnome.org/simos/2009/04/18/git-clones-vs-shallow-git-clones/

If the gain is larger than that, using a shallow repo sounds like a good idea to me. It should be enough for all non-developers, and it would work even for people who would like to submit a PR (since git 1.9). The only missing feature would be looking at the history (which you can easily do on GitHub for occasional needs).

yuyichao · 2015-09-15T11:44:22Z

@nalimilan No I think he really meant bare repo and not shallow clone. The reason of the saving seems to be a lot of small files and directories (see my comment above). Not sure if this is ext4 specific.

Last time I check (admittely pre-2.0) some git server gets very confused about pulling (yes pulling) from a shadow clone when there's branches from other remotes. (Might be fixed now and may not matter for METADATA.jl)

Edit: This seems to be caused by alignment of bock size (4K) and it seems to be the typical case for most file systems. Quoted from this thread

If a file contains any data at all (even a single byte), it will occupy one block on the disk (which is typically 4k these days). One block cannot be shared between files. This means that the space of that whole block will not be available for other files, so it is considered "used".

In the example above, the 12k are taken by the directory and the two files in it.

wildart · 2015-09-15T20:08:40Z

It is bare, not shallow. With libgit2 you can read a repo content directly from blobs, no need to extract anything to a disk.

nalimilan · 2015-09-15T20:55:50Z

Ah, OK. Indeed I just checked with DataFrames.jl, and the bare repo is 7,4MB while the shallow clone is 15MB and the full one 18MB. Unfortunately, that means we would lose the ability to see the files, let alone make a pull request...

wildart · 2015-09-15T23:33:11Z

METADATA does not have any source code. No worries.

tkelman · 2015-09-16T00:48:52Z

No source code, but manual modification of files there for pull requests is occasionally necessary, beyond what gets done programmatically by Pkg.tag/Pkg.publish. It's a little easier to test the consequences of local modifications to METADATA in-place on a real repo. So if we're considering doing this bare checkout I think we'd need functionality to unpack/repack a real repo back and forth to a bare one.

wildart · 2015-09-16T03:43:04Z

avaliable call time (all files on SSD drive)

Bare repo: 0.324362 seconds (1.07 M allocations: 44.812 MB, 10.29% gc time)
Full repo: 0.486640 seconds (1.29 M allocations: 56.850 MB, 11.98% gc time)

wildart · 2015-09-16T03:54:53Z

@tkelman Bare repo should be used by users that are not going to develop packages. Developers need to clone full repo to use Pkg dev related calls (tag, publish & etc).

yesimon · 2017-02-06T21:41:07Z

I'm currently encountering this problem. Pkg.available() takes 20+ minutes on an NFS cluster which makes usage extremely difficult to manage/update packages. The bulk of the time seems to be reading files and traversing the filesystem in the very large METADATA repository, which is only going to get bigger as time goes by. Is there any way to simply allow usage of Pkg without traversing all files (I understand that things might be out of date as a result)?

KristofferC · 2017-02-06T21:45:57Z

You could try https://github.com/KristofferC/Pkg25.jl

yesimon · 2017-02-06T23:12:41Z

@KristofferC Thanks - it is much faster! At least now it is plausible to use Pkg if not completely seamless yet. (Still one file per package!)

julia> @time Pkg25.available()
 79.702075 seconds (122.71 k allocations: 7.644 MB)

pcarbo · 2018-01-09T21:04:16Z

@KristofferC To use Julia on a compute cluster with NFS, is it now recommended that we use Pkg3.jl, or should we continue to use Pkg25.jl?

KristofferC · 2018-01-09T21:06:10Z

Pkg3 is still a work in progress, however, please try it out and report back! It should be significantly faster. Open an issue on the Pkg3 repo if you encounter any problems.

pcarbo · 2018-01-09T21:18:59Z

@krislock Thank you for the quick response.

simonbyrne · 2018-08-10T16:39:35Z

I think this is now solved by the new Pkg. If it occurs again, please open at https://github.com/JuliaLang/Pkg.jl

ihnorton added the system:windows Affects only Windows label Jan 30, 2015

vtjnash changed the title ~~.julia stored on network drive causes problems~~ .julia stored on network drive is very slow Jan 31, 2015

JeffBezanson added domain:packages Package management and loading performance Must go faster labels Feb 3, 2015

ncnc mentioned this issue May 5, 2015

RFC: Speed up Pkg.update() #11137

Merged

jiahao mentioned this issue May 6, 2015

Pkg: Don't spawn git processes asynchronously #11152

Merged

wildart mentioned this issue Aug 28, 2016

Investigate in using shallow-clone / tar balls for Pkg.add and Pkg.clone #17963

Closed

simonbyrne closed this as completed Aug 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.julia stored on network drive is very slow #9944

.julia stored on network drive is very slow #9944

riegaz commented Jan 27, 2015

simonster commented Jan 27, 2015

mschauer commented Jan 28, 2015

tkelman commented Jan 31, 2015

ihnorton commented Feb 11, 2015

tkelman commented Feb 12, 2015

PallHaraldsson commented Aug 28, 2015

pao commented Aug 28, 2015

jakebolewski commented Aug 28, 2015

tkelman commented Aug 28, 2015

wildart commented Sep 14, 2015

tkelman commented Sep 14, 2015

wildart commented Sep 14, 2015

tkelman commented Sep 15, 2015

yuyichao commented Sep 15, 2015

wildart commented Sep 15, 2015

nalimilan commented Sep 15, 2015

yuyichao commented Sep 15, 2015

wildart commented Sep 15, 2015

nalimilan commented Sep 15, 2015

wildart commented Sep 15, 2015

tkelman commented Sep 16, 2015

wildart commented Sep 16, 2015

wildart commented Sep 16, 2015

yesimon commented Feb 6, 2017

KristofferC commented Feb 6, 2017

yesimon commented Feb 6, 2017

pcarbo commented Jan 9, 2018

KristofferC commented Jan 9, 2018

pcarbo commented Jan 9, 2018

simonbyrne commented Aug 10, 2018

.julia stored on network drive is very slow #9944

.julia stored on network drive is very slow #9944

Comments

riegaz commented Jan 27, 2015

simonster commented Jan 27, 2015

mschauer commented Jan 28, 2015

tkelman commented Jan 31, 2015

ihnorton commented Feb 11, 2015

tkelman commented Feb 12, 2015

PallHaraldsson commented Aug 28, 2015

pao commented Aug 28, 2015

jakebolewski commented Aug 28, 2015

tkelman commented Aug 28, 2015

wildart commented Sep 14, 2015

tkelman commented Sep 14, 2015

wildart commented Sep 14, 2015

tkelman commented Sep 15, 2015

yuyichao commented Sep 15, 2015

wildart commented Sep 15, 2015

nalimilan commented Sep 15, 2015

yuyichao commented Sep 15, 2015

wildart commented Sep 15, 2015

nalimilan commented Sep 15, 2015

wildart commented Sep 15, 2015

tkelman commented Sep 16, 2015

wildart commented Sep 16, 2015

wildart commented Sep 16, 2015

yesimon commented Feb 6, 2017

KristofferC commented Feb 6, 2017

yesimon commented Feb 6, 2017

pcarbo commented Jan 9, 2018

KristofferC commented Jan 9, 2018

pcarbo commented Jan 9, 2018

simonbyrne commented Aug 10, 2018