-
-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dat share
is slow with lots of files
#915
Comments
update: it gets much slower over timei tried
the original |
@mafintosh feross says you know why? |
updateeven with a small number of entries per directory, it still grinds to a halt after a few hundred MB :/ |
Anyway you could try this on a Mac as well?
…On Sun, Jan 14, 2018, 22:55 DC ***@***.***> wrote:
@feross <https://github.com/feross> @mafintosh
<https://github.com/mafintosh>
update
even with a small number of entries per directory, it still grinds to a
halt after a few hundred MB :/
[image: image]
<https://user-images.githubusercontent.com/169280/34920587-e9946736-f942-11e7-8b58-e1efa6b2f010.png>
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#915 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAW_VQWhnxnIHkM239-vEfdoHR5GAe0Oks5tKmnSgaJpZM4Rdd9X>
.
|
@mafintosh sure -- i just ran it on Mac using the profiler it looks like the issue might be in append-tree |
update: issues almost certainly involves
|
adding n files takes O(n^2) time, O(n^2) spacethis looks bad after further investigation, i found out that deep in the this function runs in O(n) time, appending a log entry of size O(n), if the folder contains n files this means that adding ... cc @mafintosh logrunning the folder contains ~2500 files totalling 30MB
|
Hi, just bumping this one after 18 months. Is something that can be fixed or is it fundamental? I'm just getting started with dat and was super pumped until I hit this. Honestly I'm still super pumped about dat but this one makes one intended use non-feasible. Here is hoping there is still hope for this issue. Thanks all! |
There's a new version of dat coming that improves performance quite a bit |
Here's the new module that enables this: https://github.com/mafintosh/hypertrie |
This comment has been minimized.
This comment has been minimized.
Current timeline is to release the next dat cli with this new update by the end of the year. Until then, you can use hyperdrive-daemon which has most of the basic functionality. Cc @andrewosh |
@okdistribute any update with that new update? I have similar data with lots of files I'd like to share, so have waited for this issue to get closed for a while. Thanks |
@Koeng101 @andrewosh is ironing out the last of the bugs in hyperdrive-daemon, IIRC. It should be good to start testing if you're okay with potentially needing to rebuild your archives in the future. 😁 |
@feross and i are creating a demo of Simple English Wikipedia hosted on dat
dat share
-ing the dump (1 big file) is fastsetup
results
we are calling
dat share
on one file, total 1.2GB:dat share
-ing the articles (small files) runs very slowly(notice that
stat
-ing all 300k files only takes <4 seconds, so that's not the bottleneck...)we are calling
dat share
on ~300k files, totalling 3GB:[... runs at about ~150KB/s, hasn't finished yet ...]
tldr;
dat share
throughput is 1000x less with small filesthese files are about 10KB on average, and
dat share
is processing just a couple of them per secondThe text was updated successfully, but these errors were encountered: