New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preallocated Space for STXXL does not suit small NOR big knowledge bases. #225
Comments
The error message says "No space left on device" in the end.
On which machine are you working and do you have enough disk space?
It says that the allocated disk is of size 500000 MiB = 500 GiB. The
scientist collection works with much less, however.
…On 03.04.2019 19:56, Kirill Yankov wrote:
I have tried to follow the quick start instructions and got an error
building scientists index.
After I run in the container:
|IndexBuilderMain -l -i /index/scientists \ -n /input/scientists.nt \
-w /input/scientists.wordsfile.tsv \ -d /input/scientists.docsfile.tsv |
I get the following output:
|IndexBuilderMain, version Apr 3 2019 17:34:00 Set locale LC_CTYPE to:
C.UTF-8 Wed Apr 3 17:49:43.837 - DEBUG: Configuring STXXL... Wed Apr 3
17:49:43.837 - DEBUG: done. Wed Apr 3 17:49:43.839 - INFO: Parsing
next batch in parallel FOXXLL v1.4.99 (prerelease/Release) open()
error on path=/index/scientists-stxxl.disk flags=16450, retrying
without O_DIRECT. Disk '/index/scientists-stxxl.disk' is allocated,
space: 500000 MiB, I/O implementation: syscall queue=0 devid=0
terminate called after throwing an instance of 'foxxll::io_error'
what(): Error in void
foxxll::ufs_file_base::_set_size(foxxll::file::offset_type) :
ftruncate() path=/index/scientists-stxxl.disk fd=4 : No space left on
device: iostream error Aborted |
I have tried to figure out what is wrong in the code, but was
unsuccessful, as I am not very proficient in c++ and its libs.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#225>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANUinBjb3GixiH15JuTGRHvebFxqAVhwks5vdOtcgaJpZM4cbGWN>.
|
I work on my machine. There is no 500 Gb of free space. |
I think it's line 9 in QLever/src/global/Constants.h
I am surprised that it's hard-coded.
Try changing it to a lower value.
…On 03.04.2019 20:17, Kirill Yankov wrote:
I work on my machine. There is no 500 Gb of free space.
I am really surprised why it tries to allocate so much space. All the
files from the collection consume less than 350 Mb.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#225 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ANUinJ0fEOLQqz-wf-X_MgSMrqzx7pg-ks5vdPAkgaJpZM4cbGWN>.
|
Yes, I just found it, and it worked). But the issue still remains... |
Hi Kirill, However, according to @niklas88 Information I previously got the impression, that it is no harm to create a 500GB virtual file if you never fill it with actual information, even if your hard disk is too small. Was I mistaken about this? @manonthegithub What system are you using the Docker/QLever combination on (I think OS and the filesystem of the hard disk that was presumably too small should be the most important aspects). |
@manonthegithub as @joka921 already pointed out, the file created by STXXL should only be 500 GB conceptually. It's a sparse file that should not actually take 500 GB on a normal Linux filesystem. What kind of filesystem are you using? I've sucessfully used this on Ext4 and Btrfs filesystems with much less than 500 GB of free space. |
Yes you are right, I just removed three zeros to allocate 500Mb.
probably yes, it didn't work until I changed the constant. I use MacBook with macOs Mojave. The filesystem is APFS (Apple File System). |
Ah, I think I assumed right. We officially only aim to support Linux systems, maybe macOS and APFS handle this preallocation calls differently and require this space to be free indeed. But other than that: |
This finally worked even with UI. Now I can say that it works on mac :) (except these virtual files). |
At least scientists collection ) |
Interesting, wouldn't have thought it to be that portable. Just today I tried building with clang 8.0 and that isn't even close to working at the moment due to clang disagreeing about some of the |
I think there were no problems, because it builds and runs in docker which runs on top of ubuntu base image. @niklas88 I am wondering do you really need to allocate these virtual/sparse files ahead? So I think to close the issue we need to decide does it need to be corrected and then if yes correct, or if no I can add comments to the quickstart docs. WDYT? |
Ah so I misunderstood, it's Docker on Mac, afaik that's actually a real Linux on a VMKit VM. We're actually almost always using Docker so there's still likely a interference with macOS. Hmm, we could certainly reduce the default size but I'd still like to understand why it is a problem just now. We've had this preallocation for a long time and I think that is the recommended way to use STXXL. That said I think @joka921 was thinking of dropping this dependency since we only really use it to do on-disk sorting during index construction. |
@niklas88 Saying that there were no problems I meant that there were no problems except mentioned in this topic :). Even though it is linux inside vm, the drive remains formatted in APFS so the whole vm and linux run on top of APFS. |
@manonthegithub ok I've tried setting this to 500 MB and it looks like this is extremely detrimental to out index building performance (didn't finish after 70 hours on Wikidata instead of the usual 12-14 hours). So for now I think we should go with a warning about needing support for sparse files on the filesystem where the index is build. That said, @joka921 and I think we should indeed increase the default to 1 TB. |
Fine, I can do both things.
Kind regards / Mit freundlichen Grüßen,
Kirill Yankov.
… On 8 Apr 2019, at 11:24, Niklas Schnelle ***@***.***> wrote:
@manonthegithub ok I've tried setting this to 500 MB and it looks like this is extremely detrimental to out index building performance (didn't finish after 70 hours on Wikidata instead of the usual 12-14 hours). So for now I think we should go with a warning about needing support for sparse files on the filesystem where the index is build. That said, @joka921 and I think we should indeed increase the default to 1 TB.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@manonthegithub no need for the increase, @joka921 already has that in #227 |
Fixed with #227 |
I have tried to follow the quick start instructions and got an error building scientists index.
After I run in the container:
I get the following output:
I have tried to figure out what is wrong in the code, but was unsuccessful, as I am not very proficient in c++ and its libs.
The text was updated successfully, but these errors were encountered: