Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preallocated Space for STXXL does not suit small NOR big knowledge bases. #225

Closed
manonthegithub opened this issue Apr 3, 2019 · 18 comments

Comments

@manonthegithub
Copy link
Contributor

I have tried to follow the quick start instructions and got an error building scientists index.
After I run in the container:

 IndexBuilderMain -l -i /index/scientists \
    -n /input/scientists.nt \
    -w /input/scientists.wordsfile.tsv \
    -d /input/scientists.docsfile.tsv

I get the following output:

IndexBuilderMain, version Apr  3 2019 17:34:00

Set locale LC_CTYPE to: C.UTF-8
Wed Apr  3 17:49:43.837 - DEBUG: Configuring STXXL...
Wed Apr  3 17:49:43.837 - DEBUG: done.
Wed Apr  3 17:49:43.839 - INFO:  Parsing next batch in parallel
FOXXLL v1.4.99 (prerelease/Release)
open() error on path=/index/scientists-stxxl.disk flags=16450, retrying without O_DIRECT.
Disk '/index/scientists-stxxl.disk' is allocated, space: 500000 MiB, I/O implementation: syscall queue=0 devid=0
terminate called after throwing an instance of 'foxxll::io_error'
  what():  Error in void foxxll::ufs_file_base::_set_size(foxxll::file::offset_type) : ftruncate() path=/index/scientists-stxxl.disk fd=4 : No space left on device: iostream error
Aborted

I have tried to figure out what is wrong in the code, but was unsuccessful, as I am not very proficient in c++ and its libs.

@hannahbast
Copy link
Member

hannahbast commented Apr 3, 2019 via email

@manonthegithub
Copy link
Contributor Author

I work on my machine. There is no 500 Gb of free space.
I am really surprised why it tries to allocate so much space. All the files from the collection consume less than 350 Mb.

@hannahbast
Copy link
Member

hannahbast commented Apr 3, 2019 via email

@manonthegithub
Copy link
Contributor Author

manonthegithub commented Apr 3, 2019

Yes, I just found it, and it worked). But the issue still remains...
We should probably make it not hardcoded or at least make corresponding notes in the quick start docs.

@joka921 joka921 changed the title Error building index using command in quick start Preallocated Space for STXXL does not suit small NOR big knowledge bases. Apr 4, 2019
@joka921
Copy link
Member

joka921 commented Apr 4, 2019

Hi Kirill,
Thanks for your feedback. Do I get it right, and you were able to build the scientists with the changed constant settings?
I renamed this issue, 500GB is of course way too big for the scientists and not enough for the full Wikidata (which does not crash, because stxxl reallocates in this case).

However, according to @niklas88 Information I previously got the impression, that it is no harm to create a 500GB virtual file if you never fill it with actual information, even if your hard disk is too small. Was I mistaken about this?

@manonthegithub What system are you using the Docker/QLever combination on (I think OS and the filesystem of the hard disk that was presumably too small should be the most important aspects).

@niklas88
Copy link
Member

niklas88 commented Apr 4, 2019

@manonthegithub as @joka921 already pointed out, the file created by STXXL should only be 500 GB conceptually. It's a sparse file that should not actually take 500 GB on a normal Linux filesystem. What kind of filesystem are you using? I've sucessfully used this on Ext4 and Btrfs filesystems with much less than 500 GB of free space.

@manonthegithub
Copy link
Contributor Author

Do I get it right, and you were able to build the scientists with the changed constant settings?

Yes you are right, I just removed three zeros to allocate 500Mb.

However, according to @niklas88 Information I previously got the impression, that it is no harm to create a 500GB virtual file if you never fill it with actual information, even if your hard disk is too small. Was I mistaken about this?

probably yes, it didn't work until I changed the constant.

I use MacBook with macOs Mojave. The filesystem is APFS (Apple File System).

@joka921
Copy link
Member

joka921 commented Apr 4, 2019

Ah, I think I assumed right. We officially only aim to support Linux systems, maybe macOS and APFS handle this preallocation calls differently and require this space to be free indeed.

But other than that:
Does QLever work on MacOs if you handle this stuff accordingly? I think nobody has ever tried:)

@manonthegithub
Copy link
Contributor Author

Does QLever work on MacOs if you handle this stuff accordingly? I think nobody has ever tried:)

This finally worked even with UI. Now I can say that it works on mac :) (except these virtual files).

@manonthegithub
Copy link
Contributor Author

At least scientists collection )

@niklas88
Copy link
Member

niklas88 commented Apr 4, 2019

Interesting, wouldn't have thought it to be that portable. Just today I tried building with clang 8.0 and that isn't even close to working at the moment due to clang disagreeing about some of the constexpr magic. I did test on ARM 64 Linux a while back though.

@manonthegithub
Copy link
Contributor Author

I think there were no problems, because it builds and runs in docker which runs on top of ubuntu base image.
I didn't try to build it on Mac, but i think this will be no problem but maybe some more time.

@niklas88 I am wondering do you really need to allocate these virtual/sparse files ahead? So I think to close the issue we need to decide does it need to be corrected and then if yes correct, or if no I can add comments to the quickstart docs. WDYT?

@niklas88
Copy link
Member

niklas88 commented Apr 4, 2019

Ah so I misunderstood, it's Docker on Mac, afaik that's actually a real Linux on a VMKit VM. We're actually almost always using Docker so there's still likely a interference with macOS. Hmm, we could certainly reduce the default size but I'd still like to understand why it is a problem just now. We've had this preallocation for a long time and I think that is the recommended way to use STXXL. That said I think @joka921 was thinking of dropping this dependency since we only really use it to do on-disk sorting during index construction.

@manonthegithub
Copy link
Contributor Author

@niklas88 Saying that there were no problems I meant that there were no problems except mentioned in this topic :). Even though it is linux inside vm, the drive remains formatted in APFS so the whole vm and linux run on top of APFS.
If @joka921 planned to drop this dependency we can probably wait for those changes.

@niklas88
Copy link
Member

niklas88 commented Apr 8, 2019

@manonthegithub ok I've tried setting this to 500 MB and it looks like this is extremely detrimental to out index building performance (didn't finish after 70 hours on Wikidata instead of the usual 12-14 hours). So for now I think we should go with a warning about needing support for sparse files on the filesystem where the index is build. That said, @joka921 and I think we should indeed increase the default to 1 TB.

@manonthegithub
Copy link
Contributor Author

manonthegithub commented Apr 8, 2019 via email

@niklas88
Copy link
Member

niklas88 commented Apr 8, 2019

@manonthegithub no need for the increase, @joka921 already has that in #227

@niklas88
Copy link
Member

Fixed with #227

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants