-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load database from .gz archive #57
Comments
This is trivially implemented, but the design is important. How do you propose to detect compressed input? Easier and more brittle is to go by extensions, more robust is to use magic numbers for bzip2 and gzip @barakmich, your view? |
Easiest ways is detect compressed input by file extension or add "--format" parameter "./cayley http --dbpath=30kmovies.nt.zip --format=zip". The reason of doing it is to save space on HDD. Sometimes it's hard to find 300Gb of space especially on rented hardware (VPS, Cloud services). |
I'm not keen on adding a flag for this; it allows a greater opportunity for discordance between data and processing. Per extension is easy:
All magic number determination needs is a read of the head of the file to get the file type (gzip[:] == "\x1f\x8b" and b2zip[:3] == "BZh"). This is not difficult:
|
Sounds very good. Can you please make pull request? I really need to be able to search data in FreeBase compressed database as soon as possible. Thanks. |
I'm waiting on Barak's input. I'd prefer the magic number approach, but I'll defer to his view on this. |
Currently I seeing process of using this database:
My questions are:
To make Cayley usable I cut needed data with few |
This was fixed by 984ab6f. |
Thanks. I'll test it as soon as new version of binaries will be released for Mac OS X. |
0.3.1 is available (point release, yay) |
I'd like to propose to add support for compressed databases. Use case: I load FreeBase data dump from https://developers.google.com/freebase/data file is about 20 Gb and want be able to use database without unpacking it.
The text was updated successfully, but these errors were encountered: