Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It does not work at all. #152

Closed
alexey-milovidov opened this issue Jul 3, 2022 · 3 comments · Fixed by #153
Closed

It does not work at all. #152

alexey-milovidov opened this issue Jul 3, 2022 · 3 comments · Fixed by #153

Comments

@alexey-milovidov
Copy link

I'm trying LocustDB on a clean Ubuntu 22.04 VM on AWS:

#!/bin/bash

# https://rustup.rs/
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source $HOME/.cargo/env

sudo apt-get update
sudo apt-get install -y git

git clone https://github.com/cswinter/LocustDB.git
cd LocustDB

sudo apt-get install -y g++ capnproto libclang-14-dev

cargo build --features "enable_rocksdb" --features "enable_lz4" --release

wget --continue 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
gzip -d hits.csv.gz

target/release/repl --load hits.csv --db-path db

# Loaded data in 920s.
# Table `default` (99997496 rows, 15.0GiB)

# SELECT * FROM default LIMIT 1

# And it immediately panicked and hung:

#locustdb> SELECT * FROM default LIMIT 1
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15
#thread '<unnamed>' panicked at 'index out of bounds: the len is 65536 but the index is 65536', src/stringpack.rs:91:15

It's unclear how to preserve data upon restart.
It's unclear how to define table structure.

Nevertheless, a simple query after loading panics.
Also, it gives strange messages:

# Table `default` (99997496 rows, 15.0GiB) #
2013-07-15: 0.92KiB
-1216690514: 0.14MiB
0: 1.6GiB
4: 61KiB
17: 0.77KiB
9110818468285196899: 0.92MiB
-2461439046089301801: 0.20MiB
�O: 57KiB
2013-07-14 20:38:47: 0.69MiB
�
 : 53KiB
�: 28KiB
-1001831330: 0.12MiB
-296158784638538920: 0.42MiB
5: 33KiB
-8417682003818480435: 0.56MiB
: 13GiB
3793327: 0.10MiB
NH: 0.60KiB
2013-07-15 10:47:34: 0.69MiB
839: 81KiB
-1: 67MiB
1971-01-01 14:16:06: 0.70MiB
1: 0.52KiB

# Table `_meta_tables` (2 rows, 42.0B) #
timestamp: 1.0B
name: 41B

That makes me suspect it is not memory-safe.

@cswinter
Copy link
Owner

cswinter commented Jul 3, 2022

Ah yes, I think you've found a bug that is triggered when input strings contain null bytes. Looks like it should be relatively straightforward to fix and improve performance as well.

It's unclear how to preserve data upon restart.

Just running target/release/repl --db-path db should see all the data previously loaded to db.

It's unclear how to define table structure.

One of the nice things about LocustDB is that you don't actually need to explicitly specify a schema, everything just happens automatically. There is some support for forcing columns to be interpreted as a certain type when loading data, see the --schema option.

@cswinter
Copy link
Owner

cswinter commented Jul 3, 2022

Another thing I just noticed, some of the strange output is because LocustDB assumes that the first row in the CSV is a header the with column names. To get actual column names, you can add a header to the csv or use the --schema option.

@cswinter
Copy link
Owner

cswinter commented Jul 3, 2022

Things seem to be working with the fix in #153:

locustdb> SELECT COUNT(1), col89 FROM default;

Scanned 100.0 million rows in 17.1ms (5.8 billion rows/s)!

col89 | COUNT(1)
------+----------
"0U�" | 26
"5eL" | 1
"NH�" | 99995421
"R.�" | 81
"ZBT" | 57
"cHx" | 43
"iPP" | 1
"vUP" | 1770
"�J8" | 15
"�ht" | 42
"�o"  | 40

locustdb> SELECT * FROM default LIMIT 1;

Scanned 65.5 thousand rows in 117ms (0.56 million rows/s)!

col39 | col21 | col10 | col74 | col3 | col71 | col75 | col79 | col81 | col15 | col45                 | col48 | col78 | col85 | col89 | col103               | col61 | col8 | col1 | col18 | col7       | col28 | col47 | col27 | col56 | col22 | col34 | col23 | col69 | col32 | col29 | col95 | col96 | col46 | col63 | col86 | col88 | col16 | col83 | col11 | col72 | col80 | col14                                        | col19 | col49 | col0                | col50                  | col59 | col24 | col62 | col26 | col35 | col37 | col64                 | col30 | col6  | col76 | col94 | col93 | col67 | col98 | col31 | col42 | col43 | col9                | col91 | col44 | col97 | col65 | col38 | col99 | col55               | col36   | col33 | col77 | col92 | col101 | col5         | col87 | col54 | col2
                                                                                                                                                                                               | col51 | col58 | col4                  | col60 | col17 | col82 | col68 | col100 | col102               | col73 | col25 | col70      | col52 | col66 | col53 | col84 | col104 | col40 | col12 | col41 | col57     | col90 | col20 | col13
------+-------+-------+-------+------+-------+-------+-------+-------+-------+-----------------------+-------+-------+-------+-------+----------------------+-------+------+------+-------+------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+----------------------------------------------+-------+-------+---------------------+------------------------+-------+-------+-------+-------+-------+-------+-----------------------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-------+---------------------+-------+-------+-------+-------+-------+-------+---------------------+---------+-------+-------+-------+--------+--------------+-------+-------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------+-------+-----------------------+-------+-------+-------+-------+--------+----------------------+-------+-------+------------+-------+-------+-------+-------+--------+-------+-------+-------+-----------+-------+-------+-------------------------------------------------
""    | 554   | 0     | "S0"  | 1    | 10208 | "h1"  | 0     | 6     | 0     | "2013-07-10 00:27:42" | 16561 | 0     | 0     | "NH�" | -3299945852400637761 | 0     | 36   | 1    | 9911  | 2088547703 | 31    | 1     | 0     | ""    | 37    | ""    | 15    | 0     | 0     | "D�"  | ""    | ""    | 4     | "g"   | ""    | null  | 16000 | 111   | 44    | -1    | 2     | "http://smeshariki.ru/page=98&rstr=тержинсы" | 216   | 0     | 7746300919266382380 | "windows-1251;charset" | 0     | 7     | 0     | 0     | null  | -1    | "2013-07-10 00:06:55" | 1     | 46429 | null  | ""    | ""    | 2     | ""    | 1     | 1750  | 653   | 7841794089446734162 | ""    | 135   | ""    | 22    | 0     | ""    | 8744056147474783115 | 2528191 | 0     | null  | ""    | 0      | "2013-07-10" | 0     | 0     | "Тонус 5, объявлений и фотоград - Яндекс.Афиша@Mail.Ru - Мастей в Ростей в Россия) - AUTO.ria.ua Базар автосалоне | новых кинотеатронно блин в хорошем качестве - Пульс цене, стр. 5 мини из 31 - Яндекс.net беседов Сибирск по алфавить" | 1601  | 0     | "2013-07-10 09:05:15" | 0     | 158   | 198   | 1758  | ""     | -1655607031864382640 | 13    | 700   | 1737435482 | 0     | 1     | 0     | 0     | 0      | 0     | 5     | 0     | 125358366 | 0     | 1368  | "http://smeshariki.ru/users/446132.html%3Fhtml"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants