Add Frigatebird#906
Open
alexey-milovidov wants to merge 1 commit into
Open
Conversation
Frigatebird (https://github.com/Frigatebird-db/frigatebird) is an embedded columnar SQL database in Rust. It ingests only via INSERT ... VALUES, so ./load streams hits.parquet through parquet_to_inserts.py (pyarrow) as batched INSERTs into the REPL. Per the README, expect many queries to show up as null: Frigatebird's SQL surface lacks EXTRACT/REGEXP_REPLACE/LENGTH/CASE, and its TEXT decompressor panics on the non-UTF-8 bytes in the hits dataset. Resolves #809
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
frigatebird/ClickBench recipe for Frigatebird, an embedded columnar SQL database written in Rust (push-based Volcano execution, morsel parallelism, LZ4 + O_DIRECT storage)../loadstreamshits.parquetthrough a small pyarrow script (parquet_to_inserts.py) into the Frigatebird REPL as batchedINSERT INTO hits VALUES (...)statements — Frigatebird has noCOPY/ Parquet / CSV ingest path.create.sqlcollapses all integer widths toBIGINTandDATEtoTIMESTAMP(Frigatebird's type system has no narrower forms), and uses the mandatoryORDER BY (CounterID, EventDate, UserID, EventTime, WatchID)../querymeasures runtime with bash built-intimesince the CLI has no built-in timer.Notes
parquet_to_inserts.pyemits negative integers as quoted strings to work around Frigatebird'sINSERTplanner rejectingUnaryOp { Minus, Number }literals; the column-type coercion path parses them back toi64.EXTRACT,REGEXP_REPLACE,LENGTH/STRLEN,CASE, etc., so several queries will fail at parse/plan time and land asnullin the results JSON.failed to decompress page payload: string is not valid utf8on the non-UTF-8 bytes that the hits dataset's text columns contain. The recipe is wired up so the upstream behaviour on the full dataset is reproducible; expect many or all queries to benulluntil upstream stabilises ingest/scan for non-UTF-8 strings.Resolves #809
Test plan
./install && ./benchmark.shon a fresh Ubuntu 24.04 VMnull