Add Hyrise#883
Open
alexey-milovidov wants to merge 2 commits intomainfrom
Open
Conversation
Closes #751 Hyrise is a research in-memory column-oriented database from HPI (https://github.com/hyrise/hyrise). It implements the PostgreSQL wire protocol, so the benchmark connects via psql and uses Hyrise's COPY ... WITH (FORMAT CSV) to load the standard ClickBench CSV dataset. The system is built from source via Hyrise's install_dependencies.sh and cmake/ninja; install_dependencies.sh requires Ubuntu 25.04 or newer. Since Hyrise has no on-disk persistence, the data size is reported as the total estimated segment size from the meta_segments meta table. Hyrise has limited SQL coverage (no DATE/DATETIME types, no REGEXP_REPLACE, no DATE_TRUNC). Queries that use unsupported functions are kept verbatim and will be reported as null in the result file. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Move the build into a multi-stage Dockerfile (ubuntu:25.04 + gcc-15) so the
benchmark works on any Ubuntu host without polluting it with Hyrise's
toolchain. The runtime image only carries hyriseServer, libhyrise_impl.so,
libjemalloc.so, and a small set of shared-library deps (~250 MB).
Build args:
- HYRISE_REF (default master) — pin a Hyrise revision
- NO_LTO (default FALSE) — toggle LTO for faster development builds
Loading: drop create.sql and use hits.csv.json next to the data file as the
schema source. CREATE TABLE followed by COPY trips a Hyrise assertion
("set_immutable() should not be called on an empty chunk", chunk.cpp:125)
because COPY tries to seal the empty chunk left by CREATE TABLE; letting
COPY auto-create the table from the CSV meta avoids the issue.
run.sh: detect failed queries via psql's exit code rather than grepping the
output, so errors like "Invalid input error: Could not resolve function
'LENGTH'" are recorded as null. Hyrise lacks LENGTH, REGEXP_REPLACE,
DATE_TRUNC, and OFFSET, so 7 queries (Q28, Q29, Q39-Q43) are reported as
null; the remaining 36 succeed.
Tested locally on arm64: docker build produced a working image, hyriseServer
accepts psql connections, COPY loads a 1000-row sample, and run.sh produces
the expected 43 lines of [t1,t2,t3] output with nulls in the right slots.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hyriseServer+libhyrise_impl.so+libjemalloc.so.2).COPY hits FROM '/data/hits.csv' WITH (FORMAT CSV); column types come fromhits.csv.jsonplaced next to the data file (CREATE TABLE followed by COPY hits an internal Hyrise assertion). Data size is reported frommeta_segments.estimated_size_in_bytessince Hyrise has no on-disk persistence.LENGTH,REGEXP_REPLACE,DATE_TRUNC, orOFFSET.run.shnow keys off psql's exit code so those queries (Q28, Q29, Q39–Q43) are recorded asnullinstead of stealing the timing line that psql still prints.Closes #751
Test plan
docker buildproduces a working image;hyriseServerstarts and accepts psql connectionsSELECT COUNT(*),MIN/MAX(EventDate),AVG(UserID),COUNT(DISTINCT UserID),EXTRACT(MINUTE FROM EventTime)all return expected resultsnull(Q28/29 — unsupported functions, Q39–Q43 —OFFSET/DATE_TRUNC)Load time:line,Data size:line, 43 lines of[t1,t2,t3],🤖 Generated with Claude Code