A collection of ClickHouse query parsing and analysis tools built on a rickety, hollowed-out version of ClickHouse.
One of the challenges of building tooling around SQL is that some dialects provide a large set of features beyond standard ANSI SQL. ClickHouse is one of these, providing a rich SQL dialect that is difficult for generic tooling to keep up with.
This project builds more reliable tooling for ClickHouse queries by providing abstractions over the actual ClickHouse parser.
Note: Parsing is currently limited to
SELECTqueries, other query types are on the roadmap.
The primary entrypoint is a Python package, clickshack. Features include:
- A near like-for-like port of the actual ClickHouse SQL parser (currently supports version
26.2) - Converters for integrating with SQLGlot — parse with
clickshack, then reach for SQLGlot's growing library of SQL utilities. - (Experimental) - A minimal Python library for writing custom SQL linting rules.
- Note: I am not entirely sure where this one is headed, but it might make a useful counterpart to something like sqlcheck for ClickHouse.
Install from PyPI with:
pip install clickshackThen parse a query with:
from clickshack import parse_sql
envelope = parse_sql("SELECT id, name FROM users WHERE active = 1")
print(envelope.ir_version) # "1"
print(envelope.query) # SelectNode(...)Integrate with SQLGlot:
from sqlglot.lineage import lineage
from clickshack import parse_sql
from clickshack.ir.adapter import to_sqlglot
envelope = parse_sql("SELECT id, name FROM users WHERE active = 1")
sqlglot_expr = to_sqlglot(envelope.query)
# Trace which source column feeds 'name' in the output
node = lineage("name", sqlglot_expr)
for n in node.walk():
print(n.name)
# name
# users.nameOr run one of the experimental linters with:
from clickshack import parse_sql
from clickshack.ir.models import StarNode
from clickshack.linter.base import LintRunner, LintViolation
class NoSelectStar:
"""Prohibit SELECT *."""
def check(self, node) -> list[LintViolation]:
if isinstance(node, StarNode):
return [LintViolation(rule="no-star", message="SELECT * is not allowed", span=node.span)]
return []
envelope = parse_sql("SELECT * FROM events")
runner = LintRunner(rules=[NoSelectStar()])
violations = runner.run(envelope.query)
for v in violations:
print(f"{v.rule} at offset {v.span.start}: {v.message}")
# no-star at offset 7: SELECT * is not allowedSee the modules under ./clickshack/ for more details (docs are still in the works).
bazel,just,uv- Python 3.11+ (managed automatically by uv)
# Install Python dependencies (creates .venv automatically)
just syncported_clickhouse/ C++ parser (ported ClickHouse SELECT dialect)
ir/ C++ IR serializer (AST → JSON, uses nlohmann/json)
examples/bootstrap/ Smoke binaries incl. ir_dump (SQL stdin → JSON IR stdout)
clickshack/
ir/ Pydantic v2 models + SQLGlot adapter for the JSON IR
linter/ LintRule protocol + LintRunner for IR-based linting
tests/ Python tests (pytest)
testdata/ Parser workload fixtures and manifest
| Command | What it does |
|---|---|
just build |
Build primary binary (hello_clickshack) |
just build-all |
Build all example binaries |
just build-ir |
Build IR serializer and ir_dump binary |
just run-ported-smokes |
Run C++ parser smoke tests |
just sync |
Install / sync Python dependencies |
just test-python |
Run Python tests |
just clean |
Remove Bazel build outputs |
just clean-expunge |
Fully reset Bazel output state |
Parse SQL and emit the JSON IR:
echo "SELECT id FROM users WHERE active = 1" | \
bazel-bin/examples/bootstrap/ir_dump(Build with just build-ir first.)
The clickshack Python package and original C++ code are licensed under the
MIT License.
The C++ parser in ported_clickhouse/ is derived from
ClickHouse and is licensed under
the Apache License 2.0. See NOTICE for
attribution details and a description of modifications.