sqlcli
is a CLI tool written to parse SQL statements and answer questions about them.
Currently, it only really does two things:
- Given a
SELECT
statement, return the tables we read from - Given a
SELECT
statement, return the columns that will be present in the output resultset — i.e., return the SQL Projection.
sqlcli
was built as a glue layer in some parts of ClearTax's data platform. It relies heavily on the sqlparser crate — you can consider this to be a thin layer over the actual hard work that sqlparser
does.
It's written in rust. Probably not very idiomatic rust, since it was written in a hurry. Feedback welcome!
Display help:
$ sqlcli
==>
sqlcli 0.1.0
Parse sql queries and answer questions about them.
USAGE:
sqlcli [FLAGS] <SUBCOMMAND>
FLAGS:
-d, --debug Turn on debug logging
-h, --help Prints help information
-V, --version Prints version information
SUBCOMMANDS:
ast Print the AST as generated by sqlparser-rs.
columns Print output column names of the SQL query.
help Prints this message or the help of the given subcommand(s)
tables Print tables queried from in SQL.
Parse table names from a SQL select statement:
# Use filename argument
$ sqlcli tables some_file.sql
# Or as a filter
$ cat some_file.sql | sqlcli tables
Examples (output after ==>
):
$ sqlcli tables << EOF
select * from video_views where user_id in (
select id from users where plan = 'Ultra'
)
EOF
==>
users
video_views
sqlcli
should be able to handle arbitrarily complex SQL with joins, CTEs, group by & partitions, etc.
$ sqlcli tables << EOF
with last_month_active_users as (
select user_id, last_active_time from users
where last_active_time > current_timestamp - interval '1 month'
),
renewal_due as (
select active.id, billing.is_due
from last_month_active_users active join billing_dimension billing on active.user_id = billing.user_id
)
select
(select count(*) from last_month_active_users) total_active_users,
(select count(*) from renewal_due where is_due = 1) active_and_renewal_due
EOF
==>
billing_dimension
users
Note: sqlcli tables
will always return the unique source tables any given query reads from.
In the above example, the final SELECT
projection reads from an inner common table expression. These temporary tables are excluded from the output.
The columns
subcommand returns the fields that should be present in the resultset.
# Use filename argument
$ sqlcli columns some_file.sql
# Or as a filter
$ cat some_file.sql | sqlcli columns
Examples:
$ sqlcli columns << EOF
SELECT user_id, first_name, last_name, concat(first_name, ' ', last_name) as full_name from users
EOF
==>
NAMED: user_id
NAMED: first_name
NAMED: last_name
NAMED: full_name
It will highlight if the query contains any wildcards or columns whose name cannot be determined (eg: selecting an expression like COUNT(*)
without an alias).
This can be used to lint queries running in your data systems.
$ sqlcli columns << EOF
SELECT date(created_at) dt, count(*) from events
EOF
=>
NAMED: dt
UNNAMED
$ sqlcli columns << EOF
SELECT * from table_1
EOF
=>
WILDCARD: *
$ sqlcli columns << EOF
SELECT a.id, b.*
from table_1 a left outer join table_2 b
on a.id = b.external_id
EOF
=>
NAMED: id
WILDCARD: b.*
Install rust, and run:
# test
$ cargo test
# build binary
$ cargo build --release
sqlcli
:
- only works on
SELECT
statements - only accepts one statement at a time (either via an input file or on
STDIN
) - will exit with return code
65
in case of malformed input - does not actually run the query, so you can only consider this to be a lint tool
All heavy-lifting is done by the excellent sqlparser library.
sqlcli
was built as a side project for internal use at ClearTax. It's been open-sourced, but our primary use cases will remain internal. Pull requests / bug reports are welcome, but we may not be able to respond to everyone.