sqlcli is a CLI tool written to parse SQL statements and answer questions about them.
Currently, it only really does two things:
- Given a
SELECTstatement, return the tables we read from
- Given a
SELECTstatement, return the columns that will be present in the output resultset — i.e., return the SQL Projection.
sqlcli was built as a glue layer in some parts of ClearTax's data platform. It relies heavily on the sqlparser crate — you can consider this to be a thin layer over the actual hard work that
It's written in rust. Probably not very idiomatic rust, since it was written in a hurry. Feedback welcome!
$ sqlcli ==> sqlcli 0.1.0 Parse sql queries and answer questions about them. USAGE: sqlcli [FLAGS] <SUBCOMMAND> FLAGS: -d, --debug Turn on debug logging -h, --help Prints help information -V, --version Prints version information SUBCOMMANDS: ast Print the AST as generated by sqlparser-rs. columns Print output column names of the SQL query. help Prints this message or the help of the given subcommand(s) tables Print tables queried from in SQL.
Parse table names from a SQL select statement:
# Use filename argument $ sqlcli tables some_file.sql # Or as a filter $ cat some_file.sql | sqlcli tables
Examples (output after
$ sqlcli tables << EOF select * from video_views where user_id in ( select id from users where plan = 'Ultra' ) EOF ==> users video_views
sqlcli should be able to handle arbitrarily complex SQL with joins, CTEs, group by & partitions, etc.
$ sqlcli tables << EOF with last_month_active_users as ( select user_id, last_active_time from users where last_active_time > current_timestamp - interval '1 month' ), renewal_due as ( select active.id, billing.is_due from last_month_active_users active join billing_dimension billing on active.user_id = billing.user_id ) select (select count(*) from last_month_active_users) total_active_users, (select count(*) from renewal_due where is_due = 1) active_and_renewal_due EOF ==> billing_dimension users
sqlcli tables will always return the unique source tables any given query reads from.
In the above example, the final
SELECT projection reads from an inner common table expression. These temporary tables are excluded from the output.
columns subcommand returns the fields that should be present in the resultset.
# Use filename argument $ sqlcli columns some_file.sql # Or as a filter $ cat some_file.sql | sqlcli columns
$ sqlcli columns << EOF SELECT user_id, first_name, last_name, concat(first_name, ' ', last_name) as full_name from users EOF ==> NAMED: user_id NAMED: first_name NAMED: last_name NAMED: full_name
It will highlight if the query contains any wildcards or columns whose name cannot be determined (eg: selecting an expression like
COUNT(*) without an alias).
This can be used to lint queries running in your data systems.
$ sqlcli columns << EOF SELECT date(created_at) dt, count(*) from events EOF => NAMED: dt UNNAMED
$ sqlcli columns << EOF SELECT * from table_1 EOF => WILDCARD: *
$ sqlcli columns << EOF SELECT a.id, b.* from table_1 a left outer join table_2 b on a.id = b.external_id EOF => NAMED: id WILDCARD: b.*
Build & test
Install rust, and run:
# test $ cargo test # build binary $ cargo build --release
The Fine Print
- only works on
- only accepts one statement at a time (either via an input file or on
- will exit with return code
65in case of malformed input
- does not actually run the query, so you can only consider this to be a lint tool
All heavy-lifting is done by the excellent sqlparser library.
sqlcli was built as a side project for internal use at ClearTax. It's been open-sourced, but our primary use cases will remain internal. Pull requests / bug reports are welcome, but we may not be able to respond to everyone.