dft provides two interfaces to the DataFusion query execution engine:
- Text User Interface (TUI): An IDE for DataFusion developers and users that provides a local database experience with utilities to analyze / benchmark queries.
- Command Line Interface (CLI): Scriptable engine for executing queries from files.
dft is inspired by datafusion-cli, but has some differences:
dftTUI focuses on more complete and interactive experience for users.dftcontains many built in integrations such as Delta Lake, Iceberg, and MySQL (Coming Soon) that are not available indatafusion-cli.
The objective of dft is to provide users with the experience of having their own local database that allows them to query and join data from disparate data sources all from the terminal.
| SQL & FlightSQL Editor and Results | Query History and Stats |
|---|---|
|
|
| Filterable Logs | DataFusion Session Context Details |
|
|
Some of the current and planned features are:
- Tab management to provide clean and structured organization of DataFusion queries, results, and context
- SQL editor
- Write query results to file (TODO)
- Multiple SQL Editor tabs (TODO)
- Query history
- History and statistics of executed queries
- ExecutionContext information
- Information from ExecutionContext / Catalog / ObjectStore / State / Config
- Logs
- Logs from
dftandDataFusion
- Logs from
- SQL editor
- Custom
ObjectStoreSupport- S3, Azure(TODO), GCP(TODO)
ObjectStoreexplorer. I.e. able to list files inObjectStore
TableProviderFactorydata sources- Deltalake
- Iceberg (TODO)
- Hudi (TODO)
- Preloading DDL from
~/.config/dft/ddl.sql(or a user defined path) for local database available on startup - "Catalog File" support - see #122
- Save table definitions and data
- Save parquet metadata from remote object stores
Currently dft does not display wide result sets well (because the widget library that we use does not support horizontal scrolling - we are working with them to see what we can do about this). As a result, when working with wide data sets its best to be selective with the columns that you would like to view in the ouput.
The dft CLI is a scriptable interface to the tui engine for executing
queries from files or the command line. The CLI is used in a similar manner to
datafusion-cli but with the added benefit of supporting multiple pre-integrated
data sources.
$ dft -f query.sql$ dft -c "SELECT 1+2"Both of the commands above support the --flightsql parameter to run the SQL with your configured FlightSQL client.
The CLI can also run your configured DDL prior to executing the query by adding the --ddl parameter.
Currently, the only supported packaging is on crates.io. If you already have Rust installed it can be installed by running cargo install datafusion-tui. If rust is not installed you can download following the directions here.
Once installed you can run dft to start the application.
dft has several optional (conditionally compiled features) integrations which are controlled by Rust Crate Features
To build with all features, you can run
cargo install --path . --all-featuresMutliple s3 ObjectStores can be registered, following the below model in your configuration file.
[[execution.object_store.s3]]
bucket_name = "my_bucket"
object_store_url = "s3://my_bucket"
aws_endpoint = "https://s3.amazonaws"
aws_access_key_id = "MY_ACCESS_KEY"
aws_secret_access_key = "MY SECRET"
[[execution.object_store.s3]]
bucket_name = "my_bucket"
object_store_url = "ny1://my_bucket"
aws_endpoint = "https://s3.amazonaws"
aws_access_key_id = "MY_ACCESS_KEY"
aws_secret_access_key = "MY SECRET"Then you can run DDL such as
CREATE EXTERNAL TABLE my_table STORED AS PARQUET LOCATION 's3://my_bucket/table';
CREATE EXTERNAL TABLE other_table STORED AS PARQUET LOCATION 'ny1://other_bucket/table';A separate editor for connecting to a FlightSQL server is provided.
The default connection_url is http://localhost:50051 but this can be configured your config as well:
[flight_sql]
connection_url = "http://myhost:myport"Register deltalake tables. For example:
CREATE EXTERNAL TABLE table_name STORED AS DELTATABLE LOCATION 's3://bucket/table'Adds functions from datafusion-function-json for querying JSON strings in DataFusion in dft. For example:
select * from foo where json_get(attributes, 'bar')::string='ham'
(show examples of using operators too)To have the best experience with dft it is highly recommended to define all of your DDL in ~/.datafusion/.datafusionrc so that any tables you wish to query are available at startup. Additionally, now that DataFusion supports CREATE VIEW via sql you can also make a VIEW based on these tables.
For example, your ~/.datafusion/.datafusionrc file could look like the following:
CREATE EXTERNAL TABLE users STORED AS NDJSON LOCATION 's3://bucket/users';
CREATE EXTERNAL TABLE transactions STORED AS PARQUET LOCATION 's3://bucket/transactions';
CREATE EXTERNAL TABLE listings STORED AS PARQUET LOCATION 'file://folder/listings';
CREATE VIEW OR REPLACE users_listings AS SELECT * FROM users LEFT JOIN listings USING (user_id);
This would make the tables users, transactions, listings, and the view users_listings available at startup. Any of these DDL statements could also be run interactively from the SQL editor as well to create the tables.
The interface is split into several tabs and modes so that relevant information can be viewed and controlled in a clean and organized manner. When not writing a SQL query keys can be entered to navigate and control the interface.
Editor for executing SQL with local DataFusion SessionContext.
- Normal mode
- Not editable
q=> quit datafusion-tuie=> start editing SQL Editor in Edit modec=> clear contents of SQL EditorEnter=> execute query- Enter the tab number in brackets after a tabs name to navigate to that tab
- If query results are longer or wider than screen, you can use arrow keys to scroll
- Editable
- Character keys to write queries
- Backspace / tab / enter work same as normal
escto exit Edit mode and go back to Normal mode
- Not editable
- DDL mode
- Not editable
l=> load~/.datafusion/.datafusionrcinto editor (TODO)r=> rerun~/.datafusion/.datafusionrc(TODO)w=> write editor contents to~/.datafusion/.datafusionrc(TODO)
- Editable
- Character keys to write queries
- Backspace / tab / enter work same as normal
escto exit Edit mode and go back to Normal mode
- Not editable
Same interface as SQL tab but sends SQL queries to FlightSQL server.
- Normal mode
q=> quit datafusion-tuie=> start editing SQL Editor in Edit modec=> clear contents of SQL EditorEnter=> execute query- Enter the tab number in brackets after a tabs name to navigate to that tab
- If query results are longer or wider than screen, you can use arrow keys to scroll
- Edit mode
- Character keys to write queries
- Backspace / tab / enter work same as normal
escto exit Edit mode and go back to Normal mode
TODO
- Logging mode (coming from tui_logger)
h=> Toggles target selector widget hidden/visiblef=> Toggle focus on the selected target onlyUP=> Select previous target in target selector widgetDOWN=> Select next target in target selector widgetLEFT=> Reduce SHOWN (!) log messages by one levelRIGHT=> Increase SHOWN (!) log messages by one level-=> Reduce CAPTURED (!) log messages by one level+=> Increase CAPTURED (!) log messages by one levelPAGEUP=> Enter Page Mode and scroll approx. half page up in log history.PAGEDOWN=> Only in page mode: scroll 10 events down in log history.ESCAPE=> Exit page mode and go back to scrolling modeSPACE=> Toggles hiding of targets, which have logfilter set to off
The dft configuration is stored in ~/.config/dft/config.toml. All configuration options are listed below.
The execution config is where you can define query execution properties. You can configure the ObjectStores that you want to use in your queries and path of a DDL file that you want to run on startup. For example, if you have an S3 bucket you want to query you could define it like so:
[[execution.object_store.s3]]
bucket_name = "my_bucket"
object_store_url = "s3://my_bucket"
aws_endpoint = "https://s3.amazonaws"
aws_access_key_id = "MY_ACCESS_KEY"
aws_secret = "MY SECRET"
aws_session_token = "MY_SESSION"
aws_allow_http = falseAnd define a custom DDL path like so (the default is ~/.config/dft/ddl.sql).
[execution]
ddl_path = "/path/to/my/ddl.sql"Multiple ObjectStores can be defined in the config file. In the future datafusion SessionContext and SessionState options can be configured here.
The display config is where you can define the frame rate of the TUI.
[display]
frame_rate = 60The interaction config is where mouse and paste behavior can be defined. This is not currently implemented.
[interaction]
mouse = true
paste = trueThe FlightSQL config is where you can define the connection URL for the FlightSQL server.
[flight_sql]
connection_url = "http://localhost:50051"The editor config is where you can set your preferred editor settings.
Currently only syntax highlighting is supported. It is experimental because currently the regex that is used to determine keywords only works in simple cases.
[editor]
experimental_syntax_highlighting = true


