Skip to content

openquery is an automated descriptive analytics tool for SQL databases.

Notifications You must be signed in to change notification settings

dsmyda/openquery

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

openquery

openquery is an automated descriptive analytics tool for SQL databases. It works by translating natural language queries into SQL, using your database schema to generate viable solutions. You can opt-in to automatically run those SQL queries for a nice end-to-end experience.

openquery is designed with information security in mind, supporting encryption at rest, automated PII detection, guards against malicious database statements, and more.

Still under development, but here's a rough preview!

Screenshot 2023-04-09 at 11 59 45 PM

Installation

TODO - add installation for openquery cli binary

brew install openssl

export LDFLAGS="-L/opt/homebrew/opt/openssl@3/lib" export CPPFLAGS="-I/opt/homebrew/opt/openssl@3/include"

Features

  • OpenAI support
  • Automatic training set creation for fine-tuning models
  • Query parser supporting 20+ dialects, including Postgres, Presto, BigQuery and Snowflake
  • Automated schema introspection and query execution in 5 dialects
  • Encryption at rest
  • PII detection to prevent accidental data leaks (planned)
  • Offline support using a local .sql file (planned)

How to use

TODO

How it works

Concepts

  • Database
  • Synth
    • Structures like tables, views, indexes and foreign keys are synth'd
    • Structures like CHECK constraints, table comments, and triggers are not synth'd.
  • Language Model

Best Practices

Least Privilege

openquery should only be given the least amount of privilege required to answer your questions. Our recommendation

  1. Create a seperate database user, with defensive RBAC
  2. Use a read-only connection

while openquery ships with many safety checks, you should not rely solely on openquery to catch all edge cases.

Smallest Synth

It's recommended that you synth the smallest subset of tables needed to produce complete queries. This reduces the context length, saving you money and ensuring broad model support.

Don't use PII

In most cases, you can rephrase your query to eliminate PII. Take the following example

BAD

How many total invoices do we have for john.doe@gmail.com?

GOOD

How many total invoices do we have for user with id ea916801-2987-4f29-aab5-f2b1061dc8f4?

openquery has built-in pii detection to prevent these kind of mistakes.

About

openquery is an automated descriptive analytics tool for SQL databases.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages