Skip to content

SQL Support

Alessandro Luccaroni edited this page Jun 26, 2019 · 9 revisions

HerdDB is born due to the need to port existing JDBC application to a distributed, shared nothing, embeddable database, and so even if it is a key-value database at the low level the most commonly used API is the JDBC/SQL interface.

This is a brief description of SQL/JDBC supported functions If you need to support any of the SQL costructs just ask to the dev team on the mailing-list, file a GitHub issue or a JIRA issue

  • CREATE/DROP TABLE
  • BEGIN/COMMIT/ROLLBACK TRANSACTION
  • ALTER TABLE ADD/DROP/MODIFY COLUMN
  • INSERT INTO TABLE (....) VALUES(....)
  • SELECT .... FROM TABLE WHERE PK=? (lookup using Primary Key is the best data access pattern!!)
  • UPDATE TABLE SET ... WHERE PK=? (lookup using Primary Key is the best data access pattern!!)
  • DELETE FROM TABLE ... WHERE PK=? (lookup using Primary Key is the best data access pattern!!)
  • UPDATE TABLE SET ... WHERE ......
  • SELECT .... FROM TABLE WHERE ......
  • DELETE FROM TABLE WHERE ....
  • EXPLAIN TABLE
  • SHOW CREATE TABLE
  • TRUNCATE TABLE
  • JDBC PreparedStatements (which query plan cache, leveraging JDBC Parameters)
  • SUBQUERIES which deal on other tables on the same tablespace
  • SUM/COUNT/MIN/MAX aggregated functions
  • UPPER/LOWER scalar functions
  • AUTO_INCREMENT for integer/long columns (only on single columns PKs)
  • CURRENT_TIMESTAMP MySQL virtual column

Supported types:

  • strings (backed by java UTF encoding)
  • byte (8bit signed integers)
  • int (32bit signed integers)
  • long (64bit signed integers)
  • float (32bit single precision floating point)
  • double (64bit double precision floating point)
  • java.sql.Timestamp (datetimes with millisecond precision)
  • blobs (byte arrays)

The supported list of datatype names is just the mapping from the various names to the list of the previous list raw datattypes, for instance VARCHAR2 maps to 'string'

Other notes:

  • identifiers are always non case-insensitive
  • JDBC ResultSet are never updatable and forwards-only
  • the same Connection can run multiple statements concurrently and keeps multiple ResultSets/cursors open
  • Transaction isolation is ALWAYS "COMMITTED_READ"
  • PrimaryKey access is the best data access path
  • When a DML statement spans multiple columns the selection is performed in transaction and then the mutation on each row (READ_COMMITTED semantics, not REPEATABLE_READ or SERIALIZABLE)
  • Every column is NULLABLE, except from the columns in the PK
  • NOT NULL constraint for integer, long and strings types
  • There is no constraint on string length (syntax is supported only for supporting existing scripts)
  • Joins are very limited and are alwayes performed using a simple nested-loop algorithm

We will be very pleased to add any requested feature, just file an issue on the GitHub bugtracker