PyEXASOL public interface is similar to PEP-249 DB-API 2.0 specification, but it does not strictly follow it. This page explains the reasons behind this decision.
If you absolutely need DB-API 2.0 compatibility, you may use TurbODBC instead.
PEP-249 was originally created for general purpose OLTP row store databases running on single server: SQLite, MySQL, PostgreSQL, MSSQL, Oracle, etc.
It does not work very well for OLAP columnar databases (like Exasol) running on multiple servers because it was never designed for this purpose. Despite both OLTP DBMS and OLAP DBMS use SQL for communication, the foundation and usage patterns are completely different.
When people use DB-API 2.0 drivers, they tend to skip manuals and automatically apply OLTP usage patterns without even realizing how much they lose in terms of performance and efficiency.
The good example is TurbODBC. Very few know that it is possible to fetch data as NumPy arrays and as Apache Arrow.
Minor intentional incompatibilities with DB-API 2.0 force users to look through manual and to learn about better ways of getting the job done.
- Default
autocommit=off
prevents indexes from being stored permanently on disk forSELECT
statements; - Default
autocommit=off
may hold transaction for a long time (e.g. opened connection in IPython notebook); - Python object creation and destruction overhead is very significant when you process large amounts of data;
- Functions
fetchmany()
andexecutemany()
has significant additional overhead related to JSON serialisation; - Exasol WebSocket protocol provides more information about columns than normally available in
.description
property ofcursor
;
We also wanted to discourage:
- "Drop-in" replacements of other Exasol drivers without reading manual;
- Usage of OLTP-oriented ORM (e.g. SQLAlchemy, Django);
Unlike common OLTP databases, each OLAP database is very unique. It is important to understand implementation details and features of specific database and to build application around those features. Generalisation of any kind and "copy-paste" approach may lead to abysmal performance in trivial cases.
Find cursor()
calls:
cur = C.cursor()
cur.execute('SELECT * FROM table')
data = cur.fetchall()
Replace with:
st = C.execute('SELECT * FROM table`)
data = st.fetchall()
Find .description
columns = list(map(str.lower, next(zip(*cur.description))))
Replace with:
columns = st.column_names()
Find all reads into pandas:
cur.execute('SELECT * FROM table')
pandas.DataFrame(cur.fetchall(), columns=columns)
Replace with:
C.export_to_pandas('SELECT * FROM table')
etc.
In order to make it easier to start using PyEXASOL, simple DB-API 2.0 wrapper is provided. It works for SELECT
statements without placeholders. Please see the example:
# Import "wrapped" version of PyEXASOL module
import pyexasol.db2 as E
C = E.connect(dsn=config.dsn, user=config.user, password=config.password, schema=config.schema)
# Cursor
cur = C.cursor()
cur.execute("SELECT * FROM users ORDER BY user_id LIMIT 5")
# Standard .description and .rowcount attributes
print(cur.description)
print(cur.rowcount)
# Standard fetching
while True:
row = cur.fetchone()
if row is None:
break
print(row)