Python DB API 2.0 client for Impala and Hive (HiveServer2 protocol)
Python Thrift Shell
Latest commit 94a8eff Feb 13, 2017 @tdhopper tdhopper committed with wesm Pin Python thrift at <= 0.9.3 (#246)
thrift 0.10.0 package has a breaking change. #235


Python client for HiveServer2 implementations (e.g., Impala, Hive) for distributed query engines.

For higher-level Impala functionality, including a Pandas-like interface over distributed data sets, see the Ibis project.


  • HiveServer2 compliant; works with Impala and Hive, including nested data

  • Fully DB API 2.0 (PEP 249)-compliant Python client (similar to sqlite or MySQL clients) supporting Python 2.6+ and Python 3.3+.

  • Works with Kerberos, LDAP, SSL

  • SQLAlchemy connector

  • Converter to pandas DataFrame, allowing easy integration into the Python data stack (including scikit-learn and matplotlib); but see the Ibis project for a richer experience



  • Python 2.6+ or 3.3+

  • six, bit_array

  • thrift (on Python 2.x) or thriftpy (on Python 3.x)

For Hive and/or Kerberos support:

pip install thrift_sasl
pip install sasl


  • pandas for conversion to DataFrame objects; but see the Ibis project instead

  • sqlalchemy for the SQLAlchemy engine

  • pytest for running tests; unittest2 for testing on Python 2.6


Install the latest release (0.13.1) with pip:

pip install impyla

For the latest (dev) version, install directly from the repo:

pip install git+

or clone the repo:

git clone
cd impyla
python install

Running the tests

impyla uses the pytest toolchain, and depends on the following environment variables:

export IMPYLA_TEST_PORT=21050

To run the maximal set of tests, run

cd path/to/impyla
py.test --connect impyla

Leave out the --connect option to skip tests for DB API compliance.


Impyla implements the Python DB API v2.0 (PEP 249) database interface (refer to it for API details):

from impala.dbapi import connect
conn = connect(host='', port=21050)
cursor = conn.cursor()
cursor.execute('SELECT * FROM mytable LIMIT 100')
print cursor.description  # prints the result set's schema
results = cursor.fetchall()

The Cursor object also exposes the iterator interface, which is buffered (controlled by cursor.arraysize):

cursor.execute('SELECT * FROM mytable LIMIT 100')
for row in cursor:

You can also get back a pandas DataFrame object

from impala.util import as_pandas
df = as_pandas(cur)
# carry df through scikit-learn, for example