PyHive is a collection of Python DB-API and SQLAlchemy interfaces for Presto and Hive.
from pyhive import presto
cursor = presto.connect('localhost').cursor()
cursor.execute('SELECT * FROM my_awesome_data LIMIT 10')
print cursor.fetchone()
print cursor.fetchall()
First install this package to register it with SQLAlchemy (see setup.py
).
from sqlalchemy import *
from sqlalchemy.engine import create_engine
from sqlalchemy.schema import *
engine = create_engine('presto://localhost:8080/hive/default')
logs = Table('my_awesome_data', MetaData(bind=engine), autoload=True)
print select([func.count('*')], from_obj=logs).scalar()
Note: query generation functionality is not exhaustive or fully tested, but there should be no problem with raw SQL.
(Does not apply to Presto)
# DB-API
hive.connect('localhost', configuration={'hive.exec.reducers.max': '123'})
# SQLAlchemy
create_engine(
'hive://user@host:10000/database',
connect_args={'configuration': {'hive.exec.reducers.max': '123'}},
)
Install using
pip install pyhive[hive]
for the Hive interface andpip install pyhive[presto]
for the Presto interface.
PyHive works with
- Python 2.7
- For Presto: Presto install
- For Hive: HiveServer2 daemon
There's also a third party Conda package.
Run the following in an environment with Hive/Presto:
./scripts/make_test_tables.sh virtualenv --no-site-packages env source env/bin/activate pip install -e . pip install -r dev_requirements.txt py.test
WARNING: This drops/creates tables named one_row
, one_row_complex
, and many_rows
, plus a
database called pyhive_test_database
.