Python Driver for Apache Drill.
Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
- Free software: MIT license
- Documentation: https://pydrill.readthedocs.org.
- Python 2/3 compatibility,
- Support for all rest API calls inluding profiles/options/metrics docs with full list.
- Mapping Results to internal python types,
- Compatibility with Pandas data frame,
- Drill Authentication using PAM,
Version from https://pypi.python.org/pypi/pydrill:
$ pip install pydrill
Latest version from git:
$ pip install git+git://github.com/PythonicNinja/pydrill.git
from pydrill.client import PyDrill drill = PyDrill(host='localhost', port=8047) if not drill.is_active(): raise ImproperlyConfigured('Please run Drill first') yelp_reviews = drill.query(''' SELECT * FROM `dfs.root`.`./Users/macbookair/Downloads/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_review.json` LIMIT 5 ''') for result in yelp_reviews: print("%s: %s" %(result['type'], result['date'])) # pandas dataframe df = yelp_reviews.to_dataframe() print(df[df['stars'] > 3])