Python Driver for Apache Drill.
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
docs
pydrill
tests
.bumpversion.cfg Bump version: 0.3.3 → 0.3.4 Apr 24, 2018
.editorconfig
.gitignore
.travis.yml
AUTHORS.rst
CONTRIBUTING.rst
HISTORY.rst
LICENSE
MANIFEST.in
Makefile
README.rst
docker_drill_embedded.sh
requirements_base.txt
requirements_dev.txt
requirements_testing.txt
run_docker.sh
setup.cfg
setup.py
tox.ini
travis_pypi_setup.py

README.rst

pydrill

Documentation Status https://coveralls.io/repos/PythonicNinja/pydrill/badge.svg?branch=master&service=github

Python Driver for Apache Drill.

Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage

Features

  • Python 2/3 compatibility,
  • Support for all rest API calls inluding profiles/options/metrics docs with full list.
  • Mapping Results to internal python types,
  • Compatibility with Pandas data frame,
  • Drill Authentication using PAM,

Installation

Version from https://pypi.python.org/pypi/pydrill:

$ pip install pydrill

Latest version from git:

$ pip install git+git://github.com/PythonicNinja/pydrill.git

Sample usage

from pydrill.client import PyDrill

drill = PyDrill(host='localhost', port=8047)

if not drill.is_active():
    raise ImproperlyConfigured('Please run Drill first')

yelp_reviews = drill.query('''
  SELECT * FROM
  `dfs.root`.`./Users/macbookair/Downloads/yelp_dataset_challenge_academic_dataset/yelp_academic_dataset_review.json`
  LIMIT 5
''')

for result in yelp_reviews:
    print("%s: %s" %(result['type'], result['date']))


# pandas dataframe

df = yelp_reviews.to_dataframe()
print(df[df['stars'] > 3])