Pythonic DSL for Apache Drill.
Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
- Free software: MIT license
- Documentation: https://pydrill_dsl.readthedocs.org.
- Uses Peewee syntax. examples for selecting data are in peewee docs.
- Support for all storage plugins
- Support for drivers PyODBC and pyDrill
Version from https://pypi.python.org/pypi/pydrill_dsl:
$ pip install pydrill_dsl
Latest version from git:
$ pip install git+git://github.com/PythonicNinja/pydrill_dsl.git
from pydrill_dsl.resource import Resource class Employee(Resource): first_name = Field() salary = Field() position_id = Field() department_id = Field() class Meta: storage_plugin = 'cp' path = 'employee.json' # by default it uses pydrill # example of using pydobc # database = Drill({'dsn': 'Driver=/opt/mapr/drillodbc/lib/universal/libmaprdrillodbc.dylib;ConnectionType=Direct;Host=127.0.0.1;Port=31010;Catalog=DRILL;AuthenticationType=Basic Authentication;AdvancedProperties=CastAnyToVarchar=true;HandshakeTimeout=5;QueryTimeout=180;TimestampTZDisplayTimezone=utc;ExcludedSchemas=sys,INFORMATION_SCHEMA;NumberOfPrefetchBuffers=5;UID=[USERNAME];PWD=[PASSWORD]'}) Employee.select().filter(salary__gte=17000) Employee.select().paginate(page=1, paginate_by=5) salary_gte_17K = (Employee.salary >= 17000) salary_lte_25K = (Employee.salary <= 25000) Employee.select().where(salary_gte_17K & salary_lte_25K) Employee.select( fn.Min(Employee.salary).alias('salary_min'), fn.Max(Employee.salary).alias('salary_max') ).scalar(as_tuple=True) # creation of resource can be done without creation of class: employee = Resource(storage_plugin='cp', path='employee.json', fields=('first_name', 'salary', 'position_id', 'department_id'))