Skip to content

HTTPS clone URL

Subversion checkout URL

You can clone with
or
.
Download ZIP
Real-time Query for Hadoop
C++ Java Python Thrift C Shell Other

Python: Bootstrap a virtualenv and add impala-python command

This adds a bootstrap script and a "impala-python" command to
$IMPALA_HOME/bin that automatically runs the bootstrap and redirects to
the virtualenv python. Existing python scripts will later be updated to
use the this new "impala-python" command.

The bootstrap script will build a virtualenv to ensure a minimum python
version (2.6) and a well known set of dependencies. The bootstrap script
can be run with python 2.4 but 2.6 must already be installed on the
system. The resulting virtualenv will use 2.6 at a minimum.

Only dependencies explicitly listed in requirements.txt will be
installed and available (no system packages will ever be used). No
packages will ever be downloaded when setting up the virtualenv. In the
future new dependencies can be added by editing the requirements.txt
file. Installation through requirements.txt is a standard pip feature.
When requirements.txt is updated, the next run of "impala-python"  will
rebuild the virtualenv.

Change-Id: I150595d7e09a45d5f2e3c30a845bc8d6a761eeed
Reviewed-on: http://gerrit.cloudera.org:8080/424
Reviewed-by: Casey Ching <casey@cloudera.com>
Tested-by: Internal Jenkins
latest commit 6a3af6747e
@caseyching caseyching authored Internal Jenkins committed
Failed to load latest commit information.
be [RM] Init Llama client cache metrics
bin Python: Bootstrap a virtualenv and add impala-python command
cmake_modules Making CMake modules more modular for non-toolchain build
common [RM] Init Llama client cache metrics
ext-data-source Upgrade a few important mvn plugins.
fe IMPALA-1136: Support loading Avro tables without an explicit Avro schema
infra/python Python: Bootstrap a virtualenv and add impala-python command
llvm-ir Move IR cross compile output to a better folder for packaging.
shell IMPALA-2143: Avoid sending auth credentials over insecure connections
ssh_keys Move ssh keys from bin directory to fix packaging build break
testdata IMPALA-1136: Support loading Avro tables without an explicit Avro schema
tests IMPALA-2143: Avoid sending auth credentials over insecure connections
thirdparty Add sentry-1.5.1-cdh5.5.0 to thirdparty.
www Add HdrHistogram and HistogramMetric
.gitignore Add MetricDefs, static definitions of metric metadata generated from …
CMakeLists.txt Optional Impala Toolchain
LICENSE.txt Add text of Apache license
NOTICE.txt Add NOTICE.txt file to Impala repo
README.md Fix link syntax for README.md
buildall.sh Clean stale python object files and cached directories in buildall.

README.md

Welcome to Impala

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.

Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

  • Best of breed performance and scalability.
  • Support for data stored in HDFS, Apache HBase and Amazon S3.
  • Wide analytic SQL support, including window functions and subqueries.
  • On-the-fly code generation using LLVM to generate CPU-efficient code tailored specifically to each individual query.
  • Support for the most commonly-used Hadoop file formats, including the Apache Parquet (incubating) project.
  • Apache-licensed, 100% open source.

More about Impala

To learn more about Impala as a business user, or to try Impala live or in a VM, please visit the Impala homepage.

If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.

Something went wrong with that request. Please try again.