Real-time Query for Hadoop
C++ Java Python C Thrift Shell Other
Latest commit a94c616 Feb 29, 2016 @abehm abehm committed with Internal Jenkins IMPALA-3084: Cache the sequence of table ref and materialized tuple i…
…ds during analysis.

The bug: For correct predicate assignment we rely on TableRef.getAllTupleIds()
and TableRef.getMaterializedTupleIds(). The implementation of those functions
used to traverse the chain of table refs and collect the appropriate ids.
However, during plan generation we alter the chain of table refs, in particular,
for dealing with nested collections, so those altered TableRefs do not return the
expected list of ids, leading to wrong decisions in predicate assignment.

The fix: Cache the lists of ids during analysis, so we are free to alter the
chain of TableRefs during plan generation.

Change-Id: I298b8695c9f26644a395ca9f0e86040e3f5f3846
Reviewed-on: http://gerrit.cloudera.org:8080/2415
Reviewed-by: Alex Behm <alex.behm@cloudera.com>
Tested-by: Internal Jenkins
Failed to load latest commit information.
be IMPALA-3854: Fix use-after-free in HdfsTextScanner::Close() Jul 21, 2016
bin Adjust ASF push script names to match our repo setup. Jul 21, 2016
cmake_modules IMPALA-3223: Removal of non-toolchain builds. Jun 7, 2016
common IMPALA-3575: Add retry to backend connection request and rpc timeout Jul 18, 2016
ext-data-source IMPALA-3384: add missing frontend -> ext-data-source dependency. May 2, 2016
fe IMPALA-3084: Cache the sequence of table ref and materialized tuple i… Jul 23, 2016
infra IMPALA-3886: Improve log of pip_download.py Jul 21, 2016
llvm-ir Misc. codegen utilties Feb 10, 2016
shell IMPALA-1671: Print time and link to coordinator web UI once query is … Jul 13, 2016
ssh_keys Move ssh keys from bin directory to fix packaging build break Jan 8, 2014
testdata IMPALA-3084: Cache the sequence of table ref and materialized tuple i… Jul 23, 2016
tests IMPALA-3864: qgen: reduce likelihood of create_query() exceptions Jul 22, 2016
thirdparty Update thirdparty dependencies Jun 17, 2016
www IMPALA-3716: Add Memory Tab in query's Details page Jul 21, 2016
.gitignore Add .impala_compiler_opts to .gitignore May 9, 2016
CMakeLists.txt IMPALA-3223: Supports download of CDH components from S3. Jun 21, 2016
LICENSE.txt Add text of Apache license May 8, 2014
LOGS.md Consolidate test and cluster logs under a single directory. Mar 28, 2016
NOTICE.txt Add NOTICE.txt file to Impala repo Jul 2, 2014
README.md Fix link syntax for README.md Mar 23, 2015
buildall.sh IMPALA-3762: Download Python requirements before they are needed. Jun 22, 2016

README.md

Welcome to Impala

Lightning-fast, distributed SQL queries for petabytes of data stored in Apache Hadoop clusters.

Impala is a modern, massively-distributed, massively-parallel, C++ query engine that lets you analyze, transform and combine data from a variety of data sources:

  • Best of breed performance and scalability.
  • Support for data stored in HDFS, Apache HBase and Amazon S3.
  • Wide analytic SQL support, including window functions and subqueries.
  • On-the-fly code generation using LLVM to generate CPU-efficient code tailored specifically to each individual query.
  • Support for the most commonly-used Hadoop file formats, including the Apache Parquet (incubating) project.
  • Apache-licensed, 100% open source.

More about Impala

To learn more about Impala as a business user, or to try Impala live or in a VM, please visit the Impala homepage.

If you are interested in contributing to Impala as a developer, or learning more about Impala's internals and architecture, visit the Impala wiki.