Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-6381: Add support for index based planning and execution #1512

Merged
merged 8 commits into from
Oct 25, 2018

Conversation

amansinha100
Copy link

This PR is a replacement for the original PR #1466. Please see that PR for all review comments and resolutions. I have created this new PR after addressing review comments, doing a full rebase and resolving merge conflicts and squashing a subset of the commits. Pushing these to the original PR would have caused some comments to be lost, hence creating the new one.

rebase and others added 8 commits October 19, 2018 13:58
  1. Secondary Index planning interfaces and abstract classes like DBGroupScan, DbSubScan, IndexDecriptor etc.
  2. Statistics and Cost model interfaces/classes: PluginCost, Statistics, StatisticsPayload, AbstractIndexStatistics
  3. ScanBatch and RecordReader to support repeatable scan
  4. Secondary Index execution related interfaces: RangePartitionSender, RowKeyJoin, PartitionFunction
  5. MD-3979: Query using cast index plan fails with NPE

Co-authored-by: Aman Sinha <asinha@maprtech.com>
Co-authored-by: chunhui-shi <cshi@maprtech.com>
Co-authored-by: Gautam Parai <gparai@maprtech.com>
Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>
Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:
	exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillTable.java
	protocol/src/main/java/org/apache/drill/exec/proto/UserBitShared.java
	protocol/src/main/java/org/apache/drill/exec/proto/beans/CoreOperatorType.java
	protocol/src/main/protobuf/UserBitShared.proto
  1. MD-3960: Update Drill to build with MapR-6.0.1 libraries
  2. MD-3995: Do not pushdown limit 0 past project with CONVERT_FROMJSON
  3. MD-4054: Restricted scan limit is changed to dynamically read rows using the rowcount of the rightside instead of 4096.
  4. MD-3688: Impersonating a view owner doesn't work with security disabled in 6.0
  5. MD-4492: Missing limit pushdown changes in JsonTableGroupScan

Co-authored-by: chunhui-shi <cshi@maprtech.com>
Co-authored-by: Gautam Parai <gparai@maprtech.com>
Co-authored-by: Vlad Rozov <vrozov@mapr.com>

Conflicts:
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatPlugin.java
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBSubScan.java
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/binary/BinaryTableGroupScan.java
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/CompareFunctionsProcessor.java
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonConditionBuilder.java
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/MaprDBJsonRecordReader.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java
	pom.xml
…Secondary Indexes

  1. Index Planning Rules and Plan generators
    - DbScanToIndexScanRule: Top level physical planning rule that drives index planning for several relational algebra patterns.
    - DbScanSortRemovalRule: Physical planning rule for index planning for Sort-based operations.
    - Plan Generators: Covering, Non-Covering and Intersect physical plan generators.
    - Support planning with functional indexes such as CAST functions.
    - Enhance PlannerSettings with several configuration options for indexes.
  2. Index Selection and Statistics
    - An IndexSelector that support cost-based index selection of covering and non-covering indexes using statistics and collation properties.
    - Costing of index intersection for comparison with single-index plans.
  3. Planning and execution operators
    - Support RangePartitioning physical operator during query planning and execution.
    - Support RowKeyJoin physical operator during query planning and execution.
    - HashTable and HashJoin changes to support RowKeyJoin and Index Intersection.
    - Enhance Materializer to keep track of subscan association with a particular rowkey join.
  4. Index Planning utilities
    - Utility classes to perform RexNode analysis, including conversion to and from SchemaPath.
    - Utility class to analyze filter condition and an input collation to determine output collation.
    - Helper classes to maintain index contexts for logical and physical planning phase.
    - IndexPlanUtils utility class for various helper methods.
  5. Miscellaneous
    - Separate physical rel for DirectScan.
    - Modify LimitExchangeTranspose rule to handle SingleMergeExchange.
    - MD-3880: Return correct status from RangePartitionRecordBatch setupNewSchema

Co-authored-by: Aman Sinha <asinha@maprtech.com>
Co-authored-by: chunhui-shi <cshi@maprtech.com>
Co-authored-by: Gautam Parai <gparai@maprtech.com>
Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>
Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:
	exec/java-exec/src/main/java/org/apache/drill/exec/physical/config/HashJoinPOP.java
	exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/ScanBatch.java
	exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java
	exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTable.java
	exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableTemplate.java
	exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/HashJoinBatch.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/fragment/Materializer.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillMergeProjectRule.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillOptiq.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillPushProjectIntoScanRule.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/logical/DrillScanRel.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/BroadcastExchangePrel.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/DrillDistributionTrait.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/HashJoinPrel.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/PrelUtil.java
	exec/java-exec/src/main/java/org/apache/drill/exec/server/options/SystemOptionManager.java
	exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetPushDownFilter.java
	exec/java-exec/src/main/resources/drill-module.conf
	logical/src/main/java/org/apache/drill/common/logical/StoragePluginConfig.java

Resolve merge comflicts and compilation issues.
…dary indexes

  1. Implementation of the index descriptor for MapR-DB.
  2. MapR-DB specific costing for covering and non-covering indexes.
  3. Discovery componenent to discover the indexes available for a MapR-DB table including CAST functional indexes.
  4. Utility functions to build a canonical index descriptor.
  5. Statistics: fetch and initialize statistcs from MapR-DB for a query condition. Maintain a query-scoped cache for the statistics. Utility functions to compute selectivity.
  6. Range Partitioning: partitioning function that takes into account the tablet map to find out where a particular rowkey belongs.
  7. Restricted Scan: support doing restricted (i.e skip) scan through lookups on the rowkey. Added a group-scan and record reader for this.
  8. MD-3726: Simple Order by queries (without limit) when an index is used are showing regression.
  9. MD-3995: Do not pushdown limit 0 past project with CONVERT_FROMJSON
  10. MD-4259 : Account for limit during hashcode computation

Co-authored-by: Aman Sinha <asinha@maprtech.com>
Co-authored-by: chunhui-shi <cshi@maprtech.com>
Co-authored-by: Gautam Parai <gparai@maprtech.com>
Co-authored-by: Padma Penumarthy <ppenumar97@yahoo.com>
Co-authored-by: Hanumath Rao Maduri <hmaduri@maprtech.com>

Conflicts:
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatMatcher.java
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBPushProjectIntoScan.java
	contrib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/json/JsonTableGroupScan.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/index/rules/DbScanSortRemovalRule.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/SortPrel.java
	exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/TopNPrel.java

Fix additional compilation issues.
DRILL-6381: Add missing joinControl logic for INTERSECT_DISTINCT.

- Modified HashJoin's probe phase to process INTERSECT_DISTINCT.

- NOTE: For build phase, the functionality will be same as for SemiJoin when it is added later.

DRILL-6381: Address code review comment for intersect_distinct.

DRILL-6381: Rebase on latest master and fix compilation issues.

DRILL-6381: Generate protobuf files for C++ native client.

DRILL-6381: Use shaded Guava classes.  Add more comments and Javadoc.
@amansinha100
Copy link
Author

Since the original PR has received +1 from committers, I will be merging this one directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant