DRILL-6454: Native MapR DB plugin support for Hive MapR-DB json table by vdiravka · Pull Request #1314 · apache/drill

vdiravka · 2018-06-09T01:39:20Z

The new StoragePluginOptimizerRule for reading MapR-DB JSON tables by Drill native reader is added.
The rule can allow in planning stage to replace HiveScan with Drill native JsonTableGroupScan.
A new system session option is added to use the new rule.
The common part between ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan and ConvertHiveParquetScanToDrillParquetScan are factored out into HiveUtilities.
Dependency onto drill-format-mapr for hive-core is added.

Note:

The option name for parquet native reader is changed.
Changes in MapRDBFormatMatcher.java is just refactoring.

vdiravka · 2018-06-11T18:02:20Z

@gparai @vrozov Could you review please this PR?

gparai · 2018-06-11T19:39:57Z

@vdiravka do you have any writeup/dspec for this feature?

vdiravka · 2018-06-12T10:35:43Z

@gparai I have described the approach in the Jira description and described the changes in the PR description. There is no need in "Design document" for this feature, since it is too small.

priteshm · 2018-06-19T18:39:42Z

@gparai can you please complete the review soon?

vrozov · 2018-06-19T20:00:42Z

...rib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatMatcher.java

 import org.apache.drill.exec.store.mapr.db.binary.MapRDBBinaryTable;
 import org.apache.hadoop.fs.Path;

+import java.io.IOException;


avoid import order changes.

Done. Just changed com.mapr.fs.MapRFileStatus place, it should be near com.mapr.fs.tables.TableProperties

vrozov · 2018-06-19T20:03:37Z

exec/java-exec/src/main/resources/drill-module.conf

    security.admin.users: "%drill_process_user%",
    store.format: "parquet",
-    store.hive.optimize_scan_with_native_readers: false,
+    store.hive.parquet.optimize_scan_with_native_readers: false,


Is it necessary to maintain backward compatibility with prior versions?

It is a compulsion to change the name, since early it was only one native reader. And to control all native readers with one property is not possible.

So i have decided to keep the old property name as Deprecated for backward compatibility and to leave a comment that it should be removed starting from next 1.15.0 release.

The new property name is also introduced, so both of them can enable parquet native reader.

vrozov · 2018-06-19T20:37:16Z

...b/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveMetadataProvider.java

    }

-    public Collection<InputSplit> getInputSplits() {
+    public List<InputSplit> getInputSplits() {


Why Collection->List? Where order comes from?

There is no need to use Collection. We know exactly it is a list:

https://github.com/apache/drill/pull/1314/files#diff-47ddabc23de6e4245c7b2cdfa8bdb9c0R345

https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java#L101
Also changed usage of iterator from the last link.

It is not a question whether it is known to be a List or an ArrayList (it is probably known that it is an ArrayList). List provides certain behavior (allows duplicate and provides insertion order guarantee). Do you need any of those?

Small, but right remark here. Thanks. I have returned this to Collection. There is no need to use List.

I think Collection is used because of a Map#values() returns it in some places.
From the code it looks like it is enough (at least now) the API of Collection for logicalInputSplit.
Also I found other places with Collection<FileSplit> logicalInputSplit and List<LogicalInputSplit> getInputSplits(). Possibly they should be leaded to the common style.

vrozov · 2018-06-19T20:39:20Z

contrib/storage-hive/core/pom.xml

        </exclusion>
      </exclusions>
    </dependency>
+    <dependency>


Where is the dependency introduced?

This is contrib/format-maprdb module from default profile.
Therefore it is used here in default profile too.
Sources from this dependency is used in ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java.

OK, I see. I was confused due to Drill practice to use the same package name in different modules/jars. This practice is not common and far from best practices.

Could you clarify what package name would be better to use?
Do you mean short names instead of full ones for the common packages?

No, I don't refer to using short names. I refer to using "org.apache.drill.exec" package name outside of "exec". As it is already widely used in Drill I don't suggest to change it in this PR.

gparai · 2018-06-19T21:02:39Z

exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java

-  public static final String HIVE_OPTIMIZE_SCAN_WITH_NATIVE_READERS = "store.hive.optimize_scan_with_native_readers";
-  public static final OptionValidator HIVE_OPTIMIZE_SCAN_WITH_NATIVE_READERS_VALIDATOR =
-      new BooleanValidator(HIVE_OPTIMIZE_SCAN_WITH_NATIVE_READERS);
+  public static final String HIVE_OPTIMIZE_PARQUET_SCAN_WITH_NATIVE_READER = "store.hive.parquet.optimize_scan_with_native_readers";


Please keep the naming consistent.

The old name is not right in order of using several native readers for Hive plugin.
But I will keep the old property name as Deprecated for backward compatibility and will leave a comment that it should be removed starting from next 1.15.0 release.

The new property name is also introduced, so both of them can enable parquet native reader.

gparai · 2018-06-20T00:22:06Z

The rest of the changes LGTM.

vdiravka

@gparai @vrozov I have addressed your comments and have made changes.

vdiravka · 2018-06-19T21:23:01Z

exec/java-exec/src/main/resources/drill-module.conf

    security.admin.users: "%drill_process_user%",
    store.format: "parquet",
-    store.hive.optimize_scan_with_native_readers: false,
+    store.hive.parquet.optimize_scan_with_native_readers: false,


It is a compulsion to change the name, since early it was only one native reader. And to control all native readers with one property is not possible.

So i have decided to keep the old property name as Deprecated for backward compatibility and to leave a comment that it should be removed starting from next 1.15.0 release.

The new property name is also introduced, so both of them can enable parquet native reader.

vdiravka · 2018-06-19T21:26:24Z

...rib/format-maprdb/src/main/java/org/apache/drill/exec/store/mapr/db/MapRDBFormatMatcher.java

 import org.apache.drill.exec.store.mapr.db.binary.MapRDBBinaryTable;
 import org.apache.hadoop.fs.Path;

+import java.io.IOException;


Done. Just changed com.mapr.fs.MapRFileStatus place, it should be near com.mapr.fs.tables.TableProperties

vdiravka · 2018-06-19T21:48:39Z

contrib/storage-hive/core/pom.xml

        </exclusion>
      </exclusions>
    </dependency>
+    <dependency>


This is contrib/format-maprdb module from default profile.
Therefore it is used here in default profile too.
Sources from this dependency is used in ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java.

vdiravka · 2018-06-19T22:13:02Z

...b/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveMetadataProvider.java

    }

-    public Collection<InputSplit> getInputSplits() {
+    public List<InputSplit> getInputSplits() {


There is no need to use Collection. We know exactly it is a list:

https://github.com/apache/drill/pull/1314/files#diff-47ddabc23de6e4245c7b2cdfa8bdb9c0R345

https://github.com/apache/drill/blob/master/contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java#L101
Also changed usage of iterator from the last link.

vdiravka · 2018-06-19T22:24:43Z

exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java

-  public static final String HIVE_OPTIMIZE_SCAN_WITH_NATIVE_READERS = "store.hive.optimize_scan_with_native_readers";
-  public static final OptionValidator HIVE_OPTIMIZE_SCAN_WITH_NATIVE_READERS_VALIDATOR =
-      new BooleanValidator(HIVE_OPTIMIZE_SCAN_WITH_NATIVE_READERS);
+  public static final String HIVE_OPTIMIZE_PARQUET_SCAN_WITH_NATIVE_READER = "store.hive.parquet.optimize_scan_with_native_readers";


The old name is not right in order of using several native readers for Hive plugin.
But I will keep the old property name as Deprecated for backward compatibility and will leave a comment that it should be removed starting from next 1.15.0 release.

The new property name is also introduced, so both of them can enable parquet native reader.

arina-ielchiieva

Overall looks good, a few minor comments though.

arina-ielchiieva · 2018-06-22T14:19:47Z

contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java

+  }
+
+  /**
+   * This method allows to cehck whether the Hive Table contains


arina-ielchiieva · 2018-06-22T14:20:29Z

exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java


  // TODO: We need to add a feature that enables storage plugins to add their own options. Currently we have to declare
  // in core which is not right. Move this option and above two mongo plugin related options once we have the feature.
+  @Deprecated // it should be removed starting from next Drill 1.15.0 release


Please file Jira for this and add in comment.

arina-ielchiieva · 2018-06-22T14:20:46Z

exec/java-exec/src/main/java/org/apache/drill/exec/ExecConstants.java

  // in core which is not right. Move this option and above two mongo plugin related options once we have the feature.
+  @Deprecated // it should be removed starting from next Drill 1.15.0 release
  public static final String HIVE_OPTIMIZE_SCAN_WITH_NATIVE_READERS = "store.hive.optimize_scan_with_native_readers";
+  @Deprecated // it should be removed starting from next Drill 1.15.0 release


Please file Jira for this and add in comment.

arina-ielchiieva · 2018-06-22T14:21:18Z

contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java

  }
+
+  /**
+   * Rule is matched when all of the following match:


Please use formatting.

Done. Thanks for catching this.

arina-ielchiieva · 2018-06-22T14:23:35Z

...age-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveDrillNativeParquetScan.java

      // we need to read path of only one to get file path
-      assert iterator.hasNext();
-      InputSplit split = iterator.next();
+      InputSplit split = logicalInputSplit.getInputSplits().get(0);


No, need to revert. I have left assert deliberately.

arina-ielchiieva · 2018-06-22T14:24:05Z

...rg/apache/drill/exec/planner/sql/logical/ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java

+   */
+  private SchemaPath replaceOverriddenSchemaPath(Map<String, String> parameters, SchemaPath colNameSchemaPath) {
+    String hiveColumnName = colNameSchemaPath.getRootSegmentPath();
+    if (hiveColumnName.equals(parameters.get(MAPRDB_COLUMN_ID))) {


ternary operator?

arina-ielchiieva · 2018-06-22T14:25:59Z

...rg/apache/drill/exec/planner/sql/logical/ConvertHiveMapRDBJsonScanToDrillMapRDBJsonScan.java

+   * @return original column name
+   */
+  private String replaceOverriddenColumnId(Map<String, String> parameters, String colName) {
+    return Objects.equals(colName, parameters.get(MAPRDB_COLUMN_ID)) ? ID_KEY : colName;


Objects equals will consider two null equal, will this case be OK for your changes?

vdiravka

@arina Thanks for catching that points.
I have addressed them and filed Jira ticket for removing old Parquet Native Reader option name:
https://issues.apache.org/jira/browse/DRILL-6527

vdiravka · 2018-06-22T16:06:28Z

contrib/storage-hive/core/src/main/java/org/apache/drill/exec/store/hive/HiveUtilities.java

  }
+
+  /**
+   * Rule is matched when all of the following match:


Done. Thanks for catching this.

arina-ielchiieva · 2018-06-22T16:17:01Z

+1, LGTM.

closes apache#1314

gparai · 2018-06-22T16:58:26Z

@vdiravka thanks for making the changes. +1 LGTM.

closes apache#1314

DRILL-6454: Native MapR DB plugin support for Hive MapR-DB json table

c93a4a0

vdiravka self-assigned this Jun 11, 2018

vrozov reviewed Jun 19, 2018

View reviewed changes

gparai reviewed Jun 19, 2018

View reviewed changes

vdiravka force-pushed the DRILL-6454 branch from 8fc8bc3 to d56dd96 Compare June 20, 2018 09:40

vdiravka commented Jun 20, 2018

View reviewed changes

vdiravka force-pushed the DRILL-6454 branch from d56dd96 to e41d484 Compare June 20, 2018 09:52

arina-ielchiieva approved these changes Jun 22, 2018

View reviewed changes

vdiravka force-pushed the DRILL-6454 branch from e41d484 to c4335a6 Compare June 22, 2018 16:04

vdiravka commented Jun 22, 2018

View reviewed changes

vdiravka added a commit to vdiravka/drill that referenced this pull request Jun 22, 2018

DRILL-6454: Native MapR DB plugin support for Hive MapR-DB json table

0a1372c

closes apache#1314

vdiravka added a commit to vdiravka/drill that referenced this pull request Jun 22, 2018

DRILL-6454: Native MapR DB plugin support for Hive MapR-DB json table

20d251a

closes apache#1314

Changes after code review

56043e7

vdiravka force-pushed the DRILL-6454 branch from c4335a6 to 56043e7 Compare June 22, 2018 18:59

asfgit closed this in b92f599 Jun 22, 2018

Conversation

vdiravka commented Jun 9, 2018

Uh oh!

vdiravka commented Jun 11, 2018

Uh oh!

gparai commented Jun 11, 2018

Uh oh!

vdiravka commented Jun 12, 2018

Uh oh!

priteshm commented Jun 19, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gparai Jun 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gparai commented Jun 20, 2018

Uh oh!

vdiravka left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arina-ielchiieva left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vdiravka left a comment

Choose a reason for hiding this comment

Uh oh!

gparai Jun 19, 2018 •

edited

Loading