HAWQ-1446: Introduce vectorized profile for ORC. #1225

avocader · 2017-04-29T03:43:18Z

Work still in progress, want to get earlier feedback.

denalex · 2017-05-23T23:21:31Z

pxf/pxf-api/src/main/java/org/apache/hawq/pxf/api/utilities/Utilities.java

+            isVectorizedResolver = ArrayUtils.contains(Class.forName(inputData.getResolver()).getInterfaces(), ReadVectorizedResolver.class);
+        } catch (ClassNotFoundException e) {
+            LOG.error("Unable to load resolver class: " + e.getMessage());
+            return false;


no need for this line, it will return at the end of the function with default false value.

Sure, thanks

denalex · 2017-05-23T23:22:37Z

pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveDataFragmenter.java

@@ -289,7 +289,7 @@ private void fetchMetaData(HiveTablePartition tablePartition, boolean hasComplex
        if (inputData.getProfile() != null) {
            // evaluate optimal profile based on file format if profile was explicitly specified in url
            // if user passed accessor+fragmenter+resolver - use them
-            profile = ProfileFactory.get(fformat, hasComplexTypes);
+            profile = ProfileFactory.get(fformat, hasComplexTypes, inputData.getProfile());


getProfile() is called twice (in if statement and here, its better to call once and then evaluate and reuse the variable)

denalex · 2017-05-23T23:23:38Z

pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveMetadataFetcher.java

@@ -136,7 +136,7 @@ public HiveMetadataFetcher(InputData md) {
    private OutputFormat getOutputFormat(String inputFormat, boolean hasComplexTypes) throws Exception {
        OutputFormat outputFormat = null;
        InputFormat<?, ?> fformat = HiveDataFragmenter.makeInputFormat(inputFormat, jobConf);
-        String profile = ProfileFactory.get(fformat, hasComplexTypes);
+        String profile = ProfileFactory.get(fformat, hasComplexTypes, null);


passing explicit null params should be avoided, if possible, override the function if more/less params are desired.

denalex · 2017-05-23T23:24:25Z

pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java

+    }
+
+    /**
+     * This method updated reader optionst to include projected columns only.


typo "optionst"

Thanks, fixed

denalex · 2017-05-23T23:25:41Z

pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchAccessor.java

+ * One batch is 1024 rows of all projected columns
+ *
+ */
+public class HiveORCBatchAccessor extends Plugin implements ReadAccessor {


would it be useful if it extended the HiveORCAccessor and overwrite functions ?

Sure, updated

denalex · 2017-05-23T23:39:03Z

pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java

+        /* process all rows from current batch for given column */
+        for (int rowIndex = 0; rowIndex < vectorizedBatch.size; rowIndex++) {
+            if (vectorizedBatch.cols[columnIndex] != null && !vectorizedBatch.cols[columnIndex].isNull[rowIndex]) {
+                writableObject = vectorizedBatch.cols[columnIndex].getWritableObject(rowIndex);


vectorizedBatch.cols[columnIndex] is used 3 times, substitute with variable ?

denalex · 2017-05-23T23:47:45Z

pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadBridge.java

@@ -149,9 +149,10 @@ public static ReadAccessor getFileAccessor(InputData inputData)
                inputData.getAccessor(), inputData);
    }

-    public static ReadResolver getFieldsResolver(InputData inputData)
+    @SuppressWarnings("unchecked")


ouch, can you make Utilities.createAnyInstance templetized instead ?

denalex · 2017-05-23T23:48:29Z

pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java

+
+public class ReadVectorizedBridge implements Bridge {
+
+    ReadAccessor fileAccessor = null;


not private members ?

denalex · 2017-05-23T23:49:39Z

pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java

+                if (batch == null) {
+                    output = outputBuilder.getPartialLine();
+                    if (output != null) {
+                        //LOG.warn("A partial record in the end of the fragment");


remove commented lines ?

denalex · 2017-05-23T23:52:19Z

pxf/pxf-service/src/main/resources/pxf-profiles-default.xml

@@ -101,6 +101,17 @@ under the License.
            <outputFormat>org.apache.hawq.pxf.service.io.GPDBWritable</outputFormat>
        </plugins>
    </profile>
+        <profile>
+        <name>HiveVectorizedORC</name>


seems like "batch" and "vectorized" are used interchangeably, should we use just one term ?

Renamed all classes to use "vectorized"

shivzone · 2017-05-24T18:39:22Z

pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java

+                PrimitiveObjectInspector poi = (PrimitiveObjectInspector ) oi;
+                resolvePrimitiveColumn(columnIndex, oi, vectorizedBatch);
+            } else {
+                throw new UnsupportedTypeException("Unable to resolve column index:" +  columnIndex + ". Only primitive types are supported.");


Can't we catch this error upfront to check if have any non primitive type column in the schema when we use BatchResolver instead ?

This is the very first place we are starting to process columns I don't think w should double validate them.

shivzone · 2017-05-24T19:09:35Z

pxf/pxf-hive/src/main/java/org/apache/hawq/pxf/plugins/hive/HiveORCBatchResolver.java

+        OneField field = null;
+        Writable writableObject = null;
+        /* process all rows from current batch for given column */
+        for (int rowIndex = 0; rowIndex < vectorizedBatch.size; rowIndex++) {


Since we do know that the whole all the writableObjects are of the same type, why don't we simply extract all values in the column into a Writable [] and resolve the entire writable array at once using just one switch case to determine the field type and resolve each writable value in one go ?

shivzone · 2017-05-24T20:39:02Z

pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/ReadVectorizedBridge.java

@@ -0,0 +1,126 @@
+package org.apache.hawq.pxf.service;


ReadVectorizedBridge looks very similar to ReadBridge except for getNext() function. Please refactor both classes to avoid duplication

Makes sense, extended.

shivzone

Over all looks good. You can address the type caste optimization and the class refactor as an incremental update to the vectorized ORC profile

denalex · 2017-05-27T00:16:49Z

pxf/pxf-hive/src/test/java/org/apache/hawq/pxf/plugins/hive/utilities/ProfileFactoryTest.java

@@ -34,31 +34,31 @@
    public void get() throws Exception {

        // For TextInputFormat when table has no complex types, HiveText profile should be used
-        String profileName = ProfileFactory.get(new TextInputFormat(), false);
+        String profileName = ProfileFactory.get(new TextInputFormat(), false, null);


can revert back these changes now that the function with 2 arguments is back, right ?

denalex · 2017-05-27T00:17:59Z

pxf/pxf-service/src/main/java/org/apache/hawq/pxf/service/BridgeOutputBuilder.java

@@ -137,6 +137,18 @@ public Writable getErrorOutput(Exception ex) throws Exception {
        return outputList;
    }

+    public LinkedList<Writable> makeVectorizedOutput(List<List<OneField>> recordsBatch) throws BadRecordException {
+        outputList.clear();
+        for (List<OneField> record : recordsBatch) {


no null checks necessary for recordsBatch and record ?

avocader force-pushed the HAWQ-1446 branch 2 times, most recently from 41c1a6a to 6a063e8 Compare May 1, 2017 20:56

HAWQ-1446: Introduce vectorized profile for ORC.

9bea502

avocader force-pushed the HAWQ-1446 branch from 6a063e8 to 9bea502 Compare May 2, 2017 00:12

denalex reviewed May 23, 2017

View reviewed changes

shivzone reviewed May 24, 2017

View reviewed changes

avocader force-pushed the HAWQ-1446 branch from c5d838d to 48c1ac0 Compare May 24, 2017 20:48

shivzone approved these changes May 25, 2017

View reviewed changes

avocader force-pushed the HAWQ-1446 branch 8 times, most recently from 11e9436 to 53c40b7 Compare May 26, 2017 21:54

denalex approved these changes May 27, 2017

View reviewed changes

avocader force-pushed the HAWQ-1446 branch 3 times, most recently from a2d7fa5 to 79c688e Compare May 30, 2017 22:05

avocader force-pushed the HAWQ-1446 branch from 79c688e to 8b5a870 Compare June 20, 2017 00:47

HAWQ-1446. Code review feedback.

bc6c4d8

avocader force-pushed the HAWQ-1446 branch from 8b5a870 to bc6c4d8 Compare June 22, 2017 00:57

avocader closed this Jun 22, 2017

avocader deleted the HAWQ-1446 branch June 22, 2017 17:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HAWQ-1446: Introduce vectorized profile for ORC. #1225

HAWQ-1446: Introduce vectorized profile for ORC. #1225

avocader commented Apr 29, 2017

denalex May 23, 2017

avocader May 24, 2017

denalex May 23, 2017

denalex May 23, 2017

denalex May 23, 2017

avocader May 25, 2017

denalex May 23, 2017

avocader May 25, 2017

denalex May 23, 2017

denalex May 23, 2017

denalex May 23, 2017

denalex May 23, 2017

denalex May 23, 2017

avocader May 26, 2017

shivzone May 24, 2017

avocader May 25, 2017

shivzone May 24, 2017

shivzone May 24, 2017

avocader May 26, 2017

shivzone left a comment

denalex May 27, 2017

avocader May 27, 2017

denalex May 27, 2017


		public class ReadVectorizedBridge implements Bridge {

		ReadAccessor fileAccessor = null;

HAWQ-1446: Introduce vectorized profile for ORC. #1225

HAWQ-1446: Introduce vectorized profile for ORC. #1225

Conversation

avocader commented Apr 29, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shivzone left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment