ARROW-1474: [JAVA] Java ValueVector hierarchy Refactor (Implementation Phase 2) #1203

siddharthteotia · 2017-10-16T07:47:49Z

Implementation of all scalar types and complex types with corresponding legacy versions.

cc @jacques-n , @BryanCutler , @icexelloss

BryanCutler

Thanks @siddharthteotia! Does the logger creation need to be in each concrete vector class? What is the reason for having the ArrowReader in each vector class? does this need to be there at all?

BryanCutler · 2017-10-16T20:59:14Z

java/vector/src/main/java/org/apache/arrow/vector/NullableIntervalDayVector.java

+   }
+
+   private StringBuilder getAsStringBuilderHelper(int index) {
+      final int startIndex = index * 8;


change to index * TYPE_WIDTH

BryanCutler · 2017-10-16T21:00:45Z

java/vector/src/main/java/org/apache/arrow/vector/NullableIntervalDayVector.java

+      final int startIndex = index * 8;
+
+      final int  days = valueBuffer.getInt(startIndex);
+      int millis = valueBuffer.getInt(startIndex + 4);


maybe add a MILLISECOND_OFFSET = 4 and use that here so it gives some context on what the calculation is for

BryanCutler · 2017-10-16T21:03:22Z

java/vector/src/main/java/org/apache/arrow/vector/NullableIntervalDayVector.java

+      final int offsetIndex = index * TYPE_WIDTH;
+      BitVectorHelper.setValidityBitToOne(validityBuffer, index);
+      valueBuffer.setInt(offsetIndex, days);
+      valueBuffer.setInt((offsetIndex + 4), milliseconds);


same here for millisecond offset

BryanCutler · 2017-10-16T21:06:51Z

java/vector/src/main/java/org/apache/arrow/vector/NullableSmallIntVector.java

+
+
+   private void setValue(int index, int value) {
+      valueBuffer.setShort(index * TYPE_WIDTH, value);


is some kind of cast needed for value which is an int? Why do we need to have double the set methods to allow setting an int?

@BryanCutler

There is probably no cast needed. I just thought that since this vector type really doesn't have 4 byte ints, we should ideally have a method set(index, short value). In the existing code on master, we don't have this. We just have set (int index, int value)

So set(index, int value) internally just takes 2 bytes from the int and calls setShort on the ArrowBuf. From an API perspective, it looked less intuitive to me since the class description tells user that the vector stores 2 byte values. So I introduced another method and still kept the original one.

What do you think? Should we not have set(int index, short value)?

Can we have just setValue(int index, short value)?

I guess having a setValue(int index, int value)is fine as a convenient method . Although, do we want any overflow checks?

I think we should just have setValue(int index, short value) if the other is just convenience. Otherwise we have to think about things like overflow like @icexelloss pointed out.

my .02: given java's penchant for upcasting, I'm inclined to include both short and int signatures. I know some of these things were also introduced because of various runtime code generation patterns that we've used. @siddharthteotia, any sense whether we rely on this pattern? (int set)

@jacques-n , Iet me dig into the code and find out what is our usage

Where do we stand on this? TBD or decided?

I think we should keep both the methods. Keeping them is probably no harm but removing them is likely going to affect the downstream use in our run time generated code.

BryanCutler · 2017-10-16T21:08:28Z

java/vector/src/main/java/org/apache/arrow/vector/NullableTimeMicroVector.java

+      BitVectorHelper.setValidityBit(validityBuffer, index, 0);
+   }
+
+   public void set(int index, int isSet, long valueField ) {


could you call valueField just value to be consistent?

BryanCutler · 2017-10-16T21:15:27Z

java/vector/src/main/java/org/apache/arrow/vector/NullableUInt2Vector.java

+      valueBuffer.setChar(index * TYPE_WIDTH, value);
+   }
+
+   private void setValue(int index, char value) {


why do we support setting a value as a char?

My reasoning over here was similar to why I introduced another method for setting "short" value. I think this needs some discussion.

Should this be short instead? This is a 2 byte value.

same as before, we should try to keep the support value types to minimum. I'm not sure that having char really helps much?

BryanCutler · 2017-10-16T21:16:33Z

java/vector/src/main/java/org/apache/arrow/vector/NullableUInt4Vector.java

+ */
+public class NullableUInt4Vector extends BaseNullableFixedWidthVector {
+   private static final org.slf4j.Logger logger =
+           org.slf4j.LoggerFactory.getLogger(NullableIntVector.class);


NullableUInt4Vector.class

BryanCutler · 2017-10-16T21:17:09Z

java/vector/src/main/java/org/apache/arrow/vector/NullableUInt8Vector.java

+ */
+public class NullableUInt8Vector extends BaseNullableFixedWidthVector {
+   private static final org.slf4j.Logger logger =
+           org.slf4j.LoggerFactory.getLogger(NullableBigIntVector.class);


NullableUInt8Vector.class

BryanCutler · 2017-10-16T21:21:10Z

java/vector/src/main/java/org/apache/arrow/vector/NullableBigIntVector.java

@@ -0,0 +1,299 @@
+/*******************************************************************************
+
+ * Licensed to the Apache Software Foundation (ASF) under one


maybe minor but I usually sometimes see a different format for the license:

/** * Licensed to the Apache Software Foundation (ASF) under one * ... *\

It would be good to be consistent, but maybe that would be better to address separately after all this

Okay will look into it at the end.

BryanCutler · 2017-10-16T21:21:52Z

java/vector/src/main/java/org/apache/arrow/vector/NullableDateDayVector.java

+ */
+public class NullableDateDayVector extends BaseNullableFixedWidthVector {
+   private static final org.slf4j.Logger logger =
+           org.slf4j.LoggerFactory.getLogger(NullableIntVector.class);


should be NullableDateDayVector.class

siddharthteotia · 2017-10-17T00:53:13Z

@BryanCutler,

I have addressed most of your general comments. Few points on remaining comments

(1) Regarding Logger - I had added this as a TODO in my previous patch. The thing is that we really don't log much. We only log in the base class during realloc and alloc functions where there is a chance of catching memory related exceptions. So I have been contemplating if we really need logging. It's more of an unnecessary heap overhead.

The reason I initialize the specific logger in each subclass is because when the super class methods dump out log messages we can see which exact vector type the messages correspond to.

But again, since we barely log messages, I think we are better off not having any logging at all. You can take a look at BaseNullableFixedWidthVector and see what you think. Can we afford no logging?

60a2ebd#diff-dddca025d8d6792d8776d3c59ce508f7R270

(2) Regarding FieldReader -- I think you are right. When we are working with a vector type, we have enough information available to create the reader on demand as opposed to carrying the FieldReader object inside each vector. Is this what you were suggesting?

However, we may have to see the impact of changes. This is definitely doable but we will have to refactor code in map, list and union vectors where when they read nested scalar vectors, they can no longer make a call to getReader().

siddharthteotia · 2017-10-17T07:59:29Z

All scalar types refactored and new implementation is ready -- builds fine. Corresponding Legacy vectors are also ready.
Testing has issues w.r.t mutator/accessor in LIST, MAP, UNION. Next step is to refactor them.

icexelloss · 2017-10-17T19:33:12Z

java/vector/src/main/java/org/apache/arrow/vector/NullableDecimalVector.java

+      return new TransferImpl((NullableDecimalVector)to);
+   }
+
+   private class TransferImpl implements TransferPair {


These TransferImpl class looks very similar for different vectors. Can they be refactored out as a single class?

icexelloss · 2017-10-17T19:48:04Z

java/vector/src/main/java/org/apache/arrow/vector/NullableUInt2Vector.java

+ * integer values which could be null. A validity buffer (bit vector) is
+ * maintained to track which elements in the vector are null.
+ */
+public class NullableUInt2Vector extends BaseNullableFixedWidthVector {


What's the difference between this and NullableSmallIntVector?

I actually think we should remove all the UInt vectors. This was a task we talked about doing a year ago but just didn't get it done. They aren't complete and aren't tested.

Oh sweet. Just curious what's the initial intention for those?

Yes that is probably a good idea. I have been wondering what is the difference between Int4Vector and UInt4Vector since the latter really doesn't implement unsigned semantics.

+1 for removing them

Well, the problem with removing them is that other implementations may send unsigned integer data. We've already been integration testing this: https://github.com/apache/arrow/blob/master/integration/integration_test.py#L745

I think it's fine to mark the unsigned int vectors in Java "buggy" but they will have to get implemented properly at some point

So I guess we are not removing them ?

Keeping them as is and will open JIRA(s) for correctly implementing and testing UINT1, UINT2, UINT4, UINT8. I can open JIRAs

OK, great, thank you =)

icexelloss · 2017-10-17T20:02:25Z

java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java

@@ -260,10 +260,10 @@ private void readVector(Field field, FieldVector vector) throws JsonParseExcepti
          ((ListVector) vector).getMutator().setLastSet(count);
          break;
        case VARBINARY:
-          ((NullableVarBinaryVector) vector).getMutator().setLastSet(count - 1);
+          ((NullableVarBinaryVector)vector).setLastSet(count - 1);


Nit:
Should we add whitespace after closing ) in casting? This is specified in
https://google.github.io/styleguide/javaguide.html#s4.6.2-horizontal-whitespace
item 5

icexelloss · 2017-10-17T20:09:54Z

java/vector/src/main/java/org/apache/arrow/vector/NullableTimeStampMilliVector.java

+ * timestamp values which could be null. A validity buffer (bit vector) is
+ * maintained to track which elements in the vector are null.
+ */
+public class NullableTimeStampMilliVector extends BaseNullableFixedWidthVector {


I suppose we want to consolidate all Timestamp vectors into one class later?

Not sure what that buys. Definitely seems like something we should address afterwards (but before a release--better not to break things twice)

Yeah, we can address this later as long as it's before release.

In C++ we have a single arrow::TimestampArray but applications will generally branch based on the unit metadata and look at the int64 values. I can't really say what's the better option for Java

In general, in Java we want to avoid having extra branches at the cell level. For long I agree all could be the same. So maybe having a TimestampVector that takes a constructor provided unit type and then have subclassed vectors that hold the old per unit hierarchy could give a good combination. Both would have the get and set with int64/long. If you want to avoid the getValue() conditional, you go with the sub-hiearchy. If you want a generic interface for getValue(), use the base TimestampVector which does an internal conditional on how to interpret the value. Keeps @icexelloss happy but also allows Dremio folk to use the specific interfaces in runtime generated code.

Makes sense. I think this is a Java-specific design issue then related to the JIT -- in C/C++ we have to branch earlier than the cell level into vectorized branch-free code paths so having a single container makes things simpler (since you can dispatch to code that handles "some kind of timestamp").

Is this something we should do as part of the ongoing refactor patch? Earlier @jacques-n mentioned doing it afterwards probably once the new infrastructure is stable? I am fine either way. Just confirming..

icexelloss · 2017-10-17T20:11:00Z

Thanks @siddharthteotia. I went through the change and left a few comments.

jacques-n · 2017-10-17T20:16:25Z

@BryanCutler and @siddharthteotia , the reason the vector holders a reference to ArrowReader is that there are several cases where you don't want to constantly be recreating the reader and at the time, this was the easiest place to maintain it. Not sure if this still the case but it is something that should be reviewed before removing.

BryanCutler · 2017-10-17T23:00:51Z

@siddharthteotia , regarding the logger can you specific vector logs by moving to the base class and initializing the logger with the class name as a string, e.g. getLogger(this.getClass().getCanonicalName());?

siddharthteotia · 2017-10-18T00:35:09Z

Thanks for the thorough review @jacques-n , @BryanCutler , @icexelloss , @wesm . I am in the process of addressing comments as we are reaching consensus. Meanwhile, I am trying to prioritize stability of tests.

Right now we have 2 related failures in ComplexWriter, Promotable Writer and lots of failures in TestJsonFile since getFieldInnerVectors is no longer applicable. I am addressing the former ones as of now.

The recent commit has refactored complex types -- LIST, FIXED SIZE LIST, MAP, UNION along with corresponding Legacy types and code changes in the callers to make things work.

jacques-n · 2017-10-18T01:01:38Z

@BryanCutler, for the logger we discussed that internally (@siddharthteotia and I) and should have posted the thoughts here. Basically, the downside is additional heap bloat since each instance has to hold a reference to the logger (as opposed to be a constant). We thought it wasn't worth the cost.

icexelloss · 2017-10-18T16:31:11Z

java/vector/src/main/java/org/apache/arrow/vector/NullableTimeStampMicroTZVector.java

+      }
+   }
+
+   public void setSafe(int index, int isSet, long valueField ) {


valueField -> value?

icexelloss · 2017-10-18T16:31:40Z

java/vector/src/main/java/org/apache/arrow/vector/NullableTimeStampMicroTZVector.java

+      BitVectorHelper.setValidityBit(validityBuffer, index, 0);
+   }
+
+   public void set(int index, int isSet, long valueField ) {


valueField -> value?

icexelloss · 2017-10-18T20:07:26Z

@siddharthteotia I also found the new files are 3-space indented. The current files are 2-space indented. Please fix the indentation?

wesm · 2017-10-18T20:57:25Z

Is this style issue being checked in Travis CI?

icexelloss · 2017-10-18T21:54:07Z

Probably not. I fixed some checkstyles warning but not all of them:
#930 (comment)

So we cannot failure the build for checkstyles yet.

wesm · 2017-10-19T00:08:50Z

I opened https://issues.apache.org/jira/browse/ARROW-1688. We must keep our code clean

siddharthteotia · 2017-10-19T10:16:12Z

@jacques-n , @BryanCutler , @icexelloss

Addressed issues in JsonFileReader and JsonFileWriter. some work was needed here because getFieldInnerVectors is not applicable anymore. I have introduced static methods in vectors as helper routines for reading from and writing into JSON.
Removed logger from all vectors.
Addressed failures in tests. Down to one failure now.

Remaining:

I haven't yet removed the inner vectors from List. Only mutator/accessor were removed. Will do the remaining tomorrow.
Checkstyle issues
keeping both (int and actual) signatures for small types (SMALLINT, TINYINT etc)
TimeStamp hierarchy refactoring per the suggestions given earlier.

BryanCutler · 2017-10-19T17:23:16Z

java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java

    }
+    else if (bufferType.equals(OFFSET)) {


style: combine with line above

We probably want to run checkstyle on the new vector classes before merge.

Maybe we ignore style until the change set are finalized and just fix it all at once.

Yes, definitely. Once the patch is ready for merge to java-vector-refactor branch, we can check that. Or once we plan to push both patches from java-vector-refactor to master, then we can do and address the failures at one go.

Yes, let's address that and any other cosmetic changes once we plan to push the two patches from java-vector-refactor branch to master

BryanCutler · 2017-10-19T17:25:09Z

java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileWriter.java

+        for (int i = 0; i < bufferValueCount; i++) {
+          if (vectorType.equals(DATA) &&
+                  (vector.getMinorType() == Types.MinorType.VARCHAR ||
+                          vector.getMinorType() == Types.MinorType.VARBINARY)) {


style: indentation seems off in this if statement

BryanCutler · 2017-10-19T17:25:48Z

java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileWriter.java

+    if (bufferType.equals(TYPE)) {
+      generator.writeNumber(buffer.getByte(index * NullableTinyIntVector.TYPE_WIDTH));
+    }
+    else if (bufferType.equals(OFFSET)) {


style: combine with line above

BryanCutler · 2017-10-19T17:28:40Z

I looked at JSONfile reader/writer and looks good from what I can tell so far, just some minor style comments

siddharthteotia · 2017-10-19T23:57:09Z

The latest commit addresses some recent review comments and removes the inner vectors from LIST. I guess we are left with Implementing Timestamp vectors as suggested above -- #1203 (comment)
I will get going with it unless there are other specific concerns (modulo indentation etc)

siddharthteotia · 2017-10-20T00:41:15Z

All tests run clean with latest commit.

icexelloss · 2017-10-20T04:14:30Z

Thanks @siddharthteotia. I will find some time to review this tomorrow.

icexelloss · 2017-10-20T18:53:16Z

java/vector/src/main/codegen/templates/LegacyUnionVector.java

+ * limitations under the License.
+ */
+<@pp.dropOutputFile />
+<@pp.changeOutputFile name="/org/apache/arrow/vector/complex/UnionVector.java" />


This is in the new vector class space. @siddharthteotia, what are your thoughts on:

Do we want to continue to codegen union vector in the new vector package?

Are there any API changes comparing new/legacy union vectors?

Do we want to release support for union vector in new vector classes in 0.8?

Oh, I see this is acutally generate LegacyUnionVector.java - the doc is incorrect.

To answer myself, it seems:

Yes

(sid can you answer this?)

Yes it's compatible with the C++ union. No otherwise.

siddharthteotia · 2017-10-20T19:01:06Z

@jacques-n , @BryanCutler , @icexelloss

The latest commit addresses the changes suggested w.r.t timestamp vector hierarchy. The concrete timestamp classes now have only holder specific methods.

Adjusted license headers for consistency.

All tests run clean.

icexelloss · 2017-10-20T19:16:47Z

java/vector/src/main/codegen/templates/UnionVector.java

@@ -363,12 +361,12 @@ public void copyValueSafe(int from, int to) {

  @Override
  public Accessor getAccessor() {


Where do we stand on these? Remove before 0.8 release?

See my comment here that indicates why we have dummy getAccessor() and getMutator() interfaces https://github.com/apache/arrow/pull/1203/files/fb5768fd086a979af8cc8d8795935d191f231678#diff-b5610c173675c3c2707593d7968ee29eR259

Sorry the link is not valid any more, can you paste a new link?

icexelloss · 2017-10-20T19:17:52Z

java/vector/src/main/codegen/templates/UnionVector.java

-  public void setValueCount(int valueCount) { }
-
-  public Object getObject(int index) { return null; }
+    public int getNullCount() { return 0; }


Hmm.. Why is this returning 0?

As explained here -- #1203 (comment)

Should this be UnsupportedOperationException?

This is resolved.

icexelloss · 2017-10-20T20:41:36Z

java/vector/src/main/java/org/apache/arrow/vector/complex/BaseRepeatedValueVector.java

  public final static String DATA_VECTOR_NAME = "$data$";

-  protected final UInt4Vector offsets;
+  public final static byte OFFSET_WIDTH = 4;
+  protected ArrowBuf offsetBuffer;
  protected FieldVector vector;


Maybe rename this to dataVector?

icexelloss · 2017-10-20T20:44:08Z

java/vector/src/main/java/org/apache/arrow/vector/complex/FixedSizeListVector.java

@@ -62,9 +62,6 @@ public static FixedSizeListVector empty(String name, int size, BufferAllocator a



This class still uses BitVector as validity. If we are not changing this in this PR, maybe leaving a TODO?

I am currently doing it.

icexelloss · 2017-10-20T20:46:13Z

java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java

+    return true;
+  }
+
+  private void allocateValidityBuffer(final long size) {


Can this be in base class?

#1203 (comment)

Ok this is fine.

icexelloss · 2017-10-20T20:47:26Z

java/vector/src/main/java/org/apache/arrow/vector/complex/ListVector.java

+    /*
+    * transfer the validity.
+    */
+    private void splitAndTransferValidityBuffer(int startIndex, int length, ListVector target) {


Note to myself:

Read this carefully.

icexelloss · 2017-10-20T20:51:18Z

java/vector/src/main/java/org/apache/arrow/vector/complex/NullableMapVector.java

@@ -60,9 +60,6 @@ public static NullableMapVector empty(String name, BufferAllocator allocator) {

  private final List<BufferBacked> innerVectors;

-  private final Accessor accessor;
-  private final Mutator mutator;
-
  // deprecated, use FieldType or static constructor instead
  @Deprecated
  public NullableMapVector(String name, BufferAllocator allocator, CallBack callBack) {


Do we want to keep both MapVector and NullableMapVector?

Let's discuss when we are removing all non-nullable vectors.

track in ARROW-1710

icexelloss · 2017-10-20T20:53:15Z

java/vector/src/main/java/org/apache/arrow/vector/file/json/JsonFileReader.java

@@ -175,7 +142,7 @@ public boolean read(VectorSchemaRoot root) throws IOException {
        {
          for (Field field : root.getSchema().getFields()) {
            FieldVector vector = root.getVector(field.getName());
-            readVector(field, vector);
+            readFromJsonIntoVector(field, vector);


Note to myself: Think about this.

icexelloss · 2017-10-20T20:55:24Z

@siddharthteotia I went through the change and left some comments. I haven't read all your reply yet but I will do it later. (Have to go now)

siddharthteotia · 2017-10-21T23:07:50Z

@jacques-n , @BryanCutler , @icexelloss

All code changes done.
Tests run clean with latest commit (note all commits are now squashed)

I will review the code tonight primarily for adding JavaDocs, comments and any review comments that we haven't got a closure on.

siddharthteotia · 2017-10-21T23:11:13Z

I have added some new tests too but will file follow-up JIRAs to improve our test suite going forward.

icexelloss · 2017-10-23T03:09:15Z

A few high level comments:

There are a few functions that seems to be placeholder for the interface - for instance getNullCount that only returns 0. I suppose these functions are not going to be removed in 0.8? If that's the case, I think we should make it clear to the user that these functions will not work and shouldn't be called. Having bogus implementation could be confusing to user.
API added in vectors classes that are for json reader/writer. I think those shouldn't be a part of the public API. Also I am not sure why we need to make such changes, maybe @siddharthteotia can help clarify?

icexelloss · 2017-10-23T03:10:59Z

@siddharthteotia For future reference, I usually prefer not to squash commit during PR - this makes it hard to track incremental changes. We can squash commit when merging.

siddharthteotia · 2017-10-23T09:20:11Z

W.r.t introduction of some static interfaces for JsonFileWriter/Reader

The introduction of couple of static interfaces is not absolutely necessary. They are written for better readability in JsonFileReader's gigantic switch block when it parses Json and writes to the vector (and its inner vectors). Since now we no longer have inner vectors, we obviously couldn't leverage the same code. The JsonFileReader had to be changed to specifically write to different buffers (TYPE, VALIDITY, OFFSET, DATA) for a particular vector. Also it has to allocate the buffer and appropriately set writer index before calling loadFieldBuffers. This is something that was needed for every case in switch block here. Once I did this, the code looked pretty messy and ugly. So I moved all the logic private to vectors and made them as part of static interfaces.

On the other hand, in JsonFileWriter we were reading from vector (and its inner vectors) and writing out Json data. Again, since there are no inner vectors, all operation had to be transformed to work at the buffer level -- for writing the contents of each inner buffer. Also, the old code of JsonFileWriter stated a TODO that it was not handling each type. The new code handles all types.

If the general preference is to not introduce static interfaces in vector APIs, I am fine with removing them and moving all logic into JSon code itself. The javadocs already indicate that external use of these APIs is not recommended.

W.r.t introduction of some new APIs in ValueVector

Note that top level interface is still ValueVector even though hierarchy underneath has changed. So there are still non-nullable vectors extending ValueVector, implementing mutator/accessor interfaces etc.

So I introduced APIs like getNullCount(), getValueCount(), setValueCount(), getObject() for the new nullable vectors. Once we remove non-nullable vectors and expose mutator/accessor functions as direct get/set in ValueVector, we can get rid of these APIs too.

User is free to call such methods on vectors since internally they delegate the call to corresponding mutator/accessor operation for non-nullable vectors and for nullable vectors we already have the new implementation. For legacy vectors, it doesn't really matter since each operation is just a pass-through to new code.

There aren't any placeholder interfaces anywhere. Each vector (nullable or non-nullable) has a concrete implementation of all such interfaces as prescribed by ValueVector. Correctness is not affected anywhere. We should be able to do the simple cleanup once we remove non-nullable vectors. If we are not planning to remove non-nullable vectors then we should just remove mutator/accessor from them and expose all the get/set APIs directly just like we have done for other nullable and complex vectors. That will also allow us to do simple cleanup.

Whatever we decide to do with non-nullable scalar vectors, we should do soon to make the entire java Vector code under ValueVector hierarchy consistent.

Right now the nullable scalars and complex types are consistent -- none of them have inner vectors and none of them support mutator/accessor based access. Either we should do the same thing with non nullable vectors or remove them all together. The latter is preferable.

siddharthteotia · 2017-10-23T09:22:59Z

Looking into Travis CI failures -- build (with tests) runs clean locally

wesm · 2017-10-23T13:19:39Z

For future reference, I usually prefer not to squash commit during PR - this makes it hard to track incremental changes. We can squash commit when merging.

We're really running into a weakness of GitHub code reviews. My understanding is that Dremio uses Gerrit for code reviews (like Kudu, Impala, and lots of Google projects) and so the squashing is a key part of the Gerrit workflow. But it works pretty poorly for GitHub, where having a string of new commits is better (although the GitHub UI is terrible for reviewing incremental diffs)

I would really like to have the option of doing large Arrow code reviews on Gerrit. It can be a bit challenging to do (because Gerrit can fall out of sync) unless you have 100% of your reviews hosted on there, and Gerrit is quite a bit of process for some users. I hope that we find a way to do this in the future.

icexelloss · 2017-10-23T15:09:34Z

@siddharthteotia , thanks for the explanation.

W.r.t introduction of some static interfaces for JsonFileWriter/Reader
I see, so the reason is inner vectors are removed and therefore the json reader/writer doesn't work anymore.

In that case, I am sort of OK with leaving these methods as public static but document clearly those methods are not part of public API and should not be used (IMHO "not recommended" is not strong enough, I would probably say "shouldn't") and refactor that later, before or after 0.8 release.

What do other people think?

W.r.t introduction of some new APIs in ValueVector

So I introduced APIs like getNullCount(), getValueCount(), setValueCount(), getObject() for the new nullable vectors. Once we remove non-nullable vectors and expose mutator/accessor functions as direct get/set in ValueVector, we can get rid of these APIs too.

User is free to call such methods on vectors

@siddharthteotia, let me see if I understand this correctly:

getNullCount(), getValueCount(), setValueCount(), getObject() are a part of the new vector API and we will keep them going forward.

I think I saw a version of the BaseValueVector that these methods are returning bogus values and got confused. It seems to be correct row. I probably just saw a wip version.

siddharthteotia · 2017-10-23T21:09:04Z

@jacques-n , @BryanCutler , @icexelloss

The latest commit has good javadocs for each and every new function in all vector types.

How do you feel about merging this patch to java-vector-refactor branch? I believe merging into master will require proper formal sign off.

We are going to kickstart testing these in Dremio after cherry-picking these two patches from java-vector-refactor branch and making the necessary code changes.

cc @wesm

wesm · 2017-10-23T21:46:06Z

I'm on board with merging to the refactor branch whenever you all agree

jacques-n · 2017-10-23T22:26:57Z

I'm onboard with merge. We can still continue to address comments post merge as necessary. Big patches and github don't mix...

BryanCutler · 2017-10-23T22:43:46Z

Thanks @siddharthteotia , I agree to merge to refactor branch and continue there. This is getting a little to big to focus on any one change.

W.r.t introduction of some static interfaces for JsonFileWriter/Reader

I don't really like having these static set methods in the vector public apis if they shouldn't be used outside of JSON reader/writer, but lets continue this discussion after merging this

icexelloss · 2017-10-23T22:51:01Z

I agree we can merge this. I will open Jiras to track major unresolved discussion .

Implementation of all scalar types and complex types with corresponding legacy versions. Closes #1203

wesm · 2017-10-24T00:42:23Z

Merged in 612b970, thanks @siddharthteotia and everyone for reviewing!

Implementation of all scalar types and complex types with corresponding legacy versions. Closes apache#1203

Implementation of all scalar types and complex types with corresponding legacy versions. Closes #1203

BryanCutler reviewed Oct 16, 2017

View reviewed changes

icexelloss reviewed Oct 17, 2017

View reviewed changes

icexelloss reviewed Oct 18, 2017

View reviewed changes

BryanCutler reviewed Oct 19, 2017

View reviewed changes

icexelloss reviewed Oct 20, 2017

View reviewed changes

ARROW-1474:[JAVA] ValueVector hierarchy (Implementation Phase 2)

b83d874

siddharthteotia force-pushed the ARROW-1474 branch from fb5768f to b83d874 Compare October 21, 2017 23:03

add some javadocs

5e310ae

review comments and complete javadocs, code comments

823f75d

siddharthteotia changed the title ~~ARROW-1474:[WIP] Java Vector Refactor (Implementation Phase 2)~~ ARROW-1474: [JAVA] Java ValueVector hierarchy Refactor (Implementation Phase 2) Oct 23, 2017

wesm pushed a commit that referenced this pull request Oct 24, 2017

ARROW-1474:[JAVA] ValueVector hierarchy (Implementation Phase 2)

612b970

Implementation of all scalar types and complex types with corresponding legacy versions. Closes #1203

wesm closed this Oct 24, 2017

siddharthteotia added a commit to siddharthteotia/arrow that referenced this pull request Nov 14, 2017

ARROW-1474:[JAVA] ValueVector hierarchy (Implementation Phase 2)

4509c29

Implementation of all scalar types and complex types with corresponding legacy versions. Closes apache#1203

wesm pushed a commit that referenced this pull request Nov 15, 2017

ARROW-1474:[JAVA] ValueVector hierarchy (Implementation Phase 2)

9ee838a

Implementation of all scalar types and complex types with corresponding legacy versions. Closes #1203



		private void setValue(int index, int value) {
		valueBuffer.setShort(index * TYPE_WIDTH, value);

		@@ -0,0 +1,299 @@
		/*******************************************************************************

		* Licensed to the Apache Software Foundation (ASF) under one

		@@ -363,12 +361,12 @@ public void copyValueSafe(int from, int to) {

		@Override
		public Accessor getAccessor() {

		@@ -62,9 +62,6 @@ public static FixedSizeListVector empty(String name, int size, BufferAllocator a

ARROW-1474: [JAVA] Java ValueVector hierarchy Refactor (Implementation Phase 2) #1203

ARROW-1474: [JAVA] Java ValueVector hierarchy Refactor (Implementation Phase 2) #1203

Conversation

siddharthteotia commented Oct 16, 2017 • edited Loading

BryanCutler left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

icexelloss Oct 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia commented Oct 17, 2017

siddharthteotia commented Oct 17, 2017

icexelloss Oct 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia Oct 19, 2017 • edited Loading

Choose a reason for hiding this comment

icexelloss commented Oct 17, 2017

jacques-n commented Oct 17, 2017

BryanCutler commented Oct 17, 2017 • edited Loading

siddharthteotia commented Oct 18, 2017

jacques-n commented Oct 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

icexelloss commented Oct 18, 2017

wesm commented Oct 18, 2017

icexelloss commented Oct 18, 2017

wesm commented Oct 19, 2017

siddharthteotia commented Oct 19, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

siddharthteotia commented Oct 16, 2017 •

edited

Loading

icexelloss Oct 20, 2017 •

edited

Loading

icexelloss Oct 17, 2017 •

edited

Loading

siddharthteotia Oct 19, 2017 •

edited

Loading

BryanCutler commented Oct 17, 2017 •

edited

Loading

siddharthteotia commented Oct 19, 2017 •

edited

Loading

icexelloss Oct 20, 2017 •

edited

Loading