Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRILL-7350: Move RowSet related classes from test folder #1843

Merged
merged 1 commit into from Aug 19, 2019

Conversation

@vvysotskyi
Copy link
Member

commented Aug 15, 2019

  • Moved row-set related classes from test folder into main and changed package from org.apache.drill.test.rowSet to org.apache.drill.exec.physical.rowSet where other row-set related classes were placed.
  • Updated package info for both packages.

For details please see DRILL-7350.

*/

int offset();
}

This comment has been minimized.

Copy link
@arina-ielchiieva

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Done.

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member

Make sure Printer returns strings instead of sout
Maybe rename it to other name since now it won't print.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, replaced PrintStream usage with StringBuilder, renamed class to RowSetStringBuilder and renamed its methods.

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member

Please rename method: void setPosn(int index);

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, renamed to setPosition().

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member
  1. Remove protected SchemaChangeCallBack callBack = new SchemaChangeCallBack();
  2. Better error message: public long size() { throw new UnsupportedOperationException("getSize"); }

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

@@ -15,15 +15,15 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member

Constructors can be protected.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Done.


import java.util.HashSet;

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member

public RowSetWriter writer() { return writer(10); }

Create a constant instead with Java-doc.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

@@ -132,7 +131,7 @@ public RowSetReader reader() {

@Override
public SingleRowSet toIndirect() {
return new IndirectRowSet(this, Sets.<Integer>newHashSet());
return new IndirectRowSet(this, new HashSet<>());

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member

Should be immutable?

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Agree, replaced with Collections.emptySet().

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member

Replace usage of Sets.<Integer>newHashSet()

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, replaced with Collections.emptySet().

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member
  1. Remove public: public interface SingleRowSet extends RowSet {
  2. Please try to fix all java doc errors.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

import org.apache.drill.exec.record.BatchSchema;
import org.apache.drill.exec.record.metadata.MetadataUtils;
import org.apache.drill.exec.record.metadata.TupleMetadata;
import org.apache.drill.exec.vector.accessor.convert.ColumnConversionFactory;
import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member
  1. Capacity here can use constant we introduce for 10.
  2. Also can be considered to be moved.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

@vvysotskyi
Copy link
Member Author

left a comment

@arina-ielchiieva, thanks for code review, I have addressed CR comments. Can you please review it again?

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

@@ -15,15 +15,15 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Done.


import java.util.HashSet;

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

@@ -132,7 +131,7 @@ public RowSetReader reader() {

@Override
public SingleRowSet toIndirect() {
return new IndirectRowSet(this, Sets.<Integer>newHashSet());
return new IndirectRowSet(this, new HashSet<>());

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Agree, replaced with Collections.emptySet().

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, replaced with Collections.emptySet().

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, replaced PrintStream usage with StringBuilder, renamed class to RowSetStringBuilder and renamed its methods.

@@ -15,7 +15,7 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.drill.test.rowSet;
package org.apache.drill.exec.physical.rowSet;

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, renamed to setPosition().

*/

int offset();
}

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Done.

import org.apache.drill.exec.record.BatchSchema;
import org.apache.drill.exec.record.metadata.MetadataUtils;
import org.apache.drill.exec.record.metadata.TupleMetadata;
import org.apache.drill.exec.vector.accessor.convert.ColumnConversionFactory;
import org.apache.drill.shaded.guava.com.google.common.collect.Sets;
import org.apache.drill.test.rowSet.RowSet.SingleRowSet;
import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

@vvysotskyi vvysotskyi force-pushed the vvysotskyi:DRILL-7350 branch from 712dc08 to 01ea757 Aug 15, 2019

stringBuilder.append(reader.offset());
stringBuilder.append(")");
break;
default:

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member

Do we need default here?

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

No, it is not required.

* Helper class to obtain string representation of RowSet.
*/
public class RowSetStringBuilder {
private RowSet rowSet;

This comment has been minimized.

Copy link
@arina-ielchiieva

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

this.rowSet = rowSet;
}

public static void appendTupleSchema(StringBuilder stringBuilder, TupleMetadata schema) {

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member

Why this is public and static?

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Made private and non-static.

return result.toString();
}

public void appendTo(StringBuilder stringBuilder) {

This comment has been minimized.

Copy link
@arina-ielchiieva

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Thanks, done.

import org.apache.drill.exec.record.metadata.TupleMetadata;

/**
* Helper class to obtain string representation of RowSet.

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 15, 2019

Member

It would be nice to add an example of the output.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 15, 2019

Author Member

Agree, added an example.

@paul-rogers

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

@vvysotskyi, thanks for making this change. It will allow certain other tasks to be much easier.

The first observation is that it would be best not to put the RowSet and ResultSetLoader stuff in the same package: they represent two independent systems and our package naming should make that clear.

Perhaps

  1. Rename the existing physical.rowSet to physical.resultSet.
  2. Move the RowSet classes to a new physical.rowSet.

The difference, so we're clear, is that RowSet works on a single batch, while the ResultSet works across a stream of batches. If operators, say, want to use the row set stuff to read batches, then we need a ResultSetReader that automagically handles things like schema changes across batches, etc.

To be very clear:

"row set" -- A collection of rows. Typically called a "batch" in Drill.
"result set" -- A collection of rows for an entire query. Typically consists of multiple (related) batches.

I used the term "row set" only because the classes would otherwise get lost with all the existing "Batch" classes. And, we don't have a good name for the result set. I borrowed this name from SQL.


/**
* Basic implementation of a row set for both the single and multiple
* (hyper) varieties, both the fixed and extendible varieties.
* (hyper) varieties, both the fixed and extendable varieties.

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

extensible

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Thanks, done.

@@ -41,13 +38,19 @@ public AbstractRowSet(VectorContainer container, TupleMetadata schema) {
}

@Override
public VectorAccessible vectorAccessible() { return container(); }
public VectorAccessible vectorAccessible() {

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

Why change formatting? Drill has long used single-line methods for conciseness. Is this an artifact of some code formatter? I didn't see discussion on the dev list about changing our coding standards...

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

One-line methods are prevented in a lot of projects. Here is a couple of the checkstyle rules which forbid such style: https://checkstyle.sourceforge.io/config_blocks.html#LeftCurly and https://checkstyle.sourceforge.io/config_blocks.html#RightCurly. I think at some point, we will update the code to enable these and other rules.

Also, most of IDE can compress displaying such methods automatically.

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 16, 2019

Member

I think we should enforce such checkstyle. Taking into account that most of IDE can compress such methods automatically plus many other projects follow the same pattern. Personally, I was following such code style as well. @vvysotskyi could you please create Jira?

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

I have created https://issues.apache.org/jira/browse/DRILL-7352 for this. I'll populate it with more rules soon.

}

public AbstractSingleRowSet(AbstractSingleRowSet rowSet) {
protected AbstractSingleRowSet(AbstractSingleRowSet rowSet) {

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

Why remove nested classes? These are perfectly normal part of Java that allows small, single-purpose classes to be associated with the one and only class that uses them. Again, if we want to change the Drill style guidelines, let's have the discussion on dev list first.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

These classes weren't removed, but they moved to the bottom of the top-level class. I believe we follow such a style in the project.

This comment has been minimized.

Copy link
@arina-ielchiieva

arina-ielchiieva Aug 16, 2019

Member

Agree on moving nested classes to the bottom since it makes code mode readable, first you see all code that belongs to the class and then nested classes.

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

I can live with this, though the nested-classes at top is a more common pattern in the Java community. The general rule is that constants and nested classes at top, then fields, then methods. Original Java formatters tended to put fields at the bottom. Some projects mix nested classes with methods: the class is defined adjacent to the method that uses it.

I also recall that Arina once suggested removing an anonymous inner class in favor of a declared class. This is, again, a style issue since anonymous inner classes are a very common style, especially in UI and functional programming.

In general, we need to establish rules for the project, else each person that touches the file will reformat code to their preferred style. We should discuss those rules on the dev list before we start enforcing them.

@@ -66,4 +57,14 @@ public long size() {
protected RowSetReader buildReader(ReaderIndex rowIndex) {
return new RowSetReaderBuilder().buildReader(this, rowIndex);
}

public static class RowSetReaderBuilder extends BaseReaderBuilder {

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

See note above.

/**
* Initial row count, used for preliminary memory allocation.
*/
public static final int INITIAL_ROW_COUNT = 10;

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

See note above about removal of nested class.

@@ -129,7 +129,7 @@ public RowSetWriter addRow(Object...values) {

@Override
public RowSetWriter addSingleCol(Object value) {
return addRow(new Object[] {value});
return addRow(value);

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

Won't work. Must be an array. The idea is that addRow() takes an array of columns. But, if the row has a single column, and that one column is repeated, the Java code is ambiguous. The original (correct) implementation handles this case.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Actually, it should work as expected. Java determines how to treat incoming arguments at the compilation stage, so if the reference has a non-array type, it will be treated as a single element of vararg even if actual type of the object is an array.
Here is these two different cases:

addRow(new String[]{"a", "a"});

and the same case as in the code:

Object value = new String[]{"a", "a"};
addRow(value);

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

While this would seem to be true, it won't work for the case of a map:

addRow(new Object[]{"foo", 10})

Is the above two column values, or a single column that happens to be an object?

There is actually no need to debate this, however. Without this feature, some unit tests will fail. (I know that because I added this method directly in response to those test failures...)

Still, if you can get those tests to pass, even with the map value ambiguity, then my concern will be addressed.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Tests passed with this change. I'm afraid that we are talking about different cases.
addRow(new Object[]{"foo", 10}) will work as you expected - it will handle incoming array as two column values, but the code in the method which was changed is different a little bit, it is similar to this call:
addRow((Object) new Object[]{"foo", 10}) - in this case, it will be handled as a single column with array value.

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

Ah! But I don't want to do the cast; that is the purpose of this method. I want:

  ...addRow(mapValue(10, "foo"), 20); // convenience
  ...addRow(new Object[] {10, "foo"}, 20); // low-level
...
  ...addSingleCol(mapValue(10, "foo"); // convenience
 ... addSingleCol(new Object[] {10, "foo"}); // low-level

Unless we wrap that single column in another array, the second form is the same as:

 ... addRow(new Object[] {10, "foo"});

Which will be interpreted as a set of two columns, not as a single column with a map-value.

I could be wrong, and I'd have to test actual code to be sure. In fact, I'll do that test this weekend.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

No need to cast, it was an example. In the method where the value is used reference type is already Object, so we don't need to do a cast.

Unless we wrap that single column in another array, the second form is the same as:

 ... addRow(new Object[] {10, "foo"});

Actually not, it will be the same as

addRow((Object) new Object[] {10, "foo"});

because method argument in addSingleCol has Object type.

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 18, 2019

Contributor

Thanks for the explanation, makes sense.

* indirect row set.</dd>
* <dt>HyperRowSet</dt>
* <dd>A read-only row set made up of a collection of record batches, indexed via an
* SV4. As with the SV2, the SV4 itself is writable.</dt>

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

The SV is not writable within a RowSet. Changing the SV will mess up indexing.

The way to use that is to have a DirectRowSet on the original batch. Create the SV. Then, use the DirectRowSet and the SV to create an indirect row set.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

This part was moved from package-info.java in tests package. Removed this sentence to avoid confusion.

@@ -32,6 +32,22 @@
* batch" or "record batch." (But, in Drill, the term "record batch" also
* usually means an operator on that set of records. Here, a row set is
* just the rows &nash; separate from operations on that data.</dd>
* <dt>SingleRowSet (abstract)</dt>

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

See general comment. Please do not combine the RowSet and ResultSetLoader classes into a single package: they present very different models used for different purposes.

Yes, it is confusing because there is no ResultSetReader at present, but there should be to handle schema changes, etc. across batches. RowSet handles only a single row set such as in tests. It was deliberately designed to NOT be ResultSetReader solution.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Moved resultSet classes into a separate package and updated package-info.java.

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

Thanks!

* call these methods; the scan operator does this work.
* <p>
* Each row is delimited by a call to {@link #startValue()} and a call to
* {@link #saveRow()}. <tt>startRow()</tt> performs initialization necessary
* Each row is delimited by a call to {@link org.apache.drill.exec.vector.accessor.writer.WriterEvents#startRow()}

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

WriterEvents is meant to be entirely internal; it should not be part of the API. WriterEvents is the internal implementation of the public API described here.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Thanks for pointing this, reverted this change.

@@ -99,13 +99,6 @@ public DirectRowSet next() {
}
}

public void printAll() {

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 15, 2019

Contributor

Please either leave this, or move it to a utility class. This turns out to be a very handy way to validate or debug a unit test: add a call while working on the test to visualize the results, remove the call when done.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Reverted removing this method.

@paul-rogers

This comment has been minimized.

Copy link
Contributor

commented Aug 15, 2019

Two general comments. First, not sure why we needed to convert compact single-line methods (a long-time Drill standard) into four-line methods. Perhaps this is the result of an overly-helpful code formatter?

Second, looks like a bunch of nested helper classes were moved elsewhere. Not sure this is helpful as it clutters the code with small files used in exactly one place. The nested classes (a perfectly normal Java practice) makes clear that these are, in fact, helpers.

@vvysotskyi vvysotskyi force-pushed the vvysotskyi:DRILL-7350 branch 2 times, most recently from a3dec7d to 77fbda7 Aug 15, 2019

@vvysotskyi
Copy link
Member Author

left a comment

@paul-rogers, thanks for code review, I have addressed CR comments and moved resultSet classes into a separate package.

Regarding formatting single-line methods, I don't think that it is a long-time Drill standard and it should be preserved, though there is a lot of places where such style still used.

Regarding nested helper classes, they left in the same places but placed right after all the methods to be consistent with other classes.

@@ -41,13 +38,19 @@ public AbstractRowSet(VectorContainer container, TupleMetadata schema) {
}

@Override
public VectorAccessible vectorAccessible() { return container(); }
public VectorAccessible vectorAccessible() {

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

One-line methods are prevented in a lot of projects. Here is a couple of the checkstyle rules which forbid such style: https://checkstyle.sourceforge.io/config_blocks.html#LeftCurly and https://checkstyle.sourceforge.io/config_blocks.html#RightCurly. I think at some point, we will update the code to enable these and other rules.

Also, most of IDE can compress displaying such methods automatically.


/**
* Basic implementation of a row set for both the single and multiple
* (hyper) varieties, both the fixed and extendible varieties.
* (hyper) varieties, both the fixed and extendable varieties.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Thanks, done.

}

public AbstractSingleRowSet(AbstractSingleRowSet rowSet) {
protected AbstractSingleRowSet(AbstractSingleRowSet rowSet) {

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

These classes weren't removed, but they moved to the bottom of the top-level class. I believe we follow such a style in the project.

return null;
}

public static class RowSetWriterBuilder extends BaseWriterBuilder {

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

These classes are still nested and placed in the same top-level class, but they have moved to the bottom of the top-level class.

* 2: 2017, 12, 19
* </pre>
*/
public class RowSetStringBuilder {

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Good point, thanks, reworked this class to use Writer.

@@ -129,7 +129,7 @@ public RowSetWriter addRow(Object...values) {

@Override
public RowSetWriter addSingleCol(Object value) {
return addRow(new Object[] {value});
return addRow(value);

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Actually, it should work as expected. Java determines how to treat incoming arguments at the compilation stage, so if the reference has a non-array type, it will be treated as a single element of vararg even if actual type of the object is an array.
Here is these two different cases:

addRow(new String[]{"a", "a"});

and the same case as in the code:

Object value = new String[]{"a", "a"};
addRow(value);
@@ -32,6 +32,22 @@
* batch" or "record batch." (But, in Drill, the term "record batch" also
* usually means an operator on that set of records. Here, a row set is
* just the rows &nash; separate from operations on that data.</dd>
* <dt>SingleRowSet (abstract)</dt>

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Moved resultSet classes into a separate package and updated package-info.java.

* indirect row set.</dd>
* <dt>HyperRowSet</dt>
* <dd>A read-only row set made up of a collection of record batches, indexed via an
* SV4. As with the SV2, the SV4 itself is writable.</dt>

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

This part was moved from package-info.java in tests package. Removed this sentence to avoid confusion.

* call these methods; the scan operator does this work.
* <p>
* Each row is delimited by a call to {@link #startValue()} and a call to
* {@link #saveRow()}. <tt>startRow()</tt> performs initialization necessary
* Each row is delimited by a call to {@link org.apache.drill.exec.vector.accessor.writer.WriterEvents#startRow()}

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Thanks for pointing this, reverted this change.

@@ -99,13 +99,6 @@ public DirectRowSet next() {
}
}

public void printAll() {

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Reverted removing this method.

@vvysotskyi

This comment has been minimized.

Copy link
Member Author

commented Aug 16, 2019

@paul-rogers, I have reverted debatable formatting changes. Let's postpone this discussion until we decide to add some additional checkstyle rules.

@arina-ielchiieva

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

Overall, LGTM. Let's wait for @paul-rogers approval.

@vvysotskyi vvysotskyi force-pushed the vvysotskyi:DRILL-7350 branch from 2fcf592 to 62dd37c Aug 16, 2019

@paul-rogers
Copy link
Contributor

left a comment

Thanks for making the changes; looking pretty good. Just a few follow-up comments on the row set printer.

Also, please do suggest on the dev list our additions to the formatting rules. There is a page that explains our rules. Once we agree on the additional rules, let's update that page (and the Eclipse and IntelliJ format rules) to reflect them.

}

public AbstractSingleRowSet(AbstractSingleRowSet rowSet) {
protected AbstractSingleRowSet(AbstractSingleRowSet rowSet) {

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

I can live with this, though the nested-classes at top is a more common pattern in the Java community. The general rule is that constants and nested classes at top, then fields, then methods. Original Java formatters tended to put fields at the bottom. Some projects mix nested classes with methods: the class is defined adjacent to the method that uses it.

I also recall that Arina once suggested removing an anonymous inner class in favor of a declared class. This is, again, a style issue since anonymous inner classes are a very common style, especially in UI and functional programming.

In general, we need to establish rules for the project, else each person that touches the file will reformat code to their preferred style. We should discuss those rules on the dev list before we start enforcing them.

* 2: 2017, 12, 19
* </pre>
*/
public class RowSetStringBuilder {

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

The class name is still "RowSetStringBuilder" Better to rename it to, say, "RowSetFormatter". The constructor should take both a row set and a writer. We can then provide static methods for the two most common cases: print to stdout (which I use all the time when creating tests) and to string (which you seem to prefer):

public static void print(RowSet rowSet) {
  new RowSetFormatter(rowSet, System.out).write();
}

public static String toString(RowSet rowSet) {
  StringWriter out = new StringWriter();
  new RowSetFormatter(rowSet, out).write();
  return out.toString();
}

The constructor allows other options, such as writing to a file. Sadly, it won't work to write to a log; we'd have to materialize to a string...

@@ -32,6 +32,22 @@
* batch" or "record batch." (But, in Drill, the term "record batch" also
* usually means an operator on that set of records. Here, a row set is
* just the rows &nash; separate from operations on that data.</dd>
* <dt>SingleRowSet (abstract)</dt>

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

Thanks!

RowSet result = client.queryBuilder().sql(sql).rowSet();
result.print();
TupleMetadata expectedSchema = new SchemaBuilder()
.add("eventDate", TypeProtos.MinorType.TIMESTAMP, TypeProtos.DataMode.OPTIONAL)

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

Simpler: addNullable("eventDate", MinorType.TIMESTAMP)

Import MinorType, use the addNullable method rather than the more verbose form of using the DataMode. That is, express intent, not implementation.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Thanks, done.

@@ -101,7 +103,7 @@ public DirectRowSet next() {

public void printAll() {
for (DirectRowSet rowSet : this) {
rowSet.print();
new RowSetStringBuilder(rowSet).writeTo(new OutputStreamWriter(System.out));

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

This is not a simplification; it exposes too much detail. Better:

RowSetFormatter.print(rowSet);

Of course, if we go this route, we might as well leave the print() method in RowSet and have it call the above code.

Just to be clear, the purpose of print() is to help us developers visualize what's going on in Drill; it is not for production use. This is important because, if you can't visualize a batch, you'll make many mistakes because you can't tell what's what.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Thanks, replaced it with RowSetFormatter.print(rowSet);

RowSet result = client.queryBuilder().sql(sql).rowSet();
result.print();
TupleMetadata expectedSchema = new SchemaBuilder()

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

Another change you might consider is renaming either this, or the other SchemaBuilder. Since this is the one we want to use moving forward, maybe rename the other to BatchSchemaBuilder so that future folks don't have to figure out the correct import when the reference the class.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

I went further, removed usage of another SchemaBuilder and removed the class. We already have BatchSchemaBuilder.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 17, 2019

Author Member

I have reverted this change since it caused a lot of tests failures. I'll create a Jira to replace usage of another SchemaBuilder with BatchSchemaBuilder.

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 18, 2019

Contributor

Thanks for trying to fix this. Doing the change as another PR sounds good.

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 20, 2019

Author Member

I have created https://issues.apache.org/jira/browse/DRILL-7354 to address this change.


public class QueryRowSetIterator implements Iterator<DirectRowSet>, Iterable<DirectRowSet> {
private final BufferingQueryEventListener listener;
private int recordCount = 0;
private int batchCount = 0;
QueryId queryId = null;
private QueryId queryId = null;

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 16, 2019

Contributor

As long as you are touching this code, please remove the do-nothing initializers:

private int recordCount;
private int batchCount;
private QueryId queryId;

This comment has been minimized.

Copy link
@vvysotskyi

vvysotskyi Aug 16, 2019

Author Member

Thanks, done.

@arina-ielchiieva

This comment has been minimized.

Copy link
Member

commented Aug 16, 2019

@paul-rogers

I also recall that Arina once suggested removing an anonymous inner class in favor of a declared class.

I doubt this is true :) Maybe thrown out of the context though...

@paul-rogers

This comment has been minimized.

Copy link
Contributor

commented Aug 16, 2019

@arina-ielchiieva, could be my misunderstanding, though I do recall making the change... Point is, both forms are perfectly valid Java; it is just a preference. To avoid misunderstandings we might want to add to our formatting rules.

Again, just to be clear, I have no issue with whatever rules we decide on; but let's decide on a common set to avoid ping-pong format changes as each of us touches the code.

As it turns out, Charles started a discussion on the dev list. Perhaps you can offer there the format rules you've found to work, such as int[] foo; instead of int foo[];, classes-at-bottom, no single-line method declarations, preferred import sort order and so on. Might also include preferences for Java over Guava methods for common operations. That way, we can all follow the best practices you've discovered.

@vvysotskyi

This comment has been minimized.

Copy link
Member Author

commented Aug 16, 2019

The latest change caused tests failures. It is appeared because with new BatchSchemaBuilder TupleMetadata was used more frequently and some code does not set zero scale and precision, but PrimitiveColumnMetadata does. I'll let you know when this issue is fixed.

@vvysotskyi vvysotskyi force-pushed the vvysotskyi:DRILL-7350 branch from 1dd89e7 to 4b19cac Aug 17, 2019

@vvysotskyi

This comment has been minimized.

Copy link
Member Author

commented Aug 17, 2019

I have reverted breaking change, so for now, all tests passed.

@paul-rogers
Copy link
Contributor

left a comment

Thanks for the good work; this will be a nice improvement.

+1

RowSet result = client.queryBuilder().sql(sql).rowSet();
result.print();
TupleMetadata expectedSchema = new SchemaBuilder()

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 18, 2019

Contributor

Thanks for trying to fix this. Doing the change as another PR sounds good.

@@ -129,7 +129,7 @@ public RowSetWriter addRow(Object...values) {

@Override
public RowSetWriter addSingleCol(Object value) {
return addRow(new Object[] {value});
return addRow(value);

This comment has been minimized.

Copy link
@paul-rogers

paul-rogers Aug 18, 2019

Contributor

Thanks for the explanation, makes sense.

@vvysotskyi vvysotskyi force-pushed the vvysotskyi:DRILL-7350 branch from 4b19cac to 64ac753 Aug 18, 2019

@vvysotskyi vvysotskyi merged commit 9c62bf1 into apache:master Aug 19, 2019

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.