[Baseline] Apply Baseline to iceberg-data #156 #198

rdsr · 2019-05-28T14:37:11Z

No description provided.

rdsr · 2019-05-28T14:48:25Z

data/src/test/java/org/apache/iceberg/data/RandomGenericData.java

@@ -229,7 +231,8 @@ private static Object generatePrimitive(Type.PrimitiveType primitive, Random ran

      case DATE:
        // this will include negative values (dates before 1970-01-01)
-        return EPOCH_DAY.plusDays(random.nextInt() % ABOUT_380_YEARS_IN_DAYS);


If we do not need the negative days here, we can slightly simplify the exp from (random.nextBoolean() ? 1 : -1) * random.nextInt(ABOUT_380_YEARS_IN_DAYS) to random.nextInt(ABOUT_380_YEARS_IN_DAYS)

Would it be easier to simply suppress RandomModInteger? I think this what was done in RandomAvroData and that's what I did locally in iceberg-spark.

+1. Lets keep it consistent

aokolnychyi · 2019-06-01T13:12:12Z

data/src/main/java/org/apache/iceberg/data/IcebergGenerics.java

@@ -65,8 +65,8 @@ public ScanBuilder caseInsensitive() {
      return this;
    }

-    public ScanBuilder select(String... columns) {
-      this.columns = ImmutableList.copyOf(columns);
+    public ScanBuilder select(String... selectColumns) {


What about selectedColumns? I believe it is frequently used throughout the project.

aokolnychyi · 2019-06-01T13:14:01Z

data/src/main/java/org/apache/iceberg/data/avro/DataReader.java

@@ -58,8 +58,8 @@ private DataReader(Schema readSchema) {
  }

  @Override
-  public void setSchema(Schema fileSchema) {
-    this.fileSchema = Schema.applyAliases(fileSchema, readSchema);
+  public void setSchema(Schema schema) {


I believe in previous PRs @mccheah frequently used fileSchema -> newFileSchema type of renames to avoid hiding fields in builders. Would it make sense to make it consistent?

Will update

aokolnychyi · 2019-06-01T13:21:06Z

data/src/main/java/org/apache/iceberg/data/avro/GenericReaders.java

@@ -128,11 +128,13 @@ protected Record reuseOrCreate(Object reuse) {
    }

    @Override
+    @SuppressWarnings("checkstyle:hiddenField")


I know this rule is a bit controversial but I think it is better to follow it everywhere or ignore it globally. In this particular case, it seems appropriate to follow and, maybe, give another name to the field. It is not really clear the difference between the field and passed object.

You are right. I didn't look into it closely enough. I was thinking that baseline was mistakenly identifying this as a hidden field when non was present

aokolnychyi · 2019-06-01T13:24:59Z

data/src/main/java/org/apache/iceberg/data/parquet/GenericParquetReaders.java

@@ -278,6 +277,10 @@ private GenericParquetReaders() {
      }
    }

+    MessageType getType() {


I am not sure, but I believe Iceberg doesn't use get in getters.

aokolnychyi · 2019-06-01T13:31:09Z

data/src/test/java/org/apache/iceberg/data/RandomGenericData.java

@@ -229,7 +231,8 @@ private static Object generatePrimitive(Type.PrimitiveType primitive, Random ran

      case DATE:
        // this will include negative values (dates before 1970-01-01)
-        return EPOCH_DAY.plusDays(random.nextInt() % ABOUT_380_YEARS_IN_DAYS);


Would it be easier to simply suppress RandomModInteger? I think this what was done in RandomAvroData and that's what I did locally in iceberg-spark.

rdsr · 2019-06-01T17:23:13Z

Thanks @aokolnychyi for the comments. I've addressed all.

aokolnychyi · 2019-06-01T18:09:13Z

Thanks, @rdsr! There are two more places with hiddenField. I am not sure about the best naming there. Otherwise, LGTM.

mccheah

Looks really close! I'm fine with +1ing this and merging after these comments. @rdblue to sign off when ready.

mccheah · 2019-06-08T00:54:03Z

data/src/main/java/org/apache/iceberg/data/GenericRecord.java

@@ -104,7 +100,7 @@ public Object getField(String name) {
  @Override
  public void setField(String name, Object value) {
    Integer pos = nameToPos.get(name);
-    Preconditions.checkArgument(pos != null, "Cannot set unknown field named: " + name);
+    Preconditions.checkArgument(pos != null, "Cannot set unknown field named: %s", name);


We can probably use checkNotNull here?

mccheah · 2019-06-08T00:55:12Z

data/src/main/java/org/apache/iceberg/data/avro/IcebergDecoder.java

@@ -190,6 +190,7 @@ public D decode(InputStream stream, D reuse) {
   * @return true if the buffer is complete, false otherwise (stream ended)
   * @throws IOException if there is an error while reading
   */
+  @SuppressWarnings("checkstyle:InnerAssignment")


Curious, is there a way we can get around this without suppressing?

I tried it, but there wasn't a good way without changing the familiar idiom of stream reading.

This suppression looks okay to me.

rdblue · 2019-06-08T21:11:23Z

data/src/main/java/org/apache/iceberg/data/avro/GenericReaders.java

@@ -111,19 +111,19 @@ public OffsetDateTime read(Decoder decoder, Object reuse) throws IOException {
  }

  private static class GenericRecordReader extends ValueReaders.StructReader<Record> {
-    private final StructType struct;
+    private final StructType record;


What about structType instead? In the context of a record reader, record sounds like it would not be a type.

Makes sense!

rdblue · 2019-06-08T21:14:07Z

data/src/test/java/org/apache/iceberg/TestSplitScan.java

+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.


rdblue · 2019-06-08T21:15:46Z

data/src/test/java/org/apache/iceberg/TestSplitScan.java

@@ -60,7 +79,7 @@ public TestSplitScan(String format) {
  }

  @Before
-  public void setup() throws IOException {
+  public void before() throws IOException {


What rule required this change? I usually like to have descriptive names for @Before and @After methods. Is the name required to be before? (Granted, the original name, setup wasn't very descriptive either.)

It seems that I can name it anything other than setup/teardown. Below is the error

Task :iceberg-data:checkstyleTest
[ant:checkstyle] [ERROR] /Users/rratti/code/iceberg/data/src/test/java/org/apache/iceberg/TestSplitScan.java:82: Test setup/teardown methods are called before(), beforeClass(), after(), afterClass(), but not setUp, teardown, etc. [RegexpSinglelineJava]

data/src/main/java/org/apache/iceberg/data/GenericRecord.java

rdblue · 2019-06-08T21:20:28Z

I found a couple of minor problems, but overall it looks good. Once the last couple things are fixed, I'll merge it. Thanks for working on this!

data/src/main/java/org/apache/iceberg/data/parquet/GenericParquetReaders.java

rdsr added 2 commits May 28, 2019 00:13

[Baseline] Apply Baseline to iceberg-data apache#156

20cfd1e

[Baseline] Fix random usage

1c79240

rdsr commented May 28, 2019

View reviewed changes

aokolnychyi reviewed Jun 1, 2019

View reviewed changes

Address commits

452f09a

Address comments (take 2)

cdd9d88

mccheah suggested changes Jun 8, 2019

View reviewed changes

Address comments (take 3)

df8ce7d

rdblue reviewed Jun 8, 2019

View reviewed changes

data/src/main/java/org/apache/iceberg/data/GenericRecord.java Outdated Show resolved Hide resolved

Address comment (take 4)

611b0ec

rdblue reviewed Jun 10, 2019

View reviewed changes

data/src/main/java/org/apache/iceberg/data/parquet/GenericParquetReaders.java Outdated Show resolved Hide resolved

Address comment (take 5)

f028288

rdblue merged commit 4691809 into apache:master Jun 11, 2019

rdblue mentioned this pull request Oct 9, 2019

[Baseline] Apply Baseline to iceberg-data #156

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Baseline] Apply Baseline to iceberg-data #156 #198

[Baseline] Apply Baseline to iceberg-data #156 #198

rdsr commented May 28, 2019

rdsr May 28, 2019

aokolnychyi Jun 1, 2019

rdsr Jun 1, 2019

aokolnychyi Jun 1, 2019

rdsr Jun 1, 2019

aokolnychyi Jun 1, 2019

rdsr Jun 1, 2019

aokolnychyi Jun 1, 2019 •

edited

rdsr Jun 1, 2019

aokolnychyi Jun 1, 2019

aokolnychyi Jun 1, 2019

rdsr commented Jun 1, 2019

aokolnychyi commented Jun 1, 2019 •

edited

mccheah left a comment

mccheah Jun 8, 2019

mccheah Jun 8, 2019

rdsr Jun 8, 2019

rdblue Jun 8, 2019

rdblue Jun 8, 2019

rdsr Jun 8, 2019

rdblue Jun 8, 2019

rdblue Jun 8, 2019

rdsr Jun 8, 2019

rdblue commented Jun 8, 2019

[Baseline] Apply Baseline to iceberg-data #156 #198

[Baseline] Apply Baseline to iceberg-data #156 #198

Conversation

rdsr commented May 28, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aokolnychyi Jun 1, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdsr commented Jun 1, 2019

aokolnychyi commented Jun 1, 2019 • edited

mccheah left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rdblue commented Jun 8, 2019

aokolnychyi Jun 1, 2019 •

edited

aokolnychyi commented Jun 1, 2019 •

edited