Skip to content

Conversation

@twalthr
Copy link
Contributor

@twalthr twalthr commented May 13, 2020

What is the purpose of the change

This fixes the following shortcomings in the new data structures:

  • The some data structures do not provide a hashCode/equals for testing.
  • RawValueData cannot be created from bytes.
  • Accessing elements requires dealing with logical types during runtime.
  • Null checks are performed multiple times during runtime even for types that are declared as NOT NULL.

So far these changes are not used. This will change in FLINK-16999.

Brief change log

Adds hashCode/equals and specialized runtime instances by type for getting and setting values.

Verifying this change

Tests are following in FLINK-16999.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): yes
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): yes
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? JavaDocs

@twalthr twalthr changed the title Flink 17668 [FLINK-17668][table] Fix shortcomings in new data structures May 13, 2020
@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit 3ce9117 (Wed May 13 13:54:46 UTC 2020)

Warnings:

  • No documentation files were touched! Remember to keep the Flink docs up to date!
  • Invalid pull request title: No valid Jira ID provided

Mention the bot in a comment to re-run the automated checks.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details
The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

@flinkbot
Copy link
Collaborator

flinkbot commented May 13, 2020

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run travis re-run the last Travis build
  • @flinkbot run azure re-run the last Azure build

Copy link
Contributor

@dawidwys dawidwys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I would be really interested to see how this affects performance.

case STRUCTURED_TYPE:
case RAW:
// long, double is 8 bytes.
// It store the length and offset of variable-length part when type is string, map, etc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// It store the length and offset of variable-length part when type is string, map, etc.
// It stores the length and offset of the variable-length part for types such as string, map, etc.

GenericArrayData that = (GenericArrayData) o;
return size == that.size &&
isPrimitiveArray == that.isPrimitiveArray &&
Arrays.deepEquals(new Object[] {array}, new Object[]{that.array});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to use Arrays.deepEquals for GenericRowData.equals?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I think we need that. GenericRowData may contains byte[] which should use Arrays.deepEquals.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I revisited all GenericXXX classes again for hashCode and equals.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great work! The changes look good to me. Except GenericRowData#equals/hashcode mentioned by Benchao.

I'm +1 to use the ElementGetter and FieldGetter. But, we should still encourage deveploers to use the getXxx because this can avoid boxing and unboxing. At least, connectors in Flink repository should use getXxxx. That's also why we create FLINK-17528.

* @return the element object at the specified position in this array data
* @deprecated Use {@link #createElementGetter(LogicalType)} for a more efficient hot path.
*/
@Deprecated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create an issue to use createElementGetter to replace get utility and remove this method? I think we should provide a clean interface before release.

@wuchong
Copy link
Member

wuchong commented May 14, 2020

Btw, I have a simple benchmark to compare the performance of get utitilty and the get accessor.

The result shows get accessor is slightly worse than get utility:

# Run complete. Total time: 00:03:36

Benchmark                   Mode  Cnt     Score     Error   Units
RowBenchmark.testAccessor  thrpt   20  4656.362 ± 355.096  ops/ms
RowBenchmark.testUtility   thrpt   20  4933.103 ± 655.208  ops/ms

Here is the code, maybe the virtual function call is the reason. Or I did something wrong and missed something.
https://github.com/wuchong/my-benchmark/blob/master/src/main/java/myflink/RowBenchmark.java

throw new IllegalArgumentException();
}
if (!elementType.isNullable()) {
return elementGetter;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the nullable attribute reliable now ?

Copy link
Member

@wuchong wuchong May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I have to say it's tricky to guarantee the NOT NULL attributes in runtime, especially in streaming. AFAIK, StreamExecExpand will produce null values even if the field is NOT NULL.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of locations ignore nullability. We should gradually fix this. By default a type is nullable, so the check is performed in most of the cases. The implementation of the method is correct (given its input parameters), if there is a problem it should be the callers task.

Copy link
Member

@wuchong wuchong May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think we should be cautious here. As you said, most of the cases are nullable, so we don't get much performance improvement from this. However, it may occure unexpected bugs if it is called in planner. We can add a TODO on this. When we fix all the nullable problem, we can update this without breaking compatibility.

if there is a problem it should be the callers task.

The problem is planner will also call this method, and I'm sure there are cases that null values exist in NOT NULL fields, then an exception will happen. This will be a bug then.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no strong guarantee that our runtime would not generate null value for a not null field based on current codebase, and if there was, the bug would probably hard to trace out.

How much performance can we gain with the type nullable short-cut check ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far only the converters will call this method. Let's fix upstream bugs gradually, once code is updated to this method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not only about performance but also correctness. A caller assumes the behavior when passing a nullable/not nullable type. It is harder to trace where a null comes from than a method that throws a hard exception when it encounters an unexpected null.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The nullble problems are known issues but can't be fixed easily, they may need major rework (e.g. Expand node). I don't think we should let the queries failed.

If we want to keep this logic, then I would suggest only use this method in connectors, because we guarantee the NOT NULL constraint when writing values to sink (table.exec.sink.not-null-enforcer). But in planner, we shouldn't use this method now, because the NOT NULL information of field type is not trusted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can add this information to the issue that removes the RowData.get method. So far the new method is only used in converters.

@twalthr
Copy link
Contributor Author

twalthr commented May 14, 2020

Thanks for sharing your benchmark results @wuchong. I also performed additional benchmarks. What I didn't like in your branch is the little data size, I don't know if the JIT compiler can already kick in. I tried to perform a more realistic benachmark on the current branch. See:
https://github.com/twalthr/flink/tree/FLINK-16999_BENCHMARKS

The results didn't make a big difference:

# Run complete. Total time: 00:09:56
Benchmark                         Mode  Cnt  Score    Error   Units
RowAccessBenchmark.testAccessor  thrpt   50  0.003 ±  0.001  ops/ms
RowAccessBenchmark.testGetter    thrpt   50  0.003 ±  0.001  ops/ms

Comment on lines 88 to 89
return Arrays.deepEquals(map.keySet().toArray(), that.map.keySet().toArray()) &&
Arrays.deepEquals(map.values().toArray(), that.map.values().toArray());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this. Arrays.deepEquals will compare elements in order. However, map.keySet() doesn't guarentee order. Maybe we can simply use Objects.equals(map, that.map) here. Because byte[] shouldn't be as map key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, we can do the key check differently. But for values we need to include byte[] as well.

Copy link
Member

@wuchong wuchong May 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True. Maybe we need to foreach entries and get values from another map using the entry key.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I will just copy the implementation of HashMap.equals/HashMap.hashCode and replace the equals call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Member

@wuchong wuchong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating. LGTM.

@twalthr twalthr closed this in 96a6337 May 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants