Skip to content

Conversation

@rkhachatryan
Copy link
Contributor

What is the purpose of the change

Currently, handling chained entry on copy-on-write uses object
identity to find the wanted entry in the chain. However, if
the same method is running concurrently, the object in the chain
can be replaced by its copy; the condition will never be met and
the chain end will be reached, causing an NPE.

With some tiny changes in timings (i.e. overriding methods of
CopyOnWriteStateMap), StateBackendTestBase.testValueStateRace
fails when running repeatedly (~4 out of 100 runs).

This change replaces object identity with key+namespace equality
in the condition.

The overhead should not be significant because the same check is
already performed to find the element before copying.

Verifying this change

IncrementalHeapStateBackendTest.testValueStateRace in https://github.com/rkhachatryan/flink/tree/flip-151-full

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn/Mesos, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no
  • If yes, how is the feature documented? no

Currently, handling chained entry on copy-on-write uses object
identity to find the wanted entry in the chain. However, if
the same method is running concurrently, the object in the chain
can be replaced by its copy; the condition will never be met and
the chain end will be reached, causing an NPE.

With some tiny changes in timings (i.e. overriding methods of
CopyOnWriteStateMap), StateBackendTestBase.testValueStateRace
fails when running repeatedly (~4 out of 100 runs).

This change replaces object identity with key+namespace equality
in the condition.

The overhead should not be significant because the same check is
already performed to find the element before copying.
@flinkbot
Copy link
Collaborator

flinkbot commented Mar 31, 2022

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@github-actions
Copy link

This PR is being marked as stale since it has not had any activity in the last 180 days.
If you would like to keep this PR alive, please leave a comment asking for a review.
If the PR has merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out to the [community](https://flink.apache.org/what-is-flink/community/).

If this PR is no longer valid or desired, please feel free to close it. If no activity occurs in the next 90 days, it will be automatically closed.

@github-actions github-actions bot added the stale label Jan 15, 2025
@github-actions
Copy link

This PR has been closed since it has not had any activity in 120 days.
If you feel like this was a mistake, or you would like to continue working on it,
please feel free to re-open the PR and ask for a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants