Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BEAM-12277] Add flink 1.13 build target. #14719

Merged
merged 4 commits into from May 28, 2021
Merged

Conversation

dmvk
Copy link
Member

@dmvk dmvk commented May 4, 2021

https://issues.apache.org/jira/browse/BEAM-12277


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Choose reviewer(s) and mention them in a comment (R: @username).
  • Format the pull request title like [BEAM-XXX] Fixes bug in ApproximateQuantiles, where you replace BEAM-XXX with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

ValidatesRunner compliance status (on master branch)

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- Build Status Build Status --- Build Status ---
Java Build Status Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status
Build Status
Build Status
Build Status
Python --- Build Status
Build Status
Build Status
Build Status
Build Status
--- Build Status ---
XLang Build Status Build Status Build Status --- Build Status ---

Examples testing status on various runners

Lang ULR Dataflow Flink Samza Spark Twister2
Go --- --- --- --- --- --- ---
Java --- Build Status
Build Status
Build Status
--- --- --- --- ---
Python --- --- --- --- --- --- ---
XLang --- --- --- --- --- --- ---

Post-Commit SDK/Transform Integration Tests Status (on master branch)

Go Java Python
Build Status Build Status Build Status
Build Status
Build Status

Pre-Commit Tests Status (on master branch)

--- Java Python Go Website Whitespace Typescript
Non-portable Build Status
Build Status
Build Status
Build Status
Build Status
Build Status Build Status Build Status Build Status
Portable --- Build Status Build Status --- --- ---

See .test-infra/jenkins/README for trigger phrase, status and link of all Jenkins jobs.

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests

See CI.md for more information about GitHub Actions CI.

@dmvk
Copy link
Member Author

dmvk commented May 4, 2021

R: @iemejia Would you mind taking a look at this one? I'm not sure if there's anything else needed for portable runner to work 🤔
CC: @ibzib

@codecov
Copy link

codecov bot commented May 4, 2021

Codecov Report

Merging #14719 (881da20) into master (31988c8) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #14719   +/-   ##
=======================================
  Coverage   83.78%   83.78%           
=======================================
  Files         435      435           
  Lines       58423    58423           
=======================================
  Hits        48951    48951           
  Misses       9472     9472           
Impacted Files Coverage Δ
...hon/apache_beam/portability/api/schema_pb2_grpc.py
...xamples/snippets/transforms/elementwise/flatmap.py
...am/examples/snippets/transforms/elementwise/map.py
...hon/apache_beam/runners/worker/worker_pool_main.py
...ild/srcs/sdks/python/apache_beam/utils/__init__.py
.../python/apache_beam/runners/worker/statesampler.py
...rcs/sdks/python/apache_beam/typehints/typehints.py
.../python/apache_beam/testing/test_stream_service.py
...s/snippets/transforms/aggregation/combinevalues.py
...srcs/sdks/python/apache_beam/examples/wordcount.py
... and 860 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 31988c8...881da20. Read the comment docs.

@dmvk
Copy link
Member Author

dmvk commented May 4, 2021

Run Java PreCommit

@iemejia
Copy link
Member

iemejia commented May 4, 2021

Can you please change

static String getFlinkVersion() {
return "1.12"
}
to be 1.13 so we can run the VR tests and other CI tests by default on the new version.
Also can you add 1.13 to the website https://github.com/apache/beam/blob/cc29b5b308b6ed91d787449b8105cc0b292273ac/website/www/site/content/en/documentation/runners/flink.md

@iemejia iemejia self-requested a review May 4, 2021 12:51
@iemejia
Copy link
Member

iemejia commented May 4, 2021

Also add a note on CHANGES.md -> 2.31.0 -> New Features

@iemejia
Copy link
Member

iemejia commented May 4, 2021

and

PUBLISHED_FLINK_VERSIONS = ['1.8', '1.9', '1.10', '1.11', '1.12']
too (extra points if you remove the 1.8 and 1.9 versions that we don't support anymore)

@ibzib
Copy link
Contributor

ibzib commented May 4, 2021

Besides the things Ismaël pointed out, we also need INFRA to create a new docker repo: INFRA-21833

@iemejia
Copy link
Member

iemejia commented May 5, 2021

Docker repo seems to be up now https://hub.docker.com/r/apache/beam_flink1.13_job_server

@iemejia
Copy link
Member

iemejia commented May 7, 2021

retest this please

@iemejia
Copy link
Member

iemejia commented May 7, 2021

Run Flink ValidatesRunner

@iemejia
Copy link
Member

iemejia commented May 7, 2021

Run Java Flink PortableValidatesRunner Batch

@iemejia
Copy link
Member

iemejia commented May 7, 2021

Run Java Flink PortableValidatesRunner Streaming

Copy link
Contributor

@ibzib ibzib left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@iemejia
Copy link
Member

iemejia commented May 8, 2021

Arrghh seems metrics are broken for the Portable runner. Can you maybe what is the deal @ibzib ?
https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Phrase/131/

@ibzib
Copy link
Contributor

ibzib commented May 10, 2021

Arrghh seems metrics are broken for the Portable runner. Can you maybe what is the deal @ibzib ?
https://ci-beam.apache.org/job/beam_PreCommit_Python_PVR_Flink_Phrase/131/

Looks like it's hitting an NPE while unregistering metrics because the taskExecutorManager is null. This seems like a Flink bug to me -- if registers these metrics, they should remain accessible even after the slot manager has been shut down. Getting these metrics should probably return 0 rather than NPE.

May 08, 2021 4:57:46 AM org.apache.flink.runtime.metrics.MetricRegistryImpl unregister
WARNING: Error while unregistering metric: taskSlotsAvailable.
java.lang.NullPointerException
	at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.getNumberFreeSlots(DeclarativeSlotManager.java:715)
	at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.lambda$registerSlotManagerMetrics$2(DeclarativeSlotManager.java:200)
	at org.apache.beam.runners.flink.metrics.Metrics.toString(Metrics.java:33)
	at org.apache.beam.runners.flink.metrics.FileReporter.notifyOfRemovedMetric(FileReporter.java:72)
	at org.apache.flink.runtime.metrics.MetricRegistryImpl.unregister(MetricRegistryImpl.java:435)
	at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.close(AbstractMetricGroup.java:333)
	at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.close(DeclarativeSlotManager.java:241)
	at org.apache.flink.runtime.resourcemanager.ResourceManager.stopResourceManagerServices(ResourceManager.java:298)
	at org.apache.flink.runtime.resourcemanager.ResourceManager.onStop(ResourceManager.java:275)
	at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
	at akka.actor.ActorCell.invoke(ActorCell.scala:561)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
	at akka.dispatch.Mailbox.run(Mailbox.scala:225)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

May 08, 2021 4:57:46 AM org.apache.flink.runtime.metrics.MetricRegistryImpl unregister
WARNING: Error while unregistering metric: taskSlotsTotal.
java.lang.NullPointerException
	at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.getNumberRegisteredSlots(DeclarativeSlotManager.java:705)
	at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.lambda$registerSlotManagerMetrics$3(DeclarativeSlotManager.java:202)
	at org.apache.beam.runners.flink.metrics.Metrics.toString(Metrics.java:33)
	at org.apache.beam.runners.flink.metrics.FileReporter.notifyOfRemovedMetric(FileReporter.java:72)
	at org.apache.flink.runtime.metrics.MetricRegistryImpl.unregister(MetricRegistryImpl.java:435)
	at org.apache.flink.runtime.metrics.groups.AbstractMetricGroup.close(AbstractMetricGroup.java:333)
	at org.apache.flink.runtime.resourcemanager.slotmanager.DeclarativeSlotManager.close(DeclarativeSlotManager.java:241)
	at org.apache.flink.runtime.resourcemanager.ResourceManager.stopResourceManagerServices(ResourceManager.java:298)
	at org.apache.flink.runtime.resourcemanager.ResourceManager.onStop(ResourceManager.java:275)
	at org.apache.flink.runtime.rpc.RpcEndpoint.internalCallOnStop(RpcEndpoint.java:214)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor$StartedState.terminate(AkkaRpcActor.java:563)
	at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleControlMessage(AkkaRpcActor.java:186)
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:26)
	at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:21)
	at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:123)
	at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:21)
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:170)
	at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)
	at akka.actor.Actor$class.aroundReceive(Actor.scala:517)
	at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:225)
	at akka.actor.ActorCell.receiveMessage(ActorCell.scala:592)
	at akka.actor.ActorCell.invoke(ActorCell.scala:561)
	at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:258)
	at akka.dispatch.Mailbox.run(Mailbox.scala:225)
	at akka.dispatch.Mailbox.exec(Mailbox.scala:235)
	at akka.dispatch.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
	at akka.dispatch.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
	at akka.dispatch.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
	at akka.dispatch.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)

@ibzib
Copy link
Contributor

ibzib commented May 10, 2021

Or perhaps these metrics should be unregistered when the slot manager is shut down, before the taskExecutorManager is nulled out.

@dmvk
Copy link
Member Author

dmvk commented May 11, 2021

I don't think this is a regression, I'm pretty sure we've seen this in prior versions too. Maybe we can pull this into separate issue?

@dmvk
Copy link
Member Author

dmvk commented May 11, 2021

I think this boils down to whether we want to wait for the issue being fixed in Flink or we want a "quick-win" solution 🤔

Maybe we can temporarily work around this in FileReporter, before fixing the issue upstream?

@iemejia
Copy link
Member

iemejia commented May 12, 2021

Run Flink ValidatesRunner

@iemejia
Copy link
Member

iemejia commented May 12, 2021

Run Java Flink PortableValidatesRunner Batch

@iemejia
Copy link
Member

iemejia commented May 12, 2021

Run Java Flink PortableValidatesRunner Streaming

@dmvk
Copy link
Member Author

dmvk commented May 12, 2021

issue with PR on the flink's side - https://issues.apache.org/jira/browse/FLINK-22646

@iemejia
Copy link
Member

iemejia commented May 12, 2021

Run Flink ValidatesRunner

@iemejia
Copy link
Member

iemejia commented May 12, 2021

Run Java Flink PortableValidatesRunner Batch

@iemejia
Copy link
Member

iemejia commented May 12, 2021

Run Java Flink PortableValidatesRunner Streaming

@ibzib
Copy link
Contributor

ibzib commented May 12, 2021

Looks like the metric name formatting changed, I added a commit to update it.

@iemejia
Copy link
Member

iemejia commented May 12, 2021

Run Flink ValidatesRunner

@iemejia
Copy link
Member

iemejia commented May 12, 2021

Run Java Flink PortableValidatesRunner Batch

@iemejia
Copy link
Member

iemejia commented May 12, 2021

Run Java Flink PortableValidatesRunner Streaming

@iemejia
Copy link
Member

iemejia commented May 15, 2021

Run Java PreCommit

@dmvk
Copy link
Member Author

dmvk commented May 18, 2021

@iemejia @ibzib do you think we can still make it into 2.30.x release?

@iemejia
Copy link
Member

iemejia commented May 18, 2021

i think there is not an RC1 yet so maybe we can convince the release manager to get it in, but well in that case we got to solve the still open jacoco issue to get this into master first.

@aaltay
Copy link
Member

aaltay commented May 27, 2021

What is the next step on this PR?

I would not recommend cherry picking this to the current release.

@ibzib
Copy link
Contributor

ibzib commented May 27, 2021

What is the next step on this PR?

I would not recommend cherry picking this to the current release.

As far as I know there's one outstanding issue, the Java precommit failure with Jacoco. We won't cherry pick this. Maybe 2.31.0.

@ibzib
Copy link
Contributor

ibzib commented May 27, 2021

Looks like it was just a typo in the source overrides. I added another commit to fix it.

@dmvk can you rebase this?

@ibzib
Copy link
Contributor

ibzib commented May 28, 2021

Run Java PreCommit

@iemejia
Copy link
Member

iemejia commented May 28, 2021

I rebased it (and push forced it) since the conflict was on my commit. This should be good now!

@iemejia
Copy link
Member

iemejia commented May 28, 2021

Run Java PreCommit

@iemejia
Copy link
Member

iemejia commented May 28, 2021

Run Flink ValidatesRunner

@iemejia
Copy link
Member

iemejia commented May 28, 2021

Run Java Flink PortableValidatesRunner Batch

@iemejia
Copy link
Member

iemejia commented May 28, 2021

Run Java Flink PortableValidatesRunner Streaming

@iemejia
Copy link
Member

iemejia commented May 28, 2021

Run Java Spark PortableValidatesRunner Batch

@iemejia iemejia merged commit 00eb420 into apache:master May 28, 2021
@iemejia
Copy link
Member

iemejia commented May 28, 2021

Failing test is a well know flake. Thanks @ibzib for unblocking this one and of course @dmvk for bringing it in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants