Skip to content

[FLINK-11787] Update Kubernetes resources: workaround to make TM reachable from JM in Kubernetes#7858

Closed
1u0 wants to merge 1 commit intoapache:release-1.7from
1u0:flink-11127-workaround
Closed

[FLINK-11787] Update Kubernetes resources: workaround to make TM reachable from JM in Kubernetes#7858
1u0 wants to merge 1 commit intoapache:release-1.7from
1u0:flink-11127-workaround

Conversation

@1u0
Copy link
Contributor

@1u0 1u0 commented Feb 28, 2019

What is the purpose of the change

  • This PR adds configuration update (Kubernetes resource definitions) as workaround for [FLINK-11127] for Flink 1.7 release branch. With this change, the TMs should be advertising themselves to JM using their k8s pods' ip addresses instead of (by default) hostnames.

NB: this patch is supposed only for Flink 1.7 release branch. Flink 1.8 and higher have introduced a new configuration option to mitigate this (taskmanager.network.bind-policy).

Brief change log

  • Example Kubernetes resources are updated to use pods ip address as TMs host address

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

  • Manually verified the change by running a cluser with 1 JobManagers and 2 TaskManagers in a minikube.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (yes / no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (yes / no)
  • The serializers: (yes / no / don't know)
  • The runtime per-record code paths (performance sensitive): (yes / no / don't know)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (yes / no / don't know)
  • The S3 file system connector: (yes / no / don't know)

Documentation

  • Does this pull request introduce a new feature? (yes / no)
  • If yes, how is the feature documented? (not applicable / docs / JavaDocs / not documented)

@flinkbot
Copy link
Collaborator

flinkbot commented Feb 28, 2019

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ✅ 1. The [description] looks good.
  • ✅ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ✅ 4. The change fits into the overall [architecture].
  • ✅ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

Details

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot approve description to approve the 1st aspect (similarly, it also supports the consensus, architecture and quality keywords)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval

@tillrohrmann
Copy link
Contributor

Thanks for opening this PR @1u0. I think we should create a separate issue for this workaround since this PR does not solve FLINK-11127 which has the goal to reverse the connection pattern of the MetricFetcher and the MetricQueryServices.

@1u0
Copy link
Contributor Author

1u0 commented Feb 28, 2019

@tillrohrmann thanks for comment.
Yes, I agree that it doesn't address a proper fix of FLINK-11127, but just a (temporal) workaround.
I was not sure if in such cases the convention is to open an another ticket. I can create a new ticket if so.

Regarding this PR, should I close it and reopen a new one (under the new ticket) OR just update it and re-link (to the new ticket)?

…hable from JM in Kubernetes (for Flink 1.7 only)
@1u0 1u0 force-pushed the flink-11127-workaround branch from 10277b1 to 4a4fd22 Compare February 28, 2019 13:20
@1u0 1u0 changed the title [FLINK-11127] Update Kubernetes resources: workaround to make TM reachable from JM in Kubernetes [FLINK-11787] Update Kubernetes resources: workaround to make TM reachable from JM in Kubernetes Feb 28, 2019
@1u0
Copy link
Contributor Author

1u0 commented Feb 28, 2019

Update: I've changed the PR title and commit message to point to the new [FLINK-11787] ticket.

Copy link
Contributor

@tillrohrmann tillrohrmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @1u0. LGTM. Merging into release-1.7 branch.

@tillrohrmann
Copy link
Contributor

@flinkbot approve all

@tillrohrmann
Copy link
Contributor

Manually merged.

asfgit pushed a commit that referenced this pull request Feb 28, 2019
…hable from JM in Kubernetes (for Flink 1.7 only)

This closes #7858.
@1u0 1u0 deleted the flink-11127-workaround branch February 28, 2019 15:46
@tarmazakov
Copy link

Hi, thank you for this WA.

I downloaded the flink.tgz from
http://apache-mirror.rbc.ru/pub/apache/flink/flink-1.7.2/flink-1.7.2-bin-scala_2.12.tgz
that corresponding docker image flink:latest (flink:1.7)

You pass 2 args to the docker-entrypoint.sh:

args:
        - taskmanager
        - "-Dtaskmanager.host=$(K8S_POD_IP)"

But in docker-entrypoint.sh you did't pass argument -Dtaskmanager.host.. to the taskmanager.sh:
exec $(drop_privs_cmd) "$FLINK_HOME/bin/taskmanager.sh" start-foreground

Tell me how it will work?

@tillrohrmann
Copy link
Contributor

The script needs to be updated to call exec $(drop_privs_cmd) "$FLINK_HOME/bin/taskmanager.sh" start-foreground "$@". Do you want to open a PR for this feature @tarmazakov.

@tarmazakov
Copy link

If you add "$@", then the taskmanager argument will be passed to taskmanager.sh, which is passed to docker-entrypoint.sh earlier. I think you need to add to the script

docker-entrypoint.sh

...
ARGS=("${@:2}")
...
elif [ "$1" = "taskmanager" ]; then
  ...
  exec $(drop_privs_cmd) "$FLINK_HOME/bin/taskmanager.sh" start-foreground "${ARGS[@]}"
fi

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants