Skip to content

SAMZA-2075: Add execEnvContainerId and execEnvAttemptId for diagnostics for Kubernetes job coordinator#1566

Merged
cameronlee314 merged 2 commits intoapache:masterfrom
cameronlee314:jcattemptid
Dec 7, 2021
Merged

SAMZA-2075: Add execEnvContainerId and execEnvAttemptId for diagnostics for Kubernetes job coordinator#1566
cameronlee314 merged 2 commits intoapache:masterfrom
cameronlee314:jcattemptid

Conversation

@cameronlee314
Copy link
Contributor

@cameronlee314 cameronlee314 commented Dec 2, 2021

Issues: In YARN today, a deployment attempt id which is shared between containers for a single Samza job can be extracted from the exec-env-container-id in the diagnostics information emitted by a job. However, in other execution environments (e.g. Kubernetes), there may not be an exec-env-container-id that can be parsed to get a deployment attempt id. It isn't ideal that this attempt id must be extracted from another field.

Changes:

  1. Add an explicit samza-epoch-id field to MetricsHeader which contains the id for a deployment attempt and emit this field in the diagnostics messages.
  2. Update classes to accept attempt id as an argument or extract it from the SAMZA_EPOCH_ID environment variable.

Testing:

  1. Updated unit tests (including compatibility tests for with and without the extra field)
  2. Deployed a test job in minikube and checked the emitted diagnostics messages for the new attempt id field in the header

API changes (backwards compatible):
JSON representation of MetricsHeader (used in MetricsSnapshot) has an additional optional field called samza-epoch-id, and this field is intended to contain the deployment attempt id which is shared across all containers (job coordinator and workers) of a Samza job. This field is filled in by reading the SAMZA_EPOCH_ID environment variable.

@jia-gao
Copy link
Contributor

jia-gao commented Dec 2, 2021

LGTM, but someone else who has more context on this should take a look and approve

Copy link
Contributor

@rmatharu-zz rmatharu-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@cameronlee314 cameronlee314 merged commit 1daca89 into apache:master Dec 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants