Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAMZA-2439: Remove LocalityManager and container location information from JobModel #1421

Merged
merged 7 commits into from
Aug 27, 2020

Conversation

mynameborat
Copy link
Contributor

@mynameborat mynameborat commented Aug 21, 2020

Issues
Currently locality information is part of job model. Job model typically is immutable and fixed within the lifecycle of an application attempt. The locality information on the other hand is dynamic and changes in the event of container movements. Due to this difference, it makes it complicated to program, model or define semantics around these models when building features. Furthermore, by removing this dependency

  1. Enables us to move JobModel to public APIs and expose it in JobContext
  2. Enables us to cache and serve serialized JobModel from the AM servlet to reduce AM overhead (memory, open connections, num threads) during container startup, esp. for jobs with a large number of containers (See: SAMZA-2424: AM should cache and serve serialized Job Model to containers #1241)
  3. Removes tech debt: models should be immutable, and should not update themselves.
  4. Removes tech debt: makes current container location a first class concept for container scheduling / placement , and for tools like dashboard, samza-rest, auto-scaling, diagnostics etc.

Changes

  1. Separated out locality information out of job model into LocalityModel
  2. Introduced an endpoint in AM to serve locality information
  3. Added Json MixIns for locality models (LocalityModel & ContainerLocality)
  4. Moved JobModel to samza-api and exposed through JobContext

Tests

  1. Added tests for new servlet
  2. Modified existing tests to reflect the refactor
  3. Deployed the new servlet and verified the locality information is accessible

API Changes:

  1. Introduced new models for locality.
  2. Previous job model endpoint will no longer serve locality information. i.e. tools using these will need to update to use the new endpoint; refer usage instructions for details.
  3. Expose JobModel via JobContext

Upgrade Instructions: None. Refer to the API changes & the usage instructions below to upgrade your tooling if applicable.

Usage Instructions: The new locality information is served under am endpoint within locality sub page. Tooling that used the AM endpoint to fetch locality information will need to be updated as follows.

The endpoint supports two types of queries

  1. Querying for locality information of the entire job. It can be done by hitting the GET <am-endpoint>/locality. A sample response will look like the following
{
  container-localities: {
    0: {
      id: "0",
      host: "mynameborat-host",
      jmx-url: "",
      jmx-tunneling-url: ""
    }
  }
}
  1. Querying for specific processor locality information. It can be done by specifying the processorId in the request. e.g. GET <am-enpoint>/locality?processorId=x. A sample response will look like the following
{
  id: "0",
  host: "mynameborat-host",
  jmx-url: "",
  jmx-tunneling-url: ""
}

@prateekm
Copy link
Contributor

Looks pretty good at a high level. Left some minor comments, but let's get a second pair of eyes on this for a detailed review. Thanks for this!

Copy link
Contributor

@Sanil15 Sanil15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a first pass, let's resolve those then will take a look at tests closely

@mynameborat
Copy link
Contributor Author

Did a first pass, let's resolve those then will take a look at tests closely

Thanks for the review. Addressed the feedback as discussed offline around processorId & containerId. Renamed the ContainerLocality to ProcessorLocality for clarity and consistency.

Copy link
Contributor

@rmatharu-zz rmatharu-zz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lg, left some comments

Copy link
Contributor

@Sanil15 Sanil15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix the tests and add todo and Ship It!

@mynameborat mynameborat merged commit f7f9f3c into apache:master Aug 27, 2020
kw2542 pushed a commit to kw2542/samza that referenced this pull request Aug 27, 2020
… from JobModel (apache#1421)

Issues
Currently locality information is part of job model. Job model typically is immutable and fixed within the lifecycle of an application attempt. The locality information on the other hand is dynamic and changes in the event of container movements. Due to this difference, it makes it complicated to program, model or define semantics around these models when building features. Furthermore, by removing this dependency

- Enables us to move JobModel to public APIs and expose it in JobContext
- Enables us to cache and serve serialized JobModel from the AM servlet to reduce AM overhead (memory, open connections, num threads) during container startup, esp. for jobs with a large number of containers (See: apache#1241)
- Removes tech debt: models should be immutable, and should not update themselves.
- Removes tech debt: makes current container location a first class concept for container scheduling / placement , and for tools like dashboard, samza-rest, auto-scaling, diagnostics etc.

Changes
- Separated out locality information out of job model into LocalityModel
- Introduced an endpoint in AM to serve locality information
- Added Json MixIns for locality models (LocalityModel & ContainerLocality)
- Moved JobModel to samza-api and exposed through JobContext

API Changes:
- Introduced new models for locality.
- Previous job model endpoint will no longer serve locality information. i.e. tools using these will need to update to use the new endpoint.
- Expose JobModel via JobContext
@mynameborat mynameborat deleted the locality-refactor branch September 2, 2020 06:07
MabelYC pushed a commit to MabelYC/samza that referenced this pull request Sep 14, 2020
… from JobModel (apache#1421)

Issues
Currently locality information is part of job model. Job model typically is immutable and fixed within the lifecycle of an application attempt. The locality information on the other hand is dynamic and changes in the event of container movements. Due to this difference, it makes it complicated to program, model or define semantics around these models when building features. Furthermore, by removing this dependency

- Enables us to move JobModel to public APIs and expose it in JobContext
- Enables us to cache and serve serialized JobModel from the AM servlet to reduce AM overhead (memory, open connections, num threads) during container startup, esp. for jobs with a large number of containers (See: apache#1241)
- Removes tech debt: models should be immutable, and should not update themselves.
- Removes tech debt: makes current container location a first class concept for container scheduling / placement , and for tools like dashboard, samza-rest, auto-scaling, diagnostics etc.

Changes
- Separated out locality information out of job model into LocalityModel
- Introduced an endpoint in AM to serve locality information
- Added Json MixIns for locality models (LocalityModel & ContainerLocality)
- Moved JobModel to samza-api and exposed through JobContext

API Changes:
- Introduced new models for locality.
- Previous job model endpoint will no longer serve locality information. i.e. tools using these will need to update to use the new endpoint.
- Expose JobModel via JobContext
lakshmi-manasa-g pushed a commit to lakshmi-manasa-g/samza that referenced this pull request Feb 9, 2021
… from JobModel (apache#1421)

Issues
Currently locality information is part of job model. Job model typically is immutable and fixed within the lifecycle of an application attempt. The locality information on the other hand is dynamic and changes in the event of container movements. Due to this difference, it makes it complicated to program, model or define semantics around these models when building features. Furthermore, by removing this dependency

- Enables us to move JobModel to public APIs and expose it in JobContext
- Enables us to cache and serve serialized JobModel from the AM servlet to reduce AM overhead (memory, open connections, num threads) during container startup, esp. for jobs with a large number of containers (See: apache#1241)
- Removes tech debt: models should be immutable, and should not update themselves.
- Removes tech debt: makes current container location a first class concept for container scheduling / placement , and for tools like dashboard, samza-rest, auto-scaling, diagnostics etc.

Changes
- Separated out locality information out of job model into LocalityModel
- Introduced an endpoint in AM to serve locality information
- Added Json MixIns for locality models (LocalityModel & ContainerLocality)
- Moved JobModel to samza-api and exposed through JobContext

API Changes:
- Introduced new models for locality.
- Previous job model endpoint will no longer serve locality information. i.e. tools using these will need to update to use the new endpoint.
- Expose JobModel via JobContext
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants