Skip to content

SAMZA-2574 : improve flexibility of SystemFactory interface#1420

Closed
MabelYC wants to merge 17 commits intoapache:masterfrom
MabelYC:addAdminLabelSupport
Closed

SAMZA-2574 : improve flexibility of SystemFactory interface#1420
MabelYC wants to merge 17 commits intoapache:masterfrom
MabelYC:addAdminLabelSupport

Conversation

@MabelYC
Copy link
Contributor

@MabelYC MabelYC commented Aug 20, 2020

This is second pr to supplement commit: 4d97a8d

Changes: Added addition label info to SystemAdmin to help user provide more information to SystemFactory.
API Changes: Added several new functions to help provide more parameter. Won't impact current users.
Upgrade Instructions: None
Usage Instructions: None

.stream()
.collect(Collectors.toMap(Entry::getKey,
systemNameToFactoryEntry -> systemNameToFactoryEntry.getValue()
.getAdmin(systemNameToFactoryEntry.getKey(), this, adminLabel)));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you also change the other getSystemAdmins() to use this new one with the label = ""

* @return SystemAdmin of the system if it exists, otherwise null.
*/
public SystemAdmin getSystemAdmin(String systemName, String adminLabel) {
return getSystemAdmins(adminLabel).get(systemName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand this is an existing code, but it seems weird that we create ALL admins EVERY time just to get a single one.
Either we should create a single Admin, or we can cache the results of the previous getAdmins().

* @param adminLabel a string to provide info the admin instance.
* @return map of system name to {@link SystemAdmin}
*/
public Map<String, SystemAdmin> getSystemAdmins(String adminLabel) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we change implementation of getSystemAdmins() to {
return getSystemAdmins("").
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is updated. The version you reviewed is the outdated one. Please see: https://github.com/apache/samza/pull/1420/files#diff-2063da999ce81031b6015c31991f60c4R100

@@ -89,11 +98,14 @@ public List<String> getSystemNames() {
* @return map of system name to {@link SystemAdmin}
*/
public Map<String, SystemAdmin> getSystemAdmins() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is better to pass the admin label here in getSystemAdmins()?

Sanil15 and others added 12 commits September 14, 2020 11:51
Changes: The current restart ability for container placements works in the following way:

Tries to fetch resources on a host
Stops the active container if resources are accrued
Tried to start the container on host accrued
In production, we have seen the following observation at Linkedin

Some jobs are configured to use resources for the peak which leads to no headroom left on a host for requesting additional resources
This leads to restart requests failing due to not able to get resources on that host
A fix to this is to implement a force-restart utility , in this version we will stop the container first and then accrue resources. The upside being we will at least free up the resources on the host before issuing resource request, the downside being it will be a best-effort scenario to bring that container back up on that host

API Changes: Added new param values to destinationHost param for container placement request message

LAST_SEEN: Tries to restart a container on last seen host with RESERVE -> STOP -> MOVE policy

FORCE_RESTART_LAST_SEEN: Tries to restart a container on last seen host with STOP -> RESERVE -> MOVE policy
… from JobModel (apache#1421)

Issues
Currently locality information is part of job model. Job model typically is immutable and fixed within the lifecycle of an application attempt. The locality information on the other hand is dynamic and changes in the event of container movements. Due to this difference, it makes it complicated to program, model or define semantics around these models when building features. Furthermore, by removing this dependency

- Enables us to move JobModel to public APIs and expose it in JobContext
- Enables us to cache and serve serialized JobModel from the AM servlet to reduce AM overhead (memory, open connections, num threads) during container startup, esp. for jobs with a large number of containers (See: apache#1241)
- Removes tech debt: models should be immutable, and should not update themselves.
- Removes tech debt: makes current container location a first class concept for container scheduling / placement , and for tools like dashboard, samza-rest, auto-scaling, diagnostics etc.

Changes
- Separated out locality information out of job model into LocalityModel
- Introduced an endpoint in AM to serve locality information
- Added Json MixIns for locality models (LocalityModel & ContainerLocality)
- Moved JobModel to samza-api and exposed through JobContext

API Changes:
- Introduced new models for locality.
- Previous job model endpoint will no longer serve locality information. i.e. tools using these will need to update to use the new endpoint.
- Expose JobModel via JobContext
)

Changes: Modified the shutdown sequence inside YarnClusterResourceManager in order to bring down the AM when the NM on which it resides dies. This prevents the existence of orphaned AMs.
API Changes: None
Tests: Tested the change by manually bringing down the NM on which the AM resided with a yarn job deploy and LXC
Upgrade Instructions: None
Usage Instructions: None
Issues: In the deployment flow of a beam job, we will have a complicate flow: ClusterBasedJobCoordinator#main -> Beam main class -> JobCoordinatorLaunchUtil -> ClusterBasedJobCoordinator.
Changes:
  1. Move ClusterBasedJobCoordinator#main to ClusterBasedJobCoordinatorRunner#main
  2. Update run-jc.sh to invoke ClusterBasedJobCoordinatorRunner
Tests:
  1. unit tests
  2. Deployed hello samza job successfully with the change following instructions on http://samza.apache.org/startup/hello-samza/latest/
API Changes: None
Upgrade Instructions: None
Usage Instructions: None
Issues: app.runner.class config is hard coded across multple places in Samza.

Changes:
1. Move ApplicationRunners.java from samza-api to samza-core module while the package name stays.
2. Add app.runner.class and getAppRunnerClass method to ApplicationConfig.
3. Update occurrence of app.runner.class to ApplicationConfig.APP_RUNNER_CLASS.

API Changes:
Usage of ApplicationRunners need to depend on samza-core module instead of samza-api.
Upgrade Instructions:
Depend on samza-core instead of samza-api module.
Usage Instructions:
N/A

Author: Ke Wu <kwu@linkedin.com>

Reviewers: Yi Pan <nickpan47@gmail.com>

Closes apache#1400 from kw2542/SAMZA-2558
…pache#1428)

Issues: app.main.class is only set for Beam apps which causes different workflow on AM when launching a job.
Changes:
1. Introduce DefaultApplicationMain to capture launch workflow for High/Low level jobs so on AM, all jobs are launched in the same way: ClusterBasedJobCoordinatorRunner#main -> app.main.class -> JobCoordinatorLaunchUtil
2. Update ApplicationConfig#getAppMainClass to default to DefaultApplicationMain

Tests:
1. Unit Tests
2. Deployed hello samza job successfully with the change following instructions on http://samza.apache.org/startup/hello-samza/latest/

API Changes: None
Upgrade Instructions: None
Usage Instructions: None
@MabelYC MabelYC closed this Sep 16, 2020
@MabelYC MabelYC deleted the addAdminLabelSupport branch September 16, 2020 01:29
@MabelYC MabelYC restored the addAdminLabelSupport branch September 16, 2020 01:30
@MabelYC MabelYC deleted the addAdminLabelSupport branch September 16, 2020 01:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants