SAMZA-2574 : improve flexibility of SystemFactory interface#1420
SAMZA-2574 : improve flexibility of SystemFactory interface#1420MabelYC wants to merge 17 commits intoapache:masterfrom
Conversation
| .stream() | ||
| .collect(Collectors.toMap(Entry::getKey, | ||
| systemNameToFactoryEntry -> systemNameToFactoryEntry.getValue() | ||
| .getAdmin(systemNameToFactoryEntry.getKey(), this, adminLabel))); |
There was a problem hiding this comment.
can you also change the other getSystemAdmins() to use this new one with the label = ""
| * @return SystemAdmin of the system if it exists, otherwise null. | ||
| */ | ||
| public SystemAdmin getSystemAdmin(String systemName, String adminLabel) { | ||
| return getSystemAdmins(adminLabel).get(systemName); |
There was a problem hiding this comment.
I understand this is an existing code, but it seems weird that we create ALL admins EVERY time just to get a single one.
Either we should create a single Admin, or we can cache the results of the previous getAdmins().
| * @param adminLabel a string to provide info the admin instance. | ||
| * @return map of system name to {@link SystemAdmin} | ||
| */ | ||
| public Map<String, SystemAdmin> getSystemAdmins(String adminLabel) { |
There was a problem hiding this comment.
can we change implementation of getSystemAdmins() to {
return getSystemAdmins("").
}
There was a problem hiding this comment.
This file is updated. The version you reviewed is the outdated one. Please see: https://github.com/apache/samza/pull/1420/files#diff-2063da999ce81031b6015c31991f60c4R100
| @@ -89,11 +98,14 @@ public List<String> getSystemNames() { | |||
| * @return map of system name to {@link SystemAdmin} | |||
| */ | |||
| public Map<String, SystemAdmin> getSystemAdmins() { | |||
There was a problem hiding this comment.
I wonder if it is better to pass the admin label here in getSystemAdmins()?
Changes: The current restart ability for container placements works in the following way: Tries to fetch resources on a host Stops the active container if resources are accrued Tried to start the container on host accrued In production, we have seen the following observation at Linkedin Some jobs are configured to use resources for the peak which leads to no headroom left on a host for requesting additional resources This leads to restart requests failing due to not able to get resources on that host A fix to this is to implement a force-restart utility , in this version we will stop the container first and then accrue resources. The upside being we will at least free up the resources on the host before issuing resource request, the downside being it will be a best-effort scenario to bring that container back up on that host API Changes: Added new param values to destinationHost param for container placement request message LAST_SEEN: Tries to restart a container on last seen host with RESERVE -> STOP -> MOVE policy FORCE_RESTART_LAST_SEEN: Tries to restart a container on last seen host with STOP -> RESERVE -> MOVE policy
… from JobModel (apache#1421) Issues Currently locality information is part of job model. Job model typically is immutable and fixed within the lifecycle of an application attempt. The locality information on the other hand is dynamic and changes in the event of container movements. Due to this difference, it makes it complicated to program, model or define semantics around these models when building features. Furthermore, by removing this dependency - Enables us to move JobModel to public APIs and expose it in JobContext - Enables us to cache and serve serialized JobModel from the AM servlet to reduce AM overhead (memory, open connections, num threads) during container startup, esp. for jobs with a large number of containers (See: apache#1241) - Removes tech debt: models should be immutable, and should not update themselves. - Removes tech debt: makes current container location a first class concept for container scheduling / placement , and for tools like dashboard, samza-rest, auto-scaling, diagnostics etc. Changes - Separated out locality information out of job model into LocalityModel - Introduced an endpoint in AM to serve locality information - Added Json MixIns for locality models (LocalityModel & ContainerLocality) - Moved JobModel to samza-api and exposed through JobContext API Changes: - Introduced new models for locality. - Previous job model endpoint will no longer serve locality information. i.e. tools using these will need to update to use the new endpoint. - Expose JobModel via JobContext
) Changes: Modified the shutdown sequence inside YarnClusterResourceManager in order to bring down the AM when the NM on which it resides dies. This prevents the existence of orphaned AMs. API Changes: None Tests: Tested the change by manually bringing down the NM on which the AM resided with a yarn job deploy and LXC Upgrade Instructions: None Usage Instructions: None
Issues: In the deployment flow of a beam job, we will have a complicate flow: ClusterBasedJobCoordinator#main -> Beam main class -> JobCoordinatorLaunchUtil -> ClusterBasedJobCoordinator. Changes: 1. Move ClusterBasedJobCoordinator#main to ClusterBasedJobCoordinatorRunner#main 2. Update run-jc.sh to invoke ClusterBasedJobCoordinatorRunner Tests: 1. unit tests 2. Deployed hello samza job successfully with the change following instructions on http://samza.apache.org/startup/hello-samza/latest/ API Changes: None Upgrade Instructions: None Usage Instructions: None
… instead of Object (apache#1427)
Issues: app.runner.class config is hard coded across multple places in Samza. Changes: 1. Move ApplicationRunners.java from samza-api to samza-core module while the package name stays. 2. Add app.runner.class and getAppRunnerClass method to ApplicationConfig. 3. Update occurrence of app.runner.class to ApplicationConfig.APP_RUNNER_CLASS. API Changes: Usage of ApplicationRunners need to depend on samza-core module instead of samza-api. Upgrade Instructions: Depend on samza-core instead of samza-api module. Usage Instructions: N/A Author: Ke Wu <kwu@linkedin.com> Reviewers: Yi Pan <nickpan47@gmail.com> Closes apache#1400 from kw2542/SAMZA-2558
…pache#1428) Issues: app.main.class is only set for Beam apps which causes different workflow on AM when launching a job. Changes: 1. Introduce DefaultApplicationMain to capture launch workflow for High/Low level jobs so on AM, all jobs are launched in the same way: ClusterBasedJobCoordinatorRunner#main -> app.main.class -> JobCoordinatorLaunchUtil 2. Update ApplicationConfig#getAppMainClass to default to DefaultApplicationMain Tests: 1. Unit Tests 2. Deployed hello samza job successfully with the change following instructions on http://samza.apache.org/startup/hello-samza/latest/ API Changes: None Upgrade Instructions: None Usage Instructions: None
This is second pr to supplement commit: 4d97a8d
Changes: Added addition label info to SystemAdmin to help user provide more information to SystemFactory.
API Changes: Added several new functions to help provide more parameter. Won't impact current users.
Upgrade Instructions: None
Usage Instructions: None