Skip to content
This repository has been archived by the owner on Mar 3, 2023. It is now read-only.

Commit

Permalink
[HERON-3707] ConfigMap Pod Template Support (#3710)
Browse files Browse the repository at this point in the history
* [Kubernetes] setup basic mount info for Pod ConfigMap.

* [Kubernetes] updated function signature to handle Pod Template ConfigMap name.

* [kubernetes] extracting Pod Template ConfigMap name from <Config>.

* [kubernetes] checking for Pod Template ConfigMap and appropriately adding to Stateful Set.

* [kubernetes] Java Style lint fix.

* [Tests] Kubernetes Controller tests for Pod Template ConfigMap.

* [Tests] Kubernetes Constants and Context tests for Pod Template ConfigMap.

* [Tests] Kubernetes V1Controller test suite stubbed.

* [Tests] Java style lint fixed.

* [Tests] Kubernetes V1Controller Pod Template ConfigMap volume mount.

* [Kubernetes] cleaned up to begin work on <loadPodFromTemplate>.

* [Kubernetes] created <loadPodFromTemplate>.

* [Tests] Begun mock test setup of <loadPodFromTemplate>.

* [Kubernetes] style check/linting fix.

* [Kubernetes] refactoring <V1Controller> and <KubernetesController>.

* [Kubernetes] added description to failed to locate exception.

* [Tests] <loadPodFromTemplate> Pod Template checks.

* [Kubernetes] check for no ConfigMaps set.

* [Tests] working on mocking null list of V1ConfigMapList.

* [Kubernetes] refactoring <loadPodFromTemplate> 

Adding checks for null pointers. Default constructed V1 objects tend to have uninitialised fields set to null by default. Extracting <getConfigMaps> to method to support mocking.

* [Tests] Stubbed <getConfigMaps> and testing <loadPodFromTemplate>.

* [Kubernetes] <loadPodFromTemplate> adjusted to get <V1PodTemplateSpec> from <V1PodTemplate>.

* [Kubernetes] check for empty Pod Template in ConfigMap.

* [Tests] Valid Pod Template test.

* [Tests] Invalid Pod Template.

* [Tests] refactored test data to their respective tests.

* [Kubernetes] refactored <loadPodFromTemplate> for readability.

* [Kubernetes] params for <getConfigMaps> tweaked.

Judging from <release-11.0.0/kubernetes/src/main/java/io/kubernetes/client/openapi/apis/CoreV1Api.java> "optional" means the field can be set to <null>.

* [Kubernetes] <getPodTemplateLocation> extracting ConfigMap and Pod Template names.

* [Tests] <getPodTemplateLocation> for correct and incorrect parsing.

* [Kubernetes] <getPodTemplateLocation> catching empty names.

* [Tests] <getPodTemplateLocation> separated tests for ConfigMap and Pod Template names.

* [Kubernetes] updated <loadPodFromTemplate> to use ConfigMap and Pod Template names.

* [Tests] updated tests for <loadPodFromTemplate> to use ConfigMap and Pod Template names.

* [Kubernetes] added INFO logging to <loadPodFromTemplate> for the deployed Pod Template.

* [Kubernetes] Bug fixes in error messages for <loadPodFromTemplate>.

* [Kubernetes] bug fix and test for missing delimiter in <getPodTemplateLocation>.

* [Kubernetes] <getConfigMaps> namespace access updated.

* [Kubernetes] <configureRBAC> basic logic.

TODO: get API key for K8s.

* [Tests] cleaned up <V1Controller> tests.

* [Kubernetes] <configureRBAC> more detailed error log.

* [Kubernetes] <configureRBAC> role configurations.

* [Kubernetes] refactored <loadFromPodTemplate>.

* [Tests] switched to <ConfigMapBuilder> in <V1ControllerTest>.

* [Kubernetes] switched to <V1RoleBuilder> in <configureRBAC>.

* [Kubernetes] made <loadPodFromTemplate> protected.

Removed illegal reflection access to avoid support issues with newer testing frameworks.

* [Kubernetes] removed <configureRBAC>.

RBAC must be configured using Role/ClusterRole and RoleBinding/ClusterRoleBinding to the Heron <heron-apiserver> ServiceAccount.

* [Kubernetes] <getPodTemplateLocation> error message passed up.

* [Kubernetes] refactored <getPodSpec> to <finalizePodSpec>.

Heron should have the final say on the Pod Spec. This is as much a point of security as an operational one.

* [Kubernetes] Added boot flag to disable Pod Templates.

* [Tests] testing to validate boot flag for disabled Pod Templates.

* [Kubernetes] Wiring in boot flag to disable Pod Templates in  <loadPodFromTemplate>.

* [Tests] disabled Pod Templates output validation.

* [Kubernetes] Added class scoped variable <isPodTemplateDisabled>.

* [Kubernetes] <getContainer> modified to utilise supplied executor container.

* [Kubernetes] <getContainer> <V1EnvVar>s.

Environment variables merged with Heron defaults taking precedence.

* [Kubernetes] <getContainer> Limits.

Resource Limits merged with Heron defaults taking precedence.

* [Kubernetes] disabled Pod Templates will return error when attempting to submit.

* [Kubernetes] <API Server> configs.

Updated RBAC API version and added a commented flag command to disable to Pod Templates.

* [Kubernetes] <configureExecutorContainer>

Refactored <getContainer> to <configureExecutorContainer>. Permitting additional containers for side-car purposes.

* [Kubernetes] <configureExecutorContainer>

Switched to <TreeSet> with custom comparator for <V1EnvVar> name. <V1EnvVar>'s comparator performs a complete element wise comparison.

* [Kubernetes] <configureExecutorContainer>

Merged executor container ports with ports provided in Pod Template. Heron defaults take precedence.

* [Kubernetes] <mountVolumeIfPresent>

Merge volume mounts with those provided in Pod Template. Heron defaults take precedence.

* [Kubernetes] <V1Controller>

general cleanup of new code and comments.

* [Kubernetes] <mountVolumeIfPresent>

Error check for malformed Pod Template.

* [Kubernetes] <configureContainerPorts>

Refactored <getContainerPorts> and moved port merge with error handling to it.

* [Kubernetes] <configureExecutorContainer>

Removed a redundant <limit> put into the HashMap.

* [Kubernetes] <addVolumesIfPresent>

Merging Pod Template volume configs with Heron defaults. Heron values take precedence.

* [Kubernetes] <configureExecutorContainer>

Allow user values for CPU and MEMORY limits to override those provided by Heron.

* [Scheduler-Core] <LaunchRunner> handling <submit> errors better.

Some Schedulers, such as K8s, throw exceptions instead of returning false when <submit> fails. This leaves the Topology Manager with dangling references. An addition RPC call to the Scheduler is required to completely clear the state.

* [Kubernetes] <V1Controller>

General cleanup in tests and class.

* [Kubernetes] code review changes.

Code review from @nwangtw.
<KubernetesContext.getPodTemplateConfigMapDisabled> switched to <equalsIgnoreCase>.

* [Scheduler-Core] code review changes.

Code review from @nwangtw.
<LaunchRunner> error message assembly improved.
<LaunchRunner> added <FINE> level logging for failure to clear failed topology launch from Scheduler.

* [Tests] <configureContainerPorts>.

* [Kubernetes] <API Server> configs.

Code review from @nwangtw, @nicknezis.
Updated command to disable Pod Templates to <false> by default.

* [Kubernetes] <configureContainerEnvVars>

Logic for merging environment variables extracted to a method for testing.

* [Tests] <configureContainerEnvVars>.

* [Kubernetes] <configureExecutorContainer>

Wired in <configureContainerEnvVars> and removed old code.

* Update for Helm chart

* Updated version to match the other k8s ClusterRoleBindings

* [Kubernetes] <configureContainerResources>

Logic to configure container's resources extracted to method to facilitate testing.

* [Kubernetes] <configureExecutorContainer>

Removed old logic and wired <configureContainerResources> into <configureExecutorContainer>.

* [Tests] <testConfigureContainerPorts>.

Added a test for debugging ports.

* [Kubernetes] <addVolumesIfPresent>.

Exposed for testing.

* [Tests] <addVolumesIfPresent>.

Testing on a <hostPath> volume but will generalise across others.

* [Kubernetes] <mountVolumeIfPresent>.

Exposed for testing.

* [Tests] <mountVolumeIfPresent>.

Tested by setting a Volume Mount in the Config and then a custom Volume Mount in the container.

* [Tests] <addVolumesIfPresent>.

Cleaned up tests.

* [Tests] <mountVolumesIfPresent>.

Testing for when no Volume Mounts should be set.

* [Tests] <addVolumesIfPresent>.

Testing for when no Volumes should be set.

* Attempt to fix Travis CI build

* [Tests] <configureContainerEnvVars> <configureContainerPorts>.

Extracted logic to generate executor environment variables, ports, and debugging ports. This is to resolve production-testing code inconsistencies which may arise.

* [Tests] <V1ControllerTests>

General cleanup and simplification of test suite.

* Travis fix take 3

* Travis CI fix

* [Kubernetes] <V1ControllerUtils>

Added nested utility class to improve code maintainability.

<mergeListDedupe> will merge two input lists by keeping all values in one and deduplicating the second list.

* [Tests] <mergeListDedupe>.

Full battery of tests null lists, merged lists, and thrown errors.

* [Kubernetes] <V1Controller>.

Switched to using <mergeListsDedupe> to improve code maintainability.
Effects:
<addVolumesIfPresent>
<configureContainerEnvVars>
<configureContainerPorts>
<mountVolumeIfPresent>

* [Kubernetes] <V1Controller> cleaned up unneeded returns when using setter methods.

* [Kubernetes] <V1Controller>.

Merging Pod Specification Tolerations and deduplicating on the <V1Tolerations::key>.

* [Tests] <configureTolerations>.

Test for a null, empty, and merging of Toleration lists.

* [Kubernetes] <configurePodSpec>.

Wired in <configureTolerations>

* [Tests] cleaning up code.

* [Kubernetes] <configurePodSpec>.

Added check for multiple executor container specs in Pod Template. Will throw error if detected.

* [Tests] <V1Controller> general cleanup.

* [Kubernetes] Constants

Updated tolerations to remove deprecated taints.

* [Kubernetes] <V1Controller>

<getConfigMap> retrieving a single named ConfigMap in a specific namespace.
<loadPodFromTemplate> logic updated to handle a single ConfigMap.

* [Tests] <V1Controller>

Fixed and cleaned up tests after switching to <readNamespacedConfigMap>.

* [Kubernetes] <V1Controller>

Error message cleanup.

* [Tests] <V1Controller>

Test description cleanup.

* [Kubernetes] <KubernetesUtils>

Javadoc cleanup.

* [Tests] <KubernetesUtils>

Test description cleanup.

* [Tests] <V1Controller>

<configureContainerResources> Heron values take precedence for limits.

* [Kubernetes] <V1Controller>

<configureContainerResources> Heron values take precedence for limits.

* Add support for reading configmap

* Removed deprecated k8s tolerations

Co-authored-by: Nicholas Nezis <nicholas.nezis@gmail.com>
  • Loading branch information
surahman and nicknezis committed Nov 2, 2021
1 parent 2190502 commit 837c4f2
Show file tree
Hide file tree
Showing 16 changed files with 1,352 additions and 133 deletions.
9 changes: 4 additions & 5 deletions .travis.yml
Expand Up @@ -18,6 +18,9 @@ addons:
- libcppunit-dev
- pkg-config
- python3-dev
- python3-pip
- python3-setuptools
- python3-wheel
- python3-venv
- wget
- zip
Expand All @@ -34,10 +37,6 @@ before_install:
- chmod +x bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh
- ./bazel-${BAZEL_VERSION}-installer-linux-x86_64.sh --user

install:
- sudo apt-get install python3-pip python3-setuptools
- pip3 install travis-wait-improved

script:
- which gcc
- gcc --version
Expand All @@ -47,4 +46,4 @@ script:
- python -V
- which python3
- python3 -V
- travis-wait-improved --timeout=180m scripts/travis/ci.sh
- scripts/travis/ci.sh
9 changes: 3 additions & 6 deletions deploy/kubernetes/general/apiserver.yaml
Expand Up @@ -27,7 +27,7 @@ metadata:
namespace: default

---
apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: heron-apiserver
Expand Down Expand Up @@ -67,11 +67,7 @@ spec:
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
- key: "node.alpha.kubernetes.io/notReady"
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
- key: "node.alpha.kubernetes.io/unreachable"
- key: "node.kubernetes.io/unreachable"
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
Expand All @@ -95,6 +91,7 @@ spec:
-D heron.uploader.dlog.topologies.namespace.uri=distributedlog://zookeeper:2181/heron
-D heron.statefulstorage.classname=org.apache.heron.statefulstorage.dlog.DlogStorage
-D heron.statefulstorage.dlog.namespace.uri=distributedlog://zookeeper:2181/heron
-D heron.kubernetes.pod.template.configmap.disabled=false
---
apiVersion: v1
Expand Down
6 changes: 1 addition & 5 deletions deploy/kubernetes/general/tools.yaml
Expand Up @@ -38,11 +38,7 @@ spec:
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
- key: "node.alpha.kubernetes.io/notReady"
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
- key: "node.alpha.kubernetes.io/unreachable"
- key: "node.kubernetes.io/unreachable"
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
Expand Down
8 changes: 2 additions & 6 deletions deploy/kubernetes/gke/gcs-apiserver.yaml
Expand Up @@ -27,7 +27,7 @@ metadata:
namespace: default

---
apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: heron-apiserver
Expand Down Expand Up @@ -67,11 +67,7 @@ spec:
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
- key: "node.alpha.kubernetes.io/notReady"
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
- key: "node.alpha.kubernetes.io/unreachable"
- key: "node.kubernetes.io/unreachable"
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
Expand Down
14 changes: 9 additions & 5 deletions deploy/kubernetes/helm/templates/tools.yaml
Expand Up @@ -56,11 +56,7 @@ spec:
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
- key: "node.alpha.kubernetes.io/notReady"
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
- key: "node.alpha.kubernetes.io/unreachable"
- key: "node.kubernetes.io/unreachable"
operator: "Equal"
effect: "NoExecute"
tolerationSeconds: 10
Expand Down Expand Up @@ -162,6 +158,7 @@ spec:
-D heron.class.repacking.algorithm=org.apache.heron.packing.binpacking.FirstFitDecreasingPacking
{{- end }}
-D heron.kubernetes.resource.request.mode={{ .Values.topologyResourceRequestMode }}
-D heron.kubernetes.pod.template.configmap.disabled={{ .Values.disablePodTemplates }}
envFrom:
- configMapRef:
name: {{ .Release.Name }}-tools-config
Expand Down Expand Up @@ -265,6 +262,13 @@ rules:
- patch
- update
- watch
- apiGroups:
- ""
resources:
- configmaps
verbs:
- get
- list

---
apiVersion: v1
Expand Down
3 changes: 3 additions & 0 deletions deploy/kubernetes/helm/values.yaml.template
Expand Up @@ -58,6 +58,9 @@ uploader:
# Packing algorithms
packing: RoundRobin # ResourceCompliantRR, FirstFitDecreasing

# Support for ConfigMap mounted PodTemplates
disablePodTemplates: false

# Number of replicas for storage bookies, memory and storage requirements
bookieReplicas: 3
bookieCpuMin: 100m
Expand Down
3 changes: 2 additions & 1 deletion deploy/kubernetes/minikube/apiserver.yaml
Expand Up @@ -28,7 +28,7 @@ metadata:
namespace: default

---
apiVersion: rbac.authorization.k8s.io/v1beta1
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: heron-apiserver
Expand Down Expand Up @@ -82,6 +82,7 @@ spec:
-D heron.uploader.dlog.topologies.namespace.uri=distributedlog://zookeeper:2181/heronbkdl
-D heron.statefulstorage.classname=org.apache.heron.statefulstorage.dlog.DlogStorage
-D heron.statefulstorage.dlog.namespace.uri=distributedlog://zookeeper:2181/heronbkdl
-D heron.kubernetes.pod.template.configmap.disabled=false
---
apiVersion: v1
Expand Down
Expand Up @@ -19,11 +19,15 @@

package org.apache.heron.scheduler;

import java.util.logging.Level;
import java.util.logging.Logger;

import org.apache.heron.api.generated.TopologyAPI;
import org.apache.heron.proto.scheduler.Scheduler;
import org.apache.heron.proto.system.ExecutionEnvironment;
import org.apache.heron.proto.system.PackingPlans;
import org.apache.heron.scheduler.client.ISchedulerClient;
import org.apache.heron.scheduler.client.SchedulerClientFactory;
import org.apache.heron.scheduler.dryrun.SubmitDryRunResponse;
import org.apache.heron.scheduler.utils.LauncherUtils;
import org.apache.heron.scheduler.utils.Runtime;
Expand Down Expand Up @@ -169,13 +173,48 @@ public void call() throws LauncherException, PackingException, SubmitDryRunRespo
"Failed to set execution state for topology '%s'", topologyName));
}

// launch the topology, clear the state if it fails
if (!launcher.launch(packedPlan)) {
// Launch the topology, clear the state if it fails. Some schedulers throw exceptions instead of
// returning false. In some cases the scheduler needs to have the topology deleted.
try {
if (!launcher.launch(packedPlan)) {
throw new TopologySubmissionException(null);
}
} catch (TopologySubmissionException e) {
// Compile error message to throw.
final StringBuilder errorMessage = new StringBuilder(
String.format("Failed to launch topology '%s'", topologyName));
if (e.getMessage() != null) {
errorMessage.append("\n").append(e.getMessage());
}

try {
// Clear state from the Scheduler via RPC.
Scheduler.KillTopologyRequest killTopologyRequest = Scheduler.KillTopologyRequest
.newBuilder()
.setTopologyName(topologyName).build();

ISchedulerClient schedulerClient = new SchedulerClientFactory(config, runtime)
.getSchedulerClient();
if (!schedulerClient.killTopology(killTopologyRequest)) {
final String logMessage =
String.format("Failed to remove topology '%s' from scheduler after failed submit. "
+ "Please re-try the kill command.", topologyName);
errorMessage.append("\n").append(logMessage);
LOG.log(Level.SEVERE, logMessage);
}
// SUPPRESS CHECKSTYLE IllegalCatch
} catch (Exception ignored){
// The above call to clear the Scheduler may fail. This situation can be ignored.
LOG.log(Level.FINE,
String.format("Failure clearing failed topology `%s` from Scheduler during `submit`",
topologyName));
}

// Clear state from the State Manager.
statemgr.deleteExecutionState(topologyName);
statemgr.deletePackingPlan(topologyName);
statemgr.deleteTopology(topologyName);
throw new LauncherException(String.format(
"Failed to launch topology '%s'", topologyName));
throw new LauncherException(errorMessage.toString());
}
}
}
Expand Up @@ -35,6 +35,8 @@ private KubernetesConstants() {
public static final String MEMORY = "memory";
public static final String CPU = "cpu";

public static final String EXECUTOR_NAME = "executor";

// container env constants
public static final String ENV_HOST = "HOST";
public static final String POD_IP = "status.podIP";
Expand Down Expand Up @@ -102,8 +104,7 @@ private KubernetesConstants() {
static final List<String> TOLERATIONS = Collections.unmodifiableList(
Arrays.asList(
"node.kubernetes.io/not-ready",
"node.alpha.kubernetes.io/notReady",
"node.alpha.kubernetes.io/unreachable"
"node.kubernetes.io/unreachable"
)
);
}
Expand Up @@ -82,6 +82,12 @@ public enum KubernetesResourceRequestMode {
public static final String KUBERNETES_VOLUME_AWS_EBS_FS_TYPE =
"heron.kubernetes.volume.awsElasticBlockStore.fsType";

// pod template configmap
public static final String KUBERNETES_POD_TEMPLATE_CONFIGMAP_NAME =
"heron.kubernetes.pod.template.configmap.name";
public static final String KUBERNETES_POD_TEMPLATE_CONFIGMAP_DISABLED =
"heron.kubernetes.pod.template.configmap.disabled";

// container mount volume mount keys
public static final String KUBERNETES_CONTAINER_VOLUME_MOUNT_NAME =
"heron.kubernetes.container.volumeMount.name";
Expand Down Expand Up @@ -172,6 +178,15 @@ static String getContainerVolumeMountPath(Config config) {
return config.getStringValue(KUBERNETES_CONTAINER_VOLUME_MOUNT_PATH);
}

public static String getPodTemplateConfigMapName(Config config) {
return config.getStringValue(KUBERNETES_POD_TEMPLATE_CONFIGMAP_NAME);
}

public static boolean getPodTemplateConfigMapDisabled(Config config) {
final String disabled = config.getStringValue(KUBERNETES_POD_TEMPLATE_CONFIGMAP_DISABLED);
return "true".equalsIgnoreCase(disabled);
}

public static Map<String, String> getPodLabels(Config config) {
return getConfigItemsByPrefix(config, KUBERNETES_POD_LABEL_PREFIX);
}
Expand Down
Expand Up @@ -20,17 +20,22 @@
package org.apache.heron.scheduler.kubernetes;

import java.io.IOException;
import java.util.Comparator;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import java.util.logging.Level;
import java.util.logging.Logger;

import org.apache.heron.common.basics.ByteAmount;
import org.apache.heron.common.basics.SysUtils;
import org.apache.heron.scheduler.TopologySubmissionException;
import org.apache.heron.scheduler.utils.Runtime;
import org.apache.heron.spi.common.Config;
import org.apache.heron.spi.common.Context;

import io.kubernetes.client.openapi.ApiException;

import okhttp3.Response;

final class KubernetesUtils {
Expand Down Expand Up @@ -84,4 +89,37 @@ static String errorMessageFromResponse(Response response) {
static String Megabytes(ByteAmount amount) {
return String.format("%sMi", Long.toString(amount.asMegabytes()));
}

static class V1ControllerUtils<T> {
private static final Logger LOG = Logger.getLogger(V1Controller.class.getName());

/**
* Merge two lists by keeping all values in the <code>primaryList</code> and de-duplicating values in
* <code>secondaryList</code> using the <code>comparator</code>.
* @param primaryList All the values in this will be retained.
* @param secondaryList The values in this list will be deduplicated against <code>primaryList</code>.
* @param comparator Used to compare keys in the <code>TreeSet</code> to find their insertion position.
* @param description Description of the list merge operation which is used for error messages.
* @return A de-duplicated list of all the values in both input lists using the <code>comparator</code>.
*/
protected List<T> mergeListsDedupe(List<T> primaryList, List<T> secondaryList,
Comparator<T> comparator, String description) {
if (primaryList == null || primaryList.isEmpty()) {
return secondaryList;
}
if (secondaryList == null || secondaryList.isEmpty()) {
return primaryList;
}
try {
Set<T> treeSet = new TreeSet<>(comparator);
treeSet.addAll(primaryList);
treeSet.addAll(secondaryList);
return new LinkedList<>(treeSet);
} catch (NullPointerException e) {
final String message = String.format("Failed to merge lists for %s", description);
LOG.log(Level.FINE, message);
throw new TopologySubmissionException(message);
}
}
}
}

0 comments on commit 837c4f2

Please sign in to comment.