# Workloads

> <font size=+1> Workload is an application running on `Kubernetes` </font>

Workloads can be divided into `Pods` and `Workload Resources`.

Each `workload` runs within __a set of `Pods`__, while `Pod` __represents a set of containers running together__.

- `Pod`s are scheduled to run on nodes
- In case of `node` failure all of the `Pod`s on the node are also deleted (__`Pod`s are ephemeral, should be easy to (re)create__)
- If `Pod` run fails it will stay in inappropriate state (more details in `Pod` section)

As there are a lot of failure points, the cluster admin would have to:
- Constantly verify whether `Pod`s are running fine
- Manually reschedule `Pod`s to different `Node`s
- Manually restart `Pod`s in case any container failed

> __Manual handling is pretty inefficient, hence `Workload Resources` were created__


`Workload Resources` can specify (amongst other things):

- When `Pod` should be recreated
- What to do in case of failures
- How many times should we try to reschedule before giving up 

> ## `Workload Resources` should be used to manage `Pod`s lifecycles, avoid deploying "bare Pods"!

The most common `Workload Resources` are:

- Deployment
- StatefulSet
- DaemonSet
- Job and CronJob

In this notebook, we will see Pods, Workload Resources, and how to implement them

# Pods


> <font size=+1>A group of one or more containers, with shared context and a specification for how to run the containers</font>

Pods add another layer of abstraction over the containers and behave similarly to `docker-compose`, e.g. __they can connect multiple related containers into one logical grouping__.

Shared context:
- Shared storage
- Shared network resources (e.g. IP)
- Linux namespaces

__Individual applications might be further isolated within the Pod__

> Pod is minimal deployment unit in Kubernetes

This means we cannot schedule containers on their own.


## Lifecycle

> `Pod`s remain on scheduled `Node` until termination (according to restart `policy` if failure occured) or deletion

__`Pod`s are never moved across nodes, they are eventually recreated!__

Some features:
- `Pod`s cannot self-heal (e.g. restart themselves). This is done by appropriate `Workload Resources` or by cluster admin
- __Pods can restart failed containers though__ (using `kubelet`)
- Related resources (e.g. `volume`s) are also deleted after `Pod` termination (unless specified otherwise)

`Pod`s can be in one of multiple phases:
- `pending` - accepted by `k8s` cluster, but:
    - one or more containers didn't start
    - `Pod` is waiting for node scheduling
    - container image is currently downloaded
- `running` - at least one `container` within `Pod` is running (or being (re)started)
- `succeeded` - all containers in the `Pod` have suceeded __and will not restart__
- `failed` - at least one container terminated in a failure
- `unknown` - state of `Pod` ould not be obtained, __typically communication error with `Node`__

## Container states

> __Kubernetes also watches the state of individual containers within `Pod`s__

Containers can be in one of three states:
- `Waiting` - downloading image or pulling `secrets`, __reason is also provided for monitoring__
- `Running`
- `Terminated` - either terminated successfully or not; __reason and `exit code` is provided for monitoring__


## Single Container Pod

Usually you run Pods with a single container. In this case, we can think of a Pod as a container wrapper

Examples could include:
- FastAPI server receiving requests and saving to shared database
- Docker container receiving images as requests and forwarding the classification


## Pods with Multiple Containers

> More advanced use case, multiple tightly coupled containers __making a cohesive unit of service__

<p align=center><img src=images/pod.svg width=350></p>

Examples could include:
- Training multiple machine learning models, where:
    - First container accesses shared storage of raw data and transforms it
    - Second container trains neural network on the presented data
    - Third container pushes out the trained model to serving container
    - Fourth container serves the model
- One container serving data to the public (`read only` permissions), while another, internal one, writes data to shared storage

> __Pod containers are scheduled on the same "logical host" (for cloud, for clusters of servers: same VM or physical computer) due to tight coupling__

As mentioned in the introduction, you can see the Pods in a cluster with the command `kubectl get pod`. However, this will only show the pods in the Default namespace. You can observe all the pods in your cluster by adding the flag -A.

<i>_If you haven't, run `minikube start`, so you have a cluster in your local machine to start working with_</i>

In [4]:
!kubectl get pod -A

NAMESPACE              NAME                                         READY   STATUS    RESTARTS       AGE
default                hello-minikube-6ddfcc9757-mndds              1/1     Running   2 (106s ago)   16h
kube-system            coredns-78fcd69978-8j8hz                     1/1     Running   2 (106s ago)   19h
kube-system            etcd-minikube                                1/1     Running   2 (106s ago)   19h
kube-system            kube-apiserver-minikube                      1/1     Running   2 (106s ago)   19h
kube-system            kube-controller-manager-minikube             1/1     Running   2 (106s ago)   19h
kube-system            kube-proxy-jb22b                             1/1     Running   2 (106s ago)   16h
kube-system            kube-scheduler-minikube                      1/1     Running   2 (106s ago)   19h
kube-system            storage-provisioner                          1/1     Running   4 (32s ago)    16h
kubernetes-dashboard   dashboard-metrics-scraper-559445

Thus, we already have some Pods, because `minikube` comes with some of them by default. But of course, we will want to deploy our own pods, so let's see how to do so!

## Defining Pod (Pod template)

As you saw in the last lesson, Kubernetes objects can be defined imperatively (using defined steps), or declaratively. In these lessons we will focus on defining the declarative configurations, which are specified using `.yaml` files.

Pod as a basic `kind` can be specified via `.yaml` config file (__although specifying bare `Pod`s is discouraged__)

Specifying "bare `Pod`s" is pretty straightforward:

```
apiVersion: v1
kind: Pod
metadata:
  name: pod1
  labels:
    tier: frontend
spec:
  containers:
  - name: hello1
    image: gcr.io/google-samples/hello-app:2.0
```

In the previous lesson, we saw what the API version is, and how we should look for it in the [API docs](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.22/). 

Inside `spec` we added the `container`. This will list the Docker containers belonging to the list, and inside `container`, we specified `image`, which corresponds to the Docker image name. In this case, it is a sample image from google.

## Try it out

- Create a .yaml file with the configuration above
- Before you deploy your Kubernetes resource, observe the pods in your default namespace (there shouldn't be any if it's your first time)
- Use the corresponding `kubectl` command to deploy the Kubernetes resource using the `.yaml` config file
- Observe the pods in your default namespace
- Notice that the pod is now running in the default namespace. This is because in the `kubectl` command you (probably) haven't specified the namespace for the new pod

In [10]:
# Observe the pods in the default namesapce
!kubectl ###Your command here


No resources found in default namespace.


In [13]:
# Spin up the pod corresponding to the single-pod configuration above
!kubectl

pod/pod1 created


In [14]:
# Observe the pods in the default namesapce again
!kubectl

NAME   READY   STATUS              RESTARTS   AGE
pod1   0/1     ContainerCreating   0          23s


Good, remember from last notebook what happened when we deleted a pod? It didn't take long until a new pod appeared. Let's see what happens if we try the same.

## Try it out

1. Delete the pod using the correct `kubectl` command
2. Observe the pods in the default namespace again

In [16]:
# Delete the pod
!kubectl

pod "pod1" deleted


In [17]:
!kubectl

No resources found in default namespace.


So, what is the difference here? Why it dissapears now, but it didn't last time? The reason last time it _didn't_ dissapeared was because last time it had "instructions" to keep the pod alive. However, in this case, we have the "bare" pod, without any resource to maintaining it running after failing for any reason. 

Keeping it "alive" is one of the desired states you can specify in your configuration file, and it can be achieved with `Deployment` or `Replica Set`, and within the workload resources, you can find many more options

# Workload Resources

Before diving into specific `Workload Resources` let's get a few concepts straight:

- Each `workload` presented below uses `.spec.template` field which specifies how to create a pod
- `template` is essentially the same as pod config except `kind` and `apiVersion` (the rest is performed the same)
- Each `workload` has a `.spec.selector` that specifies which pods are handled by the `workload resource`
- `.spec.selector` uses matches on defined `labels` and may "take care of" pods outside its config file!

With the above in mind, let's dive in. We will see how to implement:

- Deployment (For which we need to explain ReplicaSet very briefly)
- DaemonSet
- Jobs

In the next lessons you will learn how to implement another type of workload, `StatefulSets`. But you will need to understand Kubernetes storage before diving into `StatefulSets`

## ReplicaSet

>  Maintains a stable set of replicated pods running at any given time

`ReplicaSet`:
- creates new `POD`s accordingly to `.spec.replicas` field value
    (from `.spec.template` config)
- deletes `POD`s if too many of them are scheduled to nodes

### Acquiring pods

> `ReplicaSet` is linked to the pods via `metadata.ownerReferences` and acquired via `.spec.selector` match

The above works something like this:
- Each pod has `metadata.ownerReferences` __automatically added by `k8s`__
- Above specifies who manages the pod (e.g. another controller)
- __If `POD`__:
    - has no "owner" (e.g. bare `POD`) __or__
    - it's owner __is not another controller and__
    - `.spec.selector` fields match
- Then `POD` is acquired by the `ReplicaSet`

> <font size=+1>The process above works the same for other `workload resources` (or managers)</font>

### When should we use `ReplicaSet`s?

> __In general it is advised to use higher level `Deployment` `workload resource`__

`Deployment`, a higher level concept, __manages `ReplicaSet`s__ and in addition provides __declarative updates to pods__

Only reasons to go for `ReplicaSet`s would be:
- custom update orchestration
- we will never need to update `config`

The latter is pretty unlikely (and we are not paying large price for this feature), hence:

> <font size=+1>Use Deployment instead</font>

(for those interested `ReplicaSet` is described in detail [here](https://kubernetes.io/docs/concepts/workloads/controllers/replicaset/))

## Deployment

> Provides declarative updates for `Pods` and `ReplicaSets`

Given about information, let's see whether we can tell what each field means in the config below:

```
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-deployment
  labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
```

In the example above we are telling the resource to have at least 3 pods running at all times. These three pods are found using the `selector.matchLabels` which finds in the template the label whose key is `app` and value `nginx`. 

## Try it out

1. Create a .yaml file with the configuration above
2. Run the right `kubectl` command to run the .yaml file
3. Observe how many pods you have
4. Delete one pod
5. Observe how many pods you have

In [1]:
!kubectl

deployment.apps/nginx-deployment created


In [2]:
# Observe how many pods you have
!kubectl

NAME                                READY   STATUS              RESTARTS   AGE
nginx-deployment-66b6c48dd5-7qpvr   0/1     ContainerCreating   0          17s
nginx-deployment-66b6c48dd5-gc7z7   0/1     ContainerCreating   0          17s
nginx-deployment-66b6c48dd5-vzsx2   0/1     ContainerCreating   0          17s


Now that you there are some `nginx` containers running, you can use them to run some commands manually (you will use to run other applications, but for now, let's use them interactively).

In this case, I ran it in the terminal:

<p align=center><img src=images/nginx.png width=700></p>

You can see that, after deleting it, the pod is still there, but with a different name:

In [1]:
# Delete one of the pods
!kubectl delete pod nginx-deployment-66b6c48dd5-vzsx2

pod "nginx-deployment-66b6c48dd5-vzsx2" deleted


In [2]:
!kubectl get pod

NAME                                READY   STATUS    RESTARTS        AGE
nginx-deployment-66b6c48dd5-7qpvr   1/1     Running   1 (8m15s ago)   8h
nginx-deployment-66b6c48dd5-fsck5   1/1     Running   0               40s
nginx-deployment-66b6c48dd5-gc7z7   1/1     Running   1 (8m15s ago)   8h



Just to avoid any confusion, delete the deployment resource

In [3]:
# Delete the deployment resource
!kubectl delete deployment nginx-deployment

deployment.apps "nginx-deployment" deleted


## DaemonSet

> DaemonSet ensures that a pod is deployed in all Nodes as it is added to the cluster

We will have one pod per node, so if we remove a node, the number of pods will downscale.

DaemonSets are generally used to monitoring services. You can use a single pod to monitor the health of each node, or to capture the logs of each node. Observe that they are quite similar to deployment, but it has no replicas.

### Required fields

> Similar to `Deployment`, which means `.spec.template`, `.spec.selector` and __no `.spec.replicas` (as the same `daemon` is run per-node)__

- Acquisition of `POD`s happens on matchin `.spec.selector` (like previously)
- __Acquisition of `Node`s happens via `.spec.template.spec.nodeSelector` field__

### Assigning Pods to Nodes

> In general we don't have to interfere with `k8s` POD deployment to specific `Node`s

There might be a few reasons to do that though:
- Ensuring `POD` ends up on a `Node` which has `SSD` attached
- Co-locate `POD`s from different services in the same zone if they communicate frequently

`k8s` comes with a set of predefined `labels` for `Node`s, full list can be seen [here](https://kubernetes.io/docs/reference/labels-annotations-taints/), but to mention a few:
- region of deployment (in case of cloud): `topology.kubernetes.io/region=us-east-1`
- ip address of a node: `kubernetes.io/hostname=ip-172-20-114-199.ec2.internal`
- operating system our node is running: `kubernetes.io/os=linux`

> __During `kubectl` lesson we will learn how to add our own `label`s to `Node`s__

Given the above, we can use the `.spec.template.spec.nodeSelector` as below:

In [None]:
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd-elasticsearch
spec:
  selector:
    matchLabels:
      name: fluentd-elasticsearch
  template:
    metadata:
      labels:
        name: fluentd-elasticsearch
    spec:
      containers:
      - name: fluentd-elasticsearch
        image: quay.io/fluentd_elasticsearch/fluentd:v2.5.2
        resources:
          limits:
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
      terminationGracePeriodSeconds: 30

Before moving on, this is a good time to talk about memory resources. As you can see in the Kubernetes object, there is a new field in the `template.spec.container` field named `resources`, which in turn has two more fields: `limits` and `requests`. 

- `limits`: These values represent the maximum capacities given to a pod. If the process starts using more than 200M of RAM memory, the pod will be restarted.
- `requests`: These values represent the minimum capacities we are guarateeing to the pod. In the example above, we are giving 200M of RAM memory, and 100 milicores. A _milicore_ is a fraction of your cores, each core is equivalent to 1000 milicores.

For more information about container resources, go to the following [link](https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/)

As you can observe, most of the arguments in the DaemonSet are the same as in the Deployment resource. Observe that we don't use the `replicas` key, since, as mentioned, there will be a single `DaemonSet` pod per node. For the sake of this demonstration, let's add a node to the minikube existing cluster

In [None]:
!minikube node add

Now you are ready to spin up the DaemonSet:

## Try it out

1. Observe the pods in ALL your nodes
2. Create a .yaml file with the configuration above
3. Run the appropriate command to deploy the DaemonSet configuration
4. Observe again the pods in ALL your nodes
5. Delete one of your nodes
6. Observe, once more, the pods in your nodes

## Jobs

`Job` will create one or more pods and will continue to retry execution of the pods until a specified number of them successfully terminate

Use cases:
- create one `Job` to make sure our task runs successfully
- run the same `Job` in parallel for `N` times

An example config `Job` workload calculating `pi` value could be:

```
apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never
  backoffLimit: 4
```

### `.spec`ification

Standard fields are necessary, but in addition:

- `.spec.restartPolicy` can be either `Never` or `OnFailure` (default)
- `.spec.completions` - how many `Job`s have to finish
- `.spec.parallelism` - how many `pod`s with our job should be scheduled at the same time

Using `.spec.completions` and `.spec.parallelism` we can construct different levels of parallelism:

- __non-parallel__ - specify `.spec.completions=1`, only one `Job` will be created. __New one will only start after this one fails!__
- __parallel with fixed completion count__ - specify `.spec.completions=N` to run __at most `N` parallel jobs at a given time__ (controller will reschedule `Node`s in case of failure)
- __parallel with work queue__ - `.spec.completions=1` and `.spec.parallelism=N` - `N` pods will run, after first one succeeds __the rest will continue execution until termination__ (for improved efficiency we need to implement direct `pod` to `pod` communication)

`non-parallel` is the default mode as `.spec.completions=1` and `.spec.parallelism=1` are the default values.

One could also specify [Completion Mode](https://kubernetes.io/docs/concepts/workloads/controllers/job/#completion-mode) which allows us to modify their behaviour upon termination.

- `.spec.backoffLimit` - how many times __a single `pod`__ should be restarted before considering `Job` failed

In this case:
- Depending on the settings, but if `.spec.completions=N` is hit, __job succeeded__
- Until this moment, try to recreate `pod` with `Job` `N` times
- Exponential back-off delay is applied, e.g.:
    - First retry after `10s`
    - Second after `20s`
    - Third after `40s`
    - __Capped at `6m` backoff!__
    
- `.spec.activeDeadlineSeconds` - how many seconds __for the whole job__ until termination. Once reached, __all of the `pod`s ater terminated__ (takes precedence over `.spec.backoffLimit`)
    
### Cleanup

> A `Job` after finishing __will not be automatically removed from the cluster__

Why is that bad?

> `kube-apiserver` will still query `Job` and look for it's `pod`s __putting unneeded pressure on `k8s`__

Why is this the default behaviour?
- One might want to check logs of finished jobs (which are stored within `pod` or external storage)
- Checking status of `Job`(s)

What one can do is to set up a so called __`TTL` (time to live)__:

> __`TTL` specifies after how long should the `job` be removed from the cluster (including all of it's `pod`s and other dependencies)__.

One could do that via `.spec.ttlSecondsAfterFinished` field as one can read below:

```
apiVersion: batch/v1
kind: Job
metadata:
  name: pi-with-ttl
spec:
  ttlSecondsAfterFinished: 100
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
      restartPolicy: Never

As mentioned, `Job`s will die when they succesfully execute a number of Pods. If you want to repeat that operation, you would need to `apply` a Kubernetes object to re run the job. Luckily, there is an easier way to generate `Job`s periodically with the frequency you desired: __`Cronjob`__

A `Cronjob` creates `Job`s on a repeating schedule. You can specify said schedule in the specs:
```
apiVersion: batch/v1
kind: CronJob
metadata:
  name: hello
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: hello
            image: busybox
            imagePullPolicy: IfNotPresent
            command:
            - /bin/sh
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure
```

## Try it Out

1. Create a .yaml file with the configuration above