# AKS Concepts and Their Relationships

Let's break down each of these concepts in **Azure Kubernetes Service (AKS)** and explain how they relate to each other in the Kubernetes ecosystem.

---

### 🔹 Node
A **node** is a physical or virtual machine that runs your Kubernetes workloads. In AKS:
- Azure manages the control plane.
- You manage **worker nodes** (the nodes that run your app containers).

---

### 🔹 Worker Node
This is essentially the **same as a node** in the AKS context — it's the VM where **pods** are deployed and run.
- Each worker node runs critical Kubernetes components: kubelet, container runtime (e.g., containerd), kube-proxy, etc.

---

### 🔹 Pods
A **pod** is the smallest deployable unit in Kubernetes. It:
- Contains one or more containers (usually 1).
- Shares storage/network resources.
- Is scheduled **on a node**.

**Relation**: Pods run **on worker nodes**. The scheduler places pods on nodes that have available resources.

---

### 🔹 Load Balancer
A **load balancer**:
- Distributes incoming traffic to the appropriate pod(s).
- In AKS, Azure provides a **Layer 4 (TCP/UDP) Load Balancer**.
- Useful when exposing your services to the internet (via a `Service` of type `LoadBalancer`).

**Relation**: Load balancer routes traffic to **pods** that live on **worker nodes**.

---

### 🔹 Auto Scaling
Auto scaling can refer to two types in AKS:
1. **Cluster Autoscaler** – Scales the number of **nodes** up/down based on resource demand.
2. **Horizontal Pod Autoscaler (HPA)** – Scales the number of **pod replicas** based on CPU/memory or custom metrics.

**Relation**:
- If demand increases, HPA adds more **pods**.
- If there’s no space for new pods, the **Cluster Autoscaler** adds **nodes**.

---

### 🔹 Replica
A **replica** is a running copy of a pod. Defined in a **Deployment**, it ensures a specified number of pod copies are always running.
- Ensures high availability.
- If one pod dies, another is created automatically.

**Relation**:
- Replicas are just multiple instances of pods running on different **nodes**.
- Managed using `replicaCount` in YAML or by the HPA.

---

### 🔹 scoringTimeoutMs
> ⚠️ Not a core Kubernetes term, but commonly used in **ML scenarios** on AKS.

- The **maximum time (in milliseconds)** a scoring (inference/prediction) request is allowed to run.
- If the time exceeds this, the request fails with a timeout.

**Relation**:
- Often used in **inference services deployed to AKS**.
- Impacts the **pod** handling the scoring request.

---

### 🔹 maxQueueWaitMs
> ⚠️ Also not a native Kubernetes term; used in ML inference services on AKS.

- Defines the **maximum time a request can wait in queue** before being picked up for processing.
- If exceeded, the request is dropped or failed.

**Relation**:
- Queuing happens **before** the request reaches the pod (like in a request router or API gateway).
- Helps manage **load** on pods, especially under high traffic.

---

### 🔄 Relationships Overview

| Concept             | Connected To...                              | Description                                 |
|---------------------|----------------------------------------------|---------------------------------------------|
| **Node / Worker Node** | Pods, Cluster Autoscaler                  | Runs the workload (pods).                   |
| **Pod**               | Nodes, Load Balancer, Replicas, Scalers   | Runs containers. Managed via deployments.   |
| **Load Balancer**     | Services, Pods                             | Routes external/internal traffic to pods.   |
| **Auto Scaling**      | Pods (HPA), Nodes (Cluster Autoscaler)    | Automatically adjusts resource allocation.  |
| **Replica**           | Pods, Nodes                                | Keeps desired number of pod instances.      |
| **scoringTimeoutMs**  | Pods (especially in ML workloads)         | Defines inference timeout for a request.    |
| **maxQueueWaitMs**    | Request queue (before pod handles request)| Sets how long a request can wait in queue.  |

---

## 🔍 What is `kubelet`?

The **kubelet** is an essential **agent** that runs on every **node** (including all **worker nodes**) in a Kubernetes cluster.

---

#### ✅ Responsibilities of Kubelet

- Watches for **PodSpecs** (pod definitions) sent by the **Kubernetes control plane**.
- Ensures that the containers described in those PodSpecs are **running and healthy**.
- Communicates with the **container runtime** (like containerd or Docker).
- Reports node and pod status **back to the control plane**.

> 💡 Think of the kubelet as the "node supervisor" that takes orders from the control plane and manages pods locally.

---

#### 🔁 Relation with **Pod**

| Component     | Role                                                                 |
|---------------|----------------------------------------------------------------------|
| **Pod**       | The unit of execution; runs containers.                              |
| **kubelet**   | The agent responsible for starting, monitoring, and reporting the pod’s status on the node. |

##### 🔧 How they interact:

- When the Kubernetes **scheduler assigns a pod** to a node:
  - The kubelet on that node **receives the PodSpec**.
  - It uses the **container runtime** to pull container images and start containers.
  - It **monitors the pod’s lifecycle** and keeps the pod in the desired state.
  - It **updates pod status** (Running, Failed, etc.) to the control plane (via the API server).

---

#### 🔄 Summary

- **Pods** are created and run on a node.
- **kubelet** ensures those pods stay running and conform to their specs.
- kubelet is the **bridge** between the Kubernetes control plane and the containers running on each node.

---


## Q. 🧠 How AKS Determines the Number of Pods on Each Worker Node?

Kubernetes (including AKS) uses the **scheduler** to decide **where** to place a pod. The process is generally the same in AKS as in any standard Kubernetes setup.

---

#### 1. Scheduler Checks for Available Nodes

When a new pod is created:
- The **Kubernetes scheduler** looks for **nodes that have enough available resources** (CPU, memory) to run the pod.
- It also checks:
  - Taints and tolerations
  - Node affinity rules
  - Resource limits and requests
  - Pod topology spread constraints

---

#### 2. Pod Resource Requests and Limits

In the pod spec (YAML), you can define:

```yaml
resources:
  requests:
    cpu: "500m"
    memory: "256Mi"
  limits:
    cpu: "1"
    memory: "512Mi"
```

- `requests`: the minimum CPU/memory the pod needs.
- `limits`: the maximum the pod can use.

The scheduler uses requests (not limits) to calculate how many pods can fit on a node.