# AKS Monitoring: Container Insights

Monitoring your AKS clusters is vital for improving performance, resource management, security, and cost-efficiency. Azure Monitor is an essential tool that offers a range of features to deliver these real-time insights. It allows you to keep a close watch on your AKS resources' performance and health, enabling you to enhance the operation of your containerized applications with confidence.

Through Azure Monitor, you will gain access to:

- **Metrics Explorer**: This feature allows you to visualize numeric data, allowing you to create custom charts, analyze trends, and gain valuable insights into your AKS cluster's operation

- **Container Insights**: Specifically designed for AKS, Container Insights offers a deeper understanding of your containerized workloads. You can explore metrics, logs, and performance data related to containers, nodes, controllers, and more. This in-depth analysis helps you make data-driven decisions.

- **Alerts**: Setting up custom alerts in Azure Monitor ensures you are proactively notified when specific conditions or thresholds are met. This feature enables you to respond promptly to potential issues, ensuring the continuous and uninterrupted operation of your AKS services.

- **Log Analytics**: With Log Analytics, you can delve into log data generated by your AKS cluster. Create log queries to extract specific information for effective troubleshooting and monitoring. This capability is invaluable in identifying and resolving issues rapidly.

In this lesson, we will focus on the capabilities of Container Insights. 

## Container Insights

Enabling Container Insights is a key step in harnessing the full power of Azure Monitor for monitoring containerized workloads. Container Insights provides you with detailed metrics and insights into the performance and health of your AKS clusters. Let's walk through the process of enabling Container Insights.

> The steps below will show you how to enable Container Insights if you use your own free tier Azure account. For the specialisation project you won't need to enable managed identity on the cluster, or to assign any additional permissions to your Entra ID user. Instead, you will only need use the **Configure monitoring** button in the **Insights** tab as explained below.

The first step we will need to do before enabling Container Insights is to enable managed identity on the AKS cluster. You can do this using the following command:

```bash
az aks update -g {resource-group-name} -n {aks-cluster-name} --enable-managed-identity
```

Replace `{resource-group-name}` with the name of the resource group where the AKS cluster has been deployed to, and `{aks-cluster-name}` with the name of the cluster.

The `--enable-managed-identity` command allows the AKS cluster to use a Microsoft Entra ID identity instead of the Service Principal to authenticate with Azure services. To collect advanced data and metrics from the cluster, using Container Insights, the cluster needs to have managed identity enabled.

Next, we need to add some permissions our Service Principal. Navigate to the **Subscriptions** web page, identifying your own subscription. From here access the **Access control (IAM)** page. Use the **+ Add** -> **Add role assignment** button at the top of the page to add permissions:

<p align=center> <img src=images/AddAssignmeents.png width=650 height=250> </p>

You will need to add the following three roles, one by one:

- **Monitoring Metrics Publisher**: Grants permission to publish monitoring metrics to Azure Monitor. This is important for applications and services that need to push metrics to Azure Monitor.
- **Monitoring Contributor**: Grants broad permissions for monitoring and managing monitoring resources in Azure, including permissions to read and write monitoring settings, access monitoring data and manage monitoring resources
- **Log Analytics Contributor**: Grants permissions to read and write access to Log Analytics workspaces. Includes permissions to query and analyze log data stored in those workspaces.

You should assign each of this role to your Service Principal application, in the **Members** page using the **+ Select members** button. This will open a page where you can search your Service Principal by name (in this case `myAppMaya`) and select it.

<p align=center> <img src=images/SelectMember.png width=750 height=350> </p>

After you assigned each of these three roles, we are finally ready to enable Container Insights for your cluster. 

> Remember, the steps above are not necessary when working with the Azure credentials generated for your specialisation project. Moreso, you will not have enough permissions to perform those steps if you are trying to.

Navigate to your chosen AKS cluster's homepage and access the **Insights** tab within the **Monitoring** section

<p align=center> <img src=images/Insights.png width=850 height=400> </p>

- Click the **Configure monitoring** button to initiate the process. You will be redirected to a configuration page where you can select the additional monitoring tools to enable for this cluster. Select **Enable container logs**.

<p align=center> <img src=images/ConfigureContainerInsights.png width=625 height=500> </p>

- Observe the message above, stating that a Log Analytics workspace will be created automatically. Enabling Container Insights for your AKS cluster triggers Azure to generate a Log Analytics workspace with a unique name. This workspace acts as the central repository for collecting, analyzing, and visualizing logs and performance data from your AKS service, simplifying the setup process.

- Click **Configure** to confirm the setup. Container Insights will then proceed to configure for your AKS cluster, which may take a few minutes. You can monitor the progress within the Azure Portal.

- Once the configuration is complete, you can explore the rich data and insights provided by Container Insights for your AKS cluster.

> Container Insights begins collecting data from your AKS cluster once activated. However, there may be a delay in data visibility, and it won't offer historical data prior to activation. To start viewing data, you'll need to interact with the cluster, for example by redeploying your applications to it.

### Exploring Container Insights

Once you've enabled Container Insight for your AKS cluster, you will gain access to a set of tabs that provide in-depth insights into your containerized workloads. Let's walk through each tab and understand the information contained within them:

#### The **Cluster** Tab

The **Cluster** tab provides an overview of your entire AKS cluster's performance and health. 

<p align=center> <img src=images/ClusterTab.png width=850 height=500> </p>

Key metrics included here are:

- **CPU Usage**: This critical metric measures the CPU utilization of the entire cluster. It provides an overarching view of your cluster's performance. Monitoring CPU usage helps you ensure that the cluster is operating efficiently and that there is no over-utilization or resource starvation.

- **Memory Usage**: By tracking memory consumption across the cluster, you can guarantee that resources are allocated efficiently. Identifying memory-intensive workloads or potential memory leaks is essential for maintaining the health of your AKS cluster.

- **Node Count**: Keeping tabs on the number of nodes in your cluster is essential for understanding its scaling and resource allocation. Anomalies in node count can indicate scaling issues, resource constraints, or even potential underlying infrastructure problems.

- **Pod Count**: This metric displays the total number of pods actively running within your cluster. Monitoring pod count helps you assess the demand on your cluster's resources. It can indicate workload spikes or irregularities in pod management.

As you can observe above there is some level of customization for these Container Insights. First, Container Insights allows you to select specific time frames for your charts. This feature enables you to zoom in on real-time data or expand your analysis to cover longer periods. Customizing the time range ensures that you're addressing the most relevant data based on your monitoring requirements.

<p align=center> <img src=images/TimeRange.png width=375 height=350> </p>

Within Container Insight, you also have the ability to customize chart properties in a highly granular way. You can fine-tune your visualizations to emphasize specific performance indicators and customize your insight to meet your AKS cluster monitoring requirements.

Key advanced chart customization options include:

- **Metric Aggregation**: You can select from various aggregation options, such as average, minimum, 50th percentile, 90th percentile, 95th percentile, and maximum. This allows you to delve deeper into the nuances of your cluster's performance by choosing the aggregation method that suits your analysis.

- **Percentage Metrics**: For a more detailed view of your AKS cluster's performance, you can analyze metrics based on percentages. This enables you to gain insights into how resources are distributed and how different percentiles impact the cluster's operation.

- **Status Indicators**: Container Insights allows you to incorporate status indicators for various elements, such as pods and nodes. These indicators provide real-time feedback on the health and status of your cluster's components, helping you identify issues or bottlenecks instantly.

Make sure to take some time to understand the nuances of what Container Insights can offer you. This will enable you to maximize your monitoring solutions.

#### The **Reports** Tab

The **Reports** tab in Container Insights offers pre-built reports designed to provide you with in-depth insights into key AKS metrics and trends. These reports offer a quick and accessible way to gain a thorough understanding of the cluster's performance and resource utilization.

<p align=center> <img src=images/Reports.png width=800 height=500> </p>

- The **Node Monitoring** category includes reports on disc capacity, disk IO, and GPU performance. These reports allow you to closely monitory the health and performance of individual nodes within your AKS cluster.

- The **Resource Monitoring** category includes a diverse set of reports focusing on the performance of your application deployments, in-depth insight into your workloads and pods, the performance of the Kubelet service, as well as reports on storage resource management and optimization to enhance your AKS cluster's performance and resource utilization.

- The **Billing** category offers reports on data usage, helping you track and manage the costs associated with your AKS cluster

- The **Networking** category offers reports on network configuration and performance, ensuring that your networking resources are optimized for your workloads

- The **Security** category offers reports on the security capabilities of your AKS cluster

Take some time to familiarize yourself with the information available in these reports. You can access the detailed overview of each report by double-clicking on its name. This will redirect you to the specific report page. Here's an example for a workload report:

<p align=center> <img src=images/ExampleReport.png width=900 height=500> </p>

By exploring these reports you can access a wide range of detailed insight into your AKS cluster's performance, workloads, and resource utilization. This information is invaluable for optimizing your cluster and ensuring the efficient operation of your containerized applications.

#### The **Nodes** Tab

The **Nodes** tab provides a comprehensive list of nodes within your AKS cluster, each accompanied by detailed performance metrics.

<p align=center> <img src=images/NodesTab.png width=800 height=500> </p>

Here's what you can do within this tab:

- Access a list of all nodes in your AKS cluster, with the ability to sort nodes by a metric of your choice, such as CPU usage, memory working set, or memory RSS. You can also select the type of statistical aggregation, including options like minimum (min), maximum (max), 90th percentile (90th), and more for the selected metric.

- Each node serves as the root of a tree structure that you can expand to reveal the processes running on that specific node. This hierarchical view provides insights into the processes contributing to node performance.

- By clicking on selected processes, such as containers, you can access detailed information, including metadata and overview data. Additionally, you have the option to view live logs and live events associated with the selected process, enabling real-time monitoring and troubleshooting.

Let's look at an example of selecting one of the running containers from the nodes processes:

<p align=center> <img src=images/NodeContainer.png width=800 height=525> </p>

First step is to identify a container name in the list of the node's processes. After, in the **Overview** tab, you can now access the container's metadata, which includes essential details such as the container's status, its namespace, and the image it's running.

Let's have a look at the **Live Events** tab now. Let's initiate an action by running a `kubectl` command (`kubectl scale deployment <deployment-name> --replicas=<desired-replica-count>`) to scale up the number of replicas in the deployment. This action triggers a series of live events:

<p align=center> <img src=images/LiveEvent.png width=450 height=600> </p>

In the image above, the first three live events indicate the addition of three more pods to the deployment. The subsequent message explains the cause of these initial events: a request to increase the replica count from 2 to 5. The following events demonstrate the execution of the image on these newly created pods. These live events capture the dynamic nature of the AKS cluster and illustrate how it responds to scaling requests.

#### The **Controllers** Tab

In the **Controllers** tab, you can explore various resource controllers and associated performance metrics. This tab provides insights into the controllers responsible for managing your containerized workloads within the AKS cluster, such as:

- **Controller Types**: Discover the types of controllers in use within your AKS cluster, each with its own set of performance metrics. These controllers, such as Deployments and Replica Sets, are essential for orchestrating and maintaining your applications.

- **Performance Metrics**: Gain insights into the performance metrics specific to each controller type. Monitoring these metrics enables you to assess the performance and resource allocation of your applications and controllers.

#### The **Containers** Tab

The **Containers** tab offers a straightforward list of containers running within your AKS cluster, presenting each container on a separate row. This layout allows you to quickly identify and access information about individual containers. By clicking on a specific container, you can get detailed information about that container, including metadata and performance data. This level of insight is valuable for understanding how your containers operate within the cluster and for troubleshooting container-related issues.

## Hands-On: Showcasing Metrics for Your AKS Cluster

In this hands-on we will explore and showcase specific metrics that are critical for monitoring the performance and health of your AKS cluster. We will focus on metrics from the Container Insights.

### Step 1: Access Metrics Explorer

Begin by opening the Metrics Explorer for your AKS cluster. This can be accessed under the **Metrics** tab in the **Monitoring** page.

### Step 2: Create Charts for Tracking Metrics

We will create four different charts to monitor specific container, nodes, and pods metrics within the AKS cluster. Let's begin by creating the first graph:

**Chart 1: CPU Usage Percentage**

- To monitor the CPU usage percentage, select the **Container service (managed) standard metrics** as the **Metric Namespace**

- Choose **CPU Usage Percentage** as the metric for tracking CPU usage

- Rename this graph to reflect the metric it includes. You can do this by clicking on the pencil icon at the top of the graph and entering the desired name. For example, here we named the chart **Average CPU Usage (%)**.

- Define the appropriate time range. In this case, we selected the last 4 hours, as this time frame corresponds to when we interacted with the cluster, and therefore we expect to see activity in this time frame.

<p align=center> <img src=images/SetTime.png width=450 height=350> </p>

- To be able to access this metric later we need to save it to a dashboard. Click **Save to dashboard** and then select **Pin to dashboard**. Create a new dashboard where you will store all the graphs related to ASK cluster.

<p align=center> <img src=images/CreateNewDashboard.png width=300 height=575> </p>

The graph should look similar to this:

<p align=center> <img src=images/AverageCPUUsage.png width=800 height=450> </p>

The actual data values will be different for different clusters. 

Repeat these steps to create the remaining three charts, each focused on a specific metric. Make sure to add all of them to the same dashboard.

**Chart 2: Used Disk Percentage**

Click on the **New chart** button to create the following chart:

- To monitor the average used disk percentage, select the **Container service (managed) standard metrics** as the **Metric Namespace**

- Choose **Disk Used Percentage** as the metric 

- Rename the graph accordingly, e.g. **Used Disk Percentage**

- Save it to the same dashboard where you store your AKS cluster-related graphs

<p align=center> <img src=images/UsedDiskPercentage.png width=800 height=350> </p>


**Chart 3: Network In/Out Bytes**

Click on the **New chart** button to create the following chart:

- To monitor the bytes written/read per second, select the **Container service (managed) standard metrics** as the **Metric Namespace**

- Choose **Network In Bytes** as the metric 

- Click on the **Add metric** button to add a second metric to this graph. From the same namespace select **Network Out Bytes** as the second metric.

- Rename the graph accordingly, e.g. **Network In/Out Bytes**

- Save it to the same dashboard where you store your AKS cluster-related graphs


### Step 3: Access Your Custom Dashboard

Anytime you want to check these metrics, access the custom dashboard you've created. You can access this dashboard by navigating to the **Dashboard** page in the Azure Portal.

<p align=center> <img src=images/DashboardTab.png width=850 height=400> </p>

Here from the dropdown menu you can select the name of the dashboard you want to display.

<p align=center> <img src=images/AKSDashboard.png width=800 height=500> </p>

This custom dashboard will display all the saved graphs, providing you with a comprehensive view of your AKS cluster's performance and health.

> You can modify the layout of the dashboard clicking on the **Edit** button and then moving the charts around to their desired position.

You can also modify the **UTC Time** to display the charts on your desired time frame, as by default this will show the last 24 hours of data activity.

## Key Takeaways

- Monitoring AKS clusters is essential for optimizing performance, managing resources, enhancing security, and ensuring cost-effectiveness in containerized applications
- Azure Monitor is an important tool for tracking AKS resources' performance and health, offering real-time insights
- Container Insights, designed for AKS, provides real-time metrics, logs, and performance data for containers, nodes, controllers, aiding data-driven decisions
- Azure Monitor's Metrics Explorer enables you to visualize data, create custom charts, and analyze trends, helping track performance and resource usage
- Custom dashboards allow you to consolidate and view saved graphs, providing a comprehensive understanding of your AKS cluster's performance for informed decision-making