Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

prom & grafana with terraform #123

Merged
merged 2 commits into from
Mar 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 33 additions & 0 deletions AKS-Landing-Zone-Accelerator.sln
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@

Microsoft Visual Studio Solution File, Format Version 12.00
# Visual Studio Version 17
VisualStudioVersion = 17.5.002.0
MinimumVisualStudioVersion = 10.0.40219.1
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Scenarios", "Scenarios", "{9733CD30-401E-4C44-AFB8-1DEA545B5DAE}"
EndProject
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Testing-Scalability", "Testing-Scalability", "{6493926B-A2F9-49BD-96FE-35AA519863EB}"
EndProject
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SimpleApi", "Scenarios\Testing-Scalability\dotnet\SimpleApi.csproj", "{8810C871-4FAC-405F-8BA4-2D7C11B2FBD8}"
EndProject
Global
GlobalSection(SolutionConfigurationPlatforms) = preSolution
Debug|Any CPU = Debug|Any CPU
Release|Any CPU = Release|Any CPU
EndGlobalSection
GlobalSection(ProjectConfigurationPlatforms) = postSolution
{8810C871-4FAC-405F-8BA4-2D7C11B2FBD8}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
{8810C871-4FAC-405F-8BA4-2D7C11B2FBD8}.Debug|Any CPU.Build.0 = Debug|Any CPU
{8810C871-4FAC-405F-8BA4-2D7C11B2FBD8}.Release|Any CPU.ActiveCfg = Release|Any CPU
{8810C871-4FAC-405F-8BA4-2D7C11B2FBD8}.Release|Any CPU.Build.0 = Release|Any CPU
EndGlobalSection
GlobalSection(SolutionProperties) = preSolution
HideSolutionNode = FALSE
EndGlobalSection
GlobalSection(NestedProjects) = preSolution
{6493926B-A2F9-49BD-96FE-35AA519863EB} = {9733CD30-401E-4C44-AFB8-1DEA545B5DAE}
{8810C871-4FAC-405F-8BA4-2D7C11B2FBD8} = {6493926B-A2F9-49BD-96FE-35AA519863EB}
EndGlobalSection
GlobalSection(ExtensibilityGlobals) = postSolution
SolutionGuid = {1624E49C-0FCF-4216-94EE-B47601C28D73}
EndGlobalSection
EndGlobal
128 changes: 18 additions & 110 deletions Scenarios/Prometheus & Grafana/README.md
Original file line number Diff line number Diff line change
@@ -1,132 +1,40 @@
# Enable Prometheus metric collection & integration with Azure Managed Grafana
# Private Azure Grafana, Prometheus and Log Analytics with AKS

## Introduction

- This guidance helps in configuring your Azure Kubernetes Service (AKS) cluster to send data to Azure Monitor managed service for Prometheus.
With AKS, you can use `Azure Monitor Workspace for Prometheus` and `Azure Managed Grafana` to collect, query and visualize the metrics from AKS.
And to collect logs, you can use `Azure Log Analytics`.

- This will also help in creation of Azure Managed Grafana workspace to link with Azure workspace
This lab will provide an implementation for monitoring and logging.

## Architecture

## Prerequisites to create Azure Monitor workspace
![](media/architecture.png)

- The cluster must use [managed identity authentication](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/azure-monitor-workspace-overview).
- The following resource providers must be registered in the subscription of the AKS cluster and the Azure Monitor Workspace.
- Microsoft.ContainerService
- Microsoft.Insights
- Microsoft.AlertsManagement
- Register the AKS-PrometheusAddonPreview feature flag in the Azure Kubernetes clusters subscription with the following command in Azure CLI: az feature register --namespace Microsoft.ContainerService --name AKS-PrometheusAddonPreview.
- The aks-preview extension needs to be installed using the command az extension add --name aks-preview.
- Azure CLI version 2.41.0 or higher is required for this feature. Aks-preview version 0.5.122 or higher is required for this feature. You can check the aks-preview version using the az version command.


> Important : Azure Monitor managed service for Prometheus is intended for storing information about service health of customer machines and applications. It is not intended for storing any data classified as Personal Identifiable Information (PII) or End User Identifiable Information. We strongly recommend that you do not send any sensitive information (usernames, credit card numbers etc.) into Azure Monitor managed service for Prometheus fields like metric names, label names, or label values
For more details , refer https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/prometheus-metrics-overview

## Enable Prometheus metric collection

> Login into Azure CLI

```bash
az login
```

> Update Subscription

```bash
az account set --subscription "subscription-id"
```

> Register Feature

```bash
az feature register --namespace Microsoft.ContainerService --name AKS-PrometheusAddonPreview
```

> Add preview-extension

```bash
az extension add --name aks-preview
```

> Create a new default Azure Monitor workspace. If no Azure Monitor Workspace is specified, then a default Azure Monitor Workspace will be created in the DefaultRG-<cluster_region> following the format DefaultAzureMonitorWorkspace-<mapped_region>. This Azure Monitor Workspace will be in the region specific in Region mappings.

```bash
az aks update --enable-azuremonitormetrics -n <cluster-name> -g <cluster-resource-group>
```

OR

> Use an existing Azure Monitor workspace. If the Azure Monitor workspace is linked to one or more Grafana workspaces, then the data will be available in Grafana.

```bash
az aks update --enable-azuremonitormetrics -n <cluster-name> -g <cluster-resource-group> --azure-monitor-workspace-resource-id <workspace-name-resource-id>
```

## Add Azure Managed Grafana service

> Prerequisites
- Azure Subscription
- Minimum required role to create an instance: resource group Contributor (Owner role is recommended since it's needed to assign users or groups to built-in Grafana roles)
- Minimum required role to access an instance: Grafana Viewer

> Implementation

1. Create an Azure Managed Grafana workspace

```bash
az grafana create --name <managed-grafana-resource-name> --resource-group <resourcegroupname>
```

**Note:** that Azure Managed Grafana workspace is available only in specific regions. Before deployment, please choose an appropriate region.

Now let’s check if you can access your new Managed Grafana instance. Take note of the endpoint URL ending in grafana.azure.com, as displayed in the CLI output. Open a browser and navigate to this URL. If you have the right permissions, you will see the the Grafana application homepage.

![Grafana Dashboard](https://user-images.githubusercontent.com/50182145/215081171-da0d9b79-a3ec-4408-9fad-3eadc2e1a0d5.png)

For more information on this, check out the documentation on [Create an Azure Managed Grafana instance using the Azure CLI](https://learn.microsoft.com/en-us/azure/managed-grafana/quickstart-managed-grafana-cli)

**Note:** Azure Managed Grafana does not support connecting with personal Microsoft accounts currently. Please refer for additional information https://learn.microsoft.com/en-us/azure/managed-grafana/quickstart-managed-grafana-cli.

## Connect Grafana and Prometheus managed services

[Azure Managed Grafana](https://learn.microsoft.com/en-us/azure/managed-grafana/overview) provides rich visualization of Prometheus data. It's designed to work seamlessly with Azure Monitor managed service for Prometheus. Connect your managed Grafana instance to your Azure monitor workspace by following the instructions in [Connect your Azure Monitor workspace to a Grafana workspace](https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/azure-monitor-workspace-manage?tabs=azure-portal#link-a-grafana-workspace).

> Below are the steps to complete this:

- Open the Azure Monitor workspace menu in the Azure portal
- Select your workspace
- Click "Linked Grafana workspaces"
- Select a Grafana workspace

After setting this up, you can access multiple prebuilt dashboards with Prometheus metrics and customize these dashboards and/or create new ones.

## Deploying Grafana and Monitor Workspace for Prometheus using Terraform

Azure Monitor Workspace for Prometheus is a new service (in preview).
It is not yet supported with ARM template or with Terraform resource.

So, we'll use (`azapi`) terraform provider to create the Monitor Workspace for Prometheus.

And we'll use a `local-exec` to run a command line to configure AKS with Prometheus.

AKS, Grafana and Log Analytics are supported with ARM templates and Terraform.

### Deploying the resources using Terraform
## Deploying the resources using Terraform

To deploy the Terraform configuration files, run the following commands:

```shell
```sh
terraform init

terraform plan -out tfplan

terraform apply tfplan
```

### Cleanup resources
The following resources will be created.

![](images/resources.png)

## Cleanup resources

To delete the creates resources, run the following command:

```shell
```sh
terraform destroy
```

## More readings

https://learn.microsoft.com/en-us/azure/azure-monitor/essentials/azure-monitor-workspace-manage?tabs=azure-portal
30 changes: 21 additions & 9 deletions Scenarios/Prometheus & Grafana/Terraform/aks.tf
Original file line number Diff line number Diff line change
@@ -1,15 +1,22 @@
# aks cluster
resource "azurerm_kubernetes_cluster" "aks" {
name = var.aks_name
location = azurerm_resource_group.rg_aks_cluster.location
resource_group_name = azurerm_resource_group.rg_aks_cluster.name
name = "aks-cluster"
location = azurerm_resource_group.rg.location
resource_group_name = azurerm_resource_group.rg.name
dns_prefix = "aks"
kubernetes_version = "1.25.5"
kubernetes_version = "1.29.0"

network_profile {
network_plugin = "azure"
network_plugin_mode = "overlay"
ebpf_data_plane = "cilium"
outbound_type = "loadBalancer"
}

default_node_pool {
name = "default"
node_count = "3"
vm_size = "Standard_DS2_v2"
name = "systempool"
node_count = 3
vm_size = "standard_b2als_v2"
vnet_subnet_id = azurerm_subnet.snet-aks.id
}

identity {
Expand All @@ -21,9 +28,14 @@ resource "azurerm_kubernetes_cluster" "aks" {
msi_auth_for_monitoring_enabled = true
}

monitor_metrics {
annotations_allowed = null
labels_allowed = null
}

lifecycle {
ignore_changes = [
monitor_metrics
default_node_pool.0.upgrade_settings,
]
}
}
22 changes: 0 additions & 22 deletions Scenarios/Prometheus & Grafana/Terraform/enable_prometheus.tf

This file was deleted.

24 changes: 9 additions & 15 deletions Scenarios/Prometheus & Grafana/Terraform/grafana.tf
Original file line number Diff line number Diff line change
@@ -1,47 +1,41 @@
resource "azurerm_dashboard_grafana" "grafana" {
name = var.grafana_name
resource_group_name = azurerm_resource_group.rg_monitoring.name
location = azurerm_resource_group.rg_monitoring.location
name = "azure-grafana-${var.prefix}"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
sku = "Standard"
grafana_major_version = "10"
zone_redundancy_enabled = false
api_key_enabled = true
deterministic_outbound_ip_enabled = true
public_network_access_enabled = true
sku = "Standard"
zone_redundancy_enabled = true

azure_monitor_workspace_integrations {
resource_id = azapi_resource.prometheus.id
resource_id = azurerm_monitor_workspace.prometheus.id
}

identity {
type = "SystemAssigned" # The only possible values is SystemAssigned
type = "SystemAssigned"
}
}

data "azurerm_client_config" "current" {}

# assign current user as Grafana Admin
resource "azurerm_role_assignment" "role_grafana_admin" {
scope = azurerm_dashboard_grafana.grafana.id
role_definition_name = "Grafana Admin"
principal_id = data.azurerm_client_config.current.object_id
}

resource "azurerm_role_assignment" "role_monitoring_data_reader" {
scope = azapi_resource.prometheus.id
scope = azurerm_monitor_workspace.prometheus.id
role_definition_name = "Monitoring Data Reader"
principal_id = azurerm_dashboard_grafana.grafana.identity.0.principal_id
}

data "azurerm_subscription" "current" {}

# https://learn.microsoft.com/en-us/azure/azure-monitor/visualize/grafana-plugin
# (Optional) Grafana to monitor all Azure resources
resource "azurerm_role_assignment" "role_monitoring_reader" {
scope = data.azurerm_subscription.current.id
role_definition_name = "Monitoring Reader"
principal_id = azurerm_dashboard_grafana.grafana.identity.0.principal_id
}

output "garafana_endpoint" {
value = azurerm_dashboard_grafana.grafana.endpoint
}
15 changes: 15 additions & 0 deletions Scenarios/Prometheus & Grafana/Terraform/log_analytics-dce.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# not required
resource "azurerm_monitor_data_collection_endpoint" "dce-log-analytics" {
name = "dce-log-analytics"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
public_network_access_enabled = true
kind = "Linux"
}

# # not required
# resource "azurerm_monitor_data_collection_rule_association" "dcra-dce-log-analytics-aks" {
# name = "configurationAccessEndpoint" # name is required when data_collection_rule_id is specified. And when data_collection_endpoint_id is specified, the name is populated with configurationAccessEndpoint
# target_resource_id = azurerm_kubernetes_cluster.aks.id
# data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.dce-log-analytics.id
# }
57 changes: 57 additions & 0 deletions Scenarios/Prometheus & Grafana/Terraform/log_analytics-dcr.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
resource "azurerm_monitor_data_collection_rule" "dcr-log-analytics" {
name = "dcr-log-analytics"
resource_group_name = azurerm_resource_group.rg.name
location = azurerm_resource_group.rg.location
data_collection_endpoint_id = azurerm_monitor_data_collection_endpoint.dce-log-analytics.id
kind = "Linux"
depends_on = [time_sleep.wait_60_seconds]

destinations {
log_analytics {
name = "log-analytics"
workspace_resource_id = azurerm_log_analytics_workspace.workspace.id
}
}

data_flow {
streams = ["Microsoft-ContainerInsights-Group-Default", "Microsoft-Syslog"]
destinations = ["log-analytics"]
}

data_sources {
syslog {
name = "syslog-data-source"
facility_names = ["*"] # ["auth", "authpriv", "cron", "daemon", "mark", "kern", "local0", "local1", "local2", "local3", "local4", "local5", "local6", "local7", "lpr", "mail", "news", "syslog", "user", "uucp"]
log_levels = ["Debug", "Info", "Notice", "Warning", "Error", "Critical", "Alert", "Emergency", ]
streams = ["Microsoft-Syslog"]
}
extension {
extension_name = "ContainerInsights"
name = "ContainerInsightsExtension"
streams = ["Microsoft-ContainerInsights-Group-Default"]
extension_json = jsonencode(
{
dataCollectionSettings = {
enableContainerLogV2 = true
interval = "1m"
namespaceFilteringMode = "Include" # "Exclude" "Off"
namespaces = ["kube-system", "default"]
}
}
)
}
}
}

resource "azurerm_monitor_data_collection_rule_association" "dcra-dcr-log-analytics-aks" {
name = "dcra-dcr-log-analytics-aks"
target_resource_id = azurerm_kubernetes_cluster.aks.id
data_collection_rule_id = azurerm_monitor_data_collection_rule.dcr-log-analytics.id
}

# DCR creation should be started about 60 seconds after the Log Analytics workspace is created
# This is a workaround, could be fixed in the future
resource "time_sleep" "wait_60_seconds" {
create_duration = "60s"
depends_on = [azurerm_log_analytics_workspace.workspace]
}
Loading
Loading