# AKS Cookbook

## 🧪 Flyte on AKS to build and deploy data and machine learning pipelines

![visual](visual.png)

[Flyte](https://docs.flyte.org/en/latest/introduction.html) is an open-source workflow orchestrator that unifies machine learning, data engineering, and data analytics stacks to help you build robust and reliable applications. When using Flyte as a Kubernetes-native workflow automation tool, you can focus on experimentation and providing business value without increasing your scope to infrastructure and resource management. Keep in mind that Flyte isn't officially supported by Microsoft, so use it at your own discretion.  
This lab is based on the official [AKS documentation](https://learn.microsoft.com/en-us/azure/aks/use-flyte).

▶️ Click on the `Run All` button to execute all the subsequent steps in sequence, or run each step individually by executing the cells one at a time.

### TOC

- [0️⃣ Initialize notebook variables](#0)
- [1️⃣ Verify the Azure CLI and connected Azure subscription](#1)
- [2️⃣ Create a new Azure Resource Group or reuse an existing one](#2)
- [3️⃣ Create an AKS cluster](#3)
- [4️⃣ Connect to the AKS cluster](#4)
- [5️⃣ Retrieve the list of AKS cluster nodes](#5)
- [6️⃣ Deploy the sample application](#6)
- [7️⃣ List the Kubernetes resources created](#7)
- [8️⃣ Test the application](#8)
- [9️⃣ Observe the logs](#9)
- [🗑️ Clean up resources](#clean)


<a id='0'></a>
### 0️⃣ Initialize notebook variables
Adjust the location parameters according your preferences and on the [product availability by Azure region](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?cdn=disable).

In [None]:
import os, time, json, utils

notebook_path = os.path.dirname(globals()['__vsc_ipynb_file__'])

deployment_name = os.path.basename(os.path.dirname(globals()['__vsc_ipynb_file__']))
resource_group_name = f"lab-{deployment_name}" # change the name to match your naming convention
resource_group_location = "eastus2"
aks_resource_name = "aks-cluster"
aks_node_count = 1
aks_dns_name_prefix = f"aks-{deployment_name}"

aks_dns_label = "flyte22"

flyte_tenant = "flyte"
flyte_environment = "lab"
flyte_project_name = "myproject"
flyte_deploy_repo = "https://github.com/unionai-oss/deploy-flyte/"
aks_admin_role_id = 'b1ff04bb-8a4e-4dc4-8eb5-8693973ce19b' # Azure Kubernetes Service RBAC Cluster Admin: https://learn.microsoft.com/en-us/azure/role-based-access-control/built-in-roles/containers#azure-kubernetes-service-rbac-cluster-admin
tfstate_resource_group_name = f"lab-{deployment_name}-tfstate" # change the name to match your naming convention
tfstate_storage_account_name="aksflytetfstate"
tfstate_storage_account_container_name="tfstate"
utils.print_ok('Notebook initiaized')


<a id='1'></a>
### 1️⃣ Verify the Azure CLI and connected Azure subscription
The following commands ensure that you have the latest version of the Azure CLI and relevant extensions installed while also verifying that the Azure CLI is connected to your Azure subscription.

In [None]:
output = utils.run("az account show", "Retrieved az account", "Failed to get the current az account")
if output.success and output.json_data:
    current_user = output.json_data.get('user').get('name')
    subscription_id = output.json_data.get('id')
    tenant_id = output.json_data.get('tenantId')

output = utils.run("az ad signed-in-user show", "Retrieved signed-in-user", "Failed to get signed-in-user")
if output.success and output.json_data:
    signed_in_user_id = output.json_data['id']
    utils.print_info(f"Signed-in User Id: {signed_in_user_id}")

output = utils.run("az provider register --namespace Microsoft.ContainerService --wait", "Microsoft.ContainerService registered in your subscription", "Failed to register Microsoft.ContainerService")
output = utils.run("az provider register --namespace Microsoft.KubernetesConfiguration --wait", "Microsoft.KubernetesConfiguration registered in your subscription", "Failed to register Microsoft.KubernetesConfiguration")
output = utils.run("az extension add --name k8s-extension", "az k8s-extension installed", "Failed to install az k8s-extension")
output = utils.run("az extension update --name k8s-extension", "az k8s-extension updated", "Failed to update az k8s-extension")
output = utils.run("az extension add --name aks-preview", "az aks-preview extension installed", "Failed to install az aks-preview extension")
output = utils.run("az extension update --name aks-preview", "az aks-preview extension updated", "Failed to update az aks-preview extension")


<a id='2'></a>
### 2️⃣ Create resource group, storage account for the terraform state and prepare the deployment

All resources deployed in this lab will be created within the designated resource group. 

In [None]:
import shutil

utils.create_resource_group(True, tfstate_resource_group_name, resource_group_location)

output = utils.run(f"az storage account create --name {tfstate_storage_account_name} --resource-group {tfstate_resource_group_name} --location {resource_group_location} --sku Standard_RAGRS --kind StorageV2", "Storage account created", "Failed to create storage account")

output = utils.run(f"az storage container create --name {tfstate_storage_account_container_name}  --account-name {tfstate_storage_account_name} --resource-group {tfstate_resource_group_name}", "Storage container created", "Failed to create storage container")

output = utils.run(f"git clone {flyte_deploy_repo} .temp/deploy-flyte", "Cloned deploy-flyte repo", "Failed to clone deploy-flyte repo")

shutil.copyfile(f"{notebook_path}/resource_group.tf", f"{notebook_path}/.temp/deploy-flyte/environments/azure/flyte-core/resource_group.tf")
shutil.copyfile(f"{notebook_path}/variables.tf", f"{notebook_path}/.temp/deploy-flyte/environments/azure/flyte-core/variables.tf")

backend_tfvars_content = f'''
resource_group_name  = "{tfstate_resource_group_name}"
storage_account_name = "{tfstate_storage_account_name}"
container_name       = "{tfstate_storage_account_container_name}" #Storage container to store state
key                  = "flyte-on-azure/terraform.tfstate"
'''
with open(f'{notebook_path}/.temp/deploy-flyte/environments/azure/flyte-core/backend.tfvars', 'w') as backend_tfvars_file:
    backend_tfvars_file.write(backend_tfvars_content)

main_tfvars_content = f'''
resource_group_name = "{resource_group_name}"
azure_region        = "{resource_group_location}"
subscription_id     = "{subscription_id}"
tenant_id           = "{tenant_id}"
'''
with open(f'{notebook_path}/.temp/deploy-flyte/environments/azure/flyte-core/main.tfvars', 'w') as main_tfvars_file:
    main_tfvars_file.write(main_tfvars_content)

locals_tf_content = "locals {" + f'''
  flyte_domain_label = "{aks_dns_label}" #Used to build the DNS name of your deployment
  environment        = "{flyte_environment}"
  tenant             = "{flyte_tenant}"
   #You must replace this email address with your own.
  # Let's Encrypt will use this to contact you about expiring
  # certificates, and issues related to your account.
  email    =             "noreply@{flyte_tenant}.org"

# Change this only if you need to add more projects in the default installation name
# Learn more about Flyte projects and domains: https://docs.flyte.org/en/latest/concepts/projects.html - https://docs.flyte.org/en/latest/concepts/domains.html
  flyte_projects     = ["{flyte_project_name}"]
  flyte_domains      = ["development", "staging", "production"]
''' + "}"
with open(f'{notebook_path}/.temp/deploy-flyte/environments/azure/flyte-core/locals.tf', 'w') as locals_tf_file:
    locals_tf_file.write(locals_tf_content)




<a id='3'></a>
### 3️⃣ Intialize terraform and create the deployment plan


In [None]:
%cd .temp/deploy-flyte/environments/azure/flyte-core

# Intialize terraform
! terraform init -upgrade -backend=true -backend-config=backend.tfvars

! terraform plan -var-file=main.tfvars -out=flyte.plan

%cd {notebook_path}


<a id='4'></a>
### 4️⃣ Apply the Terraform plan to deploy the resources

In [None]:
%cd .temp/deploy-flyte/environments/azure/flyte-core

! terraform apply flyte.plan

%cd {notebook_path}


<a id='5'></a>
### 5️⃣ Get deployment outputs

In [None]:
%cd .temp/deploy-flyte/environments/azure/flyte-core

# retrieve terraform outputs
output = utils.run(f"terraform output -json")
if output.success and output.json_data:
    aks_endpoint = output.json_data['cluster_endpoint']['value']
    utils.print_info(f"AKS endpoint: {aks_endpoint}")

%cd {notebook_path}

<a id='6'></a>
### 6️⃣ Connect to the AKS cluster
Configure kubectl to connect to your Kubernetes cluster using the [az aks get-credentials](https://learn.microsoft.com/en-us/cli/azure/aks?view=azure-cli-latest#az-aks-get-credentials) command. This command downloads credentials and configures the Kubernetes CLI to use them.

In [None]:
aks_resource_name = "flyte-lab-flytetf"
output = utils.run(f"az aks get-credentials --resource-group {resource_group_name} --name {aks_resource_name} --overwrite-existing",
             f"Credentials for AKS cluster '{aks_resource_name}' configured",
             f"Failed to configure credentials for AKS cluster '{aks_resource_name}'")

output = utils.run(f"az aks show --resource-group {resource_group_name} --name {aks_resource_name} --only-show-errors",
            f"AKS cluster '{aks_resource_name}' retrieved",
            f"Failed to retrieve AKS cluster '{aks_resource_name}'")
if output.success and output.json_data:
    aks_node_resource_group = output.json_data.get('nodeResourceGroup')
    aks_oidc_issuer = output.json_data.get("oidcIssuerProfile").get("issuerUrl")
    print(aks_node_resource_group)

output = utils.run(f"kubectl get services ingress-nginx-controller -o wide -n ingress -o json", "Retrieved lb ip", "Failed to retrieve lb ip")
if output.success and output.json_data:
    lb_ip = output.json_data['status']['loadBalancer']['ingress'][0]['ip']
output = utils.run(f"az network public-ip list --resource-group {aks_node_resource_group} --query \"[?ipAddress=='{lb_ip}']\"", "Retrieved public ip", "Failed to retrieve public ip") 
if output.success and output.json_data:
    ip_id = output.json_data[0]['id']
    print(ip_id)

In [None]:

flyte_identity_name = "flyte-lab-flyte-user"

output = utils.run(f"az identity show -g {aks_node_resource_group} -n {flyte_identity_name} --only-show-errors", "Identity retrieved", "Failed to retrieve identity")
if output.success and output.json_data:
    aks_resource_principal_id = output.json_data['principalId']
    print(f"AKS Resource Principal Id: {aks_resource_principal_id}")

output = utils.run(f"az role assignment create --assignee {aks_resource_principal_id} --scope /subscriptions/{subscription_id}/resourcegroups/{resource_group_name}  --role Contributor", "Role assigned", "Failed to assign role")


output = utils.run(f"az identity federated-credential create --name kaito-federated-identity --identity-name {flyte_identity_name} -g {aks_node_resource_group} --issuer {aks_oidc_issuer} --subject system:serviceaccount:flytesnacks-development:default --audience api://AzureADTokenExchange", "Federated credential created", "Failed to create federated credential")  
print(output.text)


<a id='7'></a>
### 7️⃣ Uptdate the dns on the load balancer public IP


In [None]:

utils.run(f"az network public-ip update --ids {ip_id} --dns-name {aks_dns_label}", "Updated public ip with dns label", "Failed to update public ip with dns label")

config_content = f'''
admin:
  # For GRPC endpoints you might want to use dns:///flyte.myexample.com
  endpoint: dns:///{aks_endpoint}
  insecure: false # Set to false to enable TLS/SSL connection (not recommended except on local sandbox deployment).
'''
with open("config.yaml", 'w') as config_file:
    config_file.write(config_content)



<a id='8'></a>
### 8️⃣ Register and run a workflow on the AKS Flyte cluster


In [None]:
os.environ["FLYTECTL_CONFIG"] = f"{notebook_path}/config.yaml"

output = utils.run("pyflyte run --remote workflows/hello_world.py my_wf", "Started workflow execution", "Failed to start workflow execution")
print(output.text)


<a id='clean'></a>
### 🗑️ Clean up resources
When you're finished with the lab, you should remove all your deployed resources from Azure to avoid extra charges and keep your Azure subscription uncluttered. Use the [clean-up-resources notebook](clean-up-resources.ipynb) for that.