Skip to content
Permalink
Browse files
Managed disk support for Azure terraform infra (#202)
Support adding managed disk to the Azure VMs created by the terraform
testing infrastructure. By adding multiple managed disks to a VM, we can
get significantly more space for data storage and also increase
performance since the data is striped across multiple disks.

* Modify the cloud-init module to accept an argument indicating the type
  of deployment (AWS or Azure) so that conditional blocks can be
  included in the cloud-init script.
* cloud-init module now accepts an optional lvm_mount_point argument. If
  this argument is specified, then the cloud-init script will assume
  that managed disks were created and load a script on the VM and run it
  to wait for the disks to be attached, then group them in an LVM volume
  that is mounted under the specified mount point.
* The azure main.tf file accepts a new managed_disk_configuration
  optional argument that contains the LVM mount point, and the number,
  size, and sku of managed disks to add to each VM. If this argument is
  specified, then the managed disks are created and attached to the VMs,
  and the lvm mount point and expected number of disks are passed along
  to the cloud-init module. Due to the way attaching managed disks are
  supported by Terraform (they must be attached after the VM is created,
  although Azure does not have this restriction), the provisioner script
  that waits for cloud-init to complete had to be moved outside of the
  VM creation to a null_resource. This null_resource must then be
  explicitly added as a dependency of any module that requires the
  manager or worker VMs to be created AND have cloud-init completed
  running.
* Fix bug in Azure configuration where the script would fail if the
  create_resource_group variable was set to false (indicating that an
  existing resource group should be used instead of creating a new one).
* Update the maven version to 3.8.5.
  • Loading branch information
brianloss committed Apr 21, 2022
1 parent 77e4910 commit e9cbb856a0fac3335fff13fc175d7439c0940cca
Showing 9 changed files with 237 additions and 29 deletions.
@@ -165,7 +165,7 @@ The table below lists the variables and their default values that are used in th
| instance\_count | The number of EC2 instances to create | `string` | `"2"` | no |
| instance\_type | The type of EC2 instances to create | `string` | `"m5.2xlarge"` | no |
| local\_sources\_dir | Directory on local machine that contains Maven, ZooKeeper or Hadoop binary distributions or Accumulo source tarball | `string` | `""` | no |
| maven\_version | The version of Maven to download and install | `string` | `"3.8.4"` | no |
| maven\_version | The version of Maven to download and install | `string` | `"3.8.5"` | no |
| optional\_cloudinit\_config | An optional config block for the cloud-init script. If you set this, you should consider setting cloudinit\_merge\_type to handle merging with the default script as you need. | `string` | `null` | no |
| private\_network | Indicates whether or not the user is on a private network and access to hosts should be through the private IP addresses rather than public ones. | `bool` | `false` | no |
| root\_volume\_gb | The size, in GB, of the EC2 instance root volume | `string` | `"300"` | no |
@@ -208,7 +208,8 @@ The table below lists the variables and their default values that are used in th
| hadoop\_version | The version of Hadoop to download and install | `string` | `"3.3.1"` | no |
| local\_sources\_dir | Directory on local machine that contains Maven, ZooKeeper or Hadoop binary distributions or Accumulo source tarball | `string` | `""` | no |
| location | The Azure region where resources are to be created. If an existing resource group is specified, this value is ignored and the resource group's location is used. | `string` | n/a | yes |
| maven\_version | The version of Maven to download and install | `string` | `"3.8.4"` | no |
| managed\_disk\_configuration | Optional managed disk configuration. If supplied, the managed disks on each VM will be combined into an LVM volume mounted at the named mount point. | <pre>object({<br> mount_point = string<br> disk_count = number<br> storage_account_type = string<br> disk_size_gb = number<br> })</pre> | `null` | no |
| maven\_version | The version of Maven to download and install | `string` | `"3.8.5"` | no |
| network\_address\_space | The network address space to use for the virtual network. | `list(string)` | <pre>[<br> "10.0.0.0/16"<br>]</pre> | no |
| optional\_cloudinit\_config | An optional config block for the cloud-init script. If you set this, you should consider setting cloudinit\_merge\_type to handle merging with the default script as you need. | `string` | `null` | no |
| os\_disk\_caching | The type of caching to use for the OS disk. Possible values are None, ReadOnly, and ReadWrite. | `string` | `"ReadOnly"` | no |
@@ -131,6 +131,7 @@ module "cloud_init_config" {
accumulo_branch_name = var.accumulo_branch_name
accumulo_version = var.accumulo_version
authorized_ssh_keys = local.ssh_keys[*]
cluster_type = "aws"

optional_cloudinit_config = var.optional_cloudinit_config
cloudinit_merge_type = var.cloudinit_merge_type
@@ -129,7 +129,7 @@ variable "accumulo_dir" {
}

variable "maven_version" {
default = "3.8.4"
default = "3.8.5"
description = "The version of Maven to download and install"
nullable = false
}
@@ -69,6 +69,15 @@ locals {

ssh_keys = toset(concat(var.authorized_ssh_keys, [for k in var.authorized_ssh_key_files : file(k)]))

# Resource group name and location
# This is pulled either from the resource group that was created (if create_resource_group is true)
# or from the resource group that already exists (if create_resource_group is false). Keeping
# references to the resource group or data object rather than just using var.resource_group_name
# allows for terraform to automatically create the dependency graph and wait for the resource group
# to be created if necessary.
rg_name = var.create_resource_group ? azurerm_resource_group.rg[0].name : data.azurerm_resource_group.existing_rg[0].name
location = var.create_resource_group ? azurerm_resource_group.rg[0].location : data.azurerm_resource_group.existing_rg[0].location

# Save the public/private IP addresses of the VMs to pass to sub-modules.
manager_ip = azurerm_linux_virtual_machine.manager.public_ip_address
worker_ips = azurerm_linux_virtual_machine.workers[*].public_ip_address
@@ -84,6 +93,11 @@ locals {
]
}

data "azurerm_resource_group" "existing_rg" {
count = var.create_resource_group ? 0 : 1
name = var.resource_group_name
}

# Place all resources in a resource group
resource "azurerm_resource_group" "rg" {
count = var.create_resource_group ? 1 : 0
@@ -98,16 +112,16 @@ resource "azurerm_resource_group" "rg" {
# Creates a virtual network for use by this cluster.
resource "azurerm_virtual_network" "accumulo_vnet" {
name = "${var.resource_name_prefix}-vnet"
resource_group_name = azurerm_resource_group.rg[0].name
location = azurerm_resource_group.rg[0].location
resource_group_name = local.rg_name
location = local.location
address_space = var.network_address_space
}

# Create a subnet for this cluster. Give storage a service endpoint
# so that we'll be able to create an NFS share.
resource "azurerm_subnet" "internal" {
name = "${var.resource_name_prefix}-subnet"
resource_group_name = azurerm_resource_group.rg[0].name
resource_group_name = local.rg_name
virtual_network_name = azurerm_virtual_network.accumulo_vnet.name
address_prefixes = var.subnet_address_prefixes
}
@@ -116,8 +130,8 @@ resource "azurerm_subnet" "internal" {
# traffic from the internet and denies everything else.
resource "azurerm_network_security_group" "nsg" {
name = "${var.resource_name_prefix}-nsg"
location = azurerm_resource_group.rg[0].location
resource_group_name = azurerm_resource_group.rg[0].name
location = local.location
resource_group_name = local.rg_name

security_rule {
name = "allow-ssh"
@@ -140,6 +154,8 @@ resource "azurerm_network_security_group" "nsg" {
module "cloud_init_config" {
source = "../modules/cloud-init-config"

lvm_mount_point = var.managed_disk_configuration != null ? var.managed_disk_configuration.mount_point : null
lvm_disk_count = var.managed_disk_configuration != null ? var.managed_disk_configuration.disk_count : null
software_root = var.software_root
zookeeper_dir = var.zookeeper_dir
hadoop_dir = var.hadoop_dir
@@ -151,6 +167,7 @@ module "cloud_init_config" {
accumulo_version = var.accumulo_version
authorized_ssh_keys = local.ssh_keys[*]
os_type = local.os_type
cluster_type = "azure"

optional_cloudinit_config = var.optional_cloudinit_config
cloudinit_merge_type = var.cloudinit_merge_type
@@ -159,16 +176,16 @@ module "cloud_init_config" {
# Create a static public IP address for the manager node.
resource "azurerm_public_ip" "manager" {
name = "${var.resource_name_prefix}-manager-ip"
resource_group_name = azurerm_resource_group.rg[0].name
location = azurerm_resource_group.rg[0].location
resource_group_name = local.rg_name
location = local.location
allocation_method = "Static"
}

# Create a NIC for the manager node.
resource "azurerm_network_interface" "manager" {
name = "${var.resource_name_prefix}-manager-nic"
location = azurerm_resource_group.rg[0].location
resource_group_name = azurerm_resource_group.rg[0].name
location = local.location
resource_group_name = local.rg_name

enable_accelerated_networking = true

@@ -190,17 +207,17 @@ resource "azurerm_network_interface_security_group_association" "manager" {
resource "azurerm_public_ip" "workers" {
count = var.worker_count
name = "${var.resource_name_prefix}-worker${count.index}-ip"
resource_group_name = azurerm_resource_group.rg[0].name
location = azurerm_resource_group.rg[0].location
resource_group_name = local.rg_name
location = local.location
allocation_method = "Static"
}

# Create a NIC for each of the worker nodes.
resource "azurerm_network_interface" "workers" {
count = var.worker_count
name = "${var.resource_name_prefix}-worker${count.index}-nic"
location = azurerm_resource_group.rg[0].location
resource_group_name = azurerm_resource_group.rg[0].name
location = local.location
resource_group_name = local.rg_name

enable_accelerated_networking = true

@@ -223,8 +240,8 @@ resource "azurerm_network_interface_security_group_association" "workers" {
# Add a login user that can SSH to the VM using the first supplied SSH key.
resource "azurerm_linux_virtual_machine" "manager" {
name = "${var.resource_name_prefix}-manager"
resource_group_name = azurerm_resource_group.rg[0].name
location = azurerm_resource_group.rg[0].location
resource_group_name = local.rg_name
location = local.location
size = var.vm_sku
computer_name = "manager"
admin_username = var.admin_username
@@ -256,24 +273,53 @@ resource "azurerm_linux_virtual_machine" "manager" {
sku = var.vm_image.sku
version = var.vm_image.version
}
}

# Create and attach managed disks to the manager VM.
resource "azurerm_managed_disk" "manager_managed_disk" {
count = var.managed_disk_configuration != null ? var.managed_disk_configuration.disk_count : 0
name = format("%s_disk%02d", azurerm_linux_virtual_machine.manager.name, count.index)
resource_group_name = local.rg_name
location = local.location
storage_account_type = var.managed_disk_configuration.storage_account_type
disk_size_gb = var.managed_disk_configuration.disk_size_gb
create_option = "Empty"
}

resource "azurerm_virtual_machine_data_disk_attachment" "manager_managed_disk_attachment" {
count = var.managed_disk_configuration != null ? var.managed_disk_configuration.disk_count : 0
managed_disk_id = azurerm_managed_disk.manager_managed_disk[count.index].id
virtual_machine_id = azurerm_linux_virtual_machine.manager.id
lun = 10 + count.index
caching = "ReadOnly"
}

# Wait for cloud-init to complete on the manager VM.
# This is done here rather than in the VM resource because the cloud-init script
# waits for managed disks to be attached (if used), but the managed disks cannot
# be attached until the VM is created, so we'd have a deadlock.
resource "null_resource" "wait_for_manager_cloud_init" {
provisioner "remote-exec" {
inline = local.ready_script
connection {
type = "ssh"
user = self.admin_username
host = self.public_ip_address
user = azurerm_linux_virtual_machine.manager.admin_username
host = azurerm_linux_virtual_machine.manager.public_ip_address
}
}

depends_on = [
azurerm_virtual_machine_data_disk_attachment.manager_managed_disk_attachment
]
}

# Create the worker VMs.
# Add a login user that can SSH to the VM using the first supplied SSH key.
resource "azurerm_linux_virtual_machine" "workers" {
count = var.worker_count
name = "${var.resource_name_prefix}-worker${count.index}"
resource_group_name = azurerm_resource_group.rg[0].name
location = azurerm_resource_group.rg[0].location
resource_group_name = local.rg_name
location = local.location
size = var.vm_sku
computer_name = "worker${count.index}"
admin_username = var.admin_username
@@ -305,15 +351,57 @@ resource "azurerm_linux_virtual_machine" "workers" {
sku = var.vm_image.sku
version = var.vm_image.version
}
}

# Create and attach managed disks to the worker VMs.
locals {
worker_disks = var.managed_disk_configuration == null ? [] : flatten([
for vm_num, vm in azurerm_linux_virtual_machine.workers : [
for disk_num in range(var.managed_disk_configuration.disk_count) : {
datadisk_name = format("%s_disk%02d", vm.name, disk_num)
lun = 10 + disk_num
worker_num = vm_num
}
]
])
}

resource "azurerm_managed_disk" "worker_managed_disk" {
count = length(local.worker_disks)
name = local.worker_disks[count.index].datadisk_name
resource_group_name = local.rg_name
location = local.location
storage_account_type = var.managed_disk_configuration.storage_account_type
disk_size_gb = var.managed_disk_configuration.disk_size_gb
create_option = "Empty"
}

resource "azurerm_virtual_machine_data_disk_attachment" "worker_managed_disk_attachment" {
count = length(local.worker_disks)
managed_disk_id = azurerm_managed_disk.worker_managed_disk[count.index].id
virtual_machine_id = azurerm_linux_virtual_machine.workers[local.worker_disks[count.index].worker_num].id
lun = local.worker_disks[count.index].lun
caching = "ReadOnly"
}

# Wait for cloud-init to complete on the worker VMs.
# This is done here rather than in the VM resources because the cloud-init script
# waits for managed disks to be attached (if used), but the managed disks cannot
# be attached until the VMs are created, so we'd have a deadlock.
resource "null_resource" "wait_for_workers_cloud_init" {
count = length(azurerm_linux_virtual_machine.workers)
provisioner "remote-exec" {
inline = local.ready_script
connection {
type = "ssh"
user = self.admin_username
host = self.public_ip_address
user = azurerm_linux_virtual_machine.workers[count.index].admin_username
host = azurerm_linux_virtual_machine.workers[count.index].public_ip_address
}
}

depends_on = [
azurerm_virtual_machine_data_disk_attachment.worker_managed_disk_attachment
]
}

##############################
@@ -351,6 +439,10 @@ module "config_files" {

accumulo_instance_name = var.accumulo_instance_name
accumulo_root_password = var.accumulo_root_password

depends_on = [
null_resource.wait_for_manager_cloud_init
]
}

#
@@ -363,6 +455,10 @@ module "upload_software" {
local_sources_dir = var.local_sources_dir
upload_dir = var.software_root
upload_host = local.manager_ip

depends_on = [
null_resource.wait_for_manager_cloud_init
]
}

#
@@ -379,7 +475,8 @@ module "configure_nodes" {

depends_on = [
module.upload_software,
module.config_files
module.config_files,
null_resource.wait_for_workers_cloud_init
]
}

@@ -126,6 +126,35 @@ variable "os_disk_caching" {
}
}

variable "managed_disk_configuration" {
default = null
type = object({
mount_point = string
disk_count = number
storage_account_type = string
disk_size_gb = number
})
description = "Optional managed disk configuration. If supplied, the managed disks on each VM will be combined into an LVM volume mounted at the named mount point."
nullable = true

validation {
condition = var.managed_disk_configuration.mount_point != null
error_message = "The mount point must be specified."
}
validation {
condition = var.managed_disk_configuration.disk_count > 0
error_message = "The number of disks must be at least 1."
}
validation {
condition = contains(["Standard_LRS", "StandardSSD_LRS", "Premium_LRS"], var.managed_disk_configuration.storage_account_type)
error_message = "The storage account type must be one of 'Standard_LRS', 'StandardSSD_LRS', or 'Premium_LRS'."
}
validation {
condition = var.managed_disk_configuration.disk_size_gb > 0 && var.managed_disk_configuration.disk_size_gb <= 32767
error_message = "The disk size must be at least 1GB and less than 32768GB."
}
}

variable "software_root" {
default = "/opt/accumulo-testing"
description = "The full directory root where software will be installed"
@@ -178,7 +207,7 @@ variable "accumulo_dir" {
}

variable "maven_version" {
default = "3.8.4"
default = "3.8.5"
description = "The version of Maven to download and install"
nullable = false
}

0 comments on commit e9cbb85

Please sign in to comment.