# VM Placement Optimization

> **Student:** Jules Decaestecker

## Project Overview

This project focuses on optimizing the placement of Virtual Machines (VMs) onto clusters (or servers) while minimizing the number of clusters used. The problem involves various constraints related to VM requirements, cluster capacities, and additional placement rules.

## Problem Definition

### **Input:**

- A set of VMs, each with specific requirements:
  - Number of vCPUs
  - RAM (GB)
  - Disk space (GB)
- A set of clusters with given capacities:
  - Number of vCPUs
  - RAM (GB)
  - Disk space (GB)

### **Output:**

- Minimized number of clusters used.

## Problem Variants & Constraints

We consider multiple variants of the VM placement problem:

1. **Affinity and Anti-affinity Rules:**

   - Some VMs should be placed together (affinity).
   - Some VMs must not share the same cluster (anti-affinity).
   - Only anti-affinities between specific VM pairs are considered.

2. **Cluster Characteristics:**

   - Clusters can be partially occupied or initially empty.
   - Clusters may have heterogeneous capacities (multiple types of servers).

3. **Splitting of VMs:**
   - VMs may be split across multiple clusters.
4. **VM Families and Criticality Levels:**
   - VMs belong to families, each with a criticality level (1, 2, or 3).
   - Levels 1 & 2 or levels 2 & 3 can be together, but not levels 1 & 3.

### Useful packages

In [364]:
import numpy as np
import random
from scipy.optimize import milp, LinearConstraint

### Instantiate data

Create VMs with random vcpu, memory and disk requirements.



In [365]:
def generate_vm_name():
    adjectives = ["Wobbly", "Fluffy", "Spicy", "Grumpy",
                  "Sneaky", "Jolly", "Zesty", "Quirky", "Bouncy", "Soggy"]
    nouns = ["Penguin", "Toaster", "Banana", "Octopus", "Pancake",
             "Noodle", "Cactus", "Sasquatch", "Pickle", "Muffin"]
    number = random.randint(1, 999)
    return f"{random.choice(adjectives)}{random.choice(nouns)}-{number}"


def generate_vm(n: int, max_vcpu=8, max_memory=24, max_disk_space=32, criticity=False):
    """"
    Generate a list of VMs using random features

    Input:
    n: int - number of VMs to generate
    """
    res = []

    for i in range(n):
        # generate VM name
        vm_name = generate_vm_name()
        # generate vCPU
        vcpu = random.randint(1, max_vcpu)
        # generate Memory (GB)
        memory = random.randint(1, max_memory)
        # generate Disk Space (GB)
        disk_space = random.randint(1, max_disk_space)
        # generate criticity between 1 and 3
        if criticity:
            criticity = random.randint(1, 3)
            res.append([vm_name, vcpu, memory, disk_space, criticity])
        else:
            res.append([vm_name, vcpu, memory, disk_space])
    # type cast to numpy array for better performance
    return np.array(res)

def show_vms(vms: list):
    max_len_name = max([len(vm[0]) for vm in vms]) + 5
    print(f"VM ID\tVM Name{' '*(max_len_name-7)}\tvCPU\tMemory(GB)\tDisk Space(GB){'\tCriticity' if len(vms[0]) == 5 else ''}")
    for i, vm in enumerate(vms):
        print(f"{i}\t{vm[0]+ ' '*(max_len_name - len(vm[0]))}\t{vm[1]}\t{vm[2]}\t\t{vm[3]}\t\t{vm[4] if len(vm) == 5 else ''}")

vms = generate_vm(10)
show_vms(vms)

VM ID	VM Name               	vCPU	Memory(GB)	Disk Space(GB)
0	WobblyPenguin-505     	4	22		24		
1	QuirkyPancake-663     	3	11		4		
2	SoggyMuffin-509       	8	9		11		
3	ZestyPancake-680      	7	10		6		
4	BouncyOctopus-345     	5	22		16		
5	SpicyToaster-948      	3	10		12		
6	SoggyBanana-733       	3	21		26		
7	SpicyNoodle-992       	4	5		22		
8	JollyMuffin-500       	3	11		18		
9	GrumpyBanana-870      	4	18		26		


In [366]:
def generate_cluster_name():
    zones = ["us-west1", "us-west2", "us-west3", "us-east1", "us-east2", "us-east3"]
    names = ["Alpha", "Beta", "Gamma", "Delta", "Epsilon", "Zeta", "Eta", "Theta", "Iota", "Kappa"]
    number = random.randint(1, 999)
    return f"{random.choice(zones)}-{random.choice(names)}-{number}"

def generate_cluster(n: int, max_vcpu=64, max_memory=64, max_disk_space=128, random_=True) -> list:
    res = []

    for i in range(n):
        # generate cluster name
        cluster_name = generate_cluster_name()
        
        # generate vCPU, Memory, Disk Space
        if random_:
            vcpu = random.randint(1, max_vcpu)
            memory = random.randint(1, max_memory)
            disk_space = random.randint(1, max_disk_space)
        else:
            vcpu = max_vcpu
            memory = max_memory
            disk_space = max_disk_space
        
        res.append([cluster_name, vcpu, memory, disk_space])
    return np.array(res)

def show_clusters(clusters: list):
    max_len_name = max([len(cluster[0]) for cluster in clusters]) + 5
    print(f"Cluster ID\tCluster Name{' '*(max_len_name-12)}\tvCPU\tMemory(GB)\tDisk Space(GB)")
    for i, cluster in enumerate(clusters):
        print(f"{i}\t\t{cluster[0]+ ' '*(max_len_name - len(cluster[0]))}\t{cluster[1]}\t{cluster[2]}\t\t{cluster[3]}")

clusters = generate_cluster(10)
show_clusters(clusters)

Cluster ID	Cluster Name             	vCPU	Memory(GB)	Disk Space(GB)
0		us-west1-Epsilon-698     	4	57		117
1		us-west2-Iota-294        	2	52		120
2		us-west3-Eta-76          	52	32		37
3		us-west2-Delta-987       	52	27		8
4		us-west1-Epsilon-543     	12	60		8
5		us-west2-Theta-546       	20	12		74
6		us-west3-Alpha-908       	43	34		37
7		us-west3-Gamma-927       	25	10		104
8		us-east2-Beta-405        	57	54		60
9		us-east2-Theta-519       	26	58		62


## Problem Formulation

### Decision Variables

- $x_{ij} \in \{0, 1\}$: Binary variable indicating whether VM $i$ is placed on cluster $j$.
- $y_i \in \{0, 1\}$: Binary variable indicating whether VM $i$ is used.

### Objective Function

- **Minimize** the number of cluster used: $\sum_{i=1}^{n} y_i$

### Constraints

1. **Cluster Capacity:**

   - The sum of vCPUs, RAM, and disk space used by VMs on cluster $j$ should not exceed the cluster's capacity.

    $$\sum_{i=1}^{n} x_{ij} \cdot \text{vcpu}_i \leq \text{vcpu}_j y_i, \forall j \in \{1, \ldots, m\}$$

    $$\sum_{i=1}^{n} x_{ij} \cdot \text{ram}_i \leq \text{ram}_j y_i, \forall j \in \{1, \ldots, m\}$$

    $$\sum_{i=1}^{n} x_{ij} \cdot \text{disk}_i \leq \text{disk}_j y_i, \forall j \in \{1, \ldots, m\}$$


2. **VM Placement:**

    - Each VM should be placed on exactly one cluster.
    
      $$\sum_{j=1}^{m} x_{ij} = 1, \forall i \in \{1, \ldots, n\}$$

3. **Decision Variables bounds:**

    - $x_{ij} \in \{0, 1\}, \forall i \in \{1, \ldots, n\}, j \in \{1, \ldots, m\}$
    - $y_i \in \{0, 1\}, \forall i \in \{1, \ldots, n\}$

### Implementation in Python

We will use the `milp` and `LinearConstraint` classes from the `scipy.optimize` module to implement the optimization model.

The idea is to formulate the problem as a Mixed-Integer Linear Programming (MILP) model and solve it using the `linprog` function.

$$\text{Minimize} \quad c \cdot x$$
$$\text{Subject to:} \quad lb \leq A \cdot x \leq ub$$

where:
- $x$ is the vector of decision variables : x= $[y_1, \ldots, y_n, x_{1,1}, x_{1,2}, \ldots, x_{1,m}, x_{2,1}, \ldots, x_{n,m}]$
- $c$ is the coefficient vector of the objective function. c=$[1, \ldots, 1, 0, \ldots, 0]$

- $A$ is the matrix of constraints.
- $lb$ and $ub$ are the lower and upper bounds of the constraints.

For instance we will rewrite the constraints to fit the linear programming format:

$$ -\infty \leq  -\text{vcpu}_j y_i + \sum_{i=1}^{n} x_{ij} \cdot \text{vcpu}_i \leq 0, \quad \forall j \in \{1, \ldots, m\}$$


### Define the basic model

In [367]:
def one_constraint_A(clusters, vms, constraint_idx):
    """
    A helper function to generate the constraint matrix for a single constraint (e.g. CPU, Memory, Disk Space)
    
    Input:
    clusters: np.array(str, float, float, float) - a list of clusters with each row representing a cluster and each column representing a resource
    vms: np.array(str, float, float, float) - a list of VMs with each row representing a VM and each column representing a resource
    constraint_idx: int - the index of the constraint to generate the matrix for

    Output:
    A: np.array(float) - the constraint matrix for the given constraint_idx
    """

    cluster_constraints = clusters[:, constraint_idx].astype(float)
    vm_constraints = vms[:, constraint_idx].astype(float)
    n_clusters = len(cluster_constraints)
    n_vms = len(vm_constraints)
    A = np.zeros((n_clusters, n_clusters + n_clusters*n_vms))
    for i in range(n_clusters):
        A[i, i] = -cluster_constraints[i]
        A[i, n_clusters:][[
            i + j * n_clusters for j in range(n_clusters)]] = vm_constraints
    return A


def bin_packing(clusters, vms):
    """ 
    Bin packing problem:
    Given a set of clusters and a set of VMs, assign each VM to a cluster such that
    the total resource usage of each cluster does not exceed the capacity of the cluster.
    
    Input:
    clusters: np.array(str, float, float, float) - a list of clusters with each row representing a cluster and each column representing a resource
    vms: np.array(str, float, float, float) - a list of VMs with each row representing a VM and each column representing a resource

    Output:
    cluster_assignment: np.array(int) - a list of cluster assignments for each VM
    vms_assignment: np.array(int) - a list of VM assignments to each cluster
    """

   
    # Define sizes
    n_cluster = len(clusters)
    n_vm = len(vms)
    n_constraints = vms.shape[1] - 1 # remove the name column
    n_var = n_cluster + n_cluster*n_vm

    # Define the objective function
    c = np.zeros(n_var)
    c[:n_cluster] = 1

    # Define the constraints
    # constraints on the variables
    A = np.zeros((n_var + n_vm, n_var))
    A[:n_var] = np.eye(n_var)

    # constraints on the coefficients of the VMs
    for i in range(n_vm):
        A[n_var + i, n_cluster + i*n_cluster:n_cluster + (i+1)*n_cluster] = np.ones(n_cluster)

    # constraints on the cluster capacities
    for i in range(n_constraints):
        A = np.concatenate((A, one_constraint_A(clusters, vms, i+1)))
    # A shape = (n_var + n_vm + n_constraints*n_cluster, n_var)

    # Define the upper and lower bounds
    ub = np.concatenate((np.ones(n_var + n_vm), np.zeros(n_cluster*n_constraints)))
    lb = np.concatenate((np.zeros(n_var), np.ones(n_vm), [-np.inf]*(n_cluster*n_constraints)))

    # Define the integrality
    integrality = np.array([True]*n_var)

    # Define the constraints as a LinearConstraint object
    constraints = LinearConstraint(A, lb, ub)

    # Solve the problem
    res = milp(c=c, constraints=constraints, integrality=integrality)

    # Number of clusters used
    n_clusters_used = int(sum(res.x[:n_cluster]))
    print(f"Number of clusters used: {n_clusters_used}")
    
    # Return the results (cluster assignment, vm assignment)
    return res.x[:n_cluster], res.x[n_cluster:].reshape(n_vm, n_cluster)

# # change numpy print options
# np.set_printoptions(linewidth=1000)

assign_cluster, assign_vm = bin_packing(clusters, vms)

Number of clusters used: 4


In [368]:
def pretty_assignment_print(clusters, assign_vm, vms):
    """
    Pretty print the assignment of VMs to clusters"
    """
  
    max_len_name = max([len(cluster[0]) for cluster in clusters]) + 5
    print(f"Cluster Name{' '*(max_len_name-12)}\tCPU usage (%)\tMem usage (%)\tDisk usage (%)\tAssigned VMs")
    for i, cluster in enumerate(clusters):
        criticity = len(vms[0]) == 5
        assigned_vms = [vms[j][0] for j in range(len(vms)) if assign_vm[j][i] == 1]
        vCPU = sum([int(vms[j][1]) for j in range(len(vms)) if assign_vm[j][i] == 1])
        mem = sum([int(vms[j][2]) for j in range(len(vms)) if assign_vm[j][i] == 1])
        disk = sum([int(vms[j][3]) for j in range(len(vms)) if assign_vm[j][i] == 1])

        cluter_cpu = int(cluster[1])
        cluster_mem = int(cluster[2])
        cluster_disk = int(cluster[3])
        cpu_usage = vCPU / cluter_cpu 
        mem_usage = mem / cluster_mem 
        disk_usage = disk / cluster_disk 
        assigned_vm_names = ", ".join(assigned_vms)
        if criticity:
            assigned_vm_with_criticity = []
            for j in range(len(vms)):
                if assign_vm[j][i] == 1:
                    assigned_vm_with_criticity.append(f"{vms[j][0]} ({vms[j][4]})")
            assigned_vm_names = ", ".join(assigned_vm_with_criticity)

        print(f"{cluster[0]+ ' '*(max_len_name - len(cluster[0]))}\t{cpu_usage*100:.2f}%\t\t{mem_usage*100:.2f}%\t\t{disk_usage*100:.2f}%\t\t{assigned_vm_names}")


pretty_assignment_print(clusters, assign_vm, vms)

Cluster Name             	CPU usage (%)	Mem usage (%)	Disk usage (%)	Assigned VMs
us-west1-Epsilon-698     	0.00%		0.00%		0.00%		
us-west2-Iota-294        	0.00%		0.00%		0.00%		
us-west3-Eta-76          	15.38%		28.12%		29.73%		SoggyMuffin-509
us-west2-Delta-987       	0.00%		0.00%		0.00%		
us-west1-Epsilon-543     	0.00%		0.00%		0.00%		
us-west2-Theta-546       	0.00%		0.00%		0.00%		
us-west3-Alpha-908       	18.60%		97.06%		91.89%		BouncyOctopus-345, JollyMuffin-500
us-west3-Gamma-927       	0.00%		0.00%		0.00%		
us-east2-Beta-405        	24.56%		90.74%		96.67%		ZestyPancake-680, SoggyBanana-733, GrumpyBanana-870
us-east2-Theta-519       	53.85%		82.76%		100.00%		WobblyPenguin-505, QuirkyPancake-663, SpicyToaster-948, SpicyNoodle-992


### 1. Affinity rules between some set of VMs

some VMs could share a cluster / some others couln't (affinity / anti-affinity)

$$\begin{align*}
        \min \quad & \sum_{i=1}^{u} y_i \\
        \text{s.t.} \quad & \sum_{j=1}^{n} w_j x_{ij} \leq c y_i, \quad \forall i = 1, \dots, u \\
        & \sum_{i=1}^{u} x_{ij} = 1, \quad \forall j = 1, \dots, n \\
        & x_{ij} \in \{0,1\}, \quad y_i \in \{0,1\}, \quad \forall i,j
    \end{align*}$$

 where:
- $y_i = 1$ if bin $i$ is used, 0 otherwise.
- $x_{ij} = 1$ if item $j$ is placed in bin $i$, 0 otherwise.

to add an affinity rule, we can add a constraints that force the VMs to be in the same cluster or in different clusters :

$$\begin{align*}
        x_{ij} = x_{i'j}, \quad \forall i,i' \in \text{affinity\_rule} \quad \forall j \in \{1, \ldots, n\}
    \end{align*}$$

to add non-affinity rule, we can add a constraints that force the VMs to be in different clusters :
$$\begin{align*}
        x_{ij} + x_{i'j} \leq 1, \quad \forall i,i' \in \text{non\_affinity\_rule} \quad \forall j \in \{1, \ldots, n\}
    \end{align*}$$


In [369]:
def affinity_constraint(clusters, vms, vm_index_1, vm_index_2):
    """
    A helper function to generate the constraint matrix for a single affinity constraint"
    
    Input:
    clusters: np.array(str, float, float, float) - a list of clusters with each row representing a cluster and each column representing a resource
    vms: np.array(str, float, float, float) - a list of VMs with each row representing a VM and each column representing a resource
    vm_index_1: int - the index of the first VM
    vm_index_2: int - the index of the second VM

    Output:
    A: np.array(float) - the constraint matrix for the given affinity constraint
    """

    n_clusters = len(clusters)
    n_vms = len(vms)
    A = np.zeros((n_clusters, n_clusters + n_clusters*n_vms))
    for i in range(n_clusters):
        A[i, n_clusters:][[i + vm_index_1 *n_clusters, i + vm_index_2 *n_clusters]] = np.array([1, -1]) # x_ij - x_ik = 0
    return A

def non_affinity_constraint(clusters, vms, vm_index_1, vm_index_2):
    """A helper function to generate the constraint matrix for a single non-affinity constraint

    Input:
    clusters: np.array(str, float, float, float) - a list of clusters with each row representing a cluster and each column representing a resource
    vms: np.array(str, float, float, float) - a list of VMs with each row representing a VM and each column representing a resource
    vm_index_1: int - the index of the first VM
    vm_index_2: int - the index of the second VM

    Output:
    A: np.array(float) - the constraint matrix for the given non-affinity constraint
    """

    n_clusters = len(clusters)
    n_vms = len(vms)
    A = np.zeros((n_clusters, n_clusters + n_clusters*n_vms))
    for i in range(n_clusters):
        A[i, n_clusters:][[i + vm_index_1 *n_clusters, i + vm_index_2 *n_clusters]] = np.array([1, 1]) # x_ij + x_ik <= 1
    return A


def bin_packing_affinity(clusters, vms, affinity_constraints=[], non_affinity_constraints=[]):
    """ 
    Bin packing problem:
    Given a set of clusters and a set of VMs, assign each VM to a cluster such that
    the total resource usage of each cluster does not exceed the capacity of the cluster.
    
    Input:
    clusters: np.array(str, float, float, float) - a list of clusters with each row representing a cluster and each column representing a resource
    vms: np.array(str, float, float, float) - a list of VMs with each row representing a VM and each column representing a resource

    Output:
    cluster_assignment: np.array(int) - a list of cluster assignments for each VM
    vms_assignment: np.array(int) - a list of VM assignments to each cluster
    """

    # Define sizes
    n_cluster = len(clusters)
    n_vm = len(vms)
    n_constraints = vms.shape[1] - 1  # remove the name column
    n_var = n_cluster + n_cluster*n_vm
    n_affinity_constraints = len(affinity_constraints)
    n_non_affinity_constraints = len(non_affinity_constraints)

    # Define the objective function
    c = np.zeros(n_var)
    c[:n_cluster] = 1

    # Define the constraints
    # constraints on the variables
    A = np.zeros((n_var + n_vm, n_var))
    A[:n_var] = np.eye(n_var)

    # constraints on the coefficients of the VMs
    for i in range(n_vm):
        A[n_var + i, n_cluster + i*n_cluster:n_cluster +
            (i+1)*n_cluster] = np.ones(n_cluster)

    # constraints on the cluster capacities
    for i in range(n_constraints):
        A = np.concatenate((A, one_constraint_A(clusters, vms, i+1)))
    # A shape = (n_var + n_vm + n_constraints*n_cluster, n_var)

    # constraints on the affinity constraints
    for vm_index_1, vm_index_2 in affinity_constraints:
        A = np.concatenate((A, affinity_constraint(clusters, vms, vm_index_1, vm_index_2)))
    # A shape = (n_var + n_vm + n_constraints*n_cluster + n_affinity_constraints*n_cluster, n_var)

    # constraints on the non-affinity constraints
    for vm_index_1, vm_index_2 in non_affinity_constraints:
        A = np.concatenate((A, non_affinity_constraint(clusters, vms, vm_index_1, vm_index_2)))
    # A shape = (n_var + n_vm + n_constraints*n_cluster + n_affinity_constraints*n_cluster + n_non_affinity_constraints*n_cluster, n_var)

    # Define the upper and lower bounds
    ub = np.concatenate(
        (np.ones(n_var + n_vm), np.zeros(n_cluster*n_constraints), np.zeros(n_cluster*n_affinity_constraints), np.ones(n_cluster*n_non_affinity_constraints)))
    lb = np.concatenate((np.zeros(n_var), np.ones(
        n_vm), [-np.inf]*(n_cluster*n_constraints), np.zeros(n_cluster*n_affinity_constraints), [-np.inf]*(n_cluster*n_non_affinity_constraints)))

    # Define the integrality
    integrality = np.array([True]*n_var)

    # Define the constraints as a LinearConstraint object
    constraints = LinearConstraint(A, lb, ub)

    # Solve the problem
    res = milp(c=c, constraints=constraints, integrality=integrality)

    # Number of clusters used
    n_clusters_used = int(sum(res.x[:n_cluster]))
    print(f"Number of clusters used: {n_clusters_used}")

    # Return the results (cluster assignment, vm assignment)
    return res.x[:n_cluster], res.x[n_cluster:].reshape(n_vm, n_cluster), res

# # change numpy print options
# np.set_printoptions(linewidth=1000)

affinity_constraints = [(0, 1), (0, 2), (6, 7)]
non_affinity_constraints = [(0, 3), (0, 4), (0,5)]

assign_cluster, assign_vm, res = bin_packing_affinity(clusters, vms, affinity_constraints, non_affinity_constraints) 

Number of clusters used: 4


In [370]:
show_vms(vms)

VM ID	VM Name               	vCPU	Memory(GB)	Disk Space(GB)
0	WobblyPenguin-505     	4	22		24		
1	QuirkyPancake-663     	3	11		4		
2	SoggyMuffin-509       	8	9		11		
3	ZestyPancake-680      	7	10		6		
4	BouncyOctopus-345     	5	22		16		
5	SpicyToaster-948      	3	10		12		
6	SoggyBanana-733       	3	21		26		
7	SpicyNoodle-992       	4	5		22		
8	JollyMuffin-500       	3	11		18		
9	GrumpyBanana-870      	4	18		26		


In [371]:
pretty_assignment_print(clusters, assign_vm, vms)

Cluster Name             	CPU usage (%)	Mem usage (%)	Disk usage (%)	Assigned VMs
us-west1-Epsilon-698     	0.00%		0.00%		0.00%		
us-west2-Iota-294        	0.00%		0.00%		0.00%		
us-west3-Eta-76          	21.15%		87.50%		86.49%		ZestyPancake-680, GrumpyBanana-870
us-west2-Delta-987       	0.00%		0.00%		0.00%		
us-west1-Epsilon-543     	0.00%		0.00%		0.00%		
us-west2-Theta-546       	0.00%		0.00%		0.00%		
us-west3-Alpha-908       	18.60%		97.06%		91.89%		BouncyOctopus-345, JollyMuffin-500
us-west3-Gamma-927       	0.00%		0.00%		0.00%		
us-east2-Beta-405        	17.54%		66.67%		100.00%		SpicyToaster-948, SoggyBanana-733, SpicyNoodle-992
us-east2-Theta-519       	57.69%		72.41%		62.90%		WobblyPenguin-505, QuirkyPancake-663, SoggyMuffin-509


In [380]:
print("We can see that:")
for vm1, vm2 in affinity_constraints:
    print(f"VM {vm1} {vms[vm1][0]} and VM {vm2} {vms[vm2][0]} are assigned to the same cluster")

for vm1, vm2 in non_affinity_constraints:
    print(f"VM {vm1} {vms[vm1][0]} and VM {vm2} {vms[vm2][0]} are assigned to different clusters")

We can see that:
VM 0 WobblyPenguin-505 and VM 1 QuirkyPancake-663 are assigned to the same cluster
VM 0 WobblyPenguin-505 and VM 2 SoggyMuffin-509 are assigned to the same cluster
VM 6 SoggyBanana-733 and VM 7 SpicyNoodle-992 are assigned to the same cluster
VM 0 WobblyPenguin-505 and VM 3 ZestyPancake-680 are assigned to different clusters
VM 0 WobblyPenguin-505 and VM 4 BouncyOctopus-345 are assigned to different clusters
VM 0 WobblyPenguin-505 and VM 5 SpicyToaster-948 are assigned to different clusters


### 2. All servers are partly occupied vs totally empty and all with the same characteristics

We already have the constraint that all servers are partly occupied in the basic model as the cluster capacities are randomly generated. To have all server empty we can now generate cluster with same capacities.

In [372]:
same_clusters = generate_cluster(10, random_=False)
show_clusters(same_clusters)

Cluster ID	Cluster Name             	vCPU	Memory(GB)	Disk Space(GB)
0		us-west3-Eta-809         	64	64		128
1		us-east3-Alpha-751       	64	64		128
2		us-east3-Epsilon-183     	64	64		128
3		us-west2-Delta-486       	64	64		128
4		us-east1-Kappa-940       	64	64		128
5		us-west3-Kappa-596       	64	64		128
6		us-east3-Delta-741       	64	64		128
7		us-west3-Theta-48        	64	64		128
8		us-west2-Alpha-372       	64	64		128
9		us-west3-Kappa-901       	64	64		128


In [373]:
# bin packing basic model
assign_cluster, assign_vm = bin_packing(same_clusters, vms)
pretty_assignment_print(same_clusters, assign_vm, vms)

Number of clusters used: 2
Cluster Name             	CPU usage (%)	Mem usage (%)	Disk usage (%)	Assigned VMs
us-west3-Eta-809         	10.94%		15.62%		4.69%		ZestyPancake-680
us-east3-Alpha-751       	0.00%		0.00%		0.00%		
us-east3-Epsilon-183     	0.00%		0.00%		0.00%		
us-west2-Delta-486       	0.00%		0.00%		0.00%		
us-east1-Kappa-940       	0.00%		0.00%		0.00%		
us-west3-Kappa-596       	0.00%		0.00%		0.00%		
us-east3-Delta-741       	0.00%		0.00%		0.00%		
us-west3-Theta-48        	35.94%		89.06%		50.78%		QuirkyPancake-663, SoggyMuffin-509, BouncyOctopus-345, SpicyToaster-948, SpicyNoodle-992
us-west2-Alpha-372       	0.00%		0.00%		0.00%		
us-west3-Kappa-901       	6.25%		34.38%		18.75%		WobblyPenguin-505


### 3. VMs could be splitted over several servers

VMs could be splitted over several servers using some float variables $x_{ij}$ between 0 and 1. In fact, this variable was already used in the basic model but it was considered as a binary variable. We can now consider it as a float variable. With MILP, this changement is easy to do, we just need to change the type of the variable in the `integrality` parameter.

In [374]:
def splitted_bin_packing(clusters, vms):
    """
    splitted big packing problem:
    Given a set of clusters and a set of VMs, assign a COEFFICIENT for each VM to a cluster such that
    the total resource usage of each cluster does not exceed the capacity of the cluster.
    The objective is to minimize the number of clusters used.

    Input:
    clusters: np.array(str, float, float, float) - a list of clusters with each row representing a cluster and each column representing a resource
    vms: np.array(str, float, float, float) - a list of VMs with each row representing a VM and each column representing a resource

    Output:
    cluster_assignment: np.array(int) - a list of cluster assignments for each VM
    vm_assignment: np.array(float) - a list of coefficients for each VM to each cluster
    """

    # Define sizes
    n_cluster = len(clusters)
    n_vm = len(vms)
    n_constraints = vms.shape[1] - 1 # remove the name column
    n_var = n_cluster + n_cluster*n_vm

    # Define the objective function
    c = np.zeros(n_var)
    c[:n_cluster] = 1

    # Define the constraints
    # constraints on the variables
    A = np.zeros((n_var + n_vm, n_var))
    A[:n_var] = np.eye(n_var)

    # constraints on the coefficients of the VMs
    for i in range(n_vm):
        A[n_var + i, n_cluster + i*n_cluster:n_cluster + (i+1)*n_cluster] = np.ones(n_cluster)

    # constraints on the cluster capacities
    for i in range(n_constraints):
        A = np.concatenate((A, one_constraint_A(clusters, vms, i+1)))
    # A shape = (n_var + n_vm + n_constraints*n_cluster, n_var)

    # Define the upper and lower bounds
    ub = np.concatenate((np.ones(n_var + n_vm), np.zeros(n_cluster*n_constraints)))
    lb = np.concatenate((np.zeros(n_var), np.ones(n_vm), [-np.inf]*(n_cluster*n_constraints)))

    # Define the integrality
    integrality = np.array([True]*n_cluster + [False]*n_cluster*n_vm)

    # Define the constraints as a LinearConstraint object
    constraints = LinearConstraint(A, lb, ub)

    # Solve the problem
    res = milp(c=c, constraints=constraints, integrality=integrality)

    # Number of clusters used
    n_clusters_used = int(sum(res.x[:n_cluster]))
    print(f"Number of clusters used: {n_clusters_used}")

    # Return the results (cluster assignment, vm assignment)
    return res.x[:n_cluster], res.x[n_cluster:].reshape(n_vm, n_cluster)


assign_cluster, assign_vm = splitted_bin_packing(clusters, vms)
pretty_assignment_print(clusters, assign_vm, vms)

Number of clusters used: 4
Cluster Name             	CPU usage (%)	Mem usage (%)	Disk usage (%)	Assigned VMs
us-west1-Epsilon-698     	0.00%		0.00%		0.00%		
us-west2-Iota-294        	0.00%		0.00%		0.00%		
us-west3-Eta-76          	0.00%		0.00%		0.00%		
us-west2-Delta-987       	0.00%		0.00%		0.00%		
us-west1-Epsilon-543     	0.00%		0.00%		0.00%		
us-west2-Theta-546       	0.00%		0.00%		0.00%		
us-west3-Alpha-908       	0.00%		0.00%		0.00%		
us-west3-Gamma-927       	0.00%		0.00%		0.00%		
us-east2-Beta-405        	31.58%		55.56%		58.33%		SoggyMuffin-509, ZestyPancake-680, JollyMuffin-500
us-east2-Theta-519       	38.46%		67.24%		67.74%		QuirkyPancake-663, SpicyToaster-948, GrumpyBanana-870


We can make sure that know VM is splitted over several servers by looking at the coefficient of the variable $x_{ij}$ in the resulting solution. If the coefficient is 1, the VM is not splitted, if it is between 0 and 1, the VM is splitted.

In [375]:
print(assign_vm)

[[ 0.    0.    0.   -0.    0.    0.    0.63  0.    0.    0.37]
 [ 0.    0.   -0.   -0.   -0.    0.    0.    0.    0.    1.  ]
 [ 0.    0.    0.    0.    0.    0.    0.    0.    1.    0.  ]
 [ 0.    0.    0.   -0.    0.    0.    0.    0.    1.    0.  ]
 [ 0.    0.   -0.    0.    0.    0.    0.37  0.    0.42  0.21]
 [ 0.    0.    0.    0.    0.    0.    0.    0.    0.    1.  ]
 [-0.   -0.    0.    0.   -0.    0.    0.    0.    0.7   0.3 ]
 [ 0.    0.    0.    0.    0.    0.82  0.18 -0.    0.    0.  ]
 [ 0.    0.    0.    0.    0.    0.    0.    0.    1.   -0.  ]
 [ 0.    0.    0.    0.    0.    0.   -0.    0.    0.    1.  ]]


### 4. Consider VMs families, each family is given a criticity level between 1 to 3

Now each VM has a criticity level between 1 and 3. We can add a constraint that force the VMs of criticity level 1 and 3 to be in different clusters

In [376]:
vms_critical = generate_vm(10, criticity=True)
show_vms(vms_critical)

VM ID	VM Name                 	vCPU	Memory(GB)	Disk Space(GB)	Criticity
0	WobblyPickle-810        	2	7		19		1
1	GrumpyBanana-105        	6	10		13		3
2	QuirkyBanana-707        	7	13		6		3
3	WobblySasquatch-278     	4	9		18		1
4	JollyToaster-251        	4	19		2		3
5	GrumpyToaster-799       	2	16		30		3
6	SpicyOctopus-341        	5	2		15		3
7	BouncyNoodle-758        	6	17		18		3
8	QuirkyNoodle-444        	7	10		6		1
9	GrumpyBanana-645        	5	4		32		2


In [377]:
def create_non_affinity_constraints_from_critical_vms(vms):
    """
    Create non-affinity constraints from critical vms
    """
    n_vms = len(vms)
    non_affinity_constraints = []
    for i in range(n_vms):
        for j in range(i+1, n_vms):
            if (int(vms[i][4]) == 3 and int(vms[j][4]) == 1) or (int(vms[i][4]) == 1 and int(vms[j][4]) == 3):
                non_affinity_constraints.append((i, j))
    return non_affinity_constraints


def bin_packing_criticity(clusters, vms):
    """ 
    Bin packing problem:
    Given a set of clusters and a set of VMs, assign each VM to a cluster such that
    the total resource usage of each cluster does not exceed the capacity of the cluster.
    
    Input:
    clusters: np.array(str, float, float, float) - a list of clusters with each row representing a cluster and each column representing a resource
    vms: np.array(str, float, float, float) - a list of VMs with each row representing a VM and each column representing a resource

    Output:
    cluster_assignment: np.array(int) - a list of cluster assignments for each VM
    vms_assignment: np.array(int) - a list of VM assignments to each cluster
    """

    # Define sizes
    n_cluster = len(clusters)
    n_vm = len(vms)
    n_constraints = vms.shape[1] - 2  # remove the name column and the criticity column
    n_var = n_cluster + n_cluster*n_vm

    # Define the objective function
    c = np.zeros(n_var)
    c[:n_cluster] = 1

    # Define the constraints
    # constraints on the variables
    A = np.zeros((n_var + n_vm, n_var))
    A[:n_var] = np.eye(n_var)

    # constraints on the coefficients of the VMs
    for i in range(n_vm):
        A[n_var + i, n_cluster + i*n_cluster:n_cluster +
            (i+1)*n_cluster] = np.ones(n_cluster)

    # constraints on the cluster capacities
    for i in range(n_constraints):
        A = np.concatenate((A, one_constraint_A(clusters, vms, i+1)))
    # A shape = (n_var + n_vm + n_constraints*n_cluster, n_var)


    # define non-affinity constraints from critical vms
    non_affinity_constraints = create_non_affinity_constraints_from_critical_vms(vms)

    n_non_affinity_constraints = len(non_affinity_constraints)

    # constraints on the non-affinity constraints
    for vm_index_1, vm_index_2 in non_affinity_constraints:
        A = np.concatenate((A, non_affinity_constraint(
            clusters, vms, vm_index_1, vm_index_2)))
    # A shape = (n_var + n_vm + n_constraints*n_cluster + n_affinity_constraints*n_cluster + n_non_affinity_constraints*n_cluster, n_var)

    # Define the upper and lower bounds
    ub = np.concatenate(
        (np.ones(n_var + n_vm), np.zeros(n_cluster*n_constraints), np.ones(n_cluster*n_non_affinity_constraints)))
    lb = np.concatenate((np.zeros(n_var), np.ones(
        n_vm), [-np.inf]*(n_cluster*n_constraints), [-np.inf]*(n_cluster*n_non_affinity_constraints)))

    # Define the integrality
    integrality = np.array([True]*n_var)

    # Define the constraints as a LinearConstraint object
    constraints = LinearConstraint(A, lb, ub)

    # Solve the problem
    res = milp(c=c, constraints=constraints, integrality=integrality)

    # Number of clusters used
    n_clusters_used = sum([1 for i in res.x[:n_cluster] if i > 0.5])
    print(f"Number of clusters used: {n_clusters_used}")

    # Return the results (cluster assignment, vm assignment)
    return res.x[:n_cluster], res.x[n_cluster:].reshape(n_vm, n_cluster), res

# # change numpy print options
# np.set_printoptions(linewidth=1000)

assign_cluster, assign_vm, res = bin_packing_criticity(clusters, vms_critical)

Number of clusters used: 4


In [378]:
# pretty print the results
pretty_assignment_print(clusters, assign_vm, vms_critical)

Cluster Name             	CPU usage (%)	Mem usage (%)	Disk usage (%)	Assigned VMs
us-west1-Epsilon-698     	0.00%		0.00%		0.00%		
us-west2-Iota-294        	0.00%		0.00%		0.00%		
us-west3-Eta-76          	0.00%		0.00%		0.00%		
us-west2-Delta-987       	0.00%		0.00%		0.00%		
us-west1-Epsilon-543     	0.00%		0.00%		0.00%		
us-west2-Theta-546       	35.00%		91.67%		68.92%		WobblyPickle-810 (1), GrumpyBanana-645 (2)
us-west3-Alpha-908       	27.91%		79.41%		83.78%		GrumpyBanana-105 (3), BouncyNoodle-758 (3)
us-west3-Gamma-927       	0.00%		0.00%		0.00%		
us-east2-Beta-405        	19.30%		35.19%		40.00%		WobblySasquatch-278 (1), QuirkyNoodle-444 (1)
us-east2-Theta-519       	69.23%		86.21%		85.48%		QuirkyBanana-707 (3), JollyToaster-251 (3), GrumpyToaster-799 (3), SpicyOctopus-341 (3)


We can see that the rule to separate the VMs of criticity level 1 and 3 is respected.