In [1]:
#imports
import numpy as np

# Load Balancing With Contraints Example

**Note: Make sure you've already read Part 1, in which we do this problem without constraints. We'll only be explaining the new bits here.**

In this example we're going to show how you could use various approaches to solve a **constrained** load balancing problem. 

For this problem, we're talking execution times on computer processors, with total execution time on certain processors limited to a certain amount. You might see this happen when one processor needs to "reserve" processing cycles for some other job not in our load balancing list. 

We can describe it as:

given a list of $n$ execution times, divide them to be executed on $k$ processors so that the total execution time on each processor is as close to the same as possible, while $y$ constrained processors are under $x$ execution time limit.

There are two kinds of constraints we can implement with our metaheuristic algorithms:
* hard constraints
* soft constraints

We'll start with hard constraints.

## Hard Constraint
A hard constraint is a constraint which rejects any solution that doesn't meet our specifications. We've seen these before.n Pyomo we used hard constraints. We can use hard constraints with some of our metaheuristics methods. We'll use hard constraints with greedy local search and simulated annealing.


### Hard Constraint - Objective Function
For a hard constraint, our objective function remains identical, too. (If you need a refresher on what the move function is doing, please see the Lesson_05_Load_Balancing notebook.)


In [2]:
# original objective function = total squared deviation of times from balanced times
def balance_metric(assign,times,k):
    target = sum(times)/k
    return sum( (sum(times[assign==j])-target)**2 for j in range(k) )


### Hard Constraint - Move Function
Our move function is where we can implement a hard constraint. We'll implement it by first completing the move, then checking to see if the new assignments meet our constraints. If they do not, we'll return the original assignments. If they do, we'll return the new assignments.

To do this, we'll need to pass in two additional parameters:

* conproc - the list of constrained processors
* conmax - the list of max total processing time allowed on each constrained processor

Let's look at the function first.



In [3]:
# define a move function which changes one processor assignment randomly
def reassign_one(assign,k,conproc, conmax):
    # pick one of the jobs and assign it to one of k processors
    n = len(assign)
    # choose a job and a new processor assignment
    which_job = np.random.randint(0,n,1)[0]
    which_proc = np.random.randint(0,k,1)[0]
    #make a copy of the assignments
    new_assign = assign.copy()
    new_assign[which_job] = which_proc
    
    ###################
    # NEW - Evaluate if the new assignments meet our constraint
    over_max = True in [sum(times[new_assign==c]) > conmax[c] for c in conproc]
    # Only return a new assignment if it meets our constraints
    #uncomment this line to see total time on processor
    #print('Total time on each processor (inside function):', [ sum(times[new_assign==j]) for j in range(k)])
    if over_max == False:
        #print('Not over max') #uncomment this line to see if it passed
        return new_assign
    else:
        #print('Over max') #uncomment this line to see if it failed
        return assign


Let's see what that looks like with a simple problem. We'll create some sample data, and run the reassign_one() function, constraining processor 0 to a total processing time of 10. You can uncomment the print statements in the function to see what's happening inside the function, or you can just compare what goes in with what comes out below.


In [4]:
k = 3
times = np.array([2,4,6,2,4,6,2,4,6])
assign=np.array([0,0,0,1,1,1,2,2,2])

# total time on each processor ... should be the same
print('Total time on each processor (going in):', [ sum(times[assign==j]) for j in range(k)])
#reassign one, with processor 0 constrained to 10
new_assign = reassign_one(assign,k, [0], [10])
print('Processor 0 Constrained to 10:', new_assign)
print('Total time on each processor (coming out):', [ sum(times[new_assign==j]) for j in range(k)])

Total time on each processor (going in): [12, 12, 12]
Processor 0 Constrained to 10: [0 0 0 1 1 1 2 2 2]
Total time on each processor (coming out): [12, 12, 12]


### Greedy Local Search - Hard Constraint

To implement the hard constraint in our greedy local search, we'll use our new move function. To use that, we need our two additional parameters, so we'll update our load_balance_local function to take in 2 additional parameters:

* conproc - a list of the processors to constrain
* conmax - a list of the max times on each processor.

We'll also track whether the algorithm ever finds a solution that meets the constraints. It's possible with a hard constraint that we never find a solution that works.


In [5]:
# local search function
def load_balance_local(times, k, max_no_improve,conproc,conmax):
    n = len(times)
    # starts from a random assignment to k processors
    current_x = np.random.randint(low=0,high=k,size=n)
    current_f = balance_metric(current_x, times, k)
    best_x = current_x
    best_f = current_f
    ##########################
    # New - track convergence
    converged = False
    ##########################
    # stop search if no better x is found within max_no_improve iterations
    num_moves_no_improve = 0
    iterations = 0
    while (num_moves_no_improve < max_no_improve):
        num_moves_no_improve += 1
        iterations += 1  # just for tracking
        ##################################
        # NEW - pass the extra parameters to reassign_one
        new_x = reassign_one(current_x,k,conproc,conmax)
        ##################################
        new_f = balance_metric(new_x, times, k)
        if new_f < current_f:
            #################################
            #NEW - track if we ever accept a solution
            converged = True
            #################################      
            num_moves_no_improve = 0
            current_x = new_x
            current_f = new_f
            if current_f < best_f:  
                best_x = current_x  
                best_f = current_f
    return best_x, best_f, iterations, converged

Let's run this with a small number of processors and a small number of job execution times. First let's generate some random data and see what the time on each processor would be if it loads were completely balanced.

In [6]:
# generate random job times
np.random.seed(666) #comment this out to play with new numbers
#we'll start with 20 execution times
n = 30
#we'll start with 2 processors
k = 3
min_time = 20
max_time = 200
times = np.random.randint(low=min_time, high = max_time, size = n)
assign = np.random.randint(low=0,high=k,size=n)
# total time on each processor
print('Total time on each processor, if completely balanced:', sum(times)/k)


Total time on each processor, if completely balanced: 1220.6666666666667


#### Running local search with constraints

Let's start with setting processor 0 to be constrained to a max processing time of 1100. Run this code several times. How often do you get convergence?

In [7]:
#####################
# NEW: adding our 2 additional parameters to the function and one additional return variable
#####################
best_assign, best_f, num_iter, converged = load_balance_local(times,k,5000,[0],[1100]) 
print('The algorithm found a solution that met the criteria:', converged)
print('The best assignment is', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The deviation from balance is', best_f)
print('It took', num_iter, 'iterations.')

The algorithm found a solution that met the criteria: False
The best assignment is [2 0 0 2 0 0 1 1 1 1 1 1 0 0 0 2 1 2 1 0 0 1 2 0 0 2 1 1 2 2]
Total time on each processor: [1353, 1373, 936]
The deviation from balance is 121752.66666666666
It took 5000 iterations.


What if we wanted to constrain 2 of our processors? Easy! We just add to our conproc and conmax lists. This time, let's constrain processor 0 to a max time of 1200 and processor 1 to a max time of 1100. Again, run this code multiple times and see how often the algorithm converges.

In [8]:
best_assign, best_f, num_iter, converged = load_balance_local(times,k,5000,[0,1],[1200,1100]) #adding our 2 additional parameters here
print('The algorithm found a solution that met the criteria:', converged)
print('The best assignment is', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The deviation from balance is', best_f)
print('It took', num_iter, 'iterations.')

The algorithm found a solution that met the criteria: False
The best assignment is [2 1 2 2 2 1 1 2 2 1 2 1 2 2 1 1 2 0 0 0 1 1 1 0 1 2 2 1 1 0]
Total time on each processor: [749, 1391, 1522]
The deviation from balance is 342284.6666666667
It took 5000 iterations.


### Simulated Annealing - By Hand - Hard Constraints

We can take the same hard constraint approach with our hand-coded simulated annealing problem. Once again, we'll add 2 parameters to our custom_simanneal function:

* conproc - a list of the processors to constrain
* conmax - a list of the max times on each processor.

And once again we'll pass back a convergence variable to let us know if we ever found a solution that matched our constraints.

We'll use the same set of jobs from the previous example so you can compare. 

In [9]:

def custom_simanneal(times, k, max_no_improve, temp, alpha, conproc, conmax):
    #get the length of our jobs
    n = len(times)
    # starts from a random assignment to k processors
    current_x = np.random.randint(low=0,high=k,size=n)
    current_f = balance_metric(current_x, times, k)
    best_x = current_x
    best_f = current_f
    
    #this is just for tracking
    iterations = 1
    trajectory = [[iterations,current_f]]
    trajectory_best = [[iterations,best_f]]
    ##########################
    # New - track convergence
    converged = False
    ##########################

    # stop search if no better x is found within max_no_improve iterations
    num_moves_no_improve = 0
    while (num_moves_no_improve < max_no_improve):
        num_moves_no_improve += 1
        iterations += 1  # just for tracking
        ###################################
        #NEW - add the 2 extra parameters
        new_x = reassign_one(current_x,k, conproc, conmax)
        ###################################
        new_f = balance_metric(new_x, times, k)
      
        #determine the change in score
        delta = new_f - current_f
        #determine the probability of accepting this solution
        prob = np.exp(min(delta, 0) / temp)
        
        #determine if we'll accept this solution
        accept = new_f < current_f or np.random.uniform() < prob          
        if accept:   
            current_x = new_x
            current_f = new_f
            if current_f < best_f:  
                #################
                #New - track if we ever got a better solution than the first
                converged = True
                #################
                best_x = current_x  
                best_f = current_f
                num_moves_no_improve = 0
        temp *= alpha
        iterations += 1
        trajectory.append([iterations,current_f])
        trajectory_best.append([iterations,best_f])        
    return best_x, best_f, iterations, trajectory, trajectory_best,converged ####NEW: Return extra variable
    

#######
# New - add the 2 extra parameters
best_x, best_f, iterations, trajectory, trajectory_best, converged = custom_simanneal(times, k, 1000, 500, .99, [0],[1100])

print('The algorithm found a solution that met the criteria:', converged)
print('The best assignment is', best_f)
print('Total time on each processor:', [ sum(times[best_x==j]) for j in range(k)])
print('The deviation from balance is', best_f)

The algorithm found a solution that met the criteria: False
The best assignment is 383660.6666666667
Total time on each processor: [1466, 1481, 715]
The deviation from balance is 383660.6666666667


### The simanneal Package - Hard Constraints
We can also use a hard constraint with the simanneal package. As a reminder, with simanneal, you don't use external functions. You add your code within the package's move and energy functions. To use a hard constraint in simanneal, you'd enforce the constraint in the **move** function, just like we did with our hand-coding.

We again need our two extra variables:
* conproc - a list of the processors to constrain
* conmax - a list of the max times on each processor.

But this time we'll pass them into the initialization function of the simanneal package.


Let's see what that looks like.



In [10]:
#this line just imports the package
from simanneal import Annealer

#this is the line where we decide what we're calling this problem
class loadProblem(Annealer):

    # Here's where we pass extra data if we need it. We need to pass our times (jobs) variable and the number of servers (k)
    ##############################
    #NEW - add 2 extra parameters
    def __init__(self, state, times, k, conproc, conmax):
        ###############################
        #this line makes the times accessible within the other two functions
        self.times = times
        self.k = k
        ###########################
        # New Set up 2 new variables
        self.conproc = conproc
        self.conmax = conmax
        ###########################
        #this is how we initialize - note we're calling super with the same name as above (loadProblem)
        super(loadProblem, self).__init__(state)  # important!

    def move(self):
        """This corresponds to our previous reassign one function"""
        # pick one of the jobs and assign it to one of k processors
        
        #############################
        #NEW - We have to COPY the state
        assign = self.state.copy()
        n = len(assign)
        k = self.k
        # choose a job and a new processor assignment
        which_job = np.random.randint(0,n,1)[0]
        which_proc = np.random.randint(0,k,1)[0]
        assign[which_job] = which_proc
        
        #################################################
        # NEW - hard constraint enforcement
        over_max = True in [sum(self.times[assign==c]) > self.conmax[c] for c in self.conproc]
        # Only update the state if it meets our requirements
        if over_max == False:
            #we only update the state if we've met our constraints
            self.state = assign
        
    
    def energy(self):
        """This corresponds to our balance_metric function"""
        times = self.times
        assign = self.state
        k = self.k
        target = sum(times)/k
        return sum( (sum(times[assign==j])-target)**2 for j in range(k) )


#initialize the class
ld = loadProblem(assign, times, k, [0], [1100])
ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results

# since our state is a numpy array, we need deepcopy
ld.copy_strategy = "deepcopy" 
#this is what kicks it off
best_assign, best_score = ld.anneal()



print('The best set is: ', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The best score is:', best_score) 

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   260.00000      28648.67    56.05%     0.00%     0:00:03     0:00:002 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   260.00000      22858.67    55.42%     0.00%     0:00:13     0:00:003

The best set is:  [0 2 1 2 1 0 2 1 1 0 1 2 0 0 0 2 1 1 0 2 1 2 0 1 2 0 1 2 0 2]
Total time on each processor: [1100, 1280, 1282]
The best score is: 21842.666666666664


**Note:** Simanneal evaluates the problem space before running. If your constraint is set too low, simanneal will print the first pink line, and then just hang. If you're playing with this and it gets stuck, you'll need to restart your kernel and loosen your constraints.

## Soft-Constraints
As we've seen, sometimes with hard constraints you fail to get a viable solution, and we can't even get closer to a viable solution because we're rejecting any option that doesn't meet the constraint. Soft constraints fix that problem. We won't always get a solution that meets the constraint, but the algorithm will at least have a chance to get closer to an optimal solution. The Big M approach is a type of soft constraint.

In the metaheuristic algorithms we're exploring, soft constraints are implemented in the objective function. Instead of rejecting a solution outright, a penalty is incorporated. For a minimization problem, a positive number is added when the constraint isn't met. For a maximization problem, a negative number is added. In our code, we're adding a multiplier to the penalty. The larger the multipler, the "harder" the soft constraint.

Let's look at what this would look like with a hand-solved problem.

We'll keep our original objective function (balanced_metric), but we'll add a new wrapper function (balanced_metric_constrained). This one will take in 2 additional parameters:
* a list of constrained processors
* a list of the max times on each processor

In [11]:

# constrained objective function = total squared deviation of times from balanced times, providing a penalty for constraints
def balance_metric_constrained(assign,times,k,conproc,conmax):
    #sum the unconstrained processor deviation
    dev_uncon = balance_metric(assign,times,k)
    #sum the constrained processors
    penalty_multiplier = 5
    dev_penalty = penalty_multiplier * sum( max(sum(times[assign==c])-conmax[c],0)**2 for c in conproc )

    return dev_uncon + dev_penalty

### Testing the Soft Constraint

We'll test our two functions with some hand-coded assignments. We'll use 9 jobs on 3 processors. First we'll look at them as an unconstrained, perfectly balanced problem.

In [12]:
#testing perfectly balanced unconstrained
k = 3
times = np.array([2,4,6,2,4,6,2,4,6])
assign=np.array([0,0,0,1,1,1,2,2,2])

# total time on each processor ... should be the same
print('Total time on each processor:', [ sum(times[assign==j]) for j in range(k)])
#print the original balance metric
print('Unconstrained Balance Metric:', balance_metric(assign,times,k))

Total time on each processor: [12, 12, 12]
Unconstrained Balance Metric: 0.0


Now we'll add some constraints. Note that neither our times nor assignments are changing. But, we're essentially changing the target for some of our processors. We're going to set processor 0 to a max limit of 10.

In [13]:
# total time on each processor has not changed
print('Total time on each processor (has not changed):', [ sum(times[assign==j]) for j in range(k)])
#Constrain processor 1 to 10
print('Constrained Balance Metric:', balance_metric_constrained(assign,times,k,[0],[10]))

Total time on each processor (has not changed): [12, 12, 12]
Constrained Balance Metric: 20.0


With the constraint in place, what was a completely balanced solution no longer looks so great. What would happen if we switch our assignments around some?

In [14]:
#new assignments
assign=np.array([1,0,0,1,1,1,2,2,2])
print('Total time on each processor (has changed):', [ sum(times[assign==j]) for j in range(k)])

#check the unconstrained balance metric
print('Balance Metric without constraints', balance_metric(assign,times,k))
#check the constrained balance metric
print('Constrained Balance Metric:', balance_metric_constrained(assign,times,k,[0],[10]))

Total time on each processor (has changed): [10, 14, 12]
Balance Metric without constraints 8.0
Constrained Balance Metric: 8.0


Here, we've met our constraint, so our unconstrained and constrained balance metrics match.

### The simanneal Package - Soft Constraint
With simanneal, instead of adding our hard constraint to the move() function, we'd add our soft constraint to the energy() function. 


In [15]:
#this line just imports the package
from simanneal import Annealer

#this is the line where we decide what we're calling this problem
class loadProblem(Annealer):

    # Here's where we pass extra data if we need it. We need to pass our times (jobs) variable and the number of servers (k)
    ##############################
    #NEW - add 2 extra parameters
    def __init__(self, state, times, k, conproc, conmax):
        ###############################
        #this line makes the times accessible within the other two functions
        self.times = times
        self.k = k
        ###########################
        # New Set up 2 new variables
        self.conproc = conproc
        self.conmax = conmax
        ###########################
        #this is how we initialize - note we're calling super with the same name as above (loadProblem)
        super(loadProblem, self).__init__(state)  # important!

    def move(self):
        """This corresponds to our previous reassign one function"""
        # pick one of the jobs and assign it to one of k processors
        ##################################
        #NEW - back to just changing the state directly
        assign = self.state
        n = len(assign)
        k = self.k
        # choose a job and a new processor assignment
        which_job = np.random.randint(0,n,1)[0]
        which_proc = np.random.randint(0,k,1)[0]
        assign[which_job] = which_proc

        
    
    def energy(self):
        """This corresponds to our balance_metric function"""
        times = self.times
        assign = self.state
        k = self.k
        conproc = self.conproc
        conmax = self.conmax
        ############################################
        #NEW - determing the energy and assign a penalty
        target = sum(times)/k
        #sum the unconstrained processor deviation
        dev_uncon = sum( (sum(times[assign==j])-target)**2 for j in range(k) )
        #sum the constrained processors
        penalty_multiplier = 5
        dev_penalty = penalty_multiplier * sum( max(sum(times[assign==c])-conmax[c],0)**2 for c in conproc )

        return dev_uncon + dev_penalty

Let's generate some more test data and run our soft constraint version of simanneal.

In [16]:
# generate random job times
np.random.seed(666) #comment this out to play with new numbers
#we'll start with 20 execution times
n = 30
#we'll start with 2 processors
k = 3
min_time = 20
max_time = 200
times = np.random.randint(low=min_time, high = max_time, size = n)
assign = np.random.randint(low=0,high=k,size=n)
# total time on each processor
print('Total time on each processor, if completely balanced:', sum(times)/k)


Total time on each processor, if completely balanced: 1220.6666666666667


With completely balanced loads, we'd have 1220-ish on each processor. We'll set processor 0's constraint to 1100. Run this code several times. How often do you meet the constraint? How balanced does the workload seem compared to our hard constraint version?

In [17]:
#initialize the class
ld = loadProblem(assign, times, k, [0], [1100])
ld.set_schedule(ld.auto(minutes=.2)) #set approximate time to find results

# since our state is a numpy array, we need deepcopy
ld.copy_strategy = "deepcopy" 
#this is what kicks it off
best_assign, best_score = ld.anneal()



print('The best set is: ', best_assign)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])
print('The best score is:', best_score) 

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   450.00000      17054.67    33.25%     0.00%     0:00:03    -1:59:591 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   450.00000      17184.67    34.52%     0.16%     0:00:13     0:00:001

The best set is:  [0 1 0 0 1 1 1 2 1 1 1 2 2 2 2 2 2 1 1 0 0 0 1 2 0 0 2 2 2 0]
Total time on each processor: [1128, 1266, 1268]
The best score is: 16802.666666666668


## Genetic Algorithm with DEAP - Soft Constraints

Again with DEAP we'll do a soft constraint in our energy function. This requires a few small, but important changes. First, the things that stay the same. 
* Our create_individual() function
* Our custom_ga() function

Neither of these change at all and we can just copy/paste the code from the load balance without constraints example.

In [18]:
# No changes to this function
def create_individual(k,n):
    current_x = np.random.randint(low=0,high=k,size=n)
    return current_x.tolist() #this converts our np array back to a list


# no changes here, call this to execute the genetic algorithm
def customGA(in_toolbox,in_tools,in_stats,pop_size, cx_prob, mut_prob, max_gen, max_no_improve):

    pop = in_toolbox.population(n=pop_size)
    logbook = in_tools.Logbook()
    hof = in_tools.HallOfFame(1)

    # Evaluate the entire population
    fitnesses = list(map(in_toolbox.evaluate, pop))
    for ind, fit in zip(pop, fitnesses):
        ind.fitness.values = fit

    hof.update(pop)
    best_val = hof[0].fitness.values
    num_no_improve = 0
    generation = 0

    while num_no_improve < max_no_improve and generation < max_gen:

        # Select the next generation individuals
        selected = in_toolbox.select(pop, len(pop))
        # Clone the selected individuals
        offspring = list(map(in_toolbox.clone, selected))

        # Apply crossover and mutation on the offspring
        for child1, child2 in zip(offspring[::2], offspring[1::2]):
            if random.random() < cx_prob:
                in_toolbox.mate(child1, child2)
                del child1.fitness.values
                del child2.fitness.values

        for mutant in offspring:
            if random.random() < mut_prob:
                in_toolbox.mutate(mutant)
                del mutant.fitness.values

        # Evaluate the individuals with an invalid fitness
        invalid_ind = [ind for ind in offspring if not ind.fitness.valid]
        fitnesses = map(in_toolbox.evaluate, invalid_ind)
        num_evals = 0
        for ind, fit in zip(invalid_ind, fitnesses):
            num_evals += 1
            ind.fitness.values = fit

        # The population is entirely replaced by the offspring
        pop[:] = offspring
        
        # track the best value and reset counter if there is a change
        hof.update(pop)
        curr_best_val = hof[0].fitness.values[0]
        num_no_improve += 1
        if curr_best_val != best_val:
            best_val = curr_best_val
            num_no_improve = 0

        # record stats
        record = in_stats.compile(pop)
        logbook.record(gen=generation, evals=num_evals, **record)

        # increment generation
        generation += 1

    best_x = list(hof[0])

    return best_val, best_x, logbook

Our balance_metric_tuple function does need to be updated. We need to take in the 2 additional parameters (do you have these down yet?):
* conproc - a list of constrained processors
* conmax - a list of the max times on each processor
 

In [19]:
# objective function = total squared deviation of times from balanced times
def balance_metric_tuple(assign,times,k,conproc, conmax):
    #make the list a numpy array
    assign_np = np.array(assign)
    ## call the balance_metric function
    metric = balance_metric_constrained(assign_np, times, k, conproc, conmax)
    return (metric, ) #note that we're returning a tuple

#let's test this function
balance_metric_tuple(assign,times,k, [0], [10])

(3058021.6666666665,)

The only other thing we'll need to change is how we set up our evaluate function. Most of this code is identical to what you've seen before. Note the one changed line

In [20]:
import random
from deap import base
from deap import creator
from deap import tools
from functools import partial

creator.create("FitnessLoad", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessLoad)
toolbox = base.Toolbox()
toolbox.register("assignments",create_individual,k,n)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.assignments)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
###############################
#NEW - this line needs additional parameters
###############################
toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=[0], conmax=[10])
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", tools.cxTwoPoint) 
toolbox.register("mutate", tools.mutUniformInt, low = 0, up = k-1, indpb=0.1)
stats = tools.Statistics(key=lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)

# define search parameters
pop_size = 200
crossover_prob = 0.3
mutation_prob = 0.5
max_gen = 2000
max_no_improve = 200

# get solution
best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob,
                                     max_gen, max_no_improve)

print('Genetic Algorithm Best Result', best_balance)
print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])

Genetic Algorithm Best Result 1691209.6666666667
Total time on each processor: [289, 1686, 1687]


## Increasing Problem Size

We used a very small problem to demonstrate each of the methods above. Now it's time to create a much larger problem and see how our algorithms perform. 

We'll also set conproc and conmax to constrain 2 of our 10 processes.

In [21]:
####################################
# Setting up a new bigger problem
####################################
n = 1000
k = 10
#we're going to set some min/max times here for the jobs
min_time = 20
max_time = 200
#randomly generate some jobs
times = np.random.randint(low=min_time, high = max_time, size = n)

# total time on each processor
print('Total time on each processor, if completely balanced:', sum(times)/k)

conproc = [0,1]
conmax=[8000,8000]

Total time on each processor, if completely balanced: 11422.3


### Baseline
Let's see what our baseline deviation from balanced loads is with a size this large. (Note, there's randomness here and some algorithms set their own baseline. But this should give us a general idea.)

In [22]:
#get the baseline
baseline = balance_metric(np.random.randint(low=0,high=k,size=n),times,k)
print('Baseline with random assignments:', baseline)

Baseline with random assignments: 14584898.099999998


### Greedy local search
The only parameter we can fiddle with in our greedy local search is how many iterations we're willing to go with no improvement. Try changing the 5000 number to see if it gets better results

* max_no_improve = 5000

In [23]:
#### Greedy Local Search #####
#####################
#Parameters
max_no_improve = 5000
#####################
best_assign, best_f, num_iter, converge = load_balance_local(times,k,max_no_improve,conproc,conmax)
print('Greedy Local Search best result:', best_f)
print('The algorithm found a solution that met the criteria:', converged)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])

Greedy Local Search best result: 14953446.099999998
The algorithm found a solution that met the criteria: False
Total time on each processor: [11869, 11878, 10088, 10630, 13558, 12060, 9668, 12811, 9975, 11686]


### Custom Simulated Annealing
For our custom simulated annealing, we can tweak the following parameters:
* max_no_improve = 1000
* temp = 500
* alpha = .99 

Try tweaking these parameters to see if you can get a better result

In [24]:
#### Custom Simulated Annealing ####
#####################
#Parameters
max_no_improve = 1000
temp = 500 
alpha = .99
#####################


best_x, best_f, iterations, trajectory, trajectory_best, converge = custom_simanneal(times, k, max_no_improve, temp, alpha, conproc, conmax)
print('Custom Simulated Annealing best result:', best_f)
print('The algorithm found a solution that met the criteria:', converged)
print('Total time on each processor:', [ sum(times[best_x==j]) for j in range(k)])

Custom Simulated Annealing best result: 10658776.1
The algorithm found a solution that met the criteria: False
Total time on each processor: [9874, 11557, 12010, 10912, 11250, 13744, 11432, 10596, 12300, 10548]


### Simanneal Package
The only parameter you can tweak in the simanneal package is how long you're willing to wait. Try changing that to see if you can get a better result.
* wait_time = .2

In [25]:
#### Simanneal Package ####
#####################
#Parameters
wait_time = .2
#####################

assign = np.random.randint(low=0,high=k,size=n)
ld = loadProblem(assign, times, k, conproc, conmax)
ld.set_schedule(ld.auto(minutes=wait_time)) 
ld.copy_strategy = "deepcopy" 
best_assign, best_score = ld.anneal()
print('Simanneal Package best result', best_score)
print('Total time on each processor:', [ sum(times[best_assign==j]) for j in range(k)])

 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   200.00000   23424942.10     9.70%     0.00%     0:00:27    -1:59:5059 Temperature        Energy    Accept   Improve     Elapsed   Remaining
   200.00000   23424881.10    12.29%     0.29%     0:00:14     0:00:0003

Simanneal Package best result 23424881.1
Total time on each processor: [8685, 8680, 12102, 12109, 12104, 12104, 12093, 12116, 12111, 12119]


### DEAP Genetic Algorithm
DEAP has a lot of parameters to tweak. Try tweaking some of the following to see if you can get a better result.

* pop_size = 200
* crossover_prob = 0.3
* mutation_prob = 0.5
* max_gen = 2000
* max_no_improve = 200

(*Note*: that we need to repeat a lot of code when we're changing the problem space with DEAP. DEAP hard-codes the k and n in our functions when we set it up, so we need to essentially start from scratch. We've included all the necessary code without comments in the cell below.) 

**Warning**: This code will be slow to run.

In [27]:
#### DEAP Genetic Algorithm ####
####################
#Parameters
pop_size = 200
crossover_prob = 0.3
mutation_prob = 0.5
max_gen = 2000
max_no_improve = 200
#####################


###################################
# Leave everything below here alone
###################################

# how we create our individuals
def create_individual(k,n):
    current_x = np.random.randint(low=0,high=k,size=n)
    return current_x.tolist() #this converts our np array back to a list

# objective function = total squared deviation of times from balanced times
def balance_metric_tuple(assign,times,k,conproc, conmax):
    #make the list a numpy array
    assign_np = np.array(assign)
    ## call the balance_metric function
    metric = balance_metric_constrained(assign_np, times, k, conproc, conmax)
    return (metric, ) #note that we're returning a tuple

creator.create("FitnessLoad", base.Fitness, weights=(-1.0,))
creator.create("Individual", list, fitness=creator.FitnessLoad)
toolbox = base.Toolbox()
toolbox.register("assignments",create_individual,k,n)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.assignments)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("evaluate", balance_metric_tuple, times=times, k=k, conproc=conproc,conmax=conmax)
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", tools.cxTwoPoint) 
toolbox.register("mutate", tools.mutUniformInt, low = 0, up = k-1, indpb=0.1)
stats = tools.Statistics(key=lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)

# get solution
best_balance, best_assign, log = customGA(toolbox,tools,stats, pop_size, crossover_prob, mutation_prob,
                                     max_gen, max_no_improve)

print('Genetic Algorithm Best Result', best_balance)
print('Total time on each processor:', [ sum(times[np.array(best_assign)==j]) for j in range(k)])

Genetic Algorithm Best Result 23434999.1
Total time on each processor: [8671, 8694, 12168, 12129, 12078, 12098, 12115, 12046, 12104, 12120]
