---

# IT00CJ42: Search and Optimization Algorithms

**Restart the kernel and run all cells** before you turn this problem in, make sure everything runs as expected.

Make sure you fill in any place that says `YOUR CODE HERE`.

---

## Week 7: Genetic Algorithm
The task for this week is to implement a genetic algorithm for the problem of test selection (See task week 2). The goal is to design a test selection algorithm that given a set of tests T, selects a subset of T such that:

* the total execution time is smaller than a given value, representing our maximum testing time
* maximizes the total code coverage

First examine the Python code in the file ga.py. It already implements the main loop of the genetic algorithm as well as a Python class to represent our test database and a fitness function. This code is given as a reference. You can use as it is our change it if helps you.

Your tasks are:

1. Implement the selection operator.  (1p)

2. Implement the crossover operator.  (2p)

3. Implement the mutation operator.  (1p)

4. Solve the problems in the files problem1.txt, problem2.txt and problem3.txt and report the maximum fitness achieved and the number of generators to find it (1p)

### Imports

In [10]:
import random

import numpy as np
from numpy.random import rand, randint

### Solutions check
We use the function **check** to implement tests for your solution

In [11]:
def check(expression, message=""):
    if not expression:
        raise AssertionError(message)
    return "Passed"

### Test database
This is already implemented

In [12]:
class TestDatabase:
    # This class represents our test database
    # It is already implemented
    def __init__(self):
        self.coverage = []
        self.time = []

    def get_time(self, i):
        return self.time[i]

    def get_coverage(self, i):
        return self.coverage[i]

    def get_number_of_tests(self):
        return len(self.time)

    def init_random(self, n_tests, max_time, max_code, p=0.05):
        # initialize a test database randomly
        self.coverage = []
        self.time = []
        for i in range(n_tests):
            t_time = randint(1, max_time)
            t_coverage = []
            for i in range(1, max_code + 1):
                if rand() < p:
                    t_coverage.append(i)
            self.time.append(t_time)
            self.coverage.append(t_coverage)

    def load_from_file(self, fn):
        # load test database from file with name fn
        # file format:
        #  line i represents test i as comma separated value
        #   the first value is the time to execute the test
        #   the other values are the lines covered by the test
        #  Example
        #   5, 1, 5, 7   -> A test that covers lines 1,5,7 and takes 5 seconds to run
        self.coverage = []
        self.time = []
        with open(fn, "rt") as fd:
            for line in fd.readlines():
                words = line.split(",")
                self.time.append(float(words[0]))
                self.coverage.append(list(map(lambda x: int(x), words[1:])))

    def write_to_file(self, fn):
        # write the test database to a file with name fn
        with open(fn, "wt") as fd:
            for time, coverage in zip(self.time, self.coverage):
                fd.write(str(time))
                for i in coverage:
                    fd.write(", " + str(i))
                fd.write("\n")

### Fitness Function
This is already implemented

In [13]:
def fitness(x, db, max_time):
    coverage = set()
    total_time = 0
    for i in range(len(x)):
        if x[i] == 1:
            coverage = coverage.union(db.get_coverage(i))
            total_time = total_time + db.get_time(i)
    if total_time <= max_time:
        return len(coverage)
    else:
        return 0

### Genetic algorithm
This is already implemented (can be changed)

In [14]:
def genetic_algorithm(db, maxtime, fitness, n_bits, n_iter, n_pop, r_cross, r_mut):
    # initial population of random bitstring
    population = [randint(0, 2, n_bits).tolist() for _ in range(n_pop)]
    # keep track of best solution
    best, best_eval = 0, fitness(population[0], db, maxtime)
    # enumerate generations
    for gen in range(n_iter):
        # evaluate all candidates in the population
        scores = [fitness(c, db, maxtime) for c in population]
        # check for new best solution
        for i in range(n_pop):
            if scores[i] > best_eval:
                best, best_eval = population[i], scores[i]
        # select parents
        selected = [selection(population, scores) for _ in range(n_pop)]  # selection

        # create the next generation
        children = list()
        for i in range(0, n_pop, 2):
            # get selected parents in pairs
            p1, p2 = selected[i], selected[i + 1]

            crossover_ = crossover(p1, p2, r_cross)

            # crossover and mutation
            for c in crossover_:
                # mutation
                c = mutation(c, r_mut)

                # store for next generation
                children.append(c)
        # replace population
        population = children
    return [best, best_eval]

### Task 1: Selection Method

In [15]:
from random import randrange

def selection(population, scores):
    # Tournament size can be adjusted. 
    tournament_size = 3
    # Randomly select individuals for the tournament
    tournament_indices = [randrange(len(population)) for _ in range(tournament_size)]
    # Select the index of the individual with the highest fitness in the tournament
    best_index = tournament_indices[0]
    for index in tournament_indices:
        if scores[index] > scores[best_index]:
            best_index = index
    # Return the winning individual from the population
    return population[best_index]


### Task 2: Crossover Method

In [16]:
from random import randrange, random

def crossover(p1, p2, r_cross):
    # Children are copies of parents by default
    c1, c2 = p1.copy(), p2.copy()
    # Check if crossover should occur
    if random() < r_cross:
        # Select crossover point that is not on the end of the string
        point = randrange(1, len(p1)-1)
        # Perform the crossover
        c1 = p1[:point] + p2[point:]
        c2 = p2[:point] + p1[point:]
    return [c1, c2]


### Task 3: Mutation Operator

In [17]:
from random import random

def mutation(bitstring, r_mut):
    for i in range(len(bitstring)):
        # Check if this bit should be mutated
        if random() < r_mut:
            # Flip the bit
            bitstring[i] = 1 - bitstring[i]
    return bitstring


### Task 4: Solve problems 1-3

#### Initialize the problems

In [18]:
p1 = TestDatabase()
p1.load_from_file("../res/data/problem1.txt")

p2 = TestDatabase()
p2.load_from_file("../res/data/problem2.txt")

p3 = TestDatabase()
p3.load_from_file("../res/data/problem3.txt")

max_time = 1000

#### Problem 1
##### Initialize the parameters and solve the problem with the genetic algorithm

In [19]:
n_iter = 150
# bits: one bit per each test that may be executed
n_bits = p1.get_number_of_tests()
# define the population size, you can change this
n_pop = 120
# crossover rate, you can change this
r_cross = 0.9
# mutation rate, you can change this
r_mut = 1.0 / float(n_bits)

best_p1, score_p1 = genetic_algorithm(
    p1, max_time, fitness, n_bits, n_iter, n_pop, r_cross, r_mut
)
print("f(%s) = %f" % (best_p1, score_p1))

f([0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 0, 1, 0, 1, 1, 1]) = 866.000000


#### Problem 2
##### Initialize the parameters and solve the problem with the genetic algorithm

In [20]:
n_iter = 130
# bits: one bit per each test that may be executed
n_bits = p2.get_number_of_tests()
# define the population size, you can change this
n_pop = 110
# crossover rate, you can change this
r_cross = 0.9
# mutation rate, you can change this
r_mut = 1.0 / float(n_bits)

best_p2, score_p2 = genetic_algorithm(
    p2, max_time, fitness, n_bits, n_iter, n_pop, r_cross, r_mut
)
print("f(%s) = %f" % (best_p2, score_p2))

f([1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 0]) = 767.000000


#### Problem 3
##### Initialize the parameters and solve the problem with the genetic algorithm

In [21]:
n_iter = 200
# bits: one bit per each test that may be executed
n_bits = p3.get_number_of_tests()
# define the population size, you can change this
n_pop = 120
# crossover rate, you can change this
r_cross = 0.8
# mutation rate, you can change this
r_mut = 1.0 / float(n_bits)

best_p3, score_p3 = genetic_algorithm(
    p3, max_time, fitness, n_bits, n_iter, n_pop, r_cross, r_mut
)
print("f(%s) = %f" % (best_p3, score_p3))

f([1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 0, 1]) = 804.000000


#### Check the results

In [22]:
check(score_p1 >= 800, "Try to get higher scores, it is possible! (try tweaking the parameters or try a different selection method, crossover method or mutation operator)")

'Passed'

In [23]:
check(score_p2 >= 700, "Try to get higher scores, it is possible! (try tweaking the parameters or try a different selection method, crossover method or mutation operator)")

'Passed'

In [24]:
check(score_p3 >= 700, "Try to get higher scores, it is possible! (try tweaking the parameters or try a different selection method, crossover method or mutation operator)")

'Passed'