# Artificial Intelligence
## UEMH3073 / UECS2053 / UECS2153

# Lab 2: Genetic Algorithm

This notebook is an assignment requiring you to investigate the Travelling Salesman Problem. Guidance is provided so you can understand what needs to be done for this assignment as you follow through this lab. Convenience classes and functions/ methods are provided.

You will encounter #TODO in the code cells explaining tasks you need to complete. In other words, you will need to write codes and accomplish the #TODO tasks so that the genetic algorithm functions well and runs correctly. Look for "Replacement starts here" and "Replacement ends here" to know the parts of the codes requiring your inputs.
    

The #TODO tasks and their marks distribution are as follows:
 
a. #TODO1 (10 marks) in the Population Initialization function. You will read a set of cities from the filename when creating an initial population. 

b. #TODO2 (10 marks) in the Parent Selection function. You will replace a dummy parent selection function with Tournament Selection. 

c. #TODO3 (10 marks) in the Parent Selection function. You will replace a dummy parent selection function with Proportional Selection.

d. #TODO4 (10 marks) in the Survival Selection function. You will replace the dummy survival selection function with Merge, Sort & Truncate. 
    
e. #TODO5 (10 marks) in the Crossover function. You will replace the dummy crossover function the Partially Mapped Crossover approach.

f. #TODO6 (10 marks) in the Mutation function. You will replace the dummy mutation function with Insertion Mutation approach. 

g. #TODO7 (10 marks) in Performance Evaluation. You will present performance evaluation for the different Parent Selection functions. 

Marks are also given for: Report Presentation and Formatting (15%) and Code Quality and Comments (15%). More details about this notebook and assignemnt are provided in your lab sheet.

## An Overview of the Travelling Salesman Problem

In the travelling salesman problem, a salesperson wish to find the shortest path that passes through all cities s/he wishes to visit given the coordinates of a set of cities. The salesperson should visit each of the cities once only, and so:

a. Each path consists all cities in the set.

b. Each path visits each of the cities once only. So, none of the cities are visited more than once. 

## Imports

In [1]:
%matplotlib inline

# Please add more imports if you need them 

import random
import time
import csv

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from pprint import pprint as print 

## Convenience Classes

### City

The City class, which represents a city, possesses the properties of the city and has functions/ methods used for calculating the distance between the city and another city. Each path, represented by a chromosome, is formed by a set of cities.   

In [2]:
class City:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def distance(self, city):
        xDis = abs(self.x - city.x)
        yDis = abs(self.y - city.y)
        distance = np.sqrt((xDis ** 2) + (yDis ** 2))
        return distance
    
    def __repr__(self):
        return "(" + str(self.x) + "," + str(self.y) + ")"

### Fitness

The Fitness class, which represents the fitness function, possesses the properties of a path and has functions/methods used for calculating the fitness value of the path, which is based on the distance of the path. 

In [3]:
class Fitness:
    def __init__(self, route):
        self.route = route
        self.distance = None
        self.fitness = None
    
    def routeDistance(self):
        if self.distance == None:
            pathDistance = 0.0
            for i in range(0, len(self.route)):
                fromCity = self.route[i]
                toCity = None
                if i+1 < len(self.route):
                    toCity = self.route[i+1]
                else:
                    toCity = self.route[0]
                pathDistance += fromCity.distance(toCity)
            self.distance = pathDistance
        return self.distance
    
    def routeFitness(self):
        if self.fitness == None:
        # Fitness function (Simple division) that uses a simple 
        # division that divides one by the distance of the path
            self.fitness = 1 / float(self.routeDistance()) 
            # Note: You must ensure a division by zero does not occur 
        return self.fitness


## Population Initialization  

The population initialization function (or method) performs random initialization. This creates an initial population with completely random chromosomes (or solutions). There are three functions related to population initialization. 

The first function is genCityList() which generates a set of cities from a file.  

In [4]:
def genCityList(filename):
    fileCityList = []
    cityList = []
    
    # TODO 1 (10 marks) - Replace the following codes that generate 10 random cities.
    # Your new implementation must read a set of cities from the filename to be used for creating 
    # an initial population.  
    
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors with slight effects on the fitness value.
    # >0 to <5 marks: Major errors with significant effects on the fitness value. 
    # 0 marks:        No answer is given. 
    
    # -------------- Replacement starts here  ----------------

    # Read the cities from the file
    with open(filename, 'r') as file:  
        next(file)  # Skip the header
        for line in file:
            row = line.strip().split(',')  # Split X and Y coordinates
            fileCityList.append(City(x=int(row[1]),y=int(row[2])))  # 1st column in list is ignored, 2nd and 3rd columns are x and y coordinates

    # Randomly select 10 cities from the list
    cityList = random.sample(fileCityList, 10)

    
    # --------------- Replacement ends here -----------------
    
    return cityList


The second function is createRoute() which generates a random route (chromosome) from a set of City instances.

In [5]:
def createRoute(cityList):
    route = random.sample(cityList, len(cityList))  # len is 10, random shuffle the entire list
    return route

The third function is initialPopulation() which calls the second function repeatedly to create an initial population (a list of routes).

In [6]:
def initialPopulation(popSize, cityList):
    population = []
    for i in range(0, popSize):
        population.append(createRoute(cityList))  # Call createRoute to shuffle the list, 3 times seperately
    return population

You can run the above functions using the sample runs below. To do so, simply change the cell type from Markdown to Code.

`Sample run 1` initializes 10 cities in cityList as follows:

In [7]:
cityList = genCityList('cities50.txt') 
print(cityList)

[(57,57),
 (61,21),
 (80,68),
 (57,30),
 (79,36),
 (28,98),
 (95,51),
 (76,88),
 (44,66),
 (78,28)]


`Sample run 2` initializes 10 cities in cityList and creates a population with three routes as follows:

In [9]:
cityList = genCityList('cities50.txt') 
population = initialPopulation(3, cityList)  # 3 different routes, randomly shuffled from the list
print(population)

[[(61,21),
  (34,83),
  (48,32),
  (97,66),
  (35,46),
  (10,31),
  (100,64),
  (39,40),
  (77,16),
  (95,97)],
 [(95,97),
  (39,40),
  (97,66),
  (10,31),
  (48,32),
  (34,83),
  (77,16),
  (35,46),
  (100,64),
  (61,21)],
 [(39,40),
  (10,31),
  (77,16),
  (48,32),
  (35,46),
  (61,21),
  (97,66),
  (34,83),
  (95,97),
  (100,64)]]


## Selection

Parents selection selects chromosomes with high fitness values from a population. Survivor selection selects chromosomes with higher fitness values to form the population of the next generation. The population size is len(population), so we have len(population) in this population. 

### Parent Selection

There are three implementations for parent selection. The first parentSelection() performs **random** selection.

In [10]:
def parentSelection(population, poolSize=None):
    if poolSize == None:
        poolSize = len(population)
        
    matingPool = []
    
    for i in range(0, poolSize):
        fitness = Fitness(population[i]).routeFitness()  # Calc. fitness but not use it
        matingPool.append(random.choice(population))
      
    return matingPool

The second parentSelection() performs **Tournament** Selection.

In [11]:
def parentSelection(population, poolSize=None):
    
    # TODO 2 (10 marks) - Replace the dummy parent selection function below with  
    # Tournament Selection.
      
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
    
    # You will need to compare the performance achieved by Random Selection, 
    # Tournament Selection, and Proportional Selection during performance evaluation 
    # later. So, you will run either Random Selection, Tournament Selection, or 
    # Proportional Selection in a simulation run.
    
    if poolSize == None:
        poolSize = len(population)
        
    matingPool = []
    
    # ---------------- Replacement starts here ----------------
    
    # Tournament Selection
    for i in range(0, poolSize):    
        tournament = random.sample(population, 2)  # Random pick 2 from population
        best = tournament[0]   # Assume the first one is the best
        for j in range(1, 2):  # Compare the rest with the first one, 2 used because randomly picked 2  
            if Fitness(tournament[j]).routeFitness() > Fitness(best).routeFitness():
                best = tournament[j]
        matingPool.append(best)
    
    # ---------------- Replacement ends here ----------------
    
    return matingPool

The third parentSelection() performs **Proportional** Selection.

In [42]:
def parentSelection(population, poolSize=None):
    
    # TODO 3 (10 marks) - Replace the dummy parent selection function below with  
    # Proportional Selection.
      
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
    
    # You will need to compare the performance achieved by Random Selection, 
    # Tournament Selection, and Proportional Selection during performance evaluation 
    # later. So, you will run either Random Selection, Tournament Selection, or 
    # Proportional Selection in a simulation run.
    
    if poolSize == None:
        poolSize = len(population)
        
    matingPool = []
    chromosome_fitness = []
    sumfitness = 0.0
    
    # ---------------- Replacement starts here ----------------
    for i in range(0, poolSize):
        chromosome_fitness.append(Fitness(population[i]).routeFitness())
        sumfitness += Fitness(population[i]).routeFitness()

    # print("Sum fitness = " + str(sumfitness))

    j = 0
    for j in range(0, poolSize):
        choice = random.uniform(0, sumfitness)
        # print("Choice = " + str(choice))
        count = 0

        for k in range(0, poolSize):
            if count < choice:
                count += chromosome_fitness[k]
                # print("Count = " + str(count))
            
                if count >= choice:
                    matingPool.append(population[k])
                    break
    # ---------------- Replacement ends here ----------------
    
    return matingPool

### Survival Selection

In [56]:
def survivorSelection(population, eliteSize):
    
    # TODO 4 (10 marks) - Replace the dummy survival selection function below with  
    # Merge, Sort & Truncate.
      
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
    
    elites = []
    
    # ---------------- Replacement starts here ----------------

    # Sort the population based on fitness score in descending order.
    sortedPopulation = sorted(population, key=lambda x: Fitness(x).routeFitness() if Fitness(x).routeFitness() != 0 else epsilon, reverse=True)

    # sorted() = a function that returns a new sorted list
    # population = the list to be sorted
    # Fitness (x).routeFitness() =  use Fitness class with x as its input, then calculates the fitness score for the route.
    # reverse = True --> sort the list in descending order (high fitness score is the first in list)

    # Select the top eliteSize routes as elites
    elites = sortedPopulation[:eliteSize]

    # [:eliteSize] = slice the first eliceSize elements
    
    # ---------------- Replacement ends here ----------------
    
    return elites

You can run the above functions using the sample runs below. To do so, simply change the cell type from Markdown to Code. 

`Sample run 1` initializes 10 cities in cityList, creates a population with four routes, and creates a pool of parents as follows:

In [43]:
population = initialPopulation(4, genCityList('cities50.txt'))
print('Initial population') 
print(population)

matingpool = parentSelection(population, 4) 
print('Mating pool')
print(matingpool)

'Initial population'
[[(79,36),
  (28,98),
  (40,64),
  (44,66),
  (52,35),
  (57,57),
  (89,52),
  (84,43),
  (76,63),
  (20,91)],
 [(76,63),
  (52,35),
  (79,36),
  (40,64),
  (57,57),
  (20,91),
  (28,98),
  (89,52),
  (44,66),
  (84,43)],
 [(44,66),
  (40,64),
  (20,91),
  (76,63),
  (52,35),
  (57,57),
  (28,98),
  (89,52),
  (79,36),
  (84,43)],
 [(84,43),
  (20,91),
  (28,98),
  (79,36),
  (89,52),
  (76,63),
  (57,57),
  (52,35),
  (44,66),
  (40,64)]]
'Mating pool'
[[(84,43),
  (20,91),
  (28,98),
  (79,36),
  (89,52),
  (76,63),
  (57,57),
  (52,35),
  (44,66),
  (40,64)],
 [(76,63),
  (52,35),
  (79,36),
  (40,64),
  (57,57),
  (20,91),
  (28,98),
  (89,52),
  (44,66),
  (84,43)],
 [(84,43),
  (20,91),
  (28,98),
  (79,36),
  (89,52),
  (76,63),
  (57,57),
  (52,35),
  (44,66),
  (40,64)],
 [(84,43),
  (20,91),
  (28,98),
  (79,36),
  (89,52),
  (76,63),
  (57,57),
  (52,35),
  (44,66),
  (40,64)]]


`Sample run 2` initializes 10 cities in cityList, creates a population with four routes, select an elite chromosome as follows:

In [16]:
population = initialPopulation(4, genCityList('cities50.txt'))
elites = survivorSelection(population, 1)
print('Initial population')
print(population)
print('Selected elites')
print(elites)

'Initial population'
[[(52,35),
  (56,77),
  (39,40),
  (92,75),
  (1,77),
  (40,64),
  (84,43),
  (12,18),
  (78,28),
  (92,57)],
 [(92,75),
  (40,64),
  (39,40),
  (52,35),
  (84,43),
  (12,18),
  (92,57),
  (1,77),
  (78,28),
  (56,77)],
 [(84,43),
  (92,57),
  (39,40),
  (56,77),
  (1,77),
  (78,28),
  (52,35),
  (12,18),
  (40,64),
  (92,75)],
 [(52,35),
  (92,57),
  (78,28),
  (1,77),
  (84,43),
  (12,18),
  (40,64),
  (92,75),
  (39,40),
  (56,77)]]
'Selected elites'
[[(84,43),
  (92,57),
  (39,40),
  (56,77),
  (1,77),
  (78,28),
  (52,35),
  (12,18),
  (40,64),
  (92,75)]]


## Crossover


Crossover selects two parents, crossover the genetic materials of the parents, and produce one or more children. In the Travelling Salesman Problem, each travelling path must be valid. Each path consists all cities in the set, and each path visits each of the cities once only. So, none of the cities are visited more than once. Exchanging parts of two chromosomes tend to produce invalid paths. As an example, Parent 1 is [2 1 0 7 3 5 4 6] and Parent 2 is [6 1 0 5 2 3 4 7]. One point crossover at midpoint generates Child 1 [2 1 0 7 2 3 4 7] and Child 2 [6 1 0 5 3 5 4 6]. Both children are invalid paths.     

In [47]:
def crossover(parent1, parent2):
    
    # TODO 5 (10 marks) - Replace the dummy crossover function below with 
    # Partially Mapped Crossover approach.
   
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
    
    # ---------------- Replacement starts here ----------------
    child1 = parent1
    child2 = parent2

    x = np.random.randint(0, len(parent1))  # Randomly select a crossover point
    
    child1 = np.append(parent1[:x], parent2[x:]).tolist()
    child2 = np.append(parent2[:x], parent1[x:]).tolist()
    # ---------------- Replacement ends here ----------------
    
    return child1, child2

Crossover selects two parents from the mating pool to produce a new generation of the same size.

In [18]:
def breedPopulation(matingpool):
    children = []
    
    # Choosing parents in their order of presence in the mating pool. Choosing parents
    # in a random manner is possible. 
    
    for i in range(1, len(matingpool), 2):
        child1, child2 = crossover(matingpool[i-1], matingpool[i])
        children.append(child1)
        children.append(child2)
    
    return children

You can run the above functions using the sample run below. To do so, simply change the cell type from Markdown to Code. The sample run initializes 2 chromosomes in the population, and performs crossover among the two parents. 

In [48]:
population = initialPopulation(2, genCityList('cities50.txt'))
parent1, parent2 = population
child1, child2 = crossover(parent1, parent2)
print('Parents')
print(parent1)
print(parent2)
print('Children')
print(child1)
print(child2)

'Parents'
[(80,68),
 (20,91),
 (92,75),
 (26,39),
 (10,31),
 (52,35),
 (98,61),
 (6,11),
 (56,77),
 (57,30)]
[(57,30),
 (98,61),
 (6,11),
 (26,39),
 (92,75),
 (56,77),
 (20,91),
 (52,35),
 (80,68),
 (10,31)]
'Children'
[(57,30),
 (98,61),
 (6,11),
 (26,39),
 (92,75),
 (56,77),
 (20,91),
 (52,35),
 (80,68),
 (10,31)]
[(80,68),
 (20,91),
 (92,75),
 (26,39),
 (10,31),
 (52,35),
 (98,61),
 (6,11),
 (56,77),
 (57,30)]


## Mutation

Mutation mutates a single chromosome to get a mutated chromosome so that genetic algorithm can converge to a shorter path quickly. In the Travelling Saleman Problem, a mutated chromosome must be a valid path. As an example, the insertion mutation randomly inserts a single gene in the [1 2 3 4 5 6 7 8 9 10] chromosome to generate the [1 2 4 5 6 7 3 8 9 10] mutated chromosome. Step 1: select a gene randomly, Step 2: insert this gene into a randomly selected location.

In [61]:
def mutate(route, mutationProbability):
    
    # TODO 6 (10 marks) - Replace the dummy mutation function below with Insertion Mutation.
    # The dummy mutation function simply swaps a city with the city before it.  
   
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
     
    mutated_route = route[:]
    for i in range(len(route)):
        if (random.random() < mutationProbability):
            # mutationProbability is the probability of a gene undergoing mutation
            
            # ---------------- Replacement starts here ----------------
            # print(i)
            city = mutated_route.pop(i)
            insert_point = random.randint(0, len(mutated_route))
            mutated_route.insert(insert_point, city)

            # print(mutated_route)
            # ---------------- Replacement ends here ----------------
    return mutated_route

Mutation runs over the entire population and mutates each chromosome in the population with a small mutationProbability. 

In [27]:
def mutation(population, mutationProbability):
    mutatedPopulation = []
    for i in range(0, len(population)):
        mutatedIndividual = mutate(population[i], mutationProbability)
        mutatedPopulation.append(mutatedIndividual)
    return mutatedPopulation

You can run the above functions using the sample run below. To do so, simply change the cell type from Markdown to Code. The sample run initializes a route comprised of 10 cities in cityList, and then mutates it as follows:

In [62]:
route = genCityList('cities50.txt')
mutated = mutate(route, 1)  # Give a pretty high chance for mutation
print('Original route')
print(route)
print('Mutated route')
print(mutated)

'Original route'
[(92,57),
 (10,31),
 (7,26),
 (35,46),
 (78,60),
 (61,21),
 (44,66),
 (33,60),
 (77,16),
 (39,40)]
'Mutated route'
[(10,31),
 (35,46),
 (92,57),
 (77,16),
 (7,26),
 (44,66),
 (61,21),
 (39,40),
 (33,60),
 (78,60)]


## Running One Generation (or Interation)

Here, we run one generation of genetic algorithm. 

In [23]:
def oneGeneration(population, eliteSize, mutationProbability):
    
    # First we preserve the elites
    elites = survivorSelection(population, eliteSize)
    
    # Then we calculate what our mating pool size should be and generate
    # the mating pool
    poolSize = len(population) - eliteSize
    matingpool = parentSelection(population, poolSize)
        
    # Then we perform crossover on the mating pool
    children = breedPopulation(matingpool)
    
    # We combine the elites and children into one population
    new_population = elites + children
    
    # We mutate the population
    mutated_population = mutation(new_population, mutationProbability)
        
    return mutated_population

You can run the above functions using the sample run below. To do so, simply change the cell type from Markdown to Code. The sample run initializes a population comprised of 5 chromosomes based on 10 cities in cityList, and then run one generation (or iteration) of genetic algorithm as follows:

In [50]:
population = initialPopulation(5, genCityList('cities50.txt'))
eliteSize = 1
mutationProbability = 0.01
new_population = oneGeneration(population, eliteSize, mutationProbability)
print('Initial population')
print(population)
print('New population')
print(new_population)

'Initial population'
[[(77,16),
  (10,31),
  (94,70),
  (29,58),
  (14,74),
  (98,61),
  (40,64),
  (10,34),
  (34,83),
  (56,77)],
 [(56,77),
  (29,58),
  (34,83),
  (40,64),
  (10,31),
  (94,70),
  (14,74),
  (10,34),
  (77,16),
  (98,61)],
 [(98,61),
  (29,58),
  (10,34),
  (77,16),
  (14,74),
  (56,77),
  (40,64),
  (10,31),
  (34,83),
  (94,70)],
 [(56,77),
  (10,31),
  (29,58),
  (34,83),
  (40,64),
  (10,34),
  (98,61),
  (94,70),
  (77,16),
  (14,74)],
 [(98,61),
  (29,58),
  (14,74),
  (10,34),
  (94,70),
  (77,16),
  (34,83),
  (40,64),
  (10,31),
  (56,77)]]
'New population'
[[(56,77),
  (10,31),
  (29,58),
  (34,83),
  (40,64),
  (10,34),
  (98,61),
  (94,70),
  (77,16),
  (14,74)],
 [(56,77),
  (10,31),
  (94,70),
  (29,58),
  (14,74),
  (98,61),
  (40,64),
  (10,34),
  (34,83),
  (56,77)],
 [(77,16),
  (10,31),
  (29,58),
  (34,83),
  (40,64),
  (10,34),
  (98,61),
  (94,70),
  (77,16),
  (14,74)],
 [(56,77),
  (10,31),
  (29,58),
  (34,83),
  (40,64),
  (10,34),
  (98,61

## Running Many Generations (or Interations) 

In [67]:
filename = 'cities500.txt'
popSize = 20
eliteSize = 5
mutationProbability = 0.01
iteration_limit = 100

cityList = genCityList(filename)

population = initialPopulation(popSize, cityList)
distances = [Fitness(p).routeDistance() for p in population]
min_dist = min(distances)
print("Best distance for initial population: " + str(min_dist))

for i in range(iteration_limit):
    population = oneGeneration(population, eliteSize, mutationProbability)
    distances = [Fitness(p).routeDistance() for p in population]
    index = np.argmin(distances)
    best_route = population[index]
    min_dist = min(distances)
    print("Best distance for population in iteration " + str(i) +
          ": " + str(min_dist))

print("Optimal path is " + str(best_route)) 

    # TODO 7 (10 marks) - Performance Evaluation. You will present the performance achieved 
    # by different parent selection function. You will compare the 
    # performance achieved by Random Selection, Tournament Selection, and Proportional Selection. 
   
    # Marking scheme: 
    # 7 to 10 marks:  In-depth performance evaluation. Optimal routes are found. 
    # 5 to <7 marks:  Clear understanding of performance evaluation.
    # >0 to <5 marks: Inaccurate or unclear understanding of performance evaluation. 
    # 0 marks:        No answer is given. 
    
  

'Best distance for initial population: 438.01695324261505'
'Best distance for population in iteration 0: 366.4296792159436'
'Best distance for population in iteration 1: 338.32237168065643'
'Best distance for population in iteration 2: 338.32237168065643'
'Best distance for population in iteration 3: 330.67867317078924'
'Best distance for population in iteration 4: 264.83088827818506'
'Best distance for population in iteration 5: 229.02108031559717'
'Best distance for population in iteration 6: 229.02108031559717'
'Best distance for population in iteration 7: 229.02108031559717'
'Best distance for population in iteration 8: 220.55378238651218'
'Best distance for population in iteration 9: 220.55378238651218'
'Best distance for population in iteration 10: 220.55378238651218'
'Best distance for population in iteration 11: 220.55378238651218'
'Best distance for population in iteration 12: 160.0120702420893'
'Best distance for population in iteration 13: 160.0120702420893'
'Best distance f