# Artificial Intelligence
## UECS2053/2153 UEMH3073/3163

# Lab 2: Genetic Algorithm

This notebook is an assignment requiring you to investigate the Travelling Salesman Problem. Guidance is provided so you can understand what needs to be done for this assignment as you follow through this lab. Convenience classes and functions/ methods are provided.

You will encounter #TODO in the code cells explaining tasks you need to complete. In other words, you will need to write codes and accomplish the #TODO tasks so that the genetic algorithm functions well and runs correctly. Look for "Replacement starts here" and "Replacement ends here" to know the parts of the codes requiring your inputs.
    

The #TODO tasks and their marks distribution are as follows:
 
a. #TODO1 (10 marks) in the Population Initialization function. You will read a set of cities from the filename when creating an initial population. 

b. #TODO2 (10 marks) in the Parent Selection function. You will replace a dummy parent selection function with Tournament Selection. 

c. #TODO3 (10 marks) in the Parent Selection function. You will replace a dummy parent selection function with Proportional Selection.

d. #TODO4 (10 marks) in the Survival Selection function. You will replace the dummy survival selection function with Merge, Sort & Truncate. 
    
e. #TODO5 (10 marks) in the Crossover function. You will replace the dummy crossover function the Partially Mapped Crossover approach.

f. #TODO6 (10 marks) in the Mutation function. You will replace the dummy mutation function with Insertion Mutation approach. 

g. #TODO7 (10 marks) in Performance Evaluation. You will present performance evaluation for the different Parent Selection functions. 

Marks are also given for: Report Presentation and Formatting (15%) and Code Quality and Comments (15%). More details about this notebook and assignemnt are provided in your lab sheet.

## An Overview of the Travelling Salesman Problem

In the travelling salesman problem, a salesperson wish to find the shortest path that passes through all cities s/he wishes to visit given the coordinates of a set of cities. The salesperson should visit each of the cities once only, and so:

a. Each path consists all cities in the set.

b. Each path visits each of the cities once only. So, none of the cities are visited more than once. 

## Imports

In [504]:
%matplotlib inline

# Please add more imports if you need them 

import random
import time
import csv

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

from pprint import pprint as print 
from math import hypot

## Convenience Classes

### City

The City class, which represents a city, possesses the properties of the city and has functions/ methods used for calculating the distance between the city and another city. Each path, represented by a chromosome, is formed by a set of cities.   

In [505]:
class City:
    def __init__(self, x, y):
        self.x = x
        self.y = y
        
    def distance(self, city):
        xDis = abs(self.x - city.x)
        yDis = abs(self.y - city.y)
        distance = hypot(xDis, yDis)
        return distance
    
    def __repr__(self):
        return "(" + str(self.x) + "," + str(self.y) + ")"

### Fitness

The Fitness class, which represents the fitness function, possesses the properties of a path and has functions/methods used for calculating the fitness value of the path, which is based on the distance of the path. 

In [506]:
class Fitness:
    def __init__(self, route):
        self.route = route
        self.distance = None
        self.fitness = None
    
    def routeDistance(self):
        if self.distance == None:
            pathDistance = 0.0
            for i in range(0, len(self.route)):
                fromCity = self.route[i]
                toCity = None
                if i+1 < len(self.route):
                    toCity = self.route[i+1]
                else:
                    toCity = self.route[0]
                pathDistance += fromCity.distance(toCity)
            self.distance = pathDistance
        return self.distance
    
    def routeFitness(self):
        if self.fitness == None:
        # Fitness function (Simple division) that uses a simple 
        # division that divides one by the distance of the path
            self.fitness = 1 / float(self.routeDistance()) 
            # Note: You must ensure a division by zero does not occur 
        return self.fitness


## Population Initialization  

The population initialization function (or method) performs random initialization. This creates an initial population with completely random chromosomes (or solutions). There are three functions related to population initialization. 

The first function is genCityList() which generates a set of cities from a file.  

In [507]:
def genCityList(filename):
    cityList = []
    
    # TODO 1 (10 marks) - Replace the following codes that generate 10 random cities.
    # Your new implementation must read a set of cities from the filename to be used for creating 
    # an initial population.  
    
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors with slight effects on the fitness value.
    # >0 to <5 marks: Major errors with significant effects on the fitness value. 
    # 0 marks:        No answer is given. 
    
    # Read the data from the file using Pandas, selecting only the x and y columns
    df = pd.read_csv(filename, delim_whitespace=True, header=None, usecols=[1, 2], names=['x', 'y'])
    
    for _, row in df.iterrows():
        city = City(float(row['x']), float(row['y']))
        cityList.append(city)
    
    return cityList


The second function is createRoute() which generates a random route (chromosome) from a set of City instances.

In [508]:
def createRoute(cityList):
    route = random.sample(cityList, len(cityList))
    return route

The third function is initialPopulation() which calls the second function repeatedly to create an initial population (a list of routes).

In [509]:
def initialPopulation(popSize, cityList):
    population = []
    for i in range(0, popSize):
        population.append(createRoute(cityList))
    return population

You can run the above functions using the sample runs below. To do so, simply change the cell type from Markdown to Code, and remove the codeblocks backticks.

Sample run 1 initializes 10 cities in cityList as follows:

```
cityList = genCityList('cities10.txt') 
print(cityList)
```

Sample run 2 initializes 10 cities in cityList and creates a population with three routes as follows:

```
cityList = genCityList('cities10.txt') 
population = initialPopulation(3, cityList) 
print(population)
```

## Selection

Parents selection selects chromosomes with high fitness values from a population. Survivor selection selects chromosomes with higher fitness values to form the population of the next generation. The population size is len(population), so we have len(population) in this population. 

### Parent Selection

There are three implementations for parent selection. The first parentSelection() performs random selection.

In [510]:
def randomParentSelection(population, poolSize=None, remove_duplicate=False):
    if poolSize == None:
        poolSize = len(population)
        
    matingPool = []
    population = population.copy()
    
    for i in range(0, poolSize):
        candidate = random.choice(population)
        if remove_duplicate:
            population.remove(candidate)
        matingPool.append(candidate)
      
    return matingPool

The second parentSelection() performs Tournament Selection.

In [511]:
def tournamentParentSelection(population, poolSize=None, remove_duplicate=False, tournament_size=3):
    
    # TODO 2 (10 marks) - Replace the dummy parent selection function below with  
    # Tournament Selection.
      
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
    
    # You will need to compare the performance achieved by Random Selection, 
    # Tournament Selection, and Proportional Selection during performance evaluation 
    # later. So, you will run either Random Selection, Tournament Selection, or 
    # Proportional Selection in a simulation run.
    
    if poolSize == None:
        poolSize = len(population)
    
    matingPool = []
    population = population.copy()

    for i in range(0, poolSize):
        tournament = random.sample(population, tournament_size)
        best_individual = max(tournament, key=lambda x: Fitness(x).routeFitness())
        if remove_duplicate:
            population.remove(best_individual)
        matingPool.append(best_individual)

    return matingPool

**Example usage**
```
population = initialPopulation(100000, cityList)
poolSize = 2
tournament_size = 20000

selected_parents = tournamentParentSelection(population, poolSize, tournament_size)
for i, parent in enumerate(selected_parents):
    fitness = Fitness(parent).routeFitness()
    print(f"Selected parent {i+1}: Fitness: {fitness}")
```

The third parentSelection() performs Proportional Selection.

In [512]:
def proportionalParentSelection(population, poolSize=None, remove_duplicate=False):
    
    # TODO 3 (10 marks) - Replace the dummy parent selection function below with  
    # Proportional Selection.
      
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
    
    # You will need to compare the performance achieved by Random Selection, 
    # Tournament Selection, and Proportional Selection during performance evaluation 
    # later. So, you will run either Random Selection, Tournament Selection, or 
    # Proportional Selection in a simulation run.
    
    if poolSize == None:
        poolSize = len(population)
        
    matingPool = []
    population = population.copy()

    # Save the fitness of each individual in the population in a list
    fitnessList = [Fitness(ind).routeFitness() for ind in population]

    # Calculate the total fitness of the population
    total_fitness = np.sum(fitnessList)
    
    for i in range(0, poolSize):
        # Generate a random number between 0 and the total fitness
        rand = random.uniform(0, total_fitness)
        current_sum = 0
        
        # Iterate through individuals and accumulate fitness values
        for index, individual in enumerate(population):
            current_sum += fitnessList[index]
            if current_sum > rand:
                if remove_duplicate:
                    population.pop(index)
                    fitnessList.pop(index)
                    total_fitness = np.sum(fitnessList)
                matingPool.append(individual)
                break
    
    return matingPool

**Example usage**
```
population = initialPopulation(1000000, cityList)
poolSize = 2

selected_parents = propotionalParentSelection(population, poolSize)
for i, parent in enumerate(selected_parents):
    fitness = Fitness(parent).routeFitness()
    print(f"Selected parent {i+1}: Fitness: {fitness}")
```

### Survival Selection

In [513]:
def survivorSelection(population, eliteSize):
    
    # TODO 4 (10 marks) - Replace the dummy survival selection function below with  
    # Merge, Sort & Truncate.
      
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
    
    elites = []

    # Sort population by fitness
    elites = population.sort(key=lambda x: Fitness(x).routeFitness(), reverse=True)

    # Truncate population to eliteSize
    elites = population[:eliteSize]

    return elites

You can run the above functions using the sample runs below. To do so, simply change the cell type from Markdown to Code, and remove the codeblocks backticks. 

Sample run 1 initializes 10 cities in cityList, creates a population with four routes, and creates a pool of parents as follows:

```
population = initialPopulation(4, genCityList('cities10.txt'))
matingpool = parentSelection(population, 4) 
print('Initial population') 
print(population) 
print('Mating pool') 
print(matingpool)
```

Sample run 2 initializes 10 cities in cityList, creates a population with four routes, select an elite chromosome as follows:

```
population = initialPopulation(4, genCityList('cities10.txt'))
elites = survivorSelection(population, 1)
print('Initial population')
print(population)
print('Selected elites')
print(elites)
```

## Crossover


Crossover selects two parents, crossover the genetic materials of the parents, and produce one or more children. In the Travelling Salesman Problem, each travelling path must be valid. Each path consists all cities in the set, and each path visits each of the cities once only. So, none of the cities are visited more than once. Exchanging parts of two chromosomes tend to produce invalid paths. As an example, Parent 1 is [2 1 0 7 3 5 4 6] and Parent 2 is [6 1 0 5 2 3 4 7]. One point crossover at midpoint generates Child 1 [2 1 0 7 2 3 4 7] and Child 2 [6 1 0 5 3 5 4 6]. Both children are invalid paths.     

In [514]:
def crossover(parent1, parent2):

    # TODO 5 (10 marks) - Replace the dummy crossover function below with 
    # Partially Mapped Crossover approach.
   
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
    
    # Define random crossover points
    crossover_points = sorted(random.sample(range(len(parent1)), 2))
    start_point, end_point = crossover_points
    
    # Initialize child chromosomes
    child1 = [-1] * len(parent1)
    child2 = [-1] * len(parent2)
    
    # Copy the crossover segment from parents to children
    child1[start_point:end_point+1] = parent1[start_point:end_point+1]
    child2[start_point:end_point+1] = parent2[start_point:end_point+1]
    
    # Map genes from the second parent to the first child
    for i in range(start_point, end_point+1):
        if parent2[i] not in child1:
            index = parent2.index(parent1[i])
            while child1[index] != -1:
                index = parent2.index(parent1[index])
            child1[index] = parent2[i]
    
    # Map genes from the first parent to the second child
    for i in range(start_point, end_point+1):
        if parent1[i] not in child2:
            index = parent1.index(parent2[i])
            while child2[index] != -1:
                index = parent1.index(parent2[index])
            child2[index] = parent1[i]
    
    # Fill in the remaining genes using the remaining genes of the parents
    for i in range(len(parent1)):
        if child1[i] == -1:
            child1[i] = parent2[i]
        if child2[i] == -1:
            child2[i] = parent1[i]
    
    return child1, child2

Crossover selects two parents from the mating pool to produce a new generation of the same size.

In [515]:
def breedPopulation(matingpool):
    children = []
    
    # Choosing parents in their order of presence in the mating pool. Choosing parents
    # in a random manner is possible. 
    
    for i in range(1, len(matingpool), 2):
        child1, child2 = crossover(matingpool[i-1], matingpool[i])
        children.append(child1)
        children.append(child2)
    
    return children

You can run the above functions using the sample run below. To do so, simply change the cell type from Markdown to Code, and remove the codeblocks backticks. The sample run initializes 2 chromosomes in the population, and performs crossover among the two parents. 

```
population = initialPopulation(2, genCityList('cities10.txt'))
parent1, parent2 = population
child1, child2 = crossover(parent1, parent2)
print('Parents')
print(parent1)
print(parent2)
print('Children')
print(child1)
print(child2)
```

## Mutation

Mutation mutates a single chromosome to get a mutated chromosome so that genetic algorithm can converge to a shorter path quickly. In the Travelling Saleman Problem, a mutated chromosome must be a valid path. As an example, the insertion mutation randomly inserts a single gene in the [1 2 3 4 5 6 7 8 9 10] chromosome to generate the [1 2 4 5 6 7 3 8 9 10] mutated chromosome. Step 1: select a gene randomly, Step 2: insert this gene into a randomly selected location.

In [516]:
def mutate(route, mutationProbability):
    
    # TODO 6 (10 marks) - Replace the dummy mutation function below with Insertion Mutation.
    # The dummy mutation function simply swaps a city with the city before it.  
   
    # Marking scheme: 
    # 7 to 10 marks:  Correct implementation. 
    # 5 to <7 marks:  Minor errors.
    # >0 to <5 marks: Major errors. 
    # 0 marks:        No answer is given. 
     
    mutated_route = route[:]

    for i in range(len(route)):
        if (random.random() < mutationProbability):
            # mutationProbability is the probability of a gene undergoing mutation
            
            # Select a random position to insert the gene
            insert_position = random.randint(0, len(route) - 1)
            
            # Get the gene to be moved
            gene_to_move = mutated_route[i]
            
            # Remove the gene from its original position
            mutated_route.remove(gene_to_move)
            
            # Insert the gene at the new position
            mutated_route.insert(insert_position, gene_to_move)
            
    return mutated_route

Mutation runs over the entire population and mutates each chromosome in the population with a small mutationProbability. 

In [517]:
def mutation(population, mutationProbability):
    mutatedPopulation = []
    for i in range(0, len(population)):
        mutatedIndividual = mutate(population[i], mutationProbability)
        mutatedPopulation.append(mutatedIndividual)
    return mutatedPopulation

You can run the above functions using the sample run below. To do so, simply change the cell type from Markdown to Code, and remove the codeblocks backticks. The sample run initializes a route comprised of 10 cities in cityList, and then mutates it as follows:

```
route = genCityList('cities10.txt')
mutated = mutate(route, 1)  # Give a pretty high chance for mutation
print('Original route')
print(route)
print('Mutated route')
print(mutated)
```

## Running One Generation (or Interation)

Here, we run one generation of genetic algorithm. 

In [518]:
def oneGeneration(population, eliteSize, mutationProbability, parentType='random', remove_duplicate=False, tournament_size=3):
    
    # First we preserve the elites
    elites = survivorSelection(population, eliteSize)
    
    # Then we calculate what our mating pool size should be and generate
    # the mating pool
    poolSize = len(population) - eliteSize
    match parentType:
        case 'tournament':
            matingpool = tournamentParentSelection(population, poolSize, remove_duplicate, tournament_size)
        case 'proportional':
            matingpool = proportionalParentSelection(population, poolSize, remove_duplicate)
        case _:
            matingpool = randomParentSelection(population, poolSize, remove_duplicate)
        
    # Then we perform crossover on the mating pool
    children = breedPopulation(matingpool)
    
    # We combine the elites and children into one population
    new_population = elites + children
    
    # We mutate the population
    mutated_population = mutation(new_population, mutationProbability)
        
    return mutated_population

You can run the above functions using the sample run below. To do so, simply change the cell type from Markdown to Code, and remove the codeblocks backticks. The sample run initializes a population comprised of 5 chromosomes based on 10 cities in cityList, and then run one generation (or iteration) of genetic algorithm as follows:

```
population = initialPopulation(5, genCityList('cities10.txt'))
eliteSize = 1
mutationProbability = 0.01
new_population = oneGeneration(population, eliteSize, mutationProbability)
print('Initial population')
print(population)
print('New population')
print(new_population)
```

## Running Many Generations (or Interations) 

In [519]:
# %%prun -s cumulative -q -l 50 -T profiler.txt
filename = 'cities734.txt'
popSize = 50
eliteSize = 15
mutationProbability = 0.0005
tournament_size = 5
prevent_parent_duplicate = True
iteration_limit = 2000

cityList = genCityList(filename)
iter_min_dist = []
iter_best_route = []

population = initialPopulation(popSize, cityList)
distances = [Fitness(p).routeDistance() for p in population]
index = np.argmin(distances)

min_dist = min(distances)
iter_min_dist.append(min_dist)
iter_best_route.append(population[index])

print("Best distance for initial population: " + str(min_dist))

for i in range(iteration_limit):
    population = oneGeneration(population, eliteSize, mutationProbability, 'proportional', prevent_parent_duplicate, tournament_size)
    distances = [Fitness(p).routeDistance() for p in population]
    index = np.argmin(distances)

    best_route = population[index]
    min_dist = min(distances)
    iter_min_dist.append(min_dist)
    iter_best_route.append(best_route)

    print("Best distance for population in iteration " + str(i) +
          ": " + str(min_dist) + ", best distance so far: " + str(min(iter_min_dist)))

print("Optimal path for last iteration is " + str(best_route)) 
print("Optimal path for all iterations is " + str(iter_best_route[np.argmin(iter_min_dist)]))

    # TODO 7 (10 marks) - Performance Evaluation. You will present the performance achieved 
    # by different parent selection function. You will compare the 
    # performance achieved by Random Selection, Tournament Selection, and Proportional Selection. 
   
    # Marking scheme: 
    # 7 to 10 marks:  In-depth performance evaluation. Optimal routes are found. 
    # 5 to <7 marks:  Clear understanding of performance evaluation.
    # >0 to <5 marks: Inaccurate or unclear understanding of performance evaluation. 
    # 0 marks:        No answer is given. 


'Best distance for initial population: 1585717.1569489923'
('Best distance for population in iteration 0: 1587287.7591830916, best '
 'distance so far: 1585717.1569489923')
('Best distance for population in iteration 1: 1583806.99842127, best distance '
 'so far: 1583806.99842127')
('Best distance for population in iteration 2: 1572373.134371163, best '
 'distance so far: 1572373.134371163')


('Best distance for population in iteration 3: 1572373.134371163, best '
 'distance so far: 1572373.134371163')
('Best distance for population in iteration 4: 1572373.134371163, best '
 'distance so far: 1572373.134371163')
('Best distance for population in iteration 5: 1553726.3159176137, best '
 'distance so far: 1553726.3159176137')
('Best distance for population in iteration 6: 1553726.3159176137, best '
 'distance so far: 1553726.3159176137')
('Best distance for population in iteration 7: 1558713.4379660196, best '
 'distance so far: 1553726.3159176137')
('Best distance for population in iteration 8: 1560103.2038282717, best '
 'distance so far: 1553726.3159176137')
('Best distance for population in iteration 9: 1558339.1977877712, best '
 'distance so far: 1553726.3159176137')
('Best distance for population in iteration 10: 1558339.1977877712, best '
 'distance so far: 1553726.3159176137')
('Best distance for population in iteration 11: 1558339.1977877712, best '
 'distance so fa

KeyboardInterrupt: 