# Hadi Babalou - 810199380

Artificial Intelligence - CA#02: *Genetics* - Spring 2023 \
In this notebook, we will implement a genetic algorithm to solve a portfolio optimization problem.

## Genetic Representation


In this section, we will define the genetic representation of our problem. \
Each chromosome will represent a portfolio. The length of the chromosome will be equal to the number of assets in our portfolio. Each gene in the chromosome will represent the weight of the corresponding asset in the portfolio. \
The sum of the weights of all assets in the portfolio will be equal to 1. The weights of the assets in the portfolio will be in the range of [0, 1]. The weights of the assets in the portfolio will be represented by a floating point number. \
For example, if we have 3 assets in our portfolio, a chromosome will be represented as follows:

`[0.12, 0.45, 0.43]`

In [1]:
import random
import pandas as pd


In [2]:
class Chromosome:
    genes: list[float] = []
    
    def __init__(self, genes: list[float]) -> None:
        self.genes = genes

    def __str__(self) -> str:
        return str(self.genes)
    
    def __repr__(self) -> str:
        return str(self.genes)
    
    def __eq__(self, other: any) -> bool:
        if not isinstance(other, Chromosome):
            return False
        return self.genes == other.genes
    
    def normalize(self) -> None:
        genes_sum: float = sum(self.genes)
        for i in range(len(self.genes)):
            self.genes[i] /= genes_sum

    # this should be an epsilon value instead of 0 because of floating point errors
    def get_diversity(self) -> int:
        diversity = 0
        epsilon = 1e-5
        for i in range(len(self.genes)):
            if abs(self.genes[i]) > epsilon:
                diversity += 1
        return diversity
    
    def mutate(self, mutation_probability: float) -> None:
        for i in range(len(self.genes)):
            if random.random() < mutation_probability:
                self.genes[i] = random.random()

    def crossover(self, other: any, probability: float) -> tuple[object, object]:
        if not isinstance(other, Chromosome):
            raise TypeError("other must be Chromosome")
        
        child1: list[float] = []
        child2: list[float] = []
        for i in range(len(self.genes)):
            if random.random() < probability:
                child1.append(self.genes[i])
                child2.append(other.genes[i])
            else:
                child1.append(other.genes[i])
                child2.append(self.genes[i])
            
        return Chromosome(child1), Chromosome(child2)

## Population Initialization


In this section, we will initialize the population using ```generate_population()``` function. \
This function will create a population of chromosomes. The size of the population will be equal to the ```population_size``` parameter. The length of each chromosome will be equal to the ```stocks_count``` parameter. The weights of the assets in the portfolio will be a floating point number in the range of [0, 1]. \
This function also sets the ```generation``` attribute of the population to 1.


## Fitness Function


This section is implemented in the ```get_fitness()``` function. \
This function gets a chromosome as an input and returns the fitness of the chromosome. The fitness of the chromosome is calculated using the following formula:
$$ fitness = w_{diversity} * diversity + \sum_{i=1}^{stocks\_count} (w_{profit} \times ratio_{i} \times profit_{i} - w_{risk} \times ratio_{i} \times risk_{i}) $$
Where $w_{diversity}$, $w_{profit}$, and $w_{risk}$ are constants that represent the importance of diversity, profit, and risk in the fitness function. \
The $diversity$ is the number of assets with non-zero ratio in the portfolio. 


## Crossover and Mutation Implementation and Generating Next Population


### Crossover
This section is implemented in the ```crossover()``` function of the ```Chrmosome``` class. \
This function gets two chromosomes as an input and returns two new chromosomes. For each gene in the chromosome, we generate a random number between 0 and 1. If the number is less than ```CROSSOVER_PROBABILITY```, we copy the genes from parents to the children. Otherwise, we swap the genes between the parents and the children. 


### Mutation
This section is implemented in the ```mutate()``` function of the ```Chrmosome``` class. \
This function gets a chromosome as an input and returns a new chromosome. For each gene in the chromosome, we generate a random number between 0 and 1. If the number is less than ```MUTATION_PROBABILITY```, we generate a new random number between 0 and 1 and assign it to the gene.

### Generating Next Population
This section is implemented in the ```evolve()``` function. \
First we sort the chromosomes of current generation based on their rank. \
Then we need to create mating pool using ```create_mating_pool()``` function. \
This function uses ```roulette_wheel_selection()``` function to select the chromosomes for the mating pool based on genes rank. \
Then we apply crossover and mutation to the chromosomes in the mating pool to create the next generation. \
Also we keep the best chromosome of the current generation (elite chromosomes) in the next generation. \
Finally we set the ```generation``` attribute of the population to the next generation number.

In [3]:
class StockSelector:
    stocks: list[tuple[str, float, float]]
    stocks_count: int
    population: list[Chromosome]
    population_size: int
    elite_ratio: float
    elite_count: int
    mutation_probability: float
    mutation_rate: float
    crossover_probability: float
    crossover_rate: float
    stocks_diversity_weight: float
    stock_risk_weight: float
    stock_profit_weight: float
    max_generation: int
    generation: int

    def __init__(self, population_size, elite_ratio,
                mutation_probablity, mutation_rate, crossover_probability, crossover_rate, 
                stocks_diversity_weight, stock_risk_weight, stock_profit_weight,
                max_generation) -> None:
        self.stocks = []
        self.stocks_count = 0
        self.population = []
        self.population_size = population_size
        self.elite_ratio = elite_ratio
        self.elite_count = int(self.population_size * self.elite_ratio)
        self.elite_count = self.elite_count if (self.population_size - self.elite_count)%2 == 0 else self.elite_count + 1
        self.mutation_probability = mutation_probablity
        self.mutation_rate = mutation_rate
        self.crossover_probability = crossover_probability
        self.crossover_rate = crossover_rate
        self.stocks_diversity_weight = stocks_diversity_weight
        self.stock_risk_weight = stock_risk_weight
        self.stock_profit_weight = stock_profit_weight
        self.max_generation = max_generation
        self.generation = 0

    def __str__(self) -> str:
        temp = ""
        temp += "stocks count: " + str(self.stocks_count)
        temp += " population size: " + str(self.population_size)
        temp += " elite ratio: " + str(self.elite_ratio)
        temp += " elite count: " + str(self.elite_count)
        temp += " mutation probability: " + str(self.mutation_probability)
        temp += " mutation rate: " + str(self.mutation_rate)
        temp += " crossover probability: " + str(self.crossover_probability)
        temp += " crossover rate: " + str(self.crossover_rate)
        temp += " stocks diversity weight: " + str(self.stocks_diversity_weight)
        temp += " stock risk weight: " + str(self.stock_risk_weight)
        temp += " stock profit weight: " + str(self.stock_profit_weight)
        temp += " max generation: " + str(self.max_generation)
        temp += " generation: " + str(self.generation)
        return temp

    def load_stocks_from_csv(self, filename: str) -> None:
        with open(filename, 'r') as file:
            next(file)
            for line in file:
                id, name, risk, profit = line.split(',')
                self.stocks.append((name, float(risk), float(profit)))
        self.stocks_count = len(self.stocks)

    def generate_population(self) -> None:
        for _ in range(self.population_size):
            genes = []
            for _ in range(self.stocks_count):
                genes.append(random.random())
            new_chromosome = Chromosome(genes)
            new_chromosome.normalize()
            self.population.append(new_chromosome)
            self.generation = 1
                
    # this fitness function can be negative. still don't know if it's a bad thing
    def get_fitness(self, chromosome: Chromosome) -> float:
        fitness = 0
        for i in range(self.stocks_count):
            fitness += chromosome.genes[i] * self.stocks[i][2] * self.stock_profit_weight
            fitness -= chromosome.genes[i] * self.stocks[i][1] * self.stock_risk_weight
        fitness += chromosome.get_diversity() * self.stocks_diversity_weight
        return fitness
    
    def is_goal(self, chromosome: Chromosome) -> bool:
        risk_sum = 0
        profit_sum = 0
        for i in range(self.stocks_count):
            risk_sum += chromosome.genes[i] * self.stocks[i][1]
            profit_sum += chromosome.genes[i] * self.stocks[i][2]
        if profit_sum >= 10:
            if risk_sum <= 0.6:
                if chromosome.get_diversity() >= 30:
                    print("profit:", profit_sum)
                    print("risk:", risk_sum)
                    print("diversity:", chromosome.get_diversity())
                    return True

    def create_mating_pool(self, population_ranking: list[tuple[float, Chromosome]]) -> list[Chromosome]:
        mating_pool = []
        for _ in range(self.population_size - self.elite_count):
            mating_pool.append(self.roulette_wheel_selection(population_ranking))

        # now we shuffle the mating pool
        random.shuffle(mating_pool)

        return mating_pool
    
    # this is a roulette wheel selection based on fitness ranking
    def roulette_wheel_selection(self, population_ranking: list[tuple[float, Chromosome]]) -> Chromosome:
        # now we can select a chromosome based on its ranking
        # the higher the ranking, the higher the chance of being selected
        # the first chromosome has the lowest chance of being selected
        # the last chromosome has the highest chance of being selected
        total_sum = self.population_size * (self.population_size + 1) / 2
        random_number = int(random.random() * total_sum)
        iterative_sum = 0
        for i in range(self.population_size):
            iterative_sum += (i + 1)
            if random_number <= iterative_sum:
                return population_ranking[i][1]
        
    def evolve(self) -> object:
        # first we rank the population based on fitness
        population_ranking = []
        for chromosome in self.population:
            population_ranking.append((self.get_fitness(chromosome), chromosome))
        population_ranking.sort(key=lambda x: x[0])

        # print some info about the population
        print("generation: ", self.generation)
        print("best fitness: ", population_ranking[-1][0])
        print("worst fitness: ", population_ranking[0][0])
        print("average fitness: ", sum([x[0] for x in population_ranking]) / len(population_ranking))
        print("============================================")

        # check if we reach the goal or not
        if self.is_goal(population_ranking[-1][1]):
            return population_ranking[-1][1]

        mating_pool = self.create_mating_pool(population_ranking)
        new_population = []

        # now we add the elite chromosomes to the new population
        for i in range(self.population_size-1, self.population_size-self.elite_count-1, -1):
            new_population.append(population_ranking[i][1])

        # now we create the rest of the population by crossover and mutation
        for i in range(0, len(mating_pool) - 1, 2):
            # crossover
            child1, child2 = mating_pool[i], mating_pool[i+1]
            if random.random() < self.crossover_rate:
                child1, child2 = mating_pool[i].crossover(mating_pool[i+1], self.crossover_probability)

            # mutation
            if random.random() < self.mutation_rate:
                child1.mutate(self.mutation_probability)
            if random.random() < self.mutation_rate:
                child2.mutate(self.mutation_probability)

            child1.normalize()
            child2.normalize()

            new_population.append(child1)
            new_population.append(child2)

        self.population = new_population
        self.generation += 1

        return None
        
    def find_best_chromosome(self) -> object:
        self.generate_population()
        while self.generation <= self.max_generation:
            best_chromosome = self.evolve()
            if best_chromosome is not None:
                print("goal reached!")
                print(best_chromosome)
                return best_chromosome
                

## Running the Genetic Algorithm

### Parameters
First we need to set the parameters of the genetic algorithm: 
* ```POPULATION_SIZE``` is the size of the population. 
* ```ELITE_RATIO``` is the ratio of elite chromosomes in the population. 
* ```MUTATION_PROBABILITY``` is the treshold used in ```mutate``` function for each gene.
* ```MUTATION_RATE``` is the probability to apply mutation to a chromosome. 
* ```CROSSOVER_PROBABILITY``` is the treshold used in ```crossover``` function for each gene. 
* ```CROSSOVER_RATE``` is the probability to apply crossover to a chromosome. 
* ```STOCKS_DIVERSITY_WEIGHT```, ```STOCK_RISK_WEIGHT```, and ```STOCK_PROFIT_WEIGHT``` are used in fitness function. 
* ```MAX_GENERATIONS``` is the maximum number of generation and algorithm will terminate current states and starts over again.

In [4]:
POPULATION_SIZE = 100
ELITE_RATIO = 0.1
MUTATION_PROBABILITY = 0.1
MUTATION_RATE = 0.1
CROSSOVER_PROBABILITY = 0.5
CROSSOVER_RATE = 0.5
STOCKS_DIVERSITY_WEIGHT = 0.01
STOCK_RISK_WEIGHT = 1
STOCK_PROFIT_WEIGHT = 7
MAX_GENERATION = 1000

best_chromosome = None
while (best_chromosome == None):
    stock_selector = StockSelector(POPULATION_SIZE, ELITE_RATIO, MUTATION_PROBABILITY, MUTATION_RATE, CROSSOVER_PROBABILITY, CROSSOVER_RATE, STOCKS_DIVERSITY_WEIGHT, STOCK_RISK_WEIGHT, STOCK_PROFIT_WEIGHT, MAX_GENERATION)
    stock_selector.load_stocks_from_csv("sample.csv")
    best_chromosome = stock_selector.find_best_chromosome()


generation:  1
best fitness:  9.597198381529626
worst fitness:  7.901673710168383
average fitness:  8.818394951349873
generation:  2
best fitness:  11.034572046048378
worst fitness:  6.864400001630894
average fitness:  8.987950248988206
generation:  3
best fitness:  11.034572046048382
worst fitness:  4.643917514754422
average fitness:  8.709910442826605
generation:  4
best fitness:  11.034572046048382
worst fitness:  5.874484065055135
average fitness:  9.152414288452118
generation:  5
best fitness:  12.052131006575292
worst fitness:  3.4997659179259806
average fitness:  8.842566728135667
generation:  6
best fitness:  12.562715153554642
worst fitness:  4.229665892459277
average fitness:  9.659892376625994
generation:  7
best fitness:  13.784790489460267
worst fitness:  4.656972980213045
average fitness:  9.77267457765083
generation:  8
best fitness:  15.705160034309943
worst fitness:  4.489849392111508
average fitness:  10.361910503978706
generation:  9
best fitness:  15.95288183579235


Finally we store the coefficients in ```result.csv``` file.

In [5]:
csv_input = pd.read_csv("sample.csv")
csv_input['coefficients'] = best_chromosome.genes
csv_input.to_csv("result.csv", index=False)

## Questions


### 1. How can very large or very small population sizes affect the performance of the algorithm?
If the population size is very large, the algorithm will take a long time to converge. If the population size is very small, the algorithm will not be able to find the optimal solution beacuse it will not have enough diversity in the population.

### 2. What happens if the population size increases in each generation?
If the population size increases in each generation, the algorithm will take longer time to converge and it will use more memory. This can help the algorithm to find the optimal solution by increasing the diversity in the population. but by doing this, we also keep the chromosomes with lower fitness in the population and this can cause problems in convergence.

### 3. What is the effect of crossover and mutation? Is it possible to use only one of them?
Crossover is used to create new chromosomes from the existing chromosomes in the population. Mutation is used to change the genes of the chromosomes. If we use only one of them, the algorithm will not be able to find the optimal solution. \
Crossover hopes to reach better chromosomes by combining two good chromosomes while mutation is used to escape getting stuck in a local extremum. \
The crossover probability is usually at least 80% and the mutation probability is usually at most 5%.


### 4. How to accelerate the algorithm?
The algorithm parameters are the most important factor in accelerating the algorithm and they should be set carefully. \
also the fitness function and crossover and mutation functions can be optimized to accelerate the algorithm.

### 5. How to stop the algorithm if it is not converging?
A common problem in genetic algorithms is that it may stop at a local maximum instead of the global maximum. Mutation is a good way to solve this problem. \
Also, we can limit the number of generations in order to stop the algorithm if it is not converging. In this case, we may also use multi-start to increase the probability of finding the global maximum.

### 6. How to stop the algorithm if there exists no solution?
We can limit the numbers that we start the algorithm all over again (multi-start) in case of not finding a solution in specific number of generations. After that we can assume that probably there is no solution.