# Maximise the Ones

by Maxim Shinskiy 1804336

In [17]:
"""
This algorithm is intended to solve
'Maximise the ones' problem
"""
import random

In [18]:
# Define representation: string
n_bits = 100  # bit length
n_pop = 1000  # population size
n_gen = 30   # generations
p_xo = 0.8  # crossover rate
p_mut = 0.02  # mutation rate
n_sel = 2

pop = []

In [19]:
# Create initial population
def create_pop():
    for i in range(n_pop):
        pop.append(''.join(random.choice('01') for j in range(n_bits)))

# Binary --> Decimal
def dec(chromosome):
    dec = 0
    for i in range(len(chromosome)):
        if chromosome[i] == '1':
            dec += 2 ** i
    return dec


# Fitness function
def fitness(chromosome):
    return str(chromosome).count('1')


# Print population
def show_pop():
    f_max = -100000
    f_avg = 0
    f_min = +100000
    for p in pop:
        f = fitness(p)
        if f > f_max:
            f_max = f
        if f < f_min:
            f_min = f
        # print(p + ' ' + str(f))
        f_avg += f
    print("f max:", f_max, " f min:", f_min, " f avg:", int(f_avg/n_pop))
    print('---------------------')


# Do tournament selection
def tournament(inverse):
    index = 0
    f_max = -100000000000
    f_min = +111111111111
    for counter in range(n_sel):
        index_i = random.randint(0, len(pop) - 1)
        f_i = fitness(pop[index_i])
        if inverse == False:
            if f_i > f_max:
                f_max = f_i
                index = index_i
        else:
            if f_i < f_min:
                f_min = f_i
                index = index_i
    return index

In [20]:
def run():
    create_pop()
    
    print('Initial population')
    show_pop()

    # Main loop
    for generation in range(n_gen):
        for individual in range(n_pop):
            if random.random() < p_xo:
                # crossover
                index_individual_1 = tournament(False)
                index_individual_2 = tournament(False)
                cut = random.randint(1, n_bits - 1)  # random cut position
                offspring = pop[index_individual_1][0:cut] + pop[index_individual_2][cut:]

            else:
                # cloning
                offspring = pop[tournament(False)]

            # mutation
            mutation = ''
            for index in range(len(offspring)):
                if random.random() < p_mut:
                    if offspring[index] == '0':
                        mutation += '1'
                    else:
                        mutation += '0'
                else:
                    mutation += offspring[index]

            # steady-state GA, put individual in the current population
            pop[tournament(True)] = mutation

        print('Generation ' + str(generation + 1))
        show_pop()

### Run 1

For:
* population size = 1000
* generation size = 30
* cross-over rate = 0.8
* mutation rate = 0.02
* tournament pop = 2

In [21]:
run()

Initial population
f max: 66  f min: 35  f avg: 49
---------------------
Generation 1
f max: 68  f min: 37  f avg: 55
---------------------
Generation 2
f max: 75  f min: 44  f avg: 60
---------------------
Generation 3
f max: 77  f min: 48  f avg: 64
---------------------
Generation 4
f max: 80  f min: 51  f avg: 67
---------------------
Generation 5
f max: 83  f min: 57  f avg: 71
---------------------
Generation 6
f max: 85  f min: 61  f avg: 74
---------------------
Generation 7
f max: 85  f min: 64  f avg: 76
---------------------
Generation 8
f max: 89  f min: 68  f avg: 79
---------------------
Generation 9
f max: 90  f min: 71  f avg: 81
---------------------
Generation 10
f max: 90  f min: 71  f avg: 83
---------------------
Generation 11
f max: 92  f min: 76  f avg: 84
---------------------
Generation 12
f max: 93  f min: 78  f avg: 86
---------------------
Generation 13
f max: 95  f min: 77  f avg: 87
---------------------
Generation 14
f max: 95  f min: 79  f avg: 88
------

The solution was found in gen 23.
Time to time the minimum fitness drops due to the mutations.

The population is not printed on point due to the large size of it and for the clearer representation.

### Run 2

In [22]:
n_bits = 100  # bit length
n_pop = 100  # population size
n_gen = 100   # generations
p_xo = 0.8  # crossover rate
p_mut = 0.05  # mutation rate
n_sel = 2

pop = []

In [23]:
run()

Initial population
f max: 63  f min: 39  f avg: 50
---------------------
Generation 1
f max: 66  f min: 39  f avg: 55
---------------------
Generation 2
f max: 67  f min: 39  f avg: 58
---------------------
Generation 3
f max: 72  f min: 52  f avg: 62
---------------------
Generation 4
f max: 74  f min: 57  f avg: 65
---------------------
Generation 5
f max: 76  f min: 57  f avg: 67
---------------------
Generation 6
f max: 78  f min: 57  f avg: 68
---------------------
Generation 7
f max: 81  f min: 61  f avg: 70
---------------------
Generation 8
f max: 81  f min: 63  f avg: 72
---------------------
Generation 9
f max: 81  f min: 64  f avg: 73
---------------------
Generation 10
f max: 81  f min: 68  f avg: 75
---------------------
Generation 11
f max: 81  f min: 63  f avg: 75
---------------------
Generation 12
f max: 83  f min: 63  f avg: 76
---------------------
Generation 13
f max: 83  f min: 66  f avg: 77
---------------------
Generation 14
f max: 83  f min: 68  f avg: 77
------

With a decent start (avg fitness = 50), the GA haven't ran into a deadend and with a help of mutation kept searching for the optimal solution.

The solution hasn't been found but in case if there would be more generations there is a chance that the mutation touches the 0 bit in the fittest bit-string, that could lead to the increase in fitness and therefore spread of this gene across the population.

In compare with the GA running a population of size 1000, this performed worse.

Now let's change the probabilities. Keep the population parametres the same because the parametres that are suitable to find the solution have already been found.

### Run 3

In [24]:
n_pop = 100  # population size
n_gen = 100   # generations
p_xo = 0.6  # crossover rate
p_mut = 0.1  # mutation rate
n_sel = 6

pop = []

I expect that with a lower cross-over rate the population will progress slower because the individuals will mostly clone and go from generation to generation unchanged (not taking the mutation in account), however the greater mutation rate will increase the broadness of the search, that may result in some individuals with higher fitness but also in some worse, this will create a greater variance in the population fitness. The selection number of 6 will insure that the very best go to the next generation.

Overall I expect this run to be more successful than previous.

In [25]:
run()

Initial population
f max: 64  f min: 39  f avg: 51
---------------------
Generation 1
f max: 69  f min: 53  f avg: 59
---------------------
Generation 2
f max: 72  f min: 56  f avg: 63
---------------------
Generation 3
f max: 74  f min: 56  f avg: 66
---------------------
Generation 4
f max: 74  f min: 60  f avg: 68
---------------------
Generation 5
f max: 75  f min: 62  f avg: 69
---------------------
Generation 6
f max: 75  f min: 65  f avg: 70
---------------------
Generation 7
f max: 77  f min: 62  f avg: 71
---------------------
Generation 8
f max: 80  f min: 59  f avg: 72
---------------------
Generation 9
f max: 80  f min: 64  f avg: 73
---------------------
Generation 10
f max: 80  f min: 66  f avg: 74
---------------------
Generation 11
f max: 80  f min: 67  f avg: 75
---------------------
Generation 12
f max: 80  f min: 69  f avg: 75
---------------------
Generation 13
f max: 80  f min: 64  f avg: 75
---------------------
Generation 14
f max: 81  f min: 66  f avg: 76
------

As greater the fitness gets the greater the chance the the mutation happens to 1 bit flipping it to 0 and therefore making fitness worse. This might have stopped the GA from the progress. 

With mutation rate of 0.1 it means that on average 10 bits in a 100 bit bit-string will be flipped. For an 'average' bit-string that has 86 'one's in it, it means that 9 'one' bits will be flipped to 'zero' and only 1 'zero' will be flipped to 'one'. That stops the progress.

Change mutation to lower parametre.

Solution was not found.

### Run 4

In [26]:
n_pop = 100  # population size
n_gen = 200   # generations
p_xo = 0.6  # crossover rate
p_mut = 0.02  # mutation rate
n_sel = 10

pop = []

Increase n_gen to 200 to see further progress. Also n_sel to 10 just to experiment.

In [27]:
run()

Initial population
f max: 62  f min: 38  f avg: 49
---------------------
Generation 1
f max: 72  f min: 54  f avg: 63
---------------------
Generation 2
f max: 78  f min: 64  f avg: 71
---------------------
Generation 3
f max: 83  f min: 72  f avg: 77
---------------------
Generation 4
f max: 86  f min: 78  f avg: 82
---------------------
Generation 5
f max: 87  f min: 82  f avg: 84
---------------------
Generation 6
f max: 91  f min: 84  f avg: 86
---------------------
Generation 7
f max: 93  f min: 83  f avg: 89
---------------------
Generation 8
f max: 94  f min: 87  f avg: 91
---------------------
Generation 9
f max: 95  f min: 89  f avg: 92
---------------------
Generation 10
f max: 95  f min: 92  f avg: 93
---------------------
Generation 11
f max: 96  f min: 89  f avg: 94
---------------------
Generation 12
f max: 97  f min: 90  f avg: 94
---------------------
Generation 13
f max: 98  f min: 92  f avg: 95
---------------------
Generation 14
f max: 98  f min: 94  f avg: 96
------

With these parametres, GA performed very well. Found solution early and spread it across the generation, reaching the 99 average fitness in generation 200.

In compare to the first run with 1000 population size, this solution required less computational power and is probably more effective.

### Run 5

In [28]:
n_pop = 250  # population size
n_gen = 30   # generations
p_xo = 0.8  # crossover rate
p_mut = 0.02  # mutation rate
n_sel = 10

pop = []

p_xo to 0.8 and n_pop to 250 but keep n_sel to 10. 

In [29]:
run()

Initial population
f max: 66  f min: 38  f avg: 50
---------------------
Generation 1
f max: 76  f min: 50  f avg: 65
---------------------
Generation 2
f max: 83  f min: 69  f avg: 75
---------------------
Generation 3
f max: 85  f min: 76  f avg: 80
---------------------
Generation 4
f max: 88  f min: 78  f avg: 83
---------------------
Generation 5
f max: 92  f min: 83  f avg: 86
---------------------
Generation 6
f max: 94  f min: 87  f avg: 89
---------------------
Generation 7
f max: 96  f min: 89  f avg: 91
---------------------
Generation 8
f max: 98  f min: 92  f avg: 93
---------------------
Generation 9
f max: 99  f min: 92  f avg: 95
---------------------
Generation 10
f max: 99  f min: 94  f avg: 96
---------------------
Generation 11
f max: 100  f min: 94  f avg: 97
---------------------
Generation 12
f max: 100  f min: 94  f avg: 98
---------------------
Generation 13
f max: 100  f min: 94  f avg: 98
---------------------
Generation 14
f max: 100  f min: 94  f avg: 99
--

I can say surely now, that tournament size helps spread the large fitness across.

However, in this case the fitness is pretty obvious, the more ones the more the fitness. In the case with Knapsack etc. where a bit string may contain several inputs, the fitness that is spread may be false global minimum/maximum and threfore the GA might quickly come to the point wehre all the population contains the solution that is not optimal.



### Run 6

In [31]:
n_pop = 250  # population size
n_gen = 100   # generations
p_xo = 0.8  # crossover rate
p_mut = 0.02  # mutation rate
n_sel = 2

pop = []

n_sel to 2 but keep n_pop at 250 and p_xo at 0.8.

In [32]:
run()

Initial population
f max: 62  f min: 30  f avg: 50
---------------------
Generation 1
f max: 68  f min: 43  f avg: 55
---------------------
Generation 2
f max: 71  f min: 45  f avg: 59
---------------------
Generation 3
f max: 72  f min: 50  f avg: 63
---------------------
Generation 4
f max: 74  f min: 54  f avg: 66
---------------------
Generation 5
f max: 81  f min: 60  f avg: 69
---------------------
Generation 6
f max: 81  f min: 63  f avg: 71
---------------------
Generation 7
f max: 81  f min: 64  f avg: 73
---------------------
Generation 8
f max: 83  f min: 68  f avg: 75
---------------------
Generation 9
f max: 85  f min: 69  f avg: 77
---------------------
Generation 10
f max: 86  f min: 72  f avg: 79
---------------------
Generation 11
f max: 88  f min: 72  f avg: 80
---------------------
Generation 12
f max: 88  f min: 73  f avg: 81
---------------------
Generation 13
f max: 89  f min: 75  f avg: 83
---------------------
Generation 14
f max: 90  f min: 77  f avg: 84
------

Now the optimal population was not spread as quickly and it is still can be observed that at generation 100, the minimum fitness is 91 that has been kept for pretty long.


#### Conclusion 

Too large tournament size is bad, so I'll stick with 3-4 for the next experiments.

Low compation intensity allows to use large populations, so I'll use a population of 1000 individual over 30-50 generations.

0.6-0.7 cross-over rate slows down the progression a little bit and will allow mutation to explore for the solution.

Keep mutation rate at 0.01-0.02.