
Notebook originally taken from (but modified here):  
https://github.com/daydrill/ga_pycon_2016_apac

Slides taken from:  
https://github.com/pythonkr/pyconapac-2016-files/raw/master/20160814-102-16-SongChisung.pdf




## Brief Introduction

### Genetics and genetic programming

![song01](images/song01.png)

Population, chromosomes and genes:  
![song01](images/song02.png)

Basic genetic programming operations:  
![song01](images/song03.png)

Selection:  
![song01](images/song04.png)

Crossover (gene sharing):  
![song01](images/song05.png)

Mutation (gene modification):  
![song01](images/song06.png)

GA/GP workflow:  
![song01](images/song07.png)


### DEAP Architecture


![song01](images/song08.png)

![song01](images/song09.png)


<hr style="border-color:#ff9900"> 

## Practice 1 : 

<hr style="border-color:#ff9900"> 

This example uses DEAP to create a list of numbers that add up to some value.

## Setting Things Up


Assuming that you alreadyhave DEAP type the following on your shell:

    conda install -c conda-forge folium 



In [72]:
import random
import numpy as np
from deap import algorithms, base, creator, tools

doVerbose = True
listLength = 5
sumUpTo = 100

In [73]:
if doVerbose:
    # manually compile list:
    # Here are 5 numbers when added, we get a hundred.
    list1 = [100, 0, 0, 0, 0]
    list2 = [20, 21, 19, 15, 25]
    (sum(list1), sum(list2))

### Creator
- Meta-factory allowing the run-time creation of classes via both inheritance and composition.
- Attributes, both data and functions, can be dynamically added to existing classes in order to create user-specific new types.
- By using this, the creation of individuals and populations from any data structure ( list, set, dictionary, tree, etc… )

In [76]:
# Creates a new class named "FitnessMin" inheriting from "base.Fitness" with attribute "weights=(-1.0,)"
# The fitness is a measure of quality of a solution.
creator.create("FitnessMin", base.Fitness, weights=(-1.0,)) # -1 -> minimum problem
creator.create("Individual", list, fitness=creator.FitnessMin)



### Toolbox
- Container for the tools (operators) that the user wants to use.
- Manually populated by the user with selected tools.

In [77]:
toolbox = base.Toolbox()

# Attribute generator : get random number between 0~100
toolbox.register("attr_bool", random.randint, 0, sumUpTo)

# Structure initializers
toolbox.register("individual", tools.initRepeat, creator.Individual, toolbox.attr_bool, listLength)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)


## The Evaluation Function

objective function  built for our problem: want to minimise the sum distance from `sumUpTo`

In [78]:
def sum_error(individual):
    error = abs(sumUpTo - sum(individual))
    # note the ',' and end of line, makes a tuple:
    return error,

In [79]:
if doVerbose:
    ind = toolbox.individual()
    print('Demonstrate the attribute, individual and population:')
    print(f'Attribute demo: {toolbox.attr_bool()}') 
    print(f'Individual demo: {ind}')  
    print(f'Individual sum: {sum(ind)}')  # demo test individual sum
    print(type(sum_error(ind))) # error
    print(sum_error(ind)) # error
    print(toolbox.population(n=10)) # demo pop of 10 indv
    

Demonstrate the attribute, individual and population:
Attribute demo: 74
Individual demo: [78, 31, 98, 15, 19]
Individual sum: 241
<class 'tuple'>
(141,)
[[57, 67, 36, 99, 16], [72, 15, 88, 91, 70], [4, 83, 74, 84, 46], [51, 38, 65, 10, 65], [76, 29, 37, 73, 13], [76, 68, 13, 1, 77], [32, 49, 71, 72, 7], [86, 19, 96, 25, 81], [93, 65, 74, 77, 93], [50, 89, 20, 83, 32]]


## The Genetic Operators
- Operators are just like initializers, except that some are already implemented in the [tools](http://deap.readthedocs.io/en/master/api/tools.html#module-deap.tools) module. 
- __Once you’ve chosen the perfect ones, simply register them in the toolbox.__

In [80]:
toolbox.register("evaluate", sum_error)

In [81]:
toolbox.register("mate", tools.cxTwoPoint)
toolbox.register("mutate", tools.mutUniformInt, low=0, up=100, indpb=0.2) # Independent probability  : for each attribute to be mutated.# low~up rondom int
toolbox.register("select", tools.selTournament, tournsize=3)

## Evolving the Population

### Creating the Population

In [82]:
pop = toolbox.population(n=100)
pop[:10] # only print the first few

[[74, 92, 77, 93, 26],
 [43, 19, 4, 6, 4],
 [50, 81, 53, 74, 49],
 [50, 68, 35, 95, 36],
 [57, 66, 43, 15, 69],
 [50, 46, 2, 100, 23],
 [56, 56, 4, 38, 66],
 [62, 44, 9, 19, 45],
 [88, 20, 14, 55, 45],
 [71, 52, 40, 0, 72]]

### The Appeal of Evolution

In [83]:
# Use of a HallOfFame in order to keep track of the best individual to appear in the evolution 
# (it keeps it even in the case it extinguishes)
hof = tools.HallOfFame(1)

In [84]:
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean)
stats.register("std", np.std)
stats.register("min", np.min)
stats.register("max", np.max)

In [90]:
# algorithms : contains useful implements some basic GA
pop, log = algorithms.eaSimple(pop, toolbox, cxpb=0.5, mutpb=0.2, ngen=40, stats=stats, halloffame=hof, verbose=True)
print(log)
print(pop)

gen	nevals	avg 	std    	min	max
0  	0     	9.98	27.1116	0  	155
1  	66    	8.72	23.2379	0  	119
2  	65    	10.96	24.835 	0  	125
3  	64    	6.69 	22.9158	0  	157
4  	68    	4.54 	14.6516	0  	83 
5  	61    	7.07 	19.1014	0  	84 
6  	59    	7.22 	21.706 	0  	108
7  	63    	5.12 	16.2649	0  	79 
8  	59    	5.21 	18.449 	0  	107
9  	62    	4.2  	15.8865	0  	108
10 	57    	4.51 	13.8978	0  	82 
11 	58    	2.7  	9.81071	0  	65 
12 	64    	6.71 	21.3646	0  	127
13 	61    	3.75 	12.2584	0  	66 
14 	60    	9.8  	27.5024	0  	184
15 	51    	5.56 	18.4766	0  	105
16 	63    	7.61 	22.0031	0  	119
17 	52    	8.14 	24.1764	0  	156
18 	55    	7.1  	27.0498	0  	174
19 	54    	5.39 	20.2711	0  	137
20 	41    	7.46 	23.0423	0  	136
21 	69    	8.84 	26.8651	0  	143
22 	60    	3.63 	12.9658	0  	73 
23 	63    	4.23 	13.9985	0  	78 
24 	57    	6.46 	22.381 	0  	149
25 	47    	4.17 	14.3137	0  	79 
26 	60    	8.13 	25.5067	0  	125
27 	62    	4.01 	14.0964	0  	76 
28 	54    	4.26 	14.0197	0  	74 
29 	65    	5.

In [93]:
[sum(ind) for ind in pop[:10]] # first ten

[100, 100, 100, 158, 100, 100, 100, 100, 100, 100]

In [94]:
tools.selBest(pop, k=5)

[[24, 30, 17, 18, 11],
 [24, 30, 17, 18, 11],
 [24, 30, 17, 18, 11],
 [24, 30, 17, 18, 11],
 [24, 30, 17, 18, 11]]

<hr style="border-color:#ff9900"> 

## Practice 2 : Travelling salesman

Find the Best Tour Route using NSGA-ii

<hr style="border-color:#ff9900"> 


# Preparing Data for GA

- Data from [Seoul data portal](http://data.seoul.go.kr/openinf/sheetview.jsp?infId=OA-12929&tMenu=11)

## Load Data

In [95]:
import pandas as pd
import numpy as np

from math import radians, cos, sin, asin, sqrt 

# Geo data viz
import folium

Read the list of sites

In [97]:
df_spot = pd.read_csv("data/seoul_street.csv")
df_spot = df_spot.drop(['street_ko','address','dong_ko'],axis=1)
df_spot.head()

Unnamed: 0,street_en,dong_en,lng,lat,score
0,fishing-tackle street,Hoehyeon-dong,126.977432,37.55805,80
1,Wangsimni Gopchang Alley,Hwanghak-dong,127.020434,37.56861,68
2,The north side of Ring-Road,Gwanghui-dong,126.995925,37.56135,55
3,Jangchungdan-gil,Pil-dong,127.000575,37.561386,87
4,Chodong-gil,Euljiro-dong,126.997646,37.56641,4


## Visualise the Data

In [98]:
# show the map
m = folium.Map(location=[np.mean(df_spot.lat), np.mean(df_spot.lng)], zoom_start=11, tiles='Stamen Toner')
# and the sites to visit
for index, row in df_spot.iterrows():
    popup_txt = "%s // Score : %s " % (row.street_en, row.score)
    folium.Marker([row.lat, row.lng], popup=popup_txt).add_to(m)
m

# Applying Genetic Algorithm

## Setting Things Up

In [99]:
import random
import numpy as np
from deap import algorithms, base, creator, tools

numSpots = 5

### Creator

In [100]:
creator.create("FitnessMulti", base.Fitness, weights=(-1.0,1.0))
creator.create("Individual", list, fitness=creator.FitnessMulti)



### Toolbox

In [101]:
toolbox = base.Toolbox()

# Attribute generator 
toolbox.register("index", np.random.choice, len(df_spot), numSpots, replace=False) # choose numSpots spots

# Structure initializers
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.index)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)

## Select the sites

In [110]:
def create_tour(individual):
    return [(df_spot.iloc[i].lat, df_spot.iloc[i].lng) for i in individual]

# sample individual -> just index of the data
ind = toolbox.individual()

# convert index to geo data for get distance.
df_spot.iloc[ind]
tour = create_tour(ind)


## The Evaluation Function
- objective function 1. __total distance -> minimun__
- objective function 2. __total score -> maximum__

In [115]:
## Function for get a total distance of tour case

def distance(spot1, spot2):
    # convert decimal degrees to radians 
    lng1, lat1, lng2, lat2 = map(radians, [spot1[0], spot1[1], spot2[0], spot2[1]])
    
    RADIUS = 6371 # FAA approved globe radius in km
    
    dlng = lng2-lng1
    dlat = lat2-lat1
    a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlng/2)**2
    c = 2 * asin(sqrt(a)) 
    dist = RADIUS * c
    return dist


def total_distance(tour):
    tour_sum = sum(distance(tour[i], tour[i+1]) for i in range(len(tour)-1))
    return tour_sum


def total_score(individual):
    return sum([df_spot.iloc[i]['score'] for i in individual]) 


def eval_func(individual):
    
    # 1 total distance -> minimun
    t_dist = total_distance(create_tour(individual))
    
    # 2 total score -> maximum
    t_score = total_score(individual)
    
    # 3 penalty
    penalty = len(individual) - len(set(individual))
    t_dist += penalty*1000000
    t_score -= penalty*1000000
    
    return t_dist, t_score

In [117]:
if doVerbose:
    print(f'Selected site indexes: {ind}')
    print(f'Selected site coordinates: {tour}')
    # total distance of sample individual
    print(f'Total distance: {total_distance(tour)}')
    print(f'Total score: {total_score(ind)}')
    print(f'Fitness score: {eval_func(ind) }')



Selected site indexes: [49, 37, 61, 4, 36]
Selected site coordinates: [(37.5546022525, 126.95608923370001), (37.6168942335, 126.99739487219999), (37.4829408135, 126.948985568), (37.5664098491, 126.9976456492), (37.485404346799996, 127.1221039568)]
Total distance: 39.292110678527436
Total score: 146
Fitness score: (39.292110678527436, 146)


In [24]:
def plot_tour(ind):
    tour = create_tour(ind)
    m = folium.Map(location=[np.mean(df_spot.lat), np.mean(df_spot.lng)], zoom_start=13)
    path=folium.PolyLine(locations=tour,weight=5)
    m.add_child(path)
    for i,loc in enumerate(ind):
        popup_txt = "%s // Score : %s " % (df_spot.iloc[loc].street_en, df_spot.iloc[loc].score)
        folium.Marker(tour[i], popup=popup_txt).add_to(m)
    return m

In [114]:
# vizualize the tour initially before optimisation
plot_tour(ind)

## The Genetic Operators

In [32]:
toolbox.register("evaluate", eval_func)

In [118]:
toolbox.register("select", tools.selNSGA2)
toolbox.register("mate", tools.cxTwoPoint)
# tools.mutShuffleIndexes : Shuffle the attributes of the input individual and return the mutant.
toolbox.register("mutate", tools.mutShuffleIndexes, indpb=0.8) 

## Evolving the Population

In [119]:
POP_SIZE = 100
MAX_GEN = 100
MUT_PROB = 0.2
CX_PROB = 0.8

### Creating the Population

In [47]:
pop = toolbox.population(n=POP_SIZE)
#pop

### The Appeal of Evolution

In [36]:
stats = tools.Statistics(lambda ind: ind.fitness.values)
stats.register("avg", np.mean, axis=0) 
stats.register("min", np.min, axis=0)
stats.register("max", np.max, axis=0)

In [48]:
%%time 
result, log = algorithms.eaMuPlusLambda(pop, 
                                     toolbox, 
                                     mu=POP_SIZE, # The number of individuals to select for the next generation.
                                     lambda_= POP_SIZE, # The number of children to produce at each generation.
                                     cxpb= CX_PROB,
                                     mutpb= MUT_PROB, 
                                     stats= stats, 
                                     ngen= MAX_GEN,
                                     verbose= True)

gen	nevals	avg                        	min                      	max                        
0  	100   	[ 42.43600958 223.68      ]	[14.42942205 53.        ]	[ 91.39867802 415.        ]
1  	100   	[ 33.83748273 260.8       ]	[12.97347948 53.        ]	[ 82.38931156 415.        ]
2  	100   	[ 29.37150932 291.73      ]	[ 6.43936981 65.        ]	[ 82.38931156 425.        ]
3  	100   	[ 25.96592586 321.57      ]	[ 6.43936981 65.        ]	[ 66.73234909 425.        ]
4  	100   	[ 24.17234973 342.86      ]	[  6.43936981 181.        ]	[ 65.75527953 441.        ]
5  	100   	[ 18.69964312 347.32      ]	[  6.43936981 186.        ]	[ 42.95289312 447.        ]
6  	100   	[ 16.42008031 364.87      ]	[  6.27041882 186.        ]	[ 42.95289312 468.        ]
7  	100   	[ 15.22321792 383.34      ]	[  6.1398439 238.       ]  	[ 34.69632753 468.        ]
8  	100   	[ 13.7453799 396.77     ]  	[  4.64643646 238.        ]	[ 32.17913321 483.        ]
9  	100   	[ 12.52667662 409.11      ]	[  4.64643646 282.   

## Make a decision

In [49]:
fronts = tools.emo.sortLogNondominated(result, len(result))
fronts

[[[7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 25, 26],
  [7, 6, 0, 2

In [53]:
from bokeh.plotting import figure, output_notebook, show
from bokeh.models import HoverTool, ColumnDataSource
from bokeh import palettes
output_notebook()

In [54]:
def viz_front(fronts):
#     TOOLS = "pan,wheel_zoom,box_zoom,reset,resize"
    TOOLS = "pan,wheel_zoom,box_zoom,reset"
    hover = HoverTool(
            tooltips=[
                ("index", "$index"),
                ("(x,y)", "($x, $y)"),
                ("individual", "@ind"),
            ]
        )
    front_colors = []
    p = figure(plot_width=700, plot_height=700, tools=[TOOLS,hover], title="NSGAii Test")

    for i,inds in enumerate(fronts):
        par = [(ind, toolbox.evaluate(ind)) for ind in inds]
        source = ColumnDataSource(
                data=dict(
                    x= [p[1][0] for p in par],
                    y= [p[1][1] for p in par],
                    ind= [p[0] for p in par]
                )
            )
        p.circle('x', 'y', size=10, source=source, alpha=0.7, fill_color=palettes.YlGnBu9[i], legend='Front %s'%(i+1), line_color="#ffffff")
    show(p)

In [55]:
viz_front(fronts)

In [56]:
fronts[0][0] # One of the Obtimal Solution

[7, 6, 0, 25, 26]

In [57]:
df_spot.iloc[fronts[0][0]] # Information of that solution

Unnamed: 0,street_ko,street_en,address,dong_ko,dong_en,lng,lat,score
7,명동거리,Myeong-dong street,서울시 중구 명동 일대,명동,Myeong-dong,126.978819,37.568059,97
6,돌담길,Doldam-gil,서울시 중구 소공동 일대,소공동,Sogong-dong,126.979934,37.56413,34
0,남대문거리,fishing-tackle street,서울시 중구 회현동 일대,회현동,Hoehyeon-dong,126.977432,37.55805,80
25,경리단길,Finance corps street,서울시 용산구 이태원2동 일대,이태원2동,Itaewon2-dong,126.991664,37.539718,99
26,이태원거리,Itaewon street,서울시 용산구 이태원1동 일대,이태원1동,Itaewon2-dong,126.991704,37.538994,99


In [59]:
print(eval_func(fronts[0][99])) # Higher score but Longer distance
plot_tour(fronts[0][99])

(8.725423290508745, 492)


In [60]:
print(eval_func(fronts[0][0]))  # Shorter distance but Lower score
plot_tour(fronts[0][0])

(2.83392693655003, 409)
