### Genetic Algorithm 
- uses energy efficiency dataset https://archive.ics.uci.edu/ml/datasets/energy+efficiency#


We perform energy analysis using 12 different building shapes simulated in Ecotect. The buildings differ with respect to the glazing area, the glazing area distribution, and the orientation, amongst other parameters. We simulate various settings as functions of the afore-mentioned characteristics to obtain 768 building shapes. The dataset comprises 768 samples and 8 features, aiming to predict two real valued responses. It can also be used as a multi-class classification problem if the response is rounded to the nearest integer.


Attribute Information:

The dataset contains eight attributes (or features, denoted by X1...X8) and two responses (or outcomes, denoted by y1 and y2). The aim is to use the eight features to predict each of the two responses. 

Specifically: 
- X1	Relative Compactness 
- X2	Surface Area 
- X3	Wall Area 
- X4	Roof Area 
- X5	Overall Height 
- X6	Orientation 
- X7	Glazing Area 
- X8	Glazing Area Distribution 
- y1	Heating Load 
- y2	Cooling Load



In [24]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
from sklearn.model_selection import cross_validate
from sklearn import preprocessing
from sklearn import svm

### Get the dataset
- df.sample(frac=1) shuffles the data 

In [25]:
df = pd.read_csv("energy_efficiency.csv")
df = df.sample(frac=1)
rows = len(df)
cols = len(df.keys())
print(rows, cols)
df.head()

768 10


Unnamed: 0,X1,X2,X3,X4,X5,X6,X7,X8,Y1,Y2
450,0.79,637.0,343.0,147.0,7.0,4,0.25,4,38.33,44.16
458,0.74,686.0,245.0,220.5,3.5,4,0.25,4,12.18,15.03
115,0.79,637.0,343.0,147.0,7.0,5,0.1,2,36.03,42.86
109,0.82,612.5,318.5,147.0,7.0,3,0.1,2,24.23,25.02
248,0.86,588.0,294.0,147.0,7.0,2,0.1,5,27.03,25.82


### X6 and X8 are categorical variables

In [26]:
X = pd.get_dummies(df.iloc[:,0:(cols-2)], columns=['X6', 'X8']).values
y = df.iloc[:, (cols-2):].values
print(X[0])
print(y[0])

[7.90e-01 6.37e+02 3.43e+02 1.47e+02 7.00e+00 2.50e-01 0.00e+00 0.00e+00
 1.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 0.00e+00 1.00e+00 0.00e+00]
[38.33 44.16]


### Scale the data with minmax scaler

In [28]:
mms = preprocessing.MinMaxScaler()
X = mms.fit_transform(X)
print(X[0])

### Set genetic algorithm parameters
- M : number of generations
- N : population size
- Pc: probability of crossover
- Pm: probability of mutation
- l : string size
- k : tournament selection contestants

### **since we are performing a GA for SVM, the parameters to tune are c and gamma**

#### SVM hyperparameters
- kernel: poly, rbf, linear
- poly: degree of polynomial
- c: controls the tradeoff between low error and maximizing the norm of the weight
- gamma: determines strength of training sample with gaussian kernel

In [34]:
Pc = .9 #these do not need to sum to 1
Pm = .1
M = 100
N = 400
l = 24 # first 12 are for c, second 12 are for gamma

In [35]:
# the chromosome
xy = np.random.choice([0,1], size=(l,), p=[.5, .5])
print(xy)

[0 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 1 1 0 1 0 1 1 1]


### (1) Create a random population of chromosomes (potential solutions)

In [38]:
population = np.empty((0, len(xy)))
for i in range(N):
    random.shuffle(xy)
    population = np.vstack((population, xy))

In [39]:
print(population[0:5])

[[1. 1. 0. 0. 1. 0. 1. 1. 0. 0. 1. 1. 1. 1. 1. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 0. 1. 1. 0. 0. 0. 1. 1. 1. 0. 0. 1.]
 [0. 1. 1. 1. 0. 0. 1. 1. 0. 1. 1. 0. 0. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0.]
 [1. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 1. 1. 0. 1. 1. 1. 1. 0. 1. 0. 0. 1. 0.]
 [0. 1. 0. 0. 0. 0. 1. 1. 1. 1. 1. 1. 0. 0. 1. 0. 0. 0. 0. 1. 1. 0. 0. 1.]]


### (2) Calculate precision of the chromosomes
- Range (a,b)
- length, l
- precision (b-a)/(2^l-1)
- encode (literal base 2 encoding)
- decode (sum(bit*2^i)*precision + a

In [42]:
# parameter c
ac = 10
bc = 1000
lc = len(xy)/2
pc = (bc - ac)/((2**lc)-1)

In [43]:
# parameter gamma
ag = .05
bg = .99
lg = len(xy)/2
pg = (bg - ag)/((2**lg)-1)

### Decode the chromosomes
- returns a real value in the specified range for the parameter

In [55]:
def decode(xy, index, precision, lowerBound):
    """Decodes a chromosome into a real value"""
    index = index
    power = 0 #binary powers
    sum = 0
    for i in range(len(xy)//2):
        val = xy[index]*(2**power)
        sum += val
        index -= 1
        power += 1
    return (sum * precision) + lowerBound

In [57]:
cIndex = -1 # start at the end of the chromosome
gIndex = (l//2) + 1 #start in the middle of the chromosome
c1 = decode(xy, cIndex, pc, ac)
g1 = decode(xy, gIndex, pg, ag)
print(c1)
print(g1)

883.2307692307693
0.8311526251526251


### Perform the algorithm

- keep track of children and mutations 

In [None]:
bestC = []
worstC = []
bestG = []
worstC = []

ofg = np.empty((0, len(xy)+2))
ofgf = []

mgm1 = np.empty((0, len(xy)+1))
mgm2 = np.empty((0, len(xy)+1))

mgm11 = np.empty((0, len(xy)+2))
mgm22 = np.empty((0, len(xy)+2))

mgm111 = np.empty((0, len(xy)+2))
mgm222 = np.empty((0, len(xy)+2))


In [72]:
gen = M
pop = N
for i in range(1):
    newPopulation = np.empty((0, len(xy)))
    
    agc1 = np.empty((0, len(xy)+1))
    acg2 = np.empty((0, len(xy)+1))
    
    mgc1 = []
    mgc2 = []
    
    bestgc = np.empty((0, len(xy)+1))
    fbgc = []
    fwgc = []
    
    #print("generation: ", i)
    
    for j in range(1):
        #print("family: ", j)
        
        #get 2 parents
        parents = np.empty((0, len(xy)))
        #get some random samples to pick good parents
        moms = population[np.random.choice(len(population), 3, replace=False)]
        dads = population[np.random.choice(len(population), 3, replace=False)]
         
            
        
        print(moms)
        print(dads)
        
        

[[1. 1. 1. 0. 1. 0. 1. 1. 1. 1. 0. 0. 0. 0. 1. 0. 1. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 1. 1. 1. 1. 0. 1. 0. 0. 1. 0. 1. 0. 1. 0.]
 [1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 1. 1. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 0. 0.]]
[[1. 0. 1. 0. 0. 0. 0. 1. 0. 0. 0. 1. 1. 0. 1. 1. 1. 1. 0. 1. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1. 1. 0. 0. 0. 1. 1. 0. 1. 1. 0. 1. 1. 1. 0. 1. 0. 0. 0. 1.]
 [0. 1. 1. 0. 1. 1. 1. 0. 1. 1. 0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 0. 0. 0. 0.]]
[[0. 0. 1. 0. 0. 1. 1. 1. 0. 1. 0. 1. 1. 0. 0. 0. 0. 0. 1. 1. 1. 0. 1. 0.]
 [0. 1. 0. 0. 1. 1. 1. 0. 0. 1. 0. 0. 1. 1. 0. 0. 1. 0. 0. 0. 1. 1. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 1. 1. 1. 0. 1. 1. 0. 1. 0. 1. 1. 0. 1. 1.]]
[[1. 0. 1. 0. 0. 0. 0. 0. 1. 0. 0. 0. 1. 1. 1. 0. 0. 1. 1. 0. 1. 0. 1. 1.]
 [0. 1. 1. 0. 1. 1. 1. 0. 0. 1. 0. 1. 0. 0. 0. 0. 1. 0. 0. 1. 0. 1. 0. 1.]
 [1. 1. 1. 1. 0. 1. 0. 0. 0. 1. 1. 0. 0. 0. 0. 0. 0. 0. 1. 1. 0. 0. 1. 1.]]
