#### Estimation of energy performance of residential buildings using SVM and MLP with GA optimization based hyperparameter tuning.

Dataset link : http://archive.ics.uci.edu/ml/datasets/energy+efficiency

Course: https://www.udemy.com/course/machine-learning-optimization-using-genetic-algorithm

This study looked into assessing the heating load and cooling load requirements of buildings (that is, energy efficiency) as a function of building parameters.
Analysis using 12 different building shapes simulated in Ecotect. The buildings differ with respect to the glazing area, the glazing area distribution, and the orientation, amongst other parameters. We simulate various settings as functions of the afore-mentioned characteristics to obtain 768 building shapes. The dataset comprises 768 samples and 8 features, aiming to predict two real valued responses. It can also be used as a multi-class classification problem if the response is rounded to the nearest integer.


#### Dataset description

| Features             | Target |
|----------------------|--------|
| Relative Compactness |Heating Load|  
| Surface Area         |Cooling Load| 
| Wall Area            |        | 
| Roof Area            |        | 
| Overall Height            |        | 
| Glazing Area Distribution           |        | 
| Glazing Area           |        | 
| Orientation            |        | 


In [19]:
from google.colab import drive
drive.mount('/content/drive') 
root_path = 'drive/My Drive/ML_DATA/' 

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [0]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.preprocessing import QuantileTransformer
from collections import OrderedDict
from sklearn.svm import SVR
from sklearn.model_selection import cross_val_score
import warnings
warnings.filterwarnings('ignore')

In [21]:
df = pd.read_excel(root_path+"ENB2012_data.xlsx", sheet_name = 0)
columns = ["Relative Compactnes", "Surface Area", "Wall Area", "Roof Area", "Overall Height", "Orientation", "Glazing Area", "Glazing Area Distribution" , "Heating Load", "Cooling Load"]

df.columns = columns

df = df.sample(frac=1)

df.info()

df.head(10)

<class 'pandas.core.frame.DataFrame'>
Int64Index: 768 entries, 361 to 741
Data columns (total 10 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Relative Compactnes        768 non-null    float64
 1   Surface Area               768 non-null    float64
 2   Wall Area                  768 non-null    float64
 3   Roof Area                  768 non-null    float64
 4   Overall Height             768 non-null    float64
 5   Orientation                768 non-null    int64  
 6   Glazing Area               768 non-null    float64
 7   Glazing Area Distribution  768 non-null    int64  
 8   Heating Load               768 non-null    float64
 9   Cooling Load               768 non-null    float64
dtypes: float64(8), int64(2)
memory usage: 66.0 KB


Unnamed: 0,Relative Compactnes,Surface Area,Wall Area,Roof Area,Overall Height,Orientation,Glazing Area,Glazing Area Distribution,Heating Load,Cooling Load
361,0.74,686.0,245.0,220.5,3.5,3,0.25,2,12.45,15.1
242,0.98,514.5,294.0,110.25,7.0,4,0.1,5,24.03,25.88
15,0.82,612.5,318.5,147.0,7.0,5,0.0,0,15.98,24.93
501,0.76,661.5,416.5,122.5,7.0,3,0.25,5,36.64,37.01
360,0.74,686.0,245.0,220.5,3.5,2,0.25,2,12.35,14.73
48,0.98,514.5,294.0,110.25,7.0,2,0.1,1,24.58,26.47
371,0.69,735.0,294.0,220.5,3.5,5,0.25,2,12.95,15.99
97,0.98,514.5,294.0,110.25,7.0,3,0.1,2,24.31,25.63
240,0.98,514.5,294.0,110.25,7.0,2,0.1,5,24.35,25.64
320,0.69,735.0,294.0,220.5,3.5,2,0.25,1,12.78,15.21


In [0]:
df.describe()

Unnamed: 0,Relative Compactnes,Surface Area,Wall Area,Roof Area,Overall Height,Orientation,Glazing Area,Glazing Area Distribution,Heating Load,Cooling Load
count,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0,768.0
mean,0.764167,671.708333,318.5,176.604167,5.25,3.5,0.234375,2.8125,22.307195,24.58776
std,0.105777,88.086116,43.626481,45.16595,1.75114,1.118763,0.133221,1.55096,10.090204,9.513306
min,0.62,514.5,245.0,110.25,3.5,2.0,0.0,0.0,6.01,10.9
25%,0.6825,606.375,294.0,140.875,3.5,2.75,0.1,1.75,12.9925,15.62
50%,0.75,673.75,318.5,183.75,5.25,3.5,0.25,3.0,18.95,22.08
75%,0.83,741.125,343.0,220.5,7.0,4.25,0.4,4.0,31.6675,33.1325
max,0.98,808.5,416.5,220.5,7.0,5.0,0.4,5.0,43.1,48.03


In [0]:
df["Heating Load"].value_counts()

15.16    6
13.00    5
12.93    4
32.31    4
14.60    4
        ..
16.94    1
10.78    1
14.21    1
7.18     1
19.50    1
Name: Y1, Length: 587, dtype: int64

In [0]:
df["Cooling Load"].value_counts()

21.33    4
17.20    4
29.79    4
14.28    4
14.27    4
        ..
36.93    1
36.12    1
32.88    1
20.82    1
16.75    1
Name: Y2, Length: 636, dtype: int64

In [0]:
train_df = df.drop(["Heating Load", "Cooling Load"], axis = 1)
heating_load = df["Heating Load"]
cooling_load = df["Cooling Load"]

In [0]:
scaler = QuantileTransformer()

cols = train_df.columns

sc_train = scaler.fit_transform(train_df)

train_df_scaled = pd.DataFrame(sc_train, columns = cols)

In [0]:
train_df_scaled.tail(30)

Unnamed: 0,Relative Compactnes,Surface Area,Wall Area,Roof Area,Overall Height,Orientation,Glazing Area,Glazing Area Distribution
738,0.875489,0.124511,0.541721,0.166232,1.0,0.375489,1.0,0.344198
739,0.207953,0.792047,0.541721,1.0,0.0,0.625815,0.531291,0.719035
740,0.374837,0.625815,0.124511,1.0,0.0,0.625815,0.531291,0.156454
741,0.542373,0.458279,1.0,0.166232,1.0,1.0,1.0,0.719035
742,0.625815,0.374837,0.750326,0.375489,1.0,0.375489,0.531291,0.531943
743,0.124511,0.875489,0.750326,1.0,0.0,0.625815,0.219035,1.0
744,0.542373,0.458279,1.0,0.166232,1.0,1.0,0.531291,1.0
745,0.292047,0.708605,0.291395,1.0,0.0,1.0,1.0,0.156454
746,0.542373,0.458279,1.0,0.166232,1.0,0.0,1.0,0.156454
747,0.458279,0.542373,0.0,1.0,0.0,0.625815,0.531291,0.156454


In [9]:
clf = SVR()
# clf.set_params()
score = cross_val_score(clf, train_df, heating_load, cv=3)
print(score)

[0.62128043 0.69644926 0.70891281]


In [0]:
def decode(chromosome, parameters):
  train_params = OrderedDict()
  #print(chromosome)
  it = 0
  reversed_chromosome = chromosome[::-1]
  #print(reversed_chromosome)
  for i in parameters.keys():
    chromos = reversed_chromosome[it:it+parameters[i]["length"]]
    #print(reversed_chromosome)
    #print(chromos)
    ssum = 0;

    for j in range(0, parameters[i]["length"]):
      ssum = ssum + chromos[j]*(2**j)

    ssum = ssum*parameters[i]["precision"] +  parameters[i]["lower_bound"]

    train_params[i] = ssum
    it  = it + parameters[i]["length"] 

  return  train_params

def svm_r2_score(X, y, param, cv = 3):
    clf = SVR()
    clf.set_params(**param)
    return cross_val_score(clf, train_df, heating_load, cv=3).mean()

def select_parents(population_list):
  tournament_1_index = np.random.randint(len(population_list), size = 3)
  tournament_1_index_scores = [population_list[tournament_1_index[0]]["score"], population_list[tournament_1_index[1]]["score"], population_list[tournament_1_index[2]]["score"]] 
  winner_1 = tournament_1_index[np.argmax(tournament_1_index_scores)]

  tournament_2_index = np.random.randint(len(population_list), size = 3)
  tournament_2_index_scores = [population_list[tournament_2_index[0]]["score"], population_list[tournament_2_index[1]]["score"], population_list[tournament_2_index[2]]["score"]] 
  winner_2 = tournament_2_index[np.argmax(tournament_2_index_scores)]

  return [winner_1, winner_2]


def genetic_algo(X, y, param_dict = {}, p_c=1, p_m = 0.2, ini_pop=100, num_gen = 30, k_fold = 3, len_genotype = 15):
  population = np.random.randint(2, size=(ini_pop, len(param_dict.keys())*len_genotype))
  population_list = []
  parameters = OrderedDict()

  for i in param_dict.keys():
    parameters[i] = {"lower_bound" : param_dict[i][0], "upper_bound" : param_dict[i][1], "length" : len_genotype, "precision" : (param_dict[i][1] - param_dict[i][0])/((2**len_genotype)-1)}
  
  for i in population:
      pop_dict = {}
      pop_dict["gen"] = 0
      pop_dict["chromosome"] = i

      params = decode(i, parameters)
      score = svm_r2_score(X, y, params, 3)
      pop_dict["score"] = score

      population_list.append(pop_dict)
      
    

  for gen in range(1, num_gen+1):
    print("procesing gen no: ", gen)
    for pop_size in range(ini_pop//2):

      parent_index = select_parents(population_list)

      parent_1 = population_list[parent_index[0]]["chromosome"]
      parent_2 = population_list[parent_index[1]]["chromosome"]

      child_1 = []
      child_2 = []

      # CrossOver
      if np.random.rand() < p_c:
        split_index = np.random.randint(len_genotype, size = 2)
        min_i = min(split_index)
        max_i = max(split_index)

        child_1 = list(parent_1[:min_i]) + list(parent_2[min_i:max_i]) + list(parent_1[max_i:])
        child_2 = list(parent_2[:min_i]) + list(parent_1[min_i:max_i]) + list(parent_2[max_i:])
      else:
        child_1 = parent_1
        child_2 = parent_2

      # Mutation for child 1
      for i in range(len_genotype):
        if np.random.rand() < p_m:
          child_1[i] = 1 - child_1[i]      

      # Mutation for child 2
      for i in range(len_genotype):
        if np.random.rand() < p_m:
          child_2[i] = 1 - child_2[i] 

      child_1_params = decode(child_1, parameters)
      child_2_params = decode(child_2, parameters)   

      child_1_score = svm_r2_score(X, y, child_1_params, k_fold) 
      child_2_score = svm_r2_score(X, y, child_2_params, k_fold) 
    
      population_list = sorted(population_list, key=lambda k: k['score']) 

      if child_1_score > child_2_score:
        if child_1_score > population_list[1]["score"]:
          child_1_dict = {"gen":gen, "chromosome":child_1, "score": child_1_score}
          if child_2_score > population_list[0]["score"]:
            child_2_dict = {"gen":gen, "chromosome":child_2, "score": child_2_score}
            population_list[0] = child_2_dict
            population_list[1] = child_1_dict
          else:
            population_list[1] = child_1_dict  
        else:
          if child_1_score > population_list[0]["score"]:
            child_1_dict = {"gen":gen, "chromosome":child_1, "score": child_1_score}
            population_list[0] = child_1_dict

      elif child_2_score > child_1_score:
        if child_2_score > population_list[1]["score"]:
          child_2_dict = {"gen":gen, "chromosome":child_2, "score": child_2_score}
          if child_1_score > population_list[0]["score"]:
            child_1_dict = {"gen":gen, "chromosome":child_1, "score": child_1_score}
            population_list[0] = child_1_dict
            population_list[1] = child_2_dict
          else:
            population_list[1] = child_2_dict 

        else:
          if child_2_score > population_list[0]["score"]:
            child_2_dict = {"gen":gen, "chromosome":child_2, "score": child_2_score}
            population_list[0] = child_2_dict   

      else:
        if child_2_score > population_list[0]["score"]:
          child_2_dict = {"gen":gen, "chromosome":child_2, "score": child_2_score}
          population_list[0] = child_2_dict 

  population_list = sorted(population_list, key=lambda k: k['score']) 
  print(population_list[-1]) 
  print(decode(population_list[-1]["chromosome"], parameters))

In [25]:
genetic_algo(X= train_df, y = heating_load, param_dict = {"C":[10, 1000], "gamma":[0.05, 0.99]},p_c=1, p_m = 0.3, ini_pop=100, num_gen = 20, k_fold = 3, len_genotype = 20) 

procesing gen no:  1
procesing gen no:  2
procesing gen no:  3
procesing gen no:  4
procesing gen no:  5
procesing gen no:  6
procesing gen no:  7
procesing gen no:  8
procesing gen no:  9
procesing gen no:  10
procesing gen no:  11
procesing gen no:  12
procesing gen no:  13
procesing gen no:  14
procesing gen no:  15
procesing gen no:  16
procesing gen no:  17
procesing gen no:  18
procesing gen no:  19
procesing gen no:  20
{'gen': 12, 'chromosome': [0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 1, 1], 'score': 0.994428917594183}
OrderedDict([('C', 958.5749803304485), ('gamma', 0.053864616264930984)])


In [26]:
genetic_algo(X= train_df, y = cooling_load, param_dict = {"C":[10, 1000], "gamma":[0.05, 0.99]},p_c=1, p_m = 0.3, ini_pop=100, num_gen = 20, k_fold = 3, len_genotype = 20)  

procesing gen no:  1
procesing gen no:  2
procesing gen no:  3
procesing gen no:  4
procesing gen no:  5
procesing gen no:  6
procesing gen no:  7
procesing gen no:  8
procesing gen no:  9
procesing gen no:  10
procesing gen no:  11
procesing gen no:  12
procesing gen no:  13
procesing gen no:  14
procesing gen no:  15
procesing gen no:  16
procesing gen no:  17
procesing gen no:  18
procesing gen no:  19
procesing gen no:  20
{'gen': 9, 'chromosome': [0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 1], 'score': 0.9944384555511988}
OrderedDict([('C', 982.3276160503541), ('gamma', 0.05331867534511123)])


In [16]:
clf = MLPRegressor()
# clf.set_params()
score = cross_val_score(clf, train_df, heating_load, cv=3)
print(score.mean())

0.8206273023344589


In [18]:
clf = MLPRegressor()
# clf.set_params()
score = cross_val_score(clf, train_df, cooling_load, cv=3)
print(score.mean())

0.7919940924254135


In [0]:
def decode(chromosome, parameters):
  train_params = OrderedDict()
  #print(chromosome)
  it = 0
  reversed_chromosome = chromosome[::-1]
  #print(reversed_chromosome)
  for i in parameters.keys():
    chromos = reversed_chromosome[it:it+parameters[i]["length"]]
    #print(reversed_chromosome)
    #print(chromos)
    ssum = 0;

    for j in range(0, parameters[i]["length"]):
      ssum = ssum + chromos[j]*(2**j)

    ssum = ssum*parameters[i]["precision"] +  parameters[i]["lower_bound"]

    train_params[i] = ssum
    it  = it + parameters[i]["length"] 

  return  train_params

def mlp_r2_score(X, y, param, cv = 3):
    clf = MLPRegressor()

    if "hidden_layer_sizes" in param.keys():
      if "hidden_layer_num" in param.keys():
        param["hidden_layer_sizes"] = (int(param["hidden_layer_sizes"]),)*(int(param["hidden_layer_num"]))
        param.pop('hidden_layer_num', None)

    clf.set_params(**param)

    return cross_val_score(clf, train_df, heating_load, cv=3).mean()

def select_parents(population_list):
  tournament_1_index = np.random.randint(len(population_list), size = 3)
  tournament_1_index_scores = [population_list[tournament_1_index[0]]["score"], population_list[tournament_1_index[1]]["score"], population_list[tournament_1_index[2]]["score"]] 
  winner_1 = tournament_1_index[np.argmax(tournament_1_index_scores)]

  tournament_2_index = np.random.randint(len(population_list), size = 3)
  tournament_2_index_scores = [population_list[tournament_2_index[0]]["score"], population_list[tournament_2_index[1]]["score"], population_list[tournament_2_index[2]]["score"]] 
  winner_2 = tournament_2_index[np.argmax(tournament_2_index_scores)]

  return [winner_1, winner_2]


def genetic_algo_mlp(X, y, param_dict = {}, p_c=1, p_m = 0.2, ini_pop=100, num_gen = 30, k_fold = 3, len_genotype = 15):
  population = np.random.randint(2, size=(ini_pop, len(param_dict.keys())*len_genotype))
  population_list = []
  parameters = OrderedDict()

  for i in param_dict.keys():
    parameters[i] = {"lower_bound" : param_dict[i][0], "upper_bound" : param_dict[i][1], "length" : len_genotype, "precision" : (param_dict[i][1] - param_dict[i][0])/((2**len_genotype)-1)}
  
  for i in population:
      pop_dict = {}
      pop_dict["gen"] = 0
      pop_dict["chromosome"] = i

      params = decode(i, parameters)
      score = mlp_r2_score(X, y, params, 3)
      pop_dict["score"] = score

      population_list.append(pop_dict)
      
    

  for gen in range(1, num_gen+1):
    print("procesing gen no: ", gen)
    for pop_size in range(ini_pop//2):

      parent_index = select_parents(population_list)

      parent_1 = population_list[parent_index[0]]["chromosome"]
      parent_2 = population_list[parent_index[1]]["chromosome"]

      child_1 = []
      child_2 = []

      # CrossOver
      if np.random.rand() < p_c:
        split_index = np.random.randint(len_genotype, size = 2)
        min_i = min(split_index)
        max_i = max(split_index)

        child_1 = list(parent_1[:min_i]) + list(parent_2[min_i:max_i]) + list(parent_1[max_i:])
        child_2 = list(parent_2[:min_i]) + list(parent_1[min_i:max_i]) + list(parent_2[max_i:])
      else:
        child_1 = parent_1
        child_2 = parent_2

      # Mutation for child 1
      for i in range(len_genotype):
        if np.random.rand() < p_m:
          child_1[i] = 1 - child_1[i]      

      # Mutation for child 2
      for i in range(len_genotype):
        if np.random.rand() < p_m:
          child_2[i] = 1 - child_2[i] 

      child_1_params = decode(child_1, parameters)
      child_2_params = decode(child_2, parameters)   

      child_1_score = mlp_r2_score(X, y, child_1_params, k_fold) 
      child_2_score = mlp_r2_score(X, y, child_2_params, k_fold) 
    
      population_list = sorted(population_list, key=lambda k: k['score']) 

      if child_1_score > child_2_score:
        if child_1_score > population_list[1]["score"]:
          child_1_dict = {"gen":gen, "chromosome":child_1, "score": child_1_score}
          if child_2_score > population_list[0]["score"]:
            child_2_dict = {"gen":gen, "chromosome":child_2, "score": child_2_score}
            population_list[0] = child_2_dict
            population_list[1] = child_1_dict
          else:
            population_list[1] = child_1_dict  
        else:
          if child_1_score > population_list[0]["score"]:
            child_1_dict = {"gen":gen, "chromosome":child_1, "score": child_1_score}
            population_list[0] = child_1_dict

      elif child_2_score > child_1_score:
        if child_2_score > population_list[1]["score"]:
          child_2_dict = {"gen":gen, "chromosome":child_2, "score": child_2_score}
          if child_1_score > population_list[0]["score"]:
            child_1_dict = {"gen":gen, "chromosome":child_1, "score": child_1_score}
            population_list[0] = child_1_dict
            population_list[1] = child_2_dict
          else:
            population_list[1] = child_2_dict 

        else:
          if child_2_score > population_list[0]["score"]:
            child_2_dict = {"gen":gen, "chromosome":child_2, "score": child_2_score}
            population_list[0] = child_2_dict   

      else:
        if child_2_score > population_list[0]["score"]:
          child_2_dict = {"gen":gen, "chromosome":child_2, "score": child_2_score}
          population_list[0] = child_2_dict 

  population_list = sorted(population_list, key=lambda k: k['score']) 
  print(population_list[-1]) 
  print(decode(population_list[-1]["chromosome"], parameters))

In [28]:
genetic_algo_mlp(X= train_df, y = heating_load, param_dict = {"hidden_layer_num":[1, 3], "hidden_layer_sizes":[6, 10], "learning_rate_init":[0.001, 0.01], "momentum":[0.5, 0.9]},p_c=1, p_m = 0.3, ini_pop=100, num_gen = 20, k_fold = 3, len_genotype = 10) 

procesing gen no:  1
procesing gen no:  2
procesing gen no:  3
procesing gen no:  4
procesing gen no:  5
procesing gen no:  6
procesing gen no:  7
procesing gen no:  8
procesing gen no:  9
procesing gen no:  10
procesing gen no:  11
procesing gen no:  12
procesing gen no:  13
procesing gen no:  14
procesing gen no:  15
procesing gen no:  16
procesing gen no:  17
procesing gen no:  18
procesing gen no:  19
procesing gen no:  20
{'gen': 11, 'chromosome': [0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1], 'score': 0.8701267911926368}
OrderedDict([('hidden_layer_num', 2.843597262952102), ('hidden_layer_sizes', 9.49169110459433), ('learning_rate_init', 0.009612903225806452), ('momentum', 0.5406647116324536)])


In [29]:
genetic_algo_mlp(X= train_df, y = cooling_load, param_dict = {"hidden_layer_num":[1, 3], "hidden_layer_sizes":[6, 10], "learning_rate_init":[0.001, 0.01], "momentum":[0.5, 0.9]},p_c=1, p_m = 0.3, ini_pop=100, num_gen = 20, k_fold = 3, len_genotype = 10) 

procesing gen no:  1
procesing gen no:  2
procesing gen no:  3
procesing gen no:  4
procesing gen no:  5
procesing gen no:  6
procesing gen no:  7
procesing gen no:  8
procesing gen no:  9
procesing gen no:  10
procesing gen no:  11
procesing gen no:  12
procesing gen no:  13
procesing gen no:  14
procesing gen no:  15
procesing gen no:  16
procesing gen no:  17
procesing gen no:  18
procesing gen no:  19
procesing gen no:  20
{'gen': 12, 'chromosome': [1, 1, 1, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0], 'score': 0.8677256257097256}
OrderedDict([('hidden_layer_num', 2.2512218963831865), ('hidden_layer_sizes', 9.476050830889541), ('learning_rate_init', 0.008988269794721407), ('momentum', 0.8675464320625611)])
