# Surrogate models

A surrogate model is an engineering method used when an outcome of interest cannot be easily directly measured, so a model of the outcome is used instead. A lot of problems require experiments and/or simulations to evaluate de the quality of the proposed solution.  For many real-world problems, however, a single simulation can take many minutes, hours, or even days to complete. As a result, routine tasks such as optimization, become impossible since they require thousands or even millions of simulation evaluations.

One way of alleviating this burden is by constructing approximation models, known as surrogate models, that mimic the behavior of the simulation model as closely as possible while being computationally cheap(er) to evaluate. Surrogate models are constructed using a data-driven, bottom-up approach. The exact, inner working of the simulation code is not assumed to be known (or even understood), solely the input-output behavior is important. A model is constructed based on modeling the response of the simulator to a limited number of intelligently chosen data points. 

## Dataset: Traffic Light Scheduling Problem

Our surrogate model will replace the utilization of SUMO to calculate the quality of the traffic light plan. In this case, a traffic light plan consists of 190 phases (from around 45 main intersections in Málaga). We have generated around 29000 different plans and get their fitness using SUMO. These samples are the dataset used to train and test our surrogate model.

In the next code, we provide a function to load and prepare the dataset for our experiments. It requires that the dataset file is in "Colab Notebook" directory of your drive. In the Campus Virtual you can find the dataset file. You have to download this file and upload it in the appropriate directory in your google drive.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
from keras.models import Sequential
from keras.layers import Dense

from google.colab import drive
import math
import random

# For using our drive
drive.mount('/content/drive')
tf.random.set_seed(100)

def load_data():
  # Load CSV file from your oun google drive (you need to upload the file to your drive. The file is available in https://drive.google.com/file/d/1bPoWQjVkbjrEQSrR7qm_f7469oQy1qIC/view?usp=drive_link)
  data = pd.read_csv('/content/drive/MyDrive/Colab Notebooks/tlsp.csv')

  # Create Matrix of Independent Variables
  X = data.drop([' Fitness'], axis=1)
  # Create Vector of Dependent Variable
  y = data[' Fitness']
  # Create a Train Test Split for Genetic Optimization
  X_train, X_test, y_train, y_test = train_test_split(X, y)
  
  return X_train, X_test, y_train, y_test

# testing if data is correctly loaded
X_train, X_test, y_train, y_test = load_data()
print(y_train)

## Surrogate model: Artificial Neural Network

Create an Artificial Neural Network as surrogate model. The features of this ANN are:
* The number of inputs is 190 float numbers.
* It has two hidden (dense) layers: the first with 285 neurons and the second with 190 neurons. Both layers use the ReLU activation function.
* The output (dense) layer has a single neuron with no activation function.
* For training, it uses the `mean_squared_error` as loss function, `adam` as optimizer, and `mse` as metrics.

Complete the following code.

In [None]:
# Build the ANN
def get_model():
  # Create a Sequential Model with 3 layers as mentioned previously
  model = None
  return model

# Standard Backpropagation optimization method
def compile_network(model, epochs, X, y):
  # Compile the model with the previous parameters
  # Fit the model with the training data
  pass

# Get error
def get_error(model, X, y):
  scores = model.evaluate(X,y)
  return scores[1]

# Testing  ANN
# Get the model
# Compile the model (train the model) with 20 epochs and the training data
# Get the error of the model with the testing data
# Print the error
