# Exercise 7B - Latin Hypercube Sampling

In this exercise, we will construct a simple latin hypercube sampling algorithm as a group using pseudo-code. Then we will learn how to implement this for the coursework.

### Colour codes

<span style="color:orange;"> Orange text is for emphasis and definitions </span>

<span style="color:lime;"> Green text is for tasks to be completed by the student </span>

<span style="color:dodgerblue;"> Blue text is for Python coding tricks and references </span>

## Load all the necessary Python packages
All packages should work with Conda environment if installed on your machine. Otherwise all necessary packages can be installed in a virtual environment (.venv) in VS Code using: Ctrl+Shift+P > Python: Create Environment > Venv > Python 3.12.x > requirements.txt

<span style="color:orange;"> NOTE: that we are using the **scikit-optimize (skopt)** package will be used. You may need to install this package using pip. </span>

In [None]:
import json
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import pandas as pd
import random

from skopt.sampler import Lhs
from skopt.space.space import Categorical, Integer, Real

## 1. Latin Hypercube Algorithm in Pseudo-code

As a group, we will construct a latin hypercube algorithm step-by-step on the board. We will be doing this in Python pseudo-code, meaning that we will be omitting some of the finer details which will make the code functional. <span style = "color:lime;">As an **optional** exercise, you can translate the pseudo-code into actual Python in the space below</span>

In this exercise we will design pseudocode for a set of parameters in the form:

$$
\
params = 
\begin{cases}
    a =  & \begin{cases}
        min = 0\\
        max = 1\\
\end{cases}\\
    b =  & \begin{cases}
        min = 10\\
        max = 50\\
\end{cases}\\
\end{cases}
\
$$




In [None]:
# Parameters to be used
params = {
    "a" : {
        "min" : 0,
        "max" : 1,
    },
    "b" : {
        "min" : 10,
        "max" : 50,
    },
}

# Construct your latin hypercube code here


## 2. Latin Hypercube with Scikit-optimize

Using the scikit-optimize package, generate a sample for the multi-objective problems with both integer, real (float), and categorical variables as used in the coursework.

The scikit-optimize (skopt) needs to be imported with its *sampler.lhs* and *space.space* methods (see the imports cell). 

## 2.1 Setting-up a Simple Example

First, generate an instance of the latin hypercube sampling model. This contains all of the instructions needed for generating the samples. The samples are generated at a later stage.
The options for this function are:
**lhs_type**
* classic : a value randomly selected in each interval
* centered : points are centered in each interval - similar to grid sampling
**criterion**
* None : the 'basic' method shown in the lecture
* maximin : maximize the minimal distance between points
* correlation : minimize the correlation
* ratio : minimize the ratio between the maximum distance between points and the minimum distance

Here we will use the *classic lhs_type* and the *None/basic* criterion and by using 10 samples.

In [None]:
n_samples = 10

# Create the latin hypercube model instance and generate some values
lhs = Lhs(lhs_type = "classic", criterion = None)


## 2.2. Variable Types

### 2.2.1 Variable 'a'

Let's first tey generating a latin hypercube sampling with the parameter *a* from above. 

<span style = "color:lime;">Before you begin, What do you expect the outcomes to be?</span>

In [None]:
# The lhs.generate method expects a list of lists or tuples to be passed to it. In each tuple should be the min and max of that range
lhs_instructions = [
    (params["a"]["min"], params["a"]["max"])
]

#Generate the samples
lhs_values = lhs.generate(lhs_instructions, n_samples)

print (lhs_values)

This might not be what you expected. It was passed two integer values, so it returned only integer values between 0 and 1.

We can address this by *wrapping* the tuple in one skopt *Dimension* objects of Integer, Real, or Categorical.


In [None]:
lhs_instructions = [
    Real(params["a"]["min"], params["a"]["max"])
]

#Generate the samples
lhs_values = lhs.generate(lhs_instructions, n_samples)

print (lhs_values)

### 2.2.2 Categoricals

Similarly we can treat variables as categoricals. We will use *b* as an example. <span style = "color:lime;">What do you expect the outcomes to be?</span>

In [None]:
# NOTE: For categoricals, we past a list of values
lhs_instructions = [
    Categorical(params["b"].values())
]

#Generate the samples
lhs_values = lhs.generate(lhs_instructions, n_samples)

print (lhs_values)

### 2.2.3 Multi-variable

Assuming we want *a* to be a real numbers and *b* to be integers we can generate the full sampling.

In [None]:
lhs_instructions = [
    Real(params["a"]["min"], params["a"]["max"]),
    Integer(params["b"]["min"], params["b"]["max"]),
]

#Generate the samples
lhs_values = lhs.generate(lhs_instructions, n_samples)

# Convert the values into a dataframe
lhs_values = pd.DataFrame(lhs_values, columns = params.keys())

print (lhs_values)


Visualizing the results

In [None]:
fig, ax = plt.subplots()

ax.scatter (lhs_values["a"], lhs_values["b"])

ax.set_xlabel("b")
ax.set_ylabel("a")

# Format the x and y axis ticks
x_ticks = np.arange(0, 1.01, 0.1)
y_ticks = np.arange(10, 51, 5)
ax.set_xticks (x_ticks)
ax.set_yticks (y_ticks)

ax.set_xlim(-0.02, 1.02) # Can leave a small buffer (of 1/5) on either side of the x-axis so dots are displayed entirely
ax.set_ylim(9, 51)

# Add a grid to help us interpolate the values on the plot
ax.grid(linestyle = ":", color = "gray", linewidth = 0.4)

plt.show()

## 3. Latin Hypercube Sampling with EnergyPlus Parameters
In this part of the exercise, we will set up a batch simulation with EnergyPlus based on latin Hypercube sampling.


### 3.1 Parameters File
Recall from Exercise 7A on full-factorial design, that we had a dictionary of EnergyPlus parameters with a list of discrete values for each parameter. A different set-up is required for latin hypercube sampling because we need to provide instructions on whether variables are integers, floats or categoricals.

I have prepared a parameters file which can be used as a template for this exercise and the coursework. It is a json file located in the */simulationParameters* directory. Storing these in a separate file is useful as it allows to swap and reuse different input parameters easily.

In [None]:
# The parameters file to be used as part of this simulation
parameters_file = "Exercise 7B.json"

parameters_file_path = Path("simulationParameters", parameters_file)

if not parameters_file_path.exists():
    raise Exception (f"Could not find the parameters_file at {parameters_file_path}.")

with open (parameters_file_path) as f:
    parameters = json.load(f)


The json file loads as a Python dictionary which is what we want.

In this example we have <span sytle = "color:yellow;">14 </span> simulation parameters which are being modified. They are:

In [None]:
print (f"Name                      TYPE        VALUES")
for k,v in parameters.items():
    print (f"{k:<26}{v['type']:<12}{v['values']}")

There are four types of variables that can be chosen:
* constant: A single value that we want fixed for the run
* float: This a two value list of values denoting the minimum and maximum range of values to be sampled from. The values sampled will be floats (decimals)
* int: This a two value list of values denoting the minimum and maximum range of values to be sampled from. The values sampled will be always be integers, even if floats are given in the list.
* categorical: This is a list of any length denoting a limited number of values from which to be sampled from

<span style = "color:orange;"> It is expected that you will modify these values for your coursework. So it is advised that you become familiar with how these work. There is a file called *simulationParameters.md* which explains each parameter. </span>

* In the above example the building has a constant length of 10m, a width of between 8m (integer values only), a height between 3 and 4m. Keeping these as constants makes it really straightforward to change the building's geometry. 
* The model uses continuous values for windows parameters, and infiltration rates
* The model uses categorical values for insulation in 25 mm (1 inch) increments. <span style = "color:lime;"> Why did I set up the insulation this way? </span> (NOTE: that in EnergyPlus, we can not give an insulation thickness of zero. Instead we give a very small value so as to have minimal impacts on the simulation)
* A value of 99 is given for the cooling setpoint. This is a trick to turn the model from one with active cooling to one with natural ventilation. This means the cooling system will turn on at 99&deg;C ie, never. If you need to model a system with active cooling you can set this value to an appropriate value between 24 and 28 &deg;C


### 3.2 Generating Latin Hypercube Samples

We need to follow a similar procedure to that of Section 2.2. There are some additional steps we need to take:
* We have a lot more parameters to handle in the lhs_instructions list
* We need to read in whether we want a constant, integer, float, or categorical sampling for each parameter

A function has been developed to handle these additional complexities. It is located in *src/sampling* and called *latinHypercubeSampling*. You should notice it looks similar to what we had done before, but you will notice if statements on the type of variable we are working with.

In [None]:
from src.sampling import latinHypercubeSampling

Creating a list of combinations is as simple as passing the parameters dictionary to the latinHypercubeSampling function. Let's create a sample of 100 samples

In [None]:
n_simulations = 1000

# Create a list of combinations of parameters stored in a dataframe format
combinations = latinHypercubeSampling(parameters, n_simulations, lhs_type = "classic", criterion = None)
print (combinations.head(10))


Now we will save the resulting dataframe

In [None]:
saveName = "Exercise_7C"
# Save the combinations as a csv
savePath = Path("outputs", "combinations", f"combinations_{saveName}.csv")
combinations.to_csv(savePath)
print (f"Combinations file saved to {savePath}.")

### 3.3 Analysis

It is challenging to visualize the results of a problem with so many dimensions. But we can compare two variables directly. I have chosen the a real value (ach_50) and a categorical (wallInsulationThickness) to analyse the latin hypercube distribution.

In [None]:
fig, ax = plt.subplots()

ax.scatter (combinations["ach_50"], combinations["slabInsulationThickness"])

ax.set_xlabel("ach_50")
ax.set_ylabel("wallInsulationThickness")

# Format the x and y axis ticks
x_ticks = np.arange(1, 10.1, 1)
y_ticks = np.arange(0, 0.101, 0.025)
ax.set_xticks (x_ticks)
ax.set_yticks (y_ticks)

ax.set_xlim(0.8, 10.2) # Can leave a small buffer (of 1/5) on either side of the x-axis so dots are displayed entirely
ax.set_ylim(-0.004, 0.104)

# Add a grid to help us interpolate the values on the plot
ax.grid(linestyle = ":", color = "gray", linewidth = 0.4)

plt.show()

Results will vary. The continuous values should be evenly distributed along the x-axis. The categorical values should will be isolated to their given discrete values. They should be evenly distributed among the 5 values. Check by getting the value counts of each slabInsulation item:

In [None]:
combinations["slabInsulationThickness"].value_counts()

<span style="color:lime;"> Is this the expected result and why? </span>

## 4. Summary

* You will have learned to generate multi-variable latin hypercube sampling of integer, float, and categorical values.
* This sampling can be used to generate the combinations file that can be used for the weighted sum method of your coursework.
* Remember that the simulation parameters file used here is for demonstration purposes. It is up to you to review whether the value ranges and variable types are appropriate.