# Dataset Preparation

## <a id="contents"></a>Contents
- [Overview](#overview)
- [37NTF-A](#37ntf-a)
    - [Part 1](#37ntf-a--part-1)
    - [Part 2](#37ntf-a--part-2)
- [37NTF-B](#37ntf-b)
    - [Part 1](#37ntf-b--part-1)
    - [Part 2](#37ntf-b--part-2)
- [37NTF-C](#37ntf-c)
    - [Part 1](#37ntf-c--part-1)
    - [Part 2](#37ntf-c--part-2)

---

## <a id="overview"></a>Overview

**Note:** Please refer to [the paper](../Paper/) for the definitions of variables.

- based on [IEEE 37-Node Test Feeder](https://site.ieee.org/pes-testfeeders/resources/) (37NTF)
- Each phase of a point load in 37NTF is treated as one consumer,
provided that it has a non-zero active power value.
    - For example, consider the load at node `742` which draws active powers of 8 kW (across phase A-B) and 85 kW (across phase B-C).
    These are encoded as consumers `742-1` and `742-2` with respective $L_{c}$'s of 8 kW and 85 kW.
    - 32 consumers in total
    - Consumer IDs and their respective power demands are available in `./Ported Load Values.xlsx` under the sheet named `IEEE 37-Node Test Feeder`.
- We assume that a consumer $c$'s appliance active power ratings add up to $L_{c}$.
- To assign power ratings for the $N_{c}$ appliances, we map the interval $[ 0,L_{c} ]$ unto $[0,1]$,
and sample $N_{c}-1$ points within $[0,1]$ according to a continuous uniform distribution $U [0.05,1)$.
This divides $[0,1]$ (as well as $[0,L_{c}]$) in to $N_{c}$ sub-intervals.
    - The length of a sub-interval in $[0,1]$ is the active power rating of an appliance expressed as
    a fraction of $L_{c}$.
    - In principle, the lengths of the equivalent sub-intervals in $[0,L_{c}]$ add up to $L_{c}$.
    - To account for numerical precision, appliance ratings are rounded to 5 decimal places,
    and the deficit between $L_{c}$ and the sum of the rounded ratings is added to the rating of a randomly chosen appliance.
    - All appliance ratings are in kW.
- Each appliance is assigned a priority level $p$ drawn from a discrete uniform distribution $U_{d} [1,P_{c}]$.
- We prepare three datasets that differ in terms of $N_{c}$ and $P_{c}$.
    - 37NTF-A
    - 37NTF-B
    - 37NTF-C

---

## <a id="37ntf-a"></a>37NTF-A

- all consumers have $N_{c} = 100$ appliances
- all consumers have $P_{c} = 5$ priority levels
- dataset saved in `./37NTF-A.xlsx`
    - Consumer IDs and their respective $L_{c}$, $N_{c}$, and $P_{c}$ can be accessed in the sheet named `Overview`.
    - For a consumer $c$, the ratings of appliances belonging to different priority levels are stored in a sheet named after that consumer's ID.

### <a id="37ntf-a--part-1"></a>Part 1

In [1]:
import matplotlib, numpy, openpyxl, pandas

In [2]:
P_c = 5
M_c_per_consumer = 100

In [3]:
# Raw ported values
IEEE_37NTF_df = pandas.read_excel("./Ported Load Values.xlsx", sheet_name="IEEE 37-Node Test Feeder")

In [4]:
# Remove kVAr info
IEEE_37NTF_df = IEEE_37NTF_df.drop(columns=["kVAr"])

In [5]:
# Get consumer kW
L_c = IEEE_37NTF_df["kW"].to_numpy(dtype=numpy.float32)

In [6]:
# No. of appliances in each consumer
M_c = numpy.zeros(L_c.shape) + M_c_per_consumer

IEEE_37NTF_df["No. of appliances"] = M_c.astype(numpy.int64)

In [7]:
# Add no. of priority levels for each consumer
IEEE_37NTF_df["No. of priority levels"] = numpy.array([P_c for i in range(len(IEEE_37NTF_df["Consumer ID"]))])

In [8]:
IEEE_37NTF_df.head()

Unnamed: 0,Consumer ID,kW,No. of appliances,No. of priority levels
0,701-1,140,100,5
1,701-2,140,100,5
2,701-3,350,100,5
3,712-3,85,100,5
4,713-3,85,100,5


In [9]:
# For record keeping (08 October 2020)
workbook_path = "./37NTF-A.xlsx"
writer = pandas.ExcelWriter(workbook_path, engine="openpyxl")

IEEE_37NTF_df.to_excel(writer, sheet_name="Overview", index=False)
writer.save()
writer.close()

### <a id="37ntf-a--part-2"></a>Part 2

In [1]:
import matplotlib, numpy, openpyxl, pandas
from matplotlib import pyplot

In [2]:
workbook_path = "./37NTF-A.xlsx"
workbook = openpyxl.load_workbook(workbook_path)
writer = pandas.ExcelWriter(workbook_path, engine="openpyxl")
writer.book = workbook

IEEE_37NTF_df = pandas.read_excel("./37NTF-A.xlsx", sheet_name="Overview")
IEEE_37NTF_df.head()

Unnamed: 0,Consumer ID,kW,No. of appliances,No. of priority levels
0,701-1,140,100,5
1,701-2,140,100,5
2,701-3,350,100,5
3,712-3,85,100,5
4,713-3,85,100,5


In [3]:
# Partition consumer load into appliance ratings
for i in IEEE_37NTF_df.index:
    rand_points = numpy.random.uniform(0.05, 1.0, size=(IEEE_37NTF_df["No. of appliances"][i]-1,))
    # Just to make sure that no values are repeated
    while not (rand_points.shape[0] == numpy.unique(rand_points).shape[0]):
        rand_points = numpy.random.uniform(0.05, 1.0, size=(IEEE_37NTF_df["No. of appliances"][i]-1,))
    
    rand_points.sort()
    rand_points = numpy.append(numpy.array([0.0]), rand_points)
    rand_points = numpy.append(rand_points, numpy.array([1.0]))
    
    # Appliance ratings
    d_c = numpy.array([rand_points[i] - rand_points[i-1] for i in range(1,IEEE_37NTF_df["No. of appliances"][i]+1)])
    d_c = d_c * IEEE_37NTF_df["kW"][i]
    
    # Round appliance ratings to 5 decimal places
    d_c = numpy.around(d_c, decimals=5)
        
    # Just to make sure the sum of all appliance ratings equals the consumer rated load up to numerical precision
    deficit = float(IEEE_37NTF_df["kW"][i]) - numpy.sum(d_c)
    index_slack_appliance = numpy.random.randint(0, IEEE_37NTF_df["No. of appliances"][i])
    d_c[index_slack_appliance] += deficit
    
    # Assign priority levels to the appliances
    priority_levels = numpy.random.randint(1, IEEE_37NTF_df["No. of priority levels"][i]+1, size=d_c.shape)
    
    # Group appliances according to priority levels
    prioritized_d_c = []
    for p in range(1, IEEE_37NTF_df["No. of priority levels"][i]+1):
        priority_level_masks = priority_levels == p
        prioritized_d_c.append(d_c[priority_level_masks])
    
    # Construct dedicated DataFrame for a consumer
    consumer_df = pandas.concat([
        pandas.DataFrame({"Priority level "+str(p) : prioritized_d_c[p-1]}) \
        for p in range(1, IEEE_37NTF_df["No. of priority levels"][i]+1)
    ], axis=1)
    
    # Export consumer DataFrame as a dedicated sheet in spreadsheet
    consumer_df.to_excel(writer, sheet_name=IEEE_37NTF_df["Consumer ID"][i], index=False)

In [4]:
# For record keeping (08 October 2020)

writer.save()
writer.close()

---

## <a id="37ntf-b"></a>37NTF-B

- all consumers have $P_{c} = 5$ priority levels
- The consumer with the largest active power demand (say, $\hat{L}_{c}$) has the most number of appliances, $\hat{N}_{c}=100$. For all other consumers, the number of appliances is given by
$$ N_{c} = \text{floor} \left( \frac{L_{c}}{\hat{L}_{c}} \hat{N}_{c} \right) + U_{d} [10,20]$$
where $\text{floor}$ is the floor function.
- dataset saved in `./37NTF-B.xlsx`
    - Consumer IDs and their respective $L_{c}$, $N_{c}$, and $P_{c}$ can be accessed in the sheet named `Overview`.
    - For a consumer $c$, the ratings of appliances belonging to different priority levels are stored in a sheet named after that consumer's ID.

### <a id="37ntf-b--part-1"></a>Part 1

In [1]:
import matplotlib, numpy, openpyxl, pandas

In [2]:
M_max = 100
P_c = 5

In [3]:
# Raw ported values
IEEE_37NTF_df = pandas.read_excel("./Ported Load Values.xlsx", sheet_name="IEEE 37-Node Test Feeder")

In [4]:
# Remove kVAr info
IEEE_37NTF_df = IEEE_37NTF_df.drop(columns=["kVAr"])

In [5]:
# Get consumer kW
L_c = IEEE_37NTF_df["kW"].to_numpy(dtype=numpy.float32)
L_max, index_L_max = L_c.max(), L_c.argmax()

In [6]:
# No. of appliances in each consumer
M_c = numpy.floor(M_max * L_c / L_max).astype(numpy.int64) + numpy.random.randint(10,20,L_c.shape)
M_c[index_L_max] = M_max

IEEE_37NTF_df["No. of appliances"] = M_c

In [7]:
# Add no. of priority levels for each consumer
IEEE_37NTF_df["No. of priority levels"] = numpy.array([P_c for i in range(len(IEEE_37NTF_df["Consumer ID"]))])

In [8]:
IEEE_37NTF_df.head()

Unnamed: 0,Consumer ID,kW,No. of appliances,No. of priority levels
0,701-1,140,53,5
1,701-2,140,58,5
2,701-3,350,100,5
3,712-3,85,42,5
4,713-3,85,39,5


In [9]:
# For record keeping (08 October 2020)
workbook_path = "./37NTF-B.xlsx"
writer = pandas.ExcelWriter(workbook_path, engine="openpyxl")

IEEE_37NTF_df.to_excel(writer, sheet_name="Overview", index=False)
writer.save()
writer.close()

### <a id="37ntf-b--part-2"></a>Part 2

In [1]:
import matplotlib, numpy, openpyxl, pandas
from matplotlib import pyplot

In [2]:
workbook_path = "./37NTF-B.xlsx"
workbook = openpyxl.load_workbook(workbook_path)
writer = pandas.ExcelWriter(workbook_path, engine="openpyxl")
writer.book = workbook

IEEE_37NTF_df = pandas.read_excel("./37NTF-B.xlsx", sheet_name="Overview")
IEEE_37NTF_df.head()

Unnamed: 0,Consumer ID,kW,No. of appliances,No. of priority levels
0,701-1,140,53,5
1,701-2,140,58,5
2,701-3,350,100,5
3,712-3,85,42,5
4,713-3,85,39,5


In [3]:
# Partition consumer load into appliance ratings
for i in IEEE_37NTF_df.index:
    rand_points = numpy.random.uniform(0.05, 1.0, size=(IEEE_37NTF_df["No. of appliances"][i]-1,))
    # Just to make sure that no values are repeated
    while not (rand_points.shape[0] == numpy.unique(rand_points).shape[0]):
        rand_points = numpy.random.uniform(0.05, 1.0, size=(IEEE_37NTF_df["No. of appliances"][i]-1,))
    
    rand_points.sort()
    rand_points = numpy.append(numpy.array([0.0]), rand_points)
    rand_points = numpy.append(rand_points, numpy.array([1.0]))
    
    # Appliance ratings
    d_c = numpy.array([rand_points[i] - rand_points[i-1] for i in range(1,IEEE_37NTF_df["No. of appliances"][i]+1)])
    d_c = d_c * IEEE_37NTF_df["kW"][i]
    
    # Round appliance ratings to 5 decimal places
    d_c = numpy.around(d_c, decimals=5)
        
    # Just to make sure the sum of all appliance ratings equals the consumer rated load up to numerical precision
    deficit = float(IEEE_37NTF_df["kW"][i]) - numpy.sum(d_c)
    index_slack_appliance = numpy.random.randint(0, IEEE_37NTF_df["No. of appliances"][i])
    d_c[index_slack_appliance] += deficit
    
    # Assign priority levels to the appliances
    priority_levels = numpy.random.randint(1, IEEE_37NTF_df["No. of priority levels"][i]+1, size=d_c.shape)
    
    # Group appliances according to priority levels
    prioritized_d_c = []
    for p in range(1, IEEE_37NTF_df["No. of priority levels"][i]+1):
        priority_level_masks = priority_levels == p
        prioritized_d_c.append(d_c[priority_level_masks])
    
    # Construct dedicated DataFrame for a consumer
    consumer_df = pandas.concat([
        pandas.DataFrame({"Priority level "+str(p) : prioritized_d_c[p-1]}) \
        for p in range(1, IEEE_37NTF_df["No. of priority levels"][i]+1)
    ], axis=1)
    
    # Export consumer DataFrame as a dedicated sheet in spreadsheet
    consumer_df.to_excel(writer, sheet_name=IEEE_37NTF_df["Consumer ID"][i], index=False)

In [4]:
# For record keeping (08 October 2020)

writer.save()
writer.close()

---

## <a id="37ntf-c"></a>37NTF-C

- all consumers have $N_{c} = 100$ appliances
- For a consumer $c$, $P_{c}$ is drawn from $U_{d} [1,5]$.
- dataset saved in `./37NTF-C.xlsx`
    - Consumer IDs and their respective $L_{c}$, $N_{c}$, and $P_{c}$ can be accessed in the sheet named `Overview`.
    - For a consumer $c$, the ratings of appliances belonging to different priority levels are stored in a sheet named after that consumer's ID.

### <a id="37ntf-c--part-1"></a>Part 1

In [1]:
import matplotlib, numpy, openpyxl, pandas

In [2]:
M_c_per_consumer = 100

In [3]:
# Raw ported values
IEEE_37NTF_df = pandas.read_excel("./Ported Load Values.xlsx", sheet_name="IEEE 37-Node Test Feeder")

In [4]:
# Remove kVAr info
IEEE_37NTF_df = IEEE_37NTF_df.drop(columns=["kVAr"])

In [5]:
# Get consumer kW
L_c = IEEE_37NTF_df["kW"].to_numpy(dtype=numpy.float32)

In [6]:
# No. of appliances in each consumer
M_c = numpy.zeros(L_c.shape) + M_c_per_consumer

IEEE_37NTF_df["No. of appliances"] = M_c.astype(numpy.int64)

In [7]:
# Add no. of priority levels for each consumer
IEEE_37NTF_df["No. of priority levels"] = numpy.random.randint(1,6,(len(IEEE_37NTF_df["Consumer ID"]),))

In [8]:
IEEE_37NTF_df.head()

Unnamed: 0,Consumer ID,kW,No. of appliances,No. of priority levels
0,701-1,140,100,2
1,701-2,140,100,2
2,701-3,350,100,5
3,712-3,85,100,2
4,713-3,85,100,1


In [9]:
# For record keeping (11 October 2020)
workbook_path = "./37NTF-C.xlsx"
writer = pandas.ExcelWriter(workbook_path, engine="openpyxl")

IEEE_37NTF_df.to_excel(writer, sheet_name="Overview", index=False)
writer.save()
writer.close()

### <a id="37ntf-c--part-2"></a>Part 2

In [1]:
import matplotlib, numpy, openpyxl, pandas
from matplotlib import pyplot

In [2]:
workbook_path = "./37NTF-C.xlsx"
workbook = openpyxl.load_workbook(workbook_path)
writer = pandas.ExcelWriter(workbook_path, engine="openpyxl")
writer.book = workbook

IEEE_37NTF_df = pandas.read_excel("./37NTF-C.xlsx", sheet_name="Overview")
IEEE_37NTF_df.head()

Unnamed: 0,Consumer ID,kW,No. of appliances,No. of priority levels
0,701-1,140,100,2
1,701-2,140,100,2
2,701-3,350,100,5
3,712-3,85,100,2
4,713-3,85,100,1


In [3]:
# Partition consumer load into appliance ratings
for i in IEEE_37NTF_df.index:
    rand_points = numpy.random.uniform(0.05, 1.0, size=(IEEE_37NTF_df["No. of appliances"][i]-1,))
    # Just to make sure that no values are repeated
    while not (rand_points.shape[0] == numpy.unique(rand_points).shape[0]):
        rand_points = numpy.random.uniform(0.05, 1.0, size=(IEEE_37NTF_df["No. of appliances"][i]-1,))
    
    rand_points.sort()
    rand_points = numpy.append(numpy.array([0.0]), rand_points)
    rand_points = numpy.append(rand_points, numpy.array([1.0]))
    
    # Appliance ratings
    d_c = numpy.array([rand_points[i] - rand_points[i-1] for i in range(1,IEEE_37NTF_df["No. of appliances"][i]+1)])
    d_c = d_c * IEEE_37NTF_df["kW"][i]
    
    # Round appliance ratings to 5 decimal places
    d_c = numpy.around(d_c, decimals=5)
        
    # Just to make sure the sum of all appliance ratings equals the consumer rated load up to numerical precision
    deficit = float(IEEE_37NTF_df["kW"][i]) - numpy.sum(d_c)
    index_slack_appliance = numpy.random.randint(0, IEEE_37NTF_df["No. of appliances"][i])
    d_c[index_slack_appliance] += deficit
    
    # Assign priority levels to the appliances
    priority_levels = numpy.random.randint(1, IEEE_37NTF_df["No. of priority levels"][i]+1, size=d_c.shape)
    
    # Group appliances according to priority levels
    prioritized_d_c = []
    for p in range(1, IEEE_37NTF_df["No. of priority levels"][i]+1):
        priority_level_masks = priority_levels == p
        prioritized_d_c.append(d_c[priority_level_masks])
    
    # Construct dedicated DataFrame for a consumer
    consumer_df = pandas.concat([
        pandas.DataFrame({"Priority level "+str(p) : prioritized_d_c[p-1]}) \
        for p in range(1, IEEE_37NTF_df["No. of priority levels"][i]+1)
    ], axis=1)
    
    # Export consumer DataFrame as a dedicated sheet in spreadsheet
    consumer_df.to_excel(writer, sheet_name=IEEE_37NTF_df["Consumer ID"][i], index=False)

In [4]:
# For record keeping (11 October 2020)

writer.save()
writer.close()

---