# Dataset Generation Parameters Glossary
**THIS IS A GLOSSARY MEANT TO BE A REFERENCE, THE CODE CELLS ARE NOT MEANT TO BE EXECUTED**

Each sample to be fed to the Neural Network model is comprised of three elements, each contained in its own individual file:

- **Graph topology**: Represents a graph topology, including the nodes and edges that forms it as well as characterstics of each.
- **Routing file**: Shows the recognized paths between each node witin the graph topology.
- **Traffic matrix (TM)**: Represents traffic flows going through a given network.

Each sample will be identified by a tuple of these three elements. This means we can generate multiple samples from the same graph topology if it's paired with different traffic matrices, for example.

In this notebook, we will show how to generate these files, and how their properties can be altered in order to generate different varying samples. Note that for a quick summary of all the constraints while generating the training dataset can be found at the [training_dataset_constraints](training_dataset_constraints.md) markdown file. 

In [1]:
import networkx as nx
import random

In [2]:
# Generate, for instance, a complete graph
G = nx.complete_graph(10)

# Assign bandwidth to each edge of the graph. Its value is considered in bps.
for (n0,n1) in G.edges():
    G[n0][n1]["bandwidth"] = 100000

Each node is defined by two characteristics:
- **Scheduling policy**. The order in which packets are served in an output port is based on the state of queues and the configured queue scheduling policy. We consider the following four policies:
    - *First In First Out (FIFO)*: shared single queue for all packets, indepedently of ToSs.
    - *Strict Priority (SP)*: one queue for each ToS (total of 3) were packets in queues with more priority are transmitted first.
    - *Weighted Fair Queueing (WFQ)*: one queue for each ToS (total of 3). Each queue is assigned a weight by the configuration. *The sum of weights must equal 100*. Each time the policy chooses a queue according to its weight plus the data rate of the queue to achieve fairness.
    - *Deficit Round Robin  (DRR)*: one queue for each ToS (total of 3). Each queue is assigned a weight by the configuration. *The sum of weights must equal 100*. The policy will cycle through the queues. The amount of time dedicated to each queue is proportional to its weight.
- **Buffer size**: the size of the buffer at the output ports of nodes, where packets are stored before they are processed. When a packet is received and its outgoing queue is full, the packet is dropped. The buffer size is computed in bits and its value should be between 8000 and 64000 bits.

We must define these two characterstics on all nodes of the topology. The scheduling policy of a node is stored in the attribute ```schedulingPolicy``` as a string. This is shown as follows:


In [3]:
# Let's configure all the nodes with a FIFO policy
for node in G:
    G.nodes[node]["schedulingPolicy"] = "FIFO"

In [None]:
# Let's configure all the nodes with a SP policy
for node in G:
    G.nodes[node]["schedulingPolicy"] = "SP"

In case of the WFQ and DRR policies, where we will also need to specify the weights of each queue, we will also need to define the attribute ```schedulingWeights```. To do so, we will feed it a string that contains the weights for the queue dealing with ToS 0, 1, and 2, respectively, separated by commas:

In [8]:
# Let's configure all the nodes with a WFQ policy
for node in G:
    G.nodes[node]["schedulingPolicy"] = "WFQ"
    G.nodes[node]["schedulingWeights"] = "45, 30, 25"

In [None]:
# Let's configure all the nodes with a DRR policy
for node in G:
    G.nodes[node]["schedulingPolicy"] = "DRR"
    G.nodes[node]["schedulingWeights"] = "45, 30, 25"

To configure the buffer size we will only need to modify the attribute ```bufferSizes```, including the size of the queue in bits. As a reminder, its value should be between 8000 and 64000 bits

In [4]:
# Assign to each node a queue size of 32000 bits
for node in G:
    G.nodes[node]["bufferSizes"] = 32000

In [5]:
# Finally we save the topology
graph_file = "graph.txt"
nx.write_gml(G,graph_file)

## Routing
The routing is expressed as a text file where each line represents a path as a sequence of nodes.
Destination base and source destination base routing can be used but they should not contain loops.

In [6]:
# For instance, we can use networkx to calculate the shortest path routing for each src-dst pair.
with open("routing.txt","w") as r_fd:
    lPaths = nx.shortest_path(G)
    for src in G:
        for dst in G:
            if (src == dst):
                continue
            path =  ','.join(str(x) for x in lPaths[src][dst])
            r_fd.write(path+"\n")
        

## Traffic Matrix
The final step is to generate the traffic matrix (TM). Each line of the TM file describes one traffic flow between two nodes. These lines are formed by a set of parameters separated by commas as follows:

```source, destination, avg_bw, time_distribution, [off_time, on_time,] pkt_dist, pkt_size_1, prob_1, [pkt_size_2, prob_2, [pkt_size_3, prob_3, [pkt_size_4, prob_4, [pkt_size_5, prob_5,]]]] tos```

Here the brackets indicate optional parameters. 

The ```source``` and ```destination``` parameters indicate the source and destination nodes for the given flow. Note that only one flow is allowed per source-destination pair in the input topology.

The ```avg_bw``` parameter indicates the average bandwidth, in bps, to be generated for this flow. Its value is limited between 10 and 10000 bps.

The next sets of parameters we'll discuss are ```pkt_dist```, ```pkt_size_n``` and ```prob_n```. These parameters are used to indicate the possible sizes of the packets and their relative frequency within the flow. ```pkt_dist``` specifically notes the distribution type used to generate the packets. **Note: currently only one distribution is supported, so the value of ```pkt_dist``` should aways be ```0```.**

Then, the ```pkt_size_n``` and ```prob_n``` properties are used to indicate a packet size, in bits, and its relative probability with respect to the other sizes. At least one packet size must be declared, but we can define up to 5 different sizes. The packet size should be a value between 256 and 2000 bits while the sum of all the ```prob_n``` values should equal 1.

The ```time_distribution``` parameter indicates how often packets should be generated over time. We support three time distributions:
- *Poisson* (```time_distribution```=0): packets are generated following a Poisson distribution
- *CBR* (```time_distribution```=1): packets are generated following a Continous Bit Rate model
- *ON-OFF* (```time_distribution```=2): packets are generated following periods of activity and inactivity

We do *NOT* need to define the parameters that define Poisson and CBR distributions, as the packets will be generated considering the chosen packet size distribution and average bandwith parameters from earlier. In the case of using the ON-OFF distribution we will need to define the length of the activity and inactivity periods (```on_time``` and ```off_time``` respectively).

Finally, ```tos``` indicates the ToS assigned to the packets generated for this flow, with values of 0, 1 or 2.


In [7]:
"""
Example: this code will generate flows between all nodes in the graph, such as:
- The average bandwidth is randomized between 10 and 10000 bps
- An ON-OFF time distribution is used, with an on_time of 5 s and an off_time of 10 s
- Packets can have two possible sizes, 300 and 1700 bits, both equally probable
- The ToS for all flows is 0 (high priority)
"""
with open("traffic.txt","w") as tm_fd:
    for src in G:
        for dst in G:
            avg_bw = random.randint(10,10000)
            time_dist = 2
            on_time = 5
            off_time = 10
            pkt_size_1 = 300
            prob_1 = 0.5
            pkt_size_2 = 1700
            prob_2 = 0.5
            tos = 0
            traffic_line = "{},{},{},{},{},{},0,{},{},{},{},{}".format(
                src,dst,avg_bw,time_dist,off_time,on_time,pkt_size_1,
                prob_1,pkt_size_2,prob_2,tos)
            tm_fd.write(traffic_line+"\n")
