## Example solution Walkthrough

The following example will take you through the steps to generate a dataset for your network simulation. We recommend using this as a template for your own implementation and customizing it to suit your specific needs.

### Dataset Generation
To generate the dataset, we will first need to define the graph topology, routing paths, and traffic matrix for each sample. These parameters will be used by the simulator to calculate the delay, jitter, and drops for each path.

To begin, we will define the graph topology, including the nodes and edges that make up the graph, as well as the scheduling policy and buffer size for each node. We will then create a routing file that defines the paths between the nodes in the topology.

Next, we will generate the traffic matrix, which includes information on the source and destination nodes, average bandwidth, time distribution, packet size and frequency, and ToS for each flow.

Once we have defined these parameters, we can run the simulation and collect performance metrics such as delay, jitter, and drops for each path.

If you need more information on the parameters of the dataset, check out the [input_parameters_glossary.ipynb](input_parameters_glossary.ipynb) notebook, which provides a detailed explanation of each parameter.

In [None]:
import networkx as nx
import random
import os

In [None]:
# Define destination for the generated samples
training_dataset_path = "training"
#paths relative to data folder
graphs_path = "graphs"
routings_path = "routings"
tm_path = "tm"
# Path to simulator file
simulation_file = os.path.join(training_dataset_path,"simulation.txt")
# Name of the dataset: Allows you to store several datasets in the same path
# Each dataset will be stored at <training_dataset_path>/results/<name>
dataset_name = "dataset-1"

In [None]:
# Create folders
if os.path.isdir(training_dataset_path):
    print ("Destination path already exists. Files within the directory may be overwritten.")
else:
    os.makedirs(os.path.join(training_dataset_path,graphs_path))
    os.mkdir(os.path.join(training_dataset_path,routings_path))
    os.mkdir(os.path.join(training_dataset_path,tm_path))

In [None]:
'''
Generate a graph topology file. The graphs generated have the following characteristics:
- The network is able to process 3 ToS: 1,2,3
- All nodes have buffer sizes of 32000 bits and WFQ scheduling. ToS 1 is assigned to first queue, and ToS 2 and 3 to second queue.
- All links have bandwidths of 100000 bits per second
'''
def generate_topology(net_size, graph_file):
    G = nx.Graph()
    
    # Set the maximum number of ToS that will use the input traffic of the network
    G.graph["levelsToS"] = 3
    
    nodes = []
    node_degree = []
    for n in range(net_size):
        node_degree.append(random.choices([2,3,4,5,6],weights=[0.34,0.35,0.2,0.1,0.01])[0])
        
        nodes.append(n)
        G.add_node(n)
        # Assign to each node the scheduling Policy
        G.nodes[n]["schedulingPolicy"] = "SP"
        # Assign ToS to scheduling queues.
        # In this case we have two queues per port. ToS 1 is assigned to the first queue and ToS 2 and 3 to the second queue
        G.nodes[n]["tosToQoSqueue"] = "1;2,3"
        # Assign weights to each queue
        G.nodes[n]["schedulingWeights"] = "60, 40"
        # Assign the buffer size of all the ports of the node
        G.nodes[n]["bufferSizes"] = 32000

    finish = False
    while True:
        aux_nodes = list(nodes)
        n0 = random.choice(aux_nodes)
        aux_nodes.remove(n0)
        # Remove adjacents nodes (only one link between two nodes)
        for n1 in G[n0]:
            if n1 in aux_nodes:
                aux_nodes.remove(n1)
        if len(aux_nodes) == 0:
            # No more links can be added to this node - can not acomplish node_degree for this node
            nodes.remove(n0)
            if len(nodes) == 1:
                break
            continue
        n1 = random.choice(aux_nodes)
        G.add_edge(n0, n1)
        # Assign the link capacity to the link
        G[n0][n1]["bandwidth"] = 100000
        
        for n in [n0,n1]:
            node_degree[n] -= 1
            if (node_degree[n] == 0):
                nodes.remove(n)
                if (len(nodes) == 1):
                    finish = True
                    break
        if finish:
            break
    if not nx.is_connected(G):
        G = generate_topology(net_size, graph_file)
        return G
    
    nx.write_gml(G,graph_file)
    
    return G

In [None]:
'''
Generate a file with the shortest path routing of the topology G
'''
def generate_routing(G, routing_file):
    with open(routing_file,"w") as r_fd:
        lPaths = nx.shortest_path(G)
        for src in G:
            for dst in G:
                if src == dst:
                    continue
                path =  ','.join(str(x) for x in lPaths[src][dst])
                r_fd.write(path+"\n")

In [None]:
'''
Generate a traffic matrix file. We consider flows between all nodes in the newtork, each with the following characterstics
- The average bandwidth ranges between 10 and max_avg_lbda
- We consider three time distributions (in case of the ON-OFF policy we have on periods of 10 and off periods of 5)
- We consider two packages distributions, chosen at random
- ToS is assigned randomly
'''
def generate_tm(G, max_avg_lbda, traffic_file):
    poisson = "0" 
    cbr = "1"
    on_off = "2,10,5" #time_distribution, avg on_time exp, avg off_time exp
    time_dist = [poisson,cbr,on_off]
    
    pkt_dist_1 = "0,300,0.5,1700,0.5" #genric pkt size dist, pkt_size 1, prob 1, pkt_size 2, prob 2
    pkt_dist_2 = "0,500,0.6,1000,0.2,1400,0.2" #genric pkt size dist, pkt_size 1, prob 1, 
                                               # pkt_size 2, prob 2, pkt_size 3, prob 3
    pkt_size_dist = [pkt_dist_1, pkt_dist_2]
    tos_lst = [0,1,2]
    
    with open(traffic_file,"w") as tm_fd:
        for src in G:
            for dst in G:
                avg_bw = random.randint(10,max_avg_lbda)
                td = random.choice(time_dist)
                sd = random.choice(pkt_size_dist)
                tos = random.choice(tos_lst)
                
                traffic_line = "{},{},{},{},{},{}".format(
                    src,dst,avg_bw,td,sd,tos)
                tm_fd.write(traffic_line+"\n")

In [None]:
"""
We generate the files using the previously defined functions. This code will produce 100 samples where:
- We generate 5 topologies, and then we generate 20 traffic matrices for each
- The topology sizes range from 6 to 10 nodes
- We consider the maximum average bandwidth per flow as 1000
"""
max_avg_lbda = 1000
with open (simulation_file,"w") as fd:
    for net_size in range (6,11):
        #Generate graph
        graph_file = os.path.join(graphs_path,"graph_{}.txt".format(net_size))
        G = generate_topology(net_size, os.path.join(training_dataset_path,graph_file))
        # Generate routing
        routing_file = os.path.join(routings_path,"routing_{}.txt".format(net_size))
        generate_routing(G, os.path.join(training_dataset_path,routing_file))
        # Generate TM:
        for i in range (20):
            tm_file = os.path.join(tm_path,"tm_{}_{}.txt".format(net_size,i))
            generate_tm(G,max_avg_lbda, os.path.join(training_dataset_path,tm_file))
            sim_line = "{},{},{}\n".format(graph_file,routing_file,tm_file)   
            # If dataset has been generated in windows, convert paths into linux format
            fd.write(sim_line.replace("\\","/"))  

Now that we have created the input files for the simulator, we are ready to run the simulation and collect the performance metrics. To do this, we will use a Docker image that contains all the necessary tools and dependencies.

The Docker image is saved on Dockerhub, which means that when running the "docker run" command for the first time, the image will be downloaded automatically. All you need to make sure is that your computer is connected to the internet.

Once the image is downloaded, you can use the "docker run" command to start the simulation and pass in the input files as parameters. The simulator will then use these input files to calculate the delay, jitter, and drops for each path.

It's worth noting that the use of a Docker container ensures that the simulation runs in a consistent environment, regardless of the host machine's operating system and dependencies.

In [None]:
# First we generate the configuration file
import yaml

conf_file = os.path.join(training_dataset_path,"conf.yml")
conf_parameters = {
    "threads": 6,# Number of threads to use 
    "dataset_name": dataset_name, # Name of the dataset. It is created in <training_dataset_path>/results/<name>
    "samples_per_file": 10, # Number of samples per compressed file
    "rm_prev_results": "n", # If 'y' is selected and the results folder already exists, the folder is removed.
}

with open(conf_file, 'w') as fd:
    yaml.dump(conf_parameters, fd)

In [None]:
from getpass import getpass
def docker_cmd(training_dataset_path):
    raw_cmd = f"docker run --rm --mount type=bind,src={os.path.join(os.getcwd(),training_dataset_path)},dst=/data bnnupc/netsim:v0.1"
    terminal_cmd = raw_cmd
    if os.name != 'nt': # Unix, requires sudo
        print("Superuser privileges are required to run docker. Introduce sudo password when prompted")
        terminal_cmd = f"echo {getpass()} | sudo -S " + raw_cmd
        raw_cmd = "sudo " + raw_cmd
    return raw_cmd, terminal_cmd

In [None]:
# Start the docker
raw_cmd, terminal_cmd = docker_cmd(training_dataset_path)
print("The next cell will launch docker from the notebook. Alternatively, run the following command from a terminal:")
print(raw_cmd)

It is possible that the execution cell may not produce an output until it finishes running the simulation. In this case, you can check the status of the simulation using the logs feature in Docker Desktop.

To do this, simply go to the "Containers" section, select the "bnnupc/bnnetsimulator" container, and then click on "Logs". This will give you access to the log file, which contains information about the progress of the simulation.

The log file will contain one line per simulated sample, with the first value indicating the simulation line, and then "Ok" if the simulation finishes properly, or an error message if there were any issues. The log file is located at <training_dataset_path>/out.log.

It is recommended to regularly check the log file to ensure that the simulation is progressing as expected. This will help you catch any issues early on and take the necessary steps to resolve them.

Additionally, you should also check the log file after the simulation has finished to ensure that there were no errors or other issues that may have affected the results.

In [None]:
!{terminal_cmd}