# FTTx Network visualizations - Part 1

## Sankey as connection flow visualization

   As a continuation of my series of visualizations in GPON networks today's post will explore how to use a __Sankey visualization__ to plot the flow of interconnections between an OLT (Optical Line Terminal) and several ODFs (Optical Distribution Frames) in a FTTx installation. Sankey diagrams are a powerful tool to visualize complex systems and processes, and they are particularly useful in showing the flow of energy, materials or information. By the end of this post, you will have a better understanding of how to use Sankey diagrams to visualize the flow of interconnections in your FTTx installation and how to use them to identify areas of improvement.


The inspiration and main code for today´s viz comes from a post by __Mattia Cinelli__ on [How to do a Sankey Plot in Python](https://medium.com/analytics-vidhya/how-to-do-a-sankey-plot-in-python-5298869f5e8e). In this post, the author uses the __Plotly__ library for visualizations.

To get some context, the ODN is used to distribute the optical signal from the OLT PON port, to the final client's ONT (Optical Network Terminal) device, which allows the customer to enjoy triple play services at their homes, you can see a simple diagram [from Wikipedia](https://commons.wikimedia.org/wiki/File:GPON_topology.png):


![FTTx ODN simple diagram](GPON_topology_wikipedia.png "FTTx ODN simple diagram")


As seen in the image, the Inside plant (__GPON Network__) and Outside Plant (__FTTx Network__) have to "come together" somewhere, and that interconnection is very important for both networks, and sometimes, and in some teams its: "Nobody's land".

The main goal of this article is that you get an overview of both the smoothness or lack of in the interconnections of both your networks, and how we'll use a __Sankey Plot__ as a proxy to visualize this phenomenon.


## The data needed

Now, to make this visualization, I'm going to suppose that you have the inventory information about each OLT, and it's PON port connection to an Outside Plant FTTx ODF (Optical Distribution Frame) on every CO (Central Office) of your network, with this you'll be able to reproduce the visualizations that I'll present here.

This information, if you think about it, could be represent as __Edges__ in a __Graph__, having the OLT and the ODF as the __Nodes__ that this edge interconnect. So, to help us create a synthetic dataset, let’s use the Python library called __NetworkX__, which is very helpful when working with data that can be represented as a graph, you can check it here: [NetworkX documentation](https://networkx.org/documentation/stable/index.html)

In [7]:
# Make all the imports needed
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import random
import os

# For the graph utilities
import networkx as nx

Let’s create a dateset with the following columns:

* OLT hostname and let’s call it OLT_HOSTNAME.
* Outside Plant Access ODF name and let’s call it OSPA_NAME.
* Interconnection optical jumper (between the OLT and the ODF) ID or name, and let’s call it CONECTION_ID.
* The OLT slot and port numbers, which we'll call OLT_SLOT, OLT_PORT respectively.

And to do this lets create a function that generates the __edges__ of our OLTs and ODFs interconnections:

In [2]:
def create_interconnections(OLT_hostnames, OSPA_names, PORT_QTY, SLOTS, graph_generator = nx.Graph):
    """
    Function that creates edges from synthetic information.
    It receives:
        * A list named "OLT_hostnames" that represent the OLT hostnames.
        * A list named "MC_ODF_names" that represent the names of the ODFs.
        * A int named "PORT_QTY" that represent the quantity of ports.
        * A int named "SLOTS" that represent the quantity of slots.
        
    It returns a Pandas Dataframe that represents the edges as previously mentioned.
    """
    # Creates a complete graph (Please check https://networkx.org/documentation/stable/reference/generators.html)
    # for more graph generators.
    G = nx.complete_graph(100)
    for name in OLT_hostnames:
        G.add_node(name, PORT_QTY = PORT_QTY, SLOTS = SLOTS)
    for name in OSPA_NAME_names:
        G.add_node(name, PORT_QTY = 256)
    # Creates the edges as a list with the information from the graph
    edges = []
    for olt in OLT_hostnames:
        for odf in OSPA_NAME_names:
            if G.nodes[olt]['PORT_QTY'] > 0 and G.nodes[odf]['PORT_QTY'] > 0:
                CONECTION_ID = random.randint(1, 100)
                OLT_SLOT = random.randint(1, G.nodes[olt]['SLOTS'])
                edges.append({'OLT_HOSTNAME':olt, 'OSPA_NAME':odf, 'CONECTION_ID':CONECTION_ID, 'OLT_SLOT':OLT_SLOT, 'OLT_PORT':16})
                G.nodes[olt]['PORT_QTY'] -= 1
                G.nodes[odf]['PORT_QTY'] -= 1

    # Creates the Pandas Dataframe of the edges representation.
    edges_df = pd.DataFrame(edges, columns=['OLT_HOSTNAME', 'OSPA_NAME', 'CONECTION_ID', 'OLT_SLOT', 'OLT_PORT'])
    
    return edges_df

Now we can create the dataset, by using the newly created function.

Lets use three OLTs and five ODFs:

In [3]:
OLT_hostnames = ["OLT1", "OLT2", "OLT3"]
OSPA_NAME_names = ["ODF1", "ODF2", "ODF3", "ODF4", "ODF5"]
PORT_QTY = 16
SLOTS = 8
df = create_interconnections(OLT_hostnames, OSPA_NAME_names, PORT_QTY, SLOTS)
df

Unnamed: 0,OLT_HOSTNAME,OSPA_NAME,CONECTION_ID,OLT_SLOT,OLT_PORT
0,OLT1,ODF1,37,1,16
1,OLT1,ODF2,25,7,16
2,OLT1,ODF3,2,7,16
3,OLT1,ODF4,81,5,16
4,OLT1,ODF5,21,2,16
5,OLT2,ODF1,66,1,16
6,OLT2,ODF2,51,1,16
7,OLT2,ODF3,91,7,16
8,OLT2,ODF4,99,6,16
9,OLT2,ODF5,11,5,16


Your information should have roughly the same structure (although you might need to filter some columns in your data):

* Several connections for each OLT to each or some ODFs.

## Create the visualization

Now, the fun and insightful part, create the visualization. As stated before, we'll use some code from a post by __Mattia Cinelli__, but we have to modify it, so let’s have a step by step explanation for this visualization, and you can review the code on each step when it is referenced:

 1. As mentioned, each jumper, from one PON port on the OLT to one port on the ODF can be represented as an edge, and this obviously will follow your data and the way it was connected.
 2. A __Sankey__ plot displays the total of connections between two nodes as a __flow__, and the width of this flow represents the quantity of connections.
 3. The bulk of connections from one individual OLT to one individual ODF will be represented with one __FLOW__.
 4. For this representation, we will need to count each individual connection, and assign it to its corresponding flow.
 5. To the Sankey plot constructor we need to pass each node as an integer (not a string of the name), and of course, the value of each flow. So, to do this we will have to do some changes to the dataframe, which will result in a new a dataframe that will contain the data as integers.
 6. For this we will need to create two new datafrmes, called __"nodes_df"__ which will contain the names of each individual node in a column named "Label", and another dataframe called __"links_df"__, which will have three columns, named: "Source", "Target", and "Value", that we will feed to the Sankey plot constructor.
 7. We can see each row in the __"links_df"__ dataframe as a __Sankey__ plot __"flow"__, whith the column "Value" being the quantity of individual connections between each OLT and each ODF.
 8. And finally, we feed both dataframes to the plot constructor via a dictionary called __"data_trace"__. For more information on how to build this plot, please refer to the post mentioned before.
 
Below the whole code of the function:

In [12]:
def sankey_from_df(df, if_save, picture_title):
    """
    Function that creates a sankey image from a dataframe and filters the information based on
    the name of an CO.
    
    This function is very specific for the given set of information.
    
    params:
        
        df -> pd.DataFrame : The Data frame that contains the information. It should contain the following columns:
            * 'OLT_PLANTA': The name of the CO to filter, in the format "SSSS" (four letters).
            * 'OLT_CODE': The name of the OLT, in the format "SSSS_OLT_##" (four letters, and two numbers).
            * 'DGO - DistribuidorNombre','CONEXION': The name of the CAP (Cable de Acceso Primario). No
            specific format.
            
        oc_to_filter -> str: The name of CO to filter.
            
    Outputs:
    
        The sankey plot and a png save.
        
    
    """

    # and creates a new column named "UNION" that allows us to get the count of each individual "conection"
    # Meaning OLT -> CAP (cuenta la cantidad de cruzadas)
    df['UNION'] = df['OLT_HOSTNAME'] + '***' + df['OSPA_NAME']
    df.dropna(subset=['CONECTION_ID'], inplace=True)

    # Cerates a dictionary with keys as the conections, and the value of each conection.
    values = df['UNION'].value_counts(dropna=False).keys().tolist()
    counts = df['UNION'].value_counts(dropna=False).tolist()
    sankey_flows = dict(zip(values, counts))
    
    # Creates a node as each individual OLT and ODF.
    nodes_df = pd.DataFrame(df['OLT_HOSTNAME'].unique().tolist() + df['OSPA_NAME'].unique().tolist(), columns=['Label'])
    
    # Creates a dataframe in which each column can be viewed as a flow between each OLT and each ODF.
    links_df = df.groupby(['UNION'])[['UNION']].count().rename(columns={'UNION':'Value'}).reset_index()
    links_df['Source'] = links_df['UNION'].apply(lambda x: nodes_df[nodes_df['Label']==x.split('***')[0]].index.values).astype(int)
    links_df['Target'] = links_df['UNION'].apply(lambda x: nodes_df[nodes_df['Label']==x.split('***')[1]].index.values).astype(int)
    
    # For more reference on the following block of code, please refer to the afore mentioned post.
    data_trace = dict(
        type='sankey',
        domain = dict(
          x =  [0,1],
          y =  [0,1]
        ),
        orientation = "h",
        valueformat = ".0f",
        node = dict(
          pad = 10,
          thickness = 30,
          line = dict(
            color = "black",
            width = 1
          ),
          label =  nodes_df['Label'].dropna(axis=0, how='any'),
        ),
        link = dict(
            source = links_df['Source'].dropna(axis=0, how='any'),
            target = links_df['Target'].dropna(axis=0, how='any'),
            value = links_df['Value'].dropna(axis=0, how='any'),
      )
    )
    
    layout =  dict(
        title = "{} Sankey Plot".format(picture_title),
        height = 772,
        font = dict(
          size = 13
        ),    
    )
    
    # Creates the actual plot
    fig = go.Figure(dict(data=[data_trace], layout=layout))
    iplot(fig, validate=False)

    # Saves the plot in a PNG file if needed.
    if if_save:
        image_name = "{}-Sankey.png".format(picture_title)
        image_path = os.path.join(image_name)
        fig.write_image(image_path)

Make some imports for necesary for the Sankey plot:

In [13]:
import plotly.graph_objects as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

import itertools
from itertools import chain

And finally, lets plot the Sankey:

In [14]:
if_save = True
picture_name = 'OC 1'
sankey_from_df(df, if_save, picture_name)

## Closing comments

What kind of issues you might see in your plots:

* The first thing to point out is that the plots you will get from your data, will look significantly more different than this one.
* For starters, we used an auxiliary function from the library __"NetworkX"__ to create the edges, more specifically the graph constructor called __"G = nx.complete_graph(100)"__, which creates a fully connected graph, which means that all the nodes are connected to each other (but themselves).
* Now what you might see in your plots will reflect how smooth or not your __"FTTx"__ network deploy in that specific Central Office.
* More specific, you might see some ODFs receive most of the flows from most of the OLTs. This could mean that a particular ODF is connected to many Outside Plant cables.
* Another thing you might see is the concentration of flows on a particular node.
* And if you have a Central Office which might have got many FTTx deploy projects over a long period of time, the flows could be very chaotic. 
* This plot will surely help you identify crams in your network, or let you know how many COs are clean regarding their installation.

Ultimately you will have a depper understanding in the relationships of the interconnections between your GPON and FTTx networks in a visual way, and in turn, could help you take better and more well-informed decision in your future Inside Plant deploy project design.

I hope this was useful for you, and that you are looking forward for the next visualization!
