# Model Driven Telemetry (MDT) diagnosis experiment

Prerequiste: Model Driven Telemetry data retrieved from a router, timestamp aligned and merged into a single file (merged.csv). The data is already filtered and only contains numeric counters.

This notebook performs the following steps:
1) Load the data.
2) Pre-process the data.
3) Visualize the data as a 2D projection using t-distributed stochastic neighbor embeddings (t-SNE).
4) Identify clusters using DBSCAN and associated transitions between the clusters, i.e., the system's change-points.
5) Distill the key features/counters that best describe a change-point
6) Describe the change-point in natural language - explaining what the issue is what could be done to resolve the issue.

In [None]:
%load_ext autoreload
%autoreload 2

### Load dataset information

In [None]:
import modules.dataset as ds
from dotenv import load_dotenv

load_dotenv("env")
ds.extract_dataset('./datasets/mdt-demo.tgz', './output')



In [None]:
import modules.mdt.datasets as mdt_ds
datasets = mdt_ds.Datasets(datasets_dir='./output')
datasets.jupyter_select_dataset_device(select_file=False)

## MDT Merged Data
See mdt_data_process notebook for how the merged CSV is curated.

In [None]:
import pandas as pd
import modules.utils as utils
from io import StringIO

merged_data_fn, _ = datasets.get_input_data_file("merged.csv")

df = pd.read_csv(merged_data_fn)  

# show number of rows and columns - dimensionality
shape = df.shape
print("dataset dimensions: rows={}, columns={}".format(shape[0], shape[1]))
# display a sample of the dataset, first 10 rows with first 10 columns for each row.
utils.displayDataFrame(df.iloc[0:9,0:9])

## MDT Preprocessed Data
See mdt_data_process notebook for how the processed-offline CSV is curated.

The nature of the network data collected on routers is multi-variate and very heterogeneous in nature. Some counters are incremental (e.g., packet counts), some are percentages (e.g., CPU usage), with ranges varying (e.g., bytes count in the trillions, or booleans that can only be one or zero). An example of incremental data that ranges in the trillions can be found here.

In order to be able to compare information from different sources, preprocessing of the selected dataset include three consecutive steps, operating over the entire timeseries:

* Order 1 difference for non-decreasing timeseries
* Min-max scaling between 0 and 1
* Exponential smoothing (with parameter 0.5)

In [None]:
preprocessed_data_fn, _ = datasets.get_input_data_file("preprocessed_offline.csv")

df = pd.read_csv(preprocessed_data_fn)

# show number of rows and columns - dimensionality
shape = df.shape
print("dataset dimensions: rows={}, columns={}".format(shape[0], shape[1]))
# display a sample of the dataset, first 10 rows with first 10 columns for each row.
utils.displayDataFrame(df.iloc[0:9,0:9])

### Changepoint Detector

Detect clusters using DBSCAN and the associated transitions of the system between the clusters.

In [None]:
from modules.mdt.data_utils import load_data, ORIGINAL_DATA
from modules.mdt.changepoint_detector import ChangepointDetector

tstp, dataframe = load_data(preprocessed_data_fn, scale=False, data_selection=ORIGINAL_DATA, ft_regex="^(?!.*(time|second)).*")

detector = ChangepointDetector(dataframe, datasets.get_device())


In [None]:
detector.detect()
detector.plot(withEvents=False)

In [None]:
detector.plot(withEvents=True)
detector.select_changepoints()

### Feature Selection

The selection problem, i.e., "which of the many features that change are the most descriptive for the change", is approached by optimizing an information-theoretic metric, i.e., cross-entropy. The goal here is to find the subset of features that describes best what is changing at the given timestamp. The intuition is that cross-entropy gives both the amount of additional information in the subset, and the divergence of the subset distribution from the original one. The added regularization term also allows for the tuning of the verbosity of the output.

More details can be found in T. Feltin, J. A. C. Fuertes, F. Brockners and T. H. Clausen, ["Understanding Semantics in Feature Selection for Fault Diagnosis in Network Telemetry Data”](https://www.researchgate.net/publication/371814291_Understanding_Semantics_in_Feature_Selection_for_Fault_Diagnosis_in_Network_Telemetry_Data), NOMS 2023 - 2023 IEEE/IFIP Network Operations and Management Symposium

In [None]:
from modules.mdt.retriever import Retriever
import modules.utils as utils
from IPython.display import clear_output


tstp, dataframe = load_data(merged_data_fn, scale=False, data_selection=ORIGINAL_DATA, ft_regex="^(?!.*(time|second|minute|hour|pid|port)).*",
                            remove_nan=True, remove_inf=True)

selected_changepoints = detector.get_changepoints()
retriever = Retriever(dataframe)
features = retriever.retrieve(selected_changepoints)

In [None]:
mdt_changepoints = []

for feature, data in features.items():
    mdt_changepoints.append({
        "Event": f"{feature - tstp[0]}",
        "Features": '\n'.join(data),
        "Source": "MDT",
        'Type': "NETWORK_DEVICE"
    })

clear_output()
utils.displayDictionary(mdt_changepoints)

### Changepoint / Feature Diagnoser

Leverage an LLM to turn the selected set of features along with the amplitude of change into a diagnosis and resolution in natural language.

In [None]:
from modules.diagnose import *
from modules.logger import Logger
from modules.llm.azure_ai import AzureLlm
import logging
import os

logger = Logger(logging.INFO)
llm = AzureLlm(logger,os.getenv('AZURE_OPENAI_API_KEY'))
        
diagnoser = Diagnose(logger, llm)
diagnoser.setOutputInitialDiagnosis("Diagnosis")

diagnoser.run(mdt_changepoints, inject=True)
utils.displayDictionary(mdt_changepoints)