# GraphData and GraphProfiler pipeline demo

DataProfiler can also load and profile graph datasets. Similarly to the rest of DataProfiler profilers, this is split into two components:
- GraphData
- GraphProfiler

We will demo the use of this graph pipeline.

First, let's import the libraries needed for this example.

In [17]:
import os
import sys
import pandas as pd
import pprint
sys.path.insert(0, '..')

import dataprofiler as dp
data_path = "../dataprofiler/tests/data"

We now input our dataset into the generic DataProfiler pipeline:

In [23]:
data = dp.Data(os.path.join(data_path, "csv/graph_data_csv_identify.csv"))
profile = dp.Profiler(data)

report = profile.report()

pp = pprint.PrettyPrinter(sort_dicts=False, compact=True)
pp.pprint(report)

{'num_nodes': 278,
 'num_edges': 199,
 'categorical_attributes': ['categorical_status', 'node_id_dst', 'node_id_src'],
 'continuous_attributes': ['continuous_weight'],
 'avg_node_degree': 1.4316546762589928,
 'global_max_component_size': 21,
 'continuous_distribution': {'categorical_status': None,
                             'node_id_dst': None,
                             'continuous_weight': {'name': 'gamma',
                                                   'scale': 269.20076739533147,
                                                   'properties': [520400.63250485307,
                                                                  -670.0469933809193,
                                                                  0.0012974550576001786,
                                                                  array([5.20400633e+05,            nan, 1.29745506e-03]),
                                                                  array([5.20400633e+05,            nan, 1.29745506e-03

We notice that the `Data` class automatically detected the input file as graph data. The `GraphData` class is able to differentiate between tabular and graph csv data. After `Data` matches the input file as graph data, `GraphData` does the necessary work to load the csv data into a NetworkX Graph. 

`Profiler` runs `GraphProfiler` when graph data is input (or when `data_type="graph"` is specified). The `report()` function outputs the profile for the user.

# Conclusion

We have shown the graph pipeline in the DataProfiler. It works similarly to the current DataProfiler implementation.