# Graph Feature Extraction

This library has several independent modules that can be used for different tasks.
In this notebook we are going to present the feature extraction module.

In [1]:
from anomalous_vertices_detection.feature_controller import FeatureController
from anomalous_vertices_detection.datasets.academia import load_data

### Data loading
First we will load a graph, in this scenario we are using load_data to load academia.edu graph

In [12]:
labels = {"neg": "Real", "pos": "Fake"}

my_graph, dataset_config = load_data(labels_map=labels, package="GraphLab")

Loading graph...
Data loaded.


### Feature Extraction
Now we are going to init the FeatureController which is an object that manges all the graph feature extraction process.

In [None]:
features = FeatureController(my_graph)

Our package includes several presets of features.
The extract_features function expects to get a dictionary that maps the type of the feature, save name, and the function name that should be executed.
We are going to use fast_link_features which contains fast computational features based on the works of Fire et al. and Chuk...

In [None]:
from anomalous_vertices_detection.configs.predefined_features_sets import fast_link_features

fast_link_features

###  Edge Feature extraction
extract_features_to_file saves the results in a file in order to allow extracting features without a memory bound.

In [None]:
edges_output_path = "../output/" + dataset_config.name + "_edges.csv"

features.extract_features_to_file(my_graph.edges[:1000], fast_link_features[my_graph.is_directed], edges_output_path)

In [None]:
import pandas as pd
edges_df = pd.read_csv(edges_output_path)
edges_df

###  Vertices Feature Extraction
In the vertices example we are demonstrate the use of features_generator instead of extract_features_to_file which returns a generator.

In [None]:
from anomalous_vertices_detection.configs.predefined_features_sets import fast_vertex_features
fast_vertex_features

In [None]:
vertices_fetures = list(features.features_generator(fast_vertex_features[my_graph.is_directed], my_graph.vertices[:1000]))

In [None]:
vertices_df = pd.DataFrame(vertices_fetures)
vertices_df