# Graph Feature Extraction

This library has several independent modules that can be used for different tasks.
In this notebook we are going to present the feature extraction module.

In [1]:
from anomalous_vertices_detection.feature_controller import FeatureController
from anomalous_vertices_detection.datasets.academia import load_data

If you want to use igraph instead of networkx please install it.
If you want to use graphlab instead of networkx please install it.


### Data loading
First we will load a grpah, in this scenario we using load_data to load academia.edu graph

In [2]:
labels = {"neg": "Real", "pos": "Fake"}

my_graph, dataset_config = load_data(labels_map=labels)

Loading graph...
Data loaded.


### Feature Extraction
Now we are going to init the FeatureController which is an object that manges all the graph feature extraction proccess.

In [3]:
features = FeatureController(my_graph)

Our package includes several presets of features.
The extract_features fucntion expects to get a dictonary that maps the type of the feture, save name, and the function name that should be excuted.
We are going to use fast_link_features which contains fast computaional features based on the works of Fire et al. and Chuk...

In [4]:
from anomalous_vertices_detection.configs.predefined_features_sets import fast_link_features

fast_link_features

{False: {'link': {'adamic_adar_index': 'get_adamic_adar_index',
   'common_friends': 'get_common_friends',
   'jaccards_coefficient': 'get_jaccards_coefficient',
   'knn_weight4': 'get_knn_weight4',
   'knn_weight8': 'get_knn_weight8',
   'preferential_attachment_score': 'get_preferential_attachment_score',
   'sum_of_friends': 'get_sum_of_friends',
   'total_friends': 'get_total_friends'}},
 True: {'link': {'bi_common_friends': 'get_bi_common_friends',
   'in_common_friends': 'get_in_common_friends',
   'is_opposite_direction_friends': 'is_opposite_direction_friends',
   'jaccards_coefficient': 'get_jaccards_coefficient',
   'knn_weight1': 'get_knn_weight1',
   'knn_weight2': 'get_knn_weight2',
   'knn_weight3': 'get_knn_weight3',
   'knn_weight4': 'get_knn_weight4',
   'knn_weight5': 'get_knn_weight5',
   'knn_weight6': 'get_knn_weight6',
   'knn_weight7': 'get_knn_weight7',
   'knn_weight8': 'get_knn_weight8',
   'number_of_transitive_friends': 'get_number_of_transitive_friends',
  

###  Edge Feature extraction
extract_features_to_file saves the results in a file in order to allow extracting fetures without a memory bound.

In [5]:
edges_output_path = "../output/" + dataset_config.name + "_edges.csv"

features.extract_features_to_file(my_graph.edges[:1000], fast_link_features[my_graph.is_directed], edges_output_path)

100%|██████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:00<00:00, 3663.00feature/s]


In [6]:
import pandas as pd
edges_df = pd.read_csv(edges_output_path)
edges_df

Unnamed: 0,out_degree_v,out_common_friends,bi_common_friends,preferential_attachment_score,is_opposite_direction_friends,jaccards_coefficient,dst,total_friends,number_of_transitive_friends,in_degree_v,...,in_common_friends,out_degree_u,knn_weight1,knn_weight3,knn_weight2,knn_weight5,knn_weight4,knn_weight7,knn_weight6,knn_weight8
0,0.0,0,0,0.0,False,0.000000,33108,111.0,0,1.0,...,0,111.0,0.845782,1.138675,0.801598,0.098058,1.094491,0.138675,0.066815,0.094491
1,1.0,0,0,111.0,False,0.000000,33108,112.0,0,2.0,...,0,111.0,0.716025,0.845782,0.671841,0.080064,0.801598,0.098058,0.054554,0.066815
2,1.0,1,0,23.0,False,0.043478,235660,23.0,1,2.0,...,1,23.0,0.854700,0.984457,0.781474,0.160128,0.911231,0.196116,0.117851,0.144338
3,6.0,6,6,162.0,False,0.222222,208549,27.0,8,38.0,...,6,27.0,0.318242,0.536078,0.349110,0.025318,0.566947,0.059761,0.030261,0.071429
4,6.0,5,5,114.0,False,0.250000,49794,20.0,5,38.0,...,5,19.0,0.345823,0.563660,0.383735,0.029735,0.601571,0.070186,0.035806,0.084515
5,6.0,6,5,204.0,False,0.176471,49792,34.0,9,38.0,...,6,34.0,0.298803,0.516640,0.329159,0.022206,0.546995,0.052414,0.027067,0.063888
6,6.0,6,4,402.0,False,0.089552,49793,67.0,15,38.0,...,6,67.0,0.278806,0.496643,0.281396,0.019004,0.499232,0.044856,0.019418,0.045835
7,6.0,4,4,84.0,False,0.250000,260371,16.0,5,38.0,...,4,14.0,0.368643,0.586479,0.418327,0.033389,0.636163,0.078811,0.041345,0.097590
8,6.0,5,5,78.0,False,0.357143,260372,14.0,6,38.0,...,5,13.0,0.378346,0.596182,0.427389,0.034943,0.645226,0.082479,0.042796,0.101015
9,6.0,1,1,18.0,False,0.125000,208546,8.0,1,38.0,...,1,3.0,0.493461,0.711298,0.660128,0.053376,0.877964,0.125988,0.080064,0.188982


###  Vertices Feature Extraction
In the vertices example we are demonstrate the use of features_generator instead of extract_features_to_file which returns a generator.

In [7]:
from anomalous_vertices_detection.configs.predefined_features_sets import fast_vertex_features
fast_vertex_features

{True: {'vertex_v': {'average_scc': 'get_average_scc',
   'average_scc_plus': 'get_average_scc_plus',
   'average_wcc': 'get_average_wcc',
   'bi_degree': 'get_bi_degree',
   'bi_degree_density': 'get_bi_degree_density',
   'density_neighborhood_subgraph': 'get_density_neighborhood_subgraph',
   'density_neighborhood_subgraph_plus': 'get_density_neighborhood_subgraph_plus',
   'in_degree': 'get_in_degree',
   'in_degree_density': 'get_in_degree_density',
   'label': 'get_label',
   'out_degree': 'get_out_degree',
   'out_degree_density': 'get_out_degree_density',
   'src': 'get_vertex',
   'subgraph_node_link_number': 'get_subgraph_node_link_number',
   'subgraph_node_link_number_plus': 'get_subgraph_node_link_number_plus'}}}

In [15]:
vertices_fetures = list(features.features_generator(fast_vertex_features[my_graph.is_directed], my_graph.vertices[:1000]))

In [16]:
vertices_df = pd.DataFrame(vertices_fetures)
vertices_df

Unnamed: 0,average_scc,average_scc_plus,average_wcc,bi_degree,bi_degree_density,density_neighborhood_subgraph,density_neighborhood_subgraph_plus,in_degree,in_degree_density,label,out_degree,out_degree_density,src,subgraph_node_link_number,subgraph_node_link_number_plus,vertex_label
0,0.000000,2.000000,0.000000,0.0,0.000000,0.000000,0.000000,2.0,1.000000,Real,0.0,0.000000,287144,0,0,Real
1,2.000000,2.000000,2.000000,1.0,0.500000,0.000000,1.000000,1.0,0.500000,Real,1.0,0.500000,287145,0,2,Real
2,2.000000,4.000000,4.000000,2.0,0.500000,4.000000,0.800000,2.0,0.500000,Real,2.0,0.500000,287146,1,5,Real
3,45.000000,45.000000,45.000000,7.0,0.155556,1.363636,0.957447,38.0,0.844444,Real,7.0,0.155556,287147,33,47,Real
4,9.000000,9.000000,9.000000,1.0,0.111111,0.000000,4.500000,8.0,0.888889,Real,1.0,0.111111,287140,0,2,Real
5,3.000000,9.000000,4.500000,3.0,0.333333,9.000000,1.285714,6.0,0.666667,Real,3.0,0.333333,287141,1,7,Real
6,0.000000,1.000000,0.000000,0.0,0.000000,0.000000,0.000000,1.0,1.000000,Real,0.0,0.000000,287142,0,0,Real
7,4.000000,4.000000,4.000000,1.0,0.250000,0.000000,2.000000,3.0,0.750000,Real,1.0,0.250000,287143,0,2,Real
8,4.000000,4.000000,4.000000,1.0,0.250000,0.000000,2.000000,3.0,0.750000,Real,1.0,0.250000,287148,0,2,Real
9,2.000000,4.000000,4.000000,2.0,0.500000,4.000000,0.800000,2.0,0.500000,Real,2.0,0.500000,287149,1,5,Real
