# Tuning de parametros
### Clasificicacion de nodos (usuarios o propiedades)
Se utiliza tuning.py



In [46]:
import importlib
import pickle

import networkx as nx
import numpy as np
import pandas as pd
from h3 import h3

from tqdm import tqdm

import funciones as fn
import tuning

importlib.reload(tuning)


<module 'tuning' from 'c:\\Users\\Ignacio\\Desktop\\GraphEmbedding\\src\\tuning.py'>

In [27]:

users = pickle.load(open("data/users.p", "rb"))
visits = pickle.load(open("data/visits.p", "rb"))
props = pickle.load(open("data/props.p", "rb"))

grafos = pickle.load(open("data/grafos.p", "rb"))
grafos_test = pickle.load(open("data/grafos_test.p", "rb"))

Se tienen dos clases para las dos tareas, NodeClassificationTuning y LinkPredictionTuning.

Para NodeClassificationTuning, se inicializa con alguna de las proyecciones. Se utiliza TrainModel, con method el metodo de embedding, d la dimension del embedding, y **kwargs los parametros del metodo.

method admite: "line","node2vec","gf","lap","sdne",'grarep','gae','vgae'

Esto entrega un diccionario con las llaves el nodo y los valores el embedding.

In [3]:
tester=tuning.NodeClassificationTuning(grafos.Users_f)
method="node2vec"
d=10
kwargs={"path_length":20,"num_paths":5,"p":0.1, "q": 0.01}
emb, time =tester.TrainModel(method, d,savefile=None,**kwargs)

Loading SelfDefined Dataset 
Start training...
Preprocess transition probs...
Walk iteration:
1 / 5
2 / 5
3 / 5
4 / 5
5 / 5
training Word2Vec model...
Obtaining vectors...
Time used = 7.708512783050537s
Finished training. Time used = 298.63742661476135.


In [7]:
emb_df=pd.DataFrame(emb).T
emb_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
1538173,0.191963,0.0507,-0.53748,-0.530098,0.369866,0.48126,0.367271,-1.705468,-1.06493,-0.334499
1504696,-0.588609,0.262009,-0.568832,0.114191,0.610207,0.496958,0.113908,-1.562443,-0.912372,-0.403653
1517493,-0.024593,0.34895,-0.575486,-0.35517,0.54279,0.600356,0.414754,-1.686879,-1.185349,-0.106043
1535923,-0.060828,-0.202048,-0.664768,-0.647827,0.485376,0.336293,0.080112,-1.537917,-0.868026,-0.561341
1529264,-0.176427,0.101871,-0.337206,-0.218122,0.533139,0.487136,0.254239,-1.583215,-1.010375,-0.496935


Para testear se usa TestModel, este acepta el embedding y un string identificador del embedding. Se obtiene una lista de Results, este es un objeto de EvalNE que recopila los resultados obtenidos del testeo, la lista son los resultados para los distintos bloques de crossvalidation. nombres distintos

In [13]:

results=tester.TestModel(emb,time=time,method_name="tutorialNC")
results

Loading SelfDefined Dataset 


[<evalne.evaluation.score.NCResults at 0x25eb79d76a0>,
 <evalne.evaluation.score.NCResults at 0x25eb79d74a8>,
 <evalne.evaluation.score.NCResults at 0x25e5bcfa860>,
 <evalne.evaluation.score.NCResults at 0x25e5bcfa780>,
 <evalne.evaluation.score.NCResults at 0x25e5bcfab38>]

Se puede usar pretty_print sobre el result para un resumen, o score sobre el tester para el promedio entre todos los resultados.

In [14]:
results[0].pretty_print()

Method: tutorial_0.8
Parameters: 
dict_items([('dim', 10), ('nw_name', 'GPI'), ('eval_time', 305.77825236320496)])
Test scores: 
f1_micro = 0.9416666666666667
f1_macro = 0.8528103941230744
f1_weighted = 0.9404777578218404



In [57]:
tester.score(method_name="tutorialNC")

0.9481771663319192

Para tunear los hiperparametros se usa TabuSearchParams, a partir de un seed empieza a recorrer la vecindad de esta y guarda los optimos locales en la lista tabu. Para evitar recalcular el mismo embedding guarda los resultados en un Scoresheet, una estrucutra de EvalNE, esta se guarda en la carpeta results.


In [21]:

seed={"path_length": 20, "num_paths": 10,  "p": 0.1, "q": 0.1}
scale={"path_length": 5, "num_paths": 5,  "p": 10, "q": 10}

best, best_f1, best_time=tester.TabuSearchParams(method=method,dim=d,seed=seed,scale=scale, window=4)

best, best_f1, best_time

({'path_length': 20, 'num_paths': 10, 'p': 0.1, 'q': 0.01},
 0.9432085979443263,
 392.85595893859863)

Tambien esta la funcion auxiliar tabu_search que repite lo anterior para distintas dimensiones y guarda el resultado final.

In [51]:
tuning.tabu_search(grafos.Users_f,"nc","node2vec",seed=seed,scale=scale,dims=[10, 30, 50, 100, 300, 500],iters=2, window=4)

Loading SelfDefined Dataset 


Unnamed: 0,name,score,time
10,"{'path_length': 20, 'num_paths': 10, 'p': 0.1,...",0.943209,392.855959
30,"{'path_length': 20, 'num_paths': 5, 'p': 1.0, ...",0.955376,233.319358
50,"{'path_length': 20, 'num_paths': 5, 'p': 0.1, ...",0.957289,246.374698
100,"{'path_length': 15, 'num_paths': 10, 'p': 0.1,...",0.956473,238.504796
300,"{'path_length': 20, 'num_paths': 5, 'p': 0.1, ...",0.957452,234.65797
500,"{'path_length': 15, 'num_paths': 5, 'p': 0.1, ...",0.956982,243.938574



### Prediccion de enlaces (usuarios y propiedades)

la clase tuning aguanta de la misma forma a la tarea de prediccion de enlaces, solo se necesita añadir un grafo de testeo.

In [102]:
tester=tuning.LinkPredictionTuning(grafos.B_f,grafos_test.B_f)
method="gf"
d=10
emb, time =tester.TrainModel(method, d)

Loading SelfDefined Dataset 
Start training...
total iter: 130
epoch 5: cost: 1152143.75; time used = 30.067538261413574s
epoch 10: cost: 1150676.75; time used = 26.675824403762817s
epoch 15: cost: 1147876.625; time used = 24.88351583480835s
epoch 20: cost: 1143396.5; time used = 26.69250512123108s
epoch 25: cost: 1136905.75; time used = 27.604138374328613s
epoch 30: cost: 1128104.125; time used = 25.65366220474243s
epoch 35: cost: 1116736.5; time used = 24.384759187698364s
epoch 40: cost: 1102606.875; time used = 27.297007083892822s
epoch 45: cost: 1085589.125; time used = 26.07245683670044s
epoch 50: cost: 1065631.75; time used = 25.653473377227783s
epoch 55: cost: 1042759.4375; time used = 26.844098567962646s
epoch 60: cost: 1017072.625; time used = 26.80748748779297s
epoch 65: cost: 988745.0625; time used = 27.113673210144043s
epoch 70: cost: 958020.0625; time used = 26.984359741210938s
epoch 75: cost: 925202.0625; time used = 26.71405291557312s
epoch 80: cost: 890645.625; time use

In [23]:
emb_df=pd.DataFrame(emb).T
emb_df.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9
721908,0.473702,0.449893,0.467521,0.399514,-0.45017,0.485544,0.506561,-0.480075,0.452618,-0.087346
-1528706,0.5051,0.245338,0.490223,0.47918,-0.189687,0.510618,0.540161,-0.510454,0.435768,-0.336404
766188,0.552585,0.475195,-0.468736,-0.551207,-0.55651,0.433805,-0.524431,-0.423653,0.550547,0.109962
-1525271,0.548274,-0.534137,0.540469,-0.519164,-0.52155,-0.33279,0.508949,0.54394,0.555981,-0.54981
-1545836,0.563226,-0.20542,0.502229,-0.440989,-0.534121,-0.064253,-0.504323,0.484225,0.530908,-0.429431


In [31]:

results=tester.TestModel(emb,time=time,method_name="tutorialLP")
results

Loading SelfDefined Dataset 


[<evalne.evaluation.score.Results at 0x25e27a24630>,
 <evalne.evaluation.score.Results at 0x25e4277c048>,
 <evalne.evaluation.score.Results at 0x25e5bcfae80>,
 <evalne.evaluation.score.Results at 0x25ec0ff41d0>]

TestModel entrega una lista con Results, para cada uno de las tecnicas de embedding de enlaces: l1, l2, hadamard, average. Al igual que antes, se puede usar pretty_print o tester.score para determinar el puntaje AUROC

In [34]:
results[2].pretty_print()

Method: tutorialLP
Parameters: 
dict_items([('dim', 10), ('edge_embed_method', 'hadamard'), ('train_frac', 0.9153803171055389), ('split_alg', 'spanning_tree'), ('owa', True), ('fe_ratio', 1.0000190759604746), ('nw_name', 'GPI'), ('split_id', 0), ('eval_time', 588.83686876297)])
Test scores: 
tn = 4082
fp = 765
fn = 2940
tp = 1906
auroc = 0.6728507262385838
precision = 0.7135904155746912
recall = 0.3933140734626496
fallout = 0.15782958531050134
miss = 0.41868413557391054
accuracy = 0.6177653977096874
f_score = 0.5071172010110416



In [38]:
tester.score("tutorialLP", metric="hadamard")

0.6728507262385838

A parte de lo anterior y el hecho que se necesita un grafo de testeo la implementacion es identica para LP y NC

In [47]:
tuning.tabu_search(grafos.B_f,"lp","gf",G_test=grafos_test.B_f)


Loading SelfDefined Dataset 


Unnamed: 0,name,l1,l2,hadamard,average,time
10,{},0.645279,0.625057,0.67053,0.514211,606.5566
30,{},0.594457,0.571272,0.722769,0.556928,644.451263
50,{},0.546418,0.545177,0.752006,0.614511,627.434258
100,{},0.561184,0.574624,0.763409,0.596196,638.739298
300,{},0.624842,0.63262,0.805607,0.63651,791.528123
500,{},0.651426,0.656475,0.822835,0.656945,909.298197
