Improved results by 'augmenting' matrix with fastRP algorithm #1

Knorreman · 2023-09-28T19:04:39Z

Hello!

I tried to use the companyKG graph together with the fastRP algorithm https://arxiv.org/pdf/1908.11512.pdf
I implemented the algorithm in apache spark https://github.com/Knorreman/graphxfastRP/tree/master
Forgive me for the incomplete README etc... :)

Here is the result when using the msBERT 512 dim vector as init vector instead of a randomly initialized one.
{"source": "embed torch.Size([1169931, 512])", "sp_auc": 0.848861754181647, "sr_validation_acc": 0.6195652173913043, "sr_test_acc": 0.6532258064516129, "cr_topk_hit_rate": [0.227659109895952, 0.32893550163287005, 0.4052213868003342, 0.47640123034859877, 0.566618724842409, 0.6384498177261335, 0.7838241436925648, 0.850617072985494]}
I could not get any results from SimCSE and ADA2 due to their large size and I ran into OOM problems on my PC. The msBERT took like 8-10h to run with my spark code... You can easily implement the fastRP algorithm in numpy/torch and get much better performance but I wanted to make the algorithm distributable with spark! :)

I used alpha1 and alpha2 as 1.0 and I also weighted the starting vector to 1.0 in the linear combination.
As you can see the 'sp_auc' and 'cr_topk_hit_rate' @50 and @100 is better than the results presented in the paper. However the 'sr_test_acc' is not quite as good.

GraphMAE has similar results with 'cr_topk_hit_rate' but not as good with 'sp_auc'

I didnt tune any hyper paramters for fastRP since I had so much trouble even getting it to work with that large graph + vector size. So there can potentially be even better results to gain if tune it even more!

I hope you find it interesting! :) And I can share the torch matrix I found if I can figure out a good host to upload it.

The text was updated successfully, but these errors were encountered:

cao-lele · 2023-10-04T15:58:18Z

Hi! It is interesting result from FastRP using msBERT embedding as initial vector. If you manage to find a place to host this result (with reproduction procedures and utilities), we would be more than happy to link to your result from our repo.
BRs//Lele

Knorreman · 2023-10-05T21:54:49Z

So I wrote the algorithm in python in this repo: https://github.com/Knorreman/fastRP
And now I can run it with both simcse and ada2 as well!
All was run using self weight (r0) to 1.0. And beta was set to -0.9 as described in the fastRP paper

To run msBERT with [1.0, 1.0] weights run this command in the repo
python src/run_fastRP.py --edges_path "/path/to/companykg/edges.pt" --embeddings_path "/path/to/companykg/msbert.pt" --weights 1.0,1.0 --output_path_prefix "/path/to/output/dir/"
Then use the eval script in this repo to get the results.

base	weights	sp_auc	sr_test_acc	R@50	R@100
msBERT	[1.0]	84.3%	69.2%	0.274	0.378
msBERT	[1.0, 1.0]	85.4%	67.7%	0.287	0.397
msBERT	[1.0, 1.0, 0.25]	85.7%	67.7%	0.275	0.393
ada2	[1.0]	82.75%	66.7%	0.308	0.430
ada2	[1.0, 1.0]	83.96%	65.9%	0.353	0.421
simcse	[1.0]	77.8%	66.2%	0.188	0.289
simcse	[1.0, 1.0]	79.6%	65.1%	0.253	0.325
pause	[1.0]	75.1%	64.0%	0.040	0.083
pause	[1.0, 1.0]	76.3%	64.1%	0.043	0.068

eval_results_fastRP.zip
These results show that there is interesting information in the node neighbourhood that can be utilized

cao-lele · 2023-10-06T09:45:49Z

Thanks a lot for more results from fastRP. Good to see competitive result on SR and SP task! I now referenced your results in the Readme of our repo. See here: https://github.com/EQTPartners/CompanyKG#external-results

Knorreman · 2023-10-06T11:50:44Z

Thank you! :) I hope it is helpful! Now I will try and incorporate the edge weights somehow...

Knorreman changed the title ~~Improved results by 'augmenting' vector with fastRP algorithm~~ Improved results by 'augmenting' matrix with fastRP algorithm Sep 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improved results by 'augmenting' matrix with fastRP algorithm #1

Improved results by 'augmenting' matrix with fastRP algorithm #1

Knorreman commented Sep 28, 2023

cao-lele commented Oct 4, 2023

Knorreman commented Oct 5, 2023

cao-lele commented Oct 6, 2023

Knorreman commented Oct 6, 2023

Improved results by 'augmenting' matrix with fastRP algorithm #1

Improved results by 'augmenting' matrix with fastRP algorithm #1

Comments

Knorreman commented Sep 28, 2023

cao-lele commented Oct 4, 2023

Knorreman commented Oct 5, 2023

cao-lele commented Oct 6, 2023

Knorreman commented Oct 6, 2023