Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved results by 'augmenting' matrix with fastRP algorithm #1

Open
Knorreman opened this issue Sep 28, 2023 · 4 comments
Open

Improved results by 'augmenting' matrix with fastRP algorithm #1

Knorreman opened this issue Sep 28, 2023 · 4 comments

Comments

@Knorreman
Copy link

Hello!

I tried to use the companyKG graph together with the fastRP algorithm https://arxiv.org/pdf/1908.11512.pdf
I implemented the algorithm in apache spark https://github.com/Knorreman/graphxfastRP/tree/master
Forgive me for the incomplete README etc... :)

Here is the result when using the msBERT 512 dim vector as init vector instead of a randomly initialized one.
{"source": "embed torch.Size([1169931, 512])", "sp_auc": 0.848861754181647, "sr_validation_acc": 0.6195652173913043, "sr_test_acc": 0.6532258064516129, "cr_topk_hit_rate": [0.227659109895952, 0.32893550163287005, 0.4052213868003342, 0.47640123034859877, 0.566618724842409, 0.6384498177261335, 0.7838241436925648, 0.850617072985494]}
I could not get any results from SimCSE and ADA2 due to their large size and I ran into OOM problems on my PC. The msBERT took like 8-10h to run with my spark code... You can easily implement the fastRP algorithm in numpy/torch and get much better performance but I wanted to make the algorithm distributable with spark! :)

I used alpha1 and alpha2 as 1.0 and I also weighted the starting vector to 1.0 in the linear combination.
As you can see the 'sp_auc' and 'cr_topk_hit_rate' @50 and @100 is better than the results presented in the paper. However the 'sr_test_acc' is not quite as good.

GraphMAE has similar results with 'cr_topk_hit_rate' but not as good with 'sp_auc'

I didnt tune any hyper paramters for fastRP since I had so much trouble even getting it to work with that large graph + vector size. So there can potentially be even better results to gain if tune it even more!

I hope you find it interesting! :) And I can share the torch matrix I found if I can figure out a good host to upload it.

@Knorreman Knorreman changed the title Improved results by 'augmenting' vector with fastRP algorithm Improved results by 'augmenting' matrix with fastRP algorithm Sep 28, 2023
@cao-lele
Copy link
Collaborator

cao-lele commented Oct 4, 2023

Hi! It is interesting result from FastRP using msBERT embedding as initial vector. If you manage to find a place to host this result (with reproduction procedures and utilities), we would be more than happy to link to your result from our repo.
BRs//Lele

@Knorreman
Copy link
Author

So I wrote the algorithm in python in this repo: https://github.com/Knorreman/fastRP
And now I can run it with both simcse and ada2 as well!
All was run using self weight (r0) to 1.0. And beta was set to -0.9 as described in the fastRP paper

To run msBERT with [1.0, 1.0] weights run this command in the repo
python src/run_fastRP.py --edges_path "/path/to/companykg/edges.pt" --embeddings_path "/path/to/companykg/msbert.pt" --weights 1.0,1.0 --output_path_prefix "/path/to/output/dir/"
Then use the eval script in this repo to get the results.

base weights sp_auc sr_test_acc R@50 R@100
msBERT [1.0] 84.3% 69.2% 0.274 0.378
msBERT [1.0, 1.0] 85.4% 67.7% 0.287 0.397
msBERT [1.0, 1.0, 0.25] 85.7% 67.7% 0.275 0.393
ada2 [1.0] 82.75% 66.7% 0.308 0.430
ada2 [1.0, 1.0] 83.96% 65.9% 0.353 0.421
simcse [1.0] 77.8% 66.2% 0.188 0.289
simcse [1.0, 1.0] 79.6% 65.1% 0.253 0.325
pause [1.0] 75.1% 64.0% 0.040 0.083
pause [1.0, 1.0] 76.3% 64.1% 0.043 0.068

eval_results_fastRP.zip
These results show that there is interesting information in the node neighbourhood that can be utilized

@cao-lele
Copy link
Collaborator

cao-lele commented Oct 6, 2023

Thanks a lot for more results from fastRP. Good to see competitive result on SR and SP task! I now referenced your results in the Readme of our repo. See here: https://github.com/EQTPartners/CompanyKG#external-results

@Knorreman
Copy link
Author

Thank you! :) I hope it is helpful! Now I will try and incorporate the edge weights somehow...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants