New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

knn retrieval evaluation #770

Open

GentleZhu wants to merge 8 commits into awslabs:main from GentleZhu:knn-retriever

Collaborator

GentleZhu commented Mar 15, 2024

Issue #, if available:

Description of changes:
This is a draft PR because I found the recall@K is extremly low and it seems the remap get wrong from reading embeddings.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.


          knn retrieval evaluation

b9bad61

GentleZhu added the draft label

GentleZhu added 2 commits

March 21, 2024 04:57


          Merge branch 'awslabs:main' into knn-retriever

ae6b91f


          fix bugs

597b9a8

classicsong reviewed

View reviewed changes

examples/knn_retriever/build_index.py

		@@ -0,0 +1,128 @@
		import torch as th

Contributor

classicsong Mar 21, 2024

Please add a license head for each python code.

examples/knn_retriever/build_index.py

+              import torch as th
+              import time
+              import graphstorm as gs
+              from graphstorm.utils import is_distributed

Contributor

classicsong Mar 21, 2024

It is better to move import graphstorm related code together.
And for import, usually the order will be:

import system/builtin libraries like os, time, etc.
import pip packages
import local codes.

examples/knn_retriever/build_index.py

+                      pred = set(pred)
+                  overlap = len(pred & ground_truth)
+                  #if overlap > 0:

Contributor

classicsong Mar 21, 2024

Remove the comments.

examples/knn_retriever/build_index.py

+              from graphstorm.utils import setup_device
+              from graphstorm.model.utils import load_gsgnn_embeddings
+              def calculate_recall(pred, ground_truth):

Contributor

classicsong Mar 21, 2024

Can you give a description of how you compute recall in the function doc?

examples/knn_retriever/build_index.py

+                  index_dimension = embs[config.target_ntype].size(1)
+                  # Number of clusters (higher values lead to better recall but slower search)
+                  #nlist = 750
+                  #quantizer = faiss.IndexFlatL2(index_dimension)  # Use Flat index for quantization

Contributor

classicsong Mar 21, 2024

Remove the commented codes.

examples/knn_retriever/build_index.py

+                  index = faiss.IndexFlatIP(index_dimension)
+                  index.add(embs[config.target_ntype])
+                  #print(scores.abs().mean())

Contributor

classicsong Mar 21, 2024

remove

examples/knn_retriever/build_index.py

+                  len_dataloader = max_num_batch = len(test_dataloader)
+                  tensor = th.tensor([len_dataloader], device=device)
+                  if is_distributed():
+                      th.distributed.all_reduce(tensor, op=th.distributed.ReduceOp.MAX)

Contributor

classicsong Mar 21, 2024

Do we need to make it distributed?

examples/knn_retriever/build_index.py

+                      #print(blocks[0].edges(form='uv', etype='also_buy'))
+                      #breakpoint()
+                      # print(dgl.NID)
+                      if 'also_buy' in blocks[0].etypes:

Contributor

classicsong Mar 21, 2024

Is this implemented specifically for amazon review?

python/graphstorm/model/utils.py

		@@ -1065,6 +1066,32 @@ def save_full_node_embeddings(g, save_embed_path,

		save_shuffled_node_embeddings(shuffled_embs, save_embed_path, save_embed_format)

		def load_gsgnn_embeddings(emb_path):

Contributor

classicsong Mar 21, 2024

Can you check if load_pytorch_embedding is useful?

GentleZhu and others added 5 commits

March 26, 2024 08:00


          Merge branch 'awslabs:main' into knn-retriever

63a8091


          Merge branch 'awslabs:main' into knn-retriever

2e462f0


          for merge

942e8c6


          Merge branch 'knn-retriever' of https://github.com/GentleZhu/graphstorm…

884876b

… into knn-retriever


          fix bug

29b60eb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment