Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Worker memory usage keeps increasing when running graphsage dist_train.py #43

Closed
fff-2013 opened this issue May 8, 2020 · 2 comments · Fixed by #46
Closed

Worker memory usage keeps increasing when running graphsage dist_train.py #43

fff-2013 opened this issue May 8, 2020 · 2 comments · Fixed by #46
Assignees
Labels
bug Something isn't working

Comments

@fff-2013
Copy link

fff-2013 commented May 8, 2020

Problem description

When I run the graphsage dist_train.py(cora data), the worker memory usage keeps increasing:

image

When I train model with our own data, which is a larger graph, the memory usage grows faster:

image

I guess if there is any memory leak? May be that some objects of the previous iterations are not free? Any advice or suggestions will be greatly appreciated.

Environment information for cora data

docker image: registry.cn-zhangjiakou.aliyuncs.com/pai-image/graph-learn:v0.1-cpu

code path: /workspace/graph-learn/examples/tf/graphsage (in docker container)

config: 2ps, 2worker / batchsize: 32 / epoch: 40000000

@baoleai baoleai added the bug Something isn't working label May 14, 2020
@fff-2013
Copy link
Author

After delete req/res pointers in edge_sampler.py and neighbor_sampler.py, memory leak is gone.

diff --git a/graphlearn/python/sampler/edge_sampler.py b/graphlearn/python/sampler/edge_sampler.py
index 4103ca1..79ab314 100644
--- a/graphlearn/python/sampler/edge_sampler.py
+++ b/graphlearn/python/sampler/edge_sampler.py
@@ -83,6 +83,8 @@ class EdgeSampler(object):
                                   src_ids,
                                   dst_ids)
     edges.edge_ids = edge_ids
+    pywrap.del_get_edge_req(req)
+    pywrap.del_get_edge_res(res)
     return edges


diff --git a/graphlearn/python/sampler/neighbor_sampler.py b/graphlearn/python/sampler/neighbor_sampler.py
index 1757f12..b912e50 100644
--- a/graphlearn/python/sampler/neighbor_sampler.py
+++ b/graphlearn/python/sampler/neighbor_sampler.py
@@ -124,6 +124,8 @@ class NeighborSampler(object):
       current_batch_size = nbr_ids_flat.size

       src_ids = nbr_ids
+      pywrap.del_nbr_req(req)
+      pywrap.del_nbr_res(res)
     return layers

   def _make_req(self, index, src_ids):
@@ -200,4 +202,6 @@ class FullNeighborSampler(NeighborSampler):
       current_batch_size = nbr_ids_flat.size

       src_ids = nbr_ids
+      pywrap.del_nbr_req(req)
+      pywrap.del_nbr_res(res)
     return layers

When I want to provide feedback,i see this #46 , more modifications. Awesome!

@jackonan
Copy link
Collaborator

@fff-2013 Sorry to trouble you and thanks for pointing out the problem. We've fixed it and you can try again.

@Seventeen17 Seventeen17 linked a pull request May 19, 2020 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants