Worker memory usage keeps increasing when running graphsage dist_train.py #43

fff-2013 · 2020-05-08T09:01:59Z

Problem description

When I run the graphsage dist_train.py(cora data), the worker memory usage keeps increasing:

When I train model with our own data, which is a larger graph, the memory usage grows faster:

I guess if there is any memory leak? May be that some objects of the previous iterations are not free? Any advice or suggestions will be greatly appreciated.

Environment information for cora data

docker image: registry.cn-zhangjiakou.aliyuncs.com/pai-image/graph-learn:v0.1-cpu

code path: /workspace/graph-learn/examples/tf/graphsage (in docker container)

config: 2ps, 2worker / batchsize: 32 / epoch: 40000000

fff-2013 · 2020-05-18T09:00:41Z

After delete req/res pointers in edge_sampler.py and neighbor_sampler.py, memory leak is gone.

diff --git a/graphlearn/python/sampler/edge_sampler.py b/graphlearn/python/sampler/edge_sampler.py
index 4103ca1..79ab314 100644
--- a/graphlearn/python/sampler/edge_sampler.py
+++ b/graphlearn/python/sampler/edge_sampler.py
@@ -83,6 +83,8 @@ class EdgeSampler(object):
                                   src_ids,
                                   dst_ids)
     edges.edge_ids = edge_ids
+    pywrap.del_get_edge_req(req)
+    pywrap.del_get_edge_res(res)
     return edges


diff --git a/graphlearn/python/sampler/neighbor_sampler.py b/graphlearn/python/sampler/neighbor_sampler.py
index 1757f12..b912e50 100644
--- a/graphlearn/python/sampler/neighbor_sampler.py
+++ b/graphlearn/python/sampler/neighbor_sampler.py
@@ -124,6 +124,8 @@ class NeighborSampler(object):
       current_batch_size = nbr_ids_flat.size

       src_ids = nbr_ids
+      pywrap.del_nbr_req(req)
+      pywrap.del_nbr_res(res)
     return layers

   def _make_req(self, index, src_ids):
@@ -200,4 +202,6 @@ class FullNeighborSampler(NeighborSampler):
       current_batch_size = nbr_ids_flat.size

       src_ids = nbr_ids
+      pywrap.del_nbr_req(req)
+      pywrap.del_nbr_res(res)
     return layers

When I want to provide feedback，i see this #46 , more modifications. Awesome!

jackonan · 2020-05-19T03:00:47Z

@fff-2013 Sorry to trouble you and thanks for pointing out the problem. We've fixed it and you can try again.

baoleai added the bug Something isn't working label May 14, 2020

baoleai assigned Seventeen17 May 14, 2020

Seventeen17 linked a pull request May 19, 2020 that will close this issue

[fix #44] Fix memory leak. #46

Merged

Seventeen17 closed this as completed May 25, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Worker memory usage keeps increasing when running graphsage dist_train.py #43

Worker memory usage keeps increasing when running graphsage dist_train.py #43

fff-2013 commented May 8, 2020

fff-2013 commented May 18, 2020

jackonan commented May 19, 2020

Worker memory usage keeps increasing when running graphsage dist_train.py #43

Worker memory usage keeps increasing when running graphsage dist_train.py #43

Comments

fff-2013 commented May 8, 2020

Problem description

Environment information for cora data

fff-2013 commented May 18, 2020

jackonan commented May 19, 2020