-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Memory Leak?]Memory usage keep going up when generating superpoint, leading system killing the process #10
Comments
Hi Yan Xu
Thanks for reporting your issue. I never saw anything that look like memory leak on cut-pursuit. Can you monitor memory usage just before and right after the call to cut-pursuit? And also maybe have a look at how cut-pursuit is called in SSP, there might be unecessary copy that explains what you observe.
01/03/2024 07:43, Yan Xu :
… Thank so much for open source such great work !!!
When I running the superpoint generation process using `parallel_cut_pursuit`, I notice that the memory usage keep going up when process multiple files, when meet the maximum memory capacity, the python program got killed by the system.
This block me from training SSP(supervised superpoint, CVPR 2019) with more epochs when I transplant `parallel_cut_pursuit` as the optimization backend, although the training speed up a lot.
Does the `parallel_cut_pursuit` really have memory leak issue? How can I fix it?
```shell
#--- error training log
...
...
Epoch 2/50 (results_partition/xx/nsp800_f1000):
75%|█████████████████████████████████████████████▉ | 469/622 [28:23<11:04, 4.34s/it]
[1] 735656 killed python supervised_partition/train.py --config
```
Looking forward to your reply, thanks again~
--
Reply to this email directly or view it on GitHub:
#10
You are receiving this because you are subscribed to this thread.
Message ID: ***@***.***>
|
Thank you for your reply. I check the memory usage right before and after using cut-pursuit, it turns out that the leakage is not within #?--- print memory usage before cutp
process = psutil.Process()
memory_info = process.memory_full_info().rss / (1024 * 1024)
print(f"\ncurrent step memory usage (before cutp) {memory_info} MB")
#--- parallel cutpursuit 2019 - cp_kmpp_d0_dist
if cfg.pcp_type == 'cp_kmpp_d0_dist':
# from sls_partition.pcutp_2019.python.wrappers.cp_kmpp_d0_dist import cp_kmpp_d0_dist
from sls_partition.pcutp_2023.python.wrappers.cp_kmpp_d0_dist import cp_kmpp_d0_dist
pred_in_component, x_c, pred_components, edges, times = cp_kmpp_d0_dist(
1,
ver_value,
source_csr,
target,
edge_weights=edge_weights,
vert_weights=node_size,
coor_weights=coor_weights,
min_comp_weight=cfg.cp_cutoff,
cp_dif_tol=1e-2,
cp_it_max=cfg.cp_iterations,
split_damp_ratio=0.7,
verbose=cfg.cp_verbose,
max_num_threads=cfg.cp_num_threads,
balance_parallel_split=True,
compute_Time=True,
compute_List=True,
compute_Graph=True)
#!--- free RAM
del cp_kmpp_d0_dist
elif cfg.pcp_type == 'cp_d0_dist':
#--- parallel cutpursuit 2024 - cp_d0_dist
#- coor_weights = coor_weights | None
from sls_partition.pcutp_2024.python.wrappers.cp_d0_dist import cp_d0_dist
pred_in_component, x_c, pred_components, edges, times = cp_d0_dist(
ver_value.shape[0],
ver_value,
source_csr,
target,
edge_weights=edge_weights,
vert_weights=node_size,
coor_weights=None,
min_comp_weight=cfg.cp_cutoff,
cp_dif_tol=1e-2,
cp_it_max=cfg.cp_iterations,
split_damp_ratio=0.7,
verbose=cfg.cp_verbose,
max_num_threads=cfg.cp_num_threads,
balance_parallel_split=True,
compute_Time=True,
compute_List=True,
compute_Graph=True)
#!--- free RAM
del cp_d0_dist
else:
raise NotImplementedError('unknown pcutp type ' + cfg.pcp_type)
#?--- print memory usage before cutp
process = psutil.Process()
memory_info = process.memory_full_info().rss / (1024 * 1024)
print(f"current step memory usage (after cutp) {memory_info} MB") ...
...
current step memory usage (before collate) 1314.43359375 MB
current step memory usage (before train data load) 1330.3515625 MB
current step memory usage (before cutp) 1330.3515625 MB
current step memory usage (after cutp) 1332.97265625 MB
11%|███████▏ | 7/62 [00:04<00:37, 1.48it/s]
current step memory usage (before collate) 1334.26171875 MB
current step memory usage (before train data load) 1334.51953125 MB
current step memory usage (before cutp) 1334.51953125 MB
current step memory usage (after cutp) 1334.40625 MB
13%|████████▎ | 8/62 [00:05<00:34, 1.58it/s]
current step memory usage (before collate) 1335.6953125 MB
current step memory usage (before train data load) 1338.53125 MB
current step memory usage (before cutp) 1338.53125 MB
current step memory usage (after cutp) 1339.1015625 MB
...
...
current step memory usage (before collate) 1594.14453125 MB
current step memory usage (before train data load) 1594.14453125 MB
current step memory usage (before cutp) 1594.14453125 MB
current step memory usage (after cutp) 1594.4921875 MB
55%|██████████████████████████████████▌ | 34/62 [00:21<00:17, 1.58it/s]
current step memory usage (before collate) 1595.78125 MB
current step memory usage (before train data load) 1596.296875 MB
current step memory usage (before cutp) 1596.296875 MB
current step memory usage (after cutp) 1600.8046875 MB
...
... To find what cause the memory leakage, I further converted the SSP backbone to an semantic segmentation network(Backbone+MLP and a BCE loss), then the leakage disappear, so I think the leakage is caused by the loss computing part of SSP but not the cut-pursuit part(C++ Part). Further, I try to Thank you again for your quick reply ! |
Hi Yan Xu, that's what I thought! You can close this issue, and maybe report what you observed in the SPT repo.
Regards
03/03/2024 05:57, Yan Xu :
… Thank you for your reply. I check the memory usage right before and after using cut-pursuit, it turns out that the leakage is not within `parallel_cut_pursuit`, here is the python scripts and the output.
```python
#?--- print memory usage before cutp
process = psutil.Process()
memory_info = process.memory_full_info().rss / (1024 * 1024)
print(f"\ncurrent step memory usage (before cutp) {memory_info} MB")
#--- parallel cutpursuit 2019 - cp_kmpp_d0_dist
if cfg.pcp_type == 'cp_kmpp_d0_dist':
# from sls_partition.pcutp_2019.python.wrappers.cp_kmpp_d0_dist import cp_kmpp_d0_dist
from sls_partition.pcutp_2023.python.wrappers.cp_kmpp_d0_dist import cp_kmpp_d0_dist
pred_in_component, x_c, pred_components, edges, times = cp_kmpp_d0_dist(
1,
ver_value,
source_csr,
target,
edge_weights=edge_weights,
vert_weights=node_size,
coor_weights=coor_weights,
min_comp_weight=cfg.cp_cutoff,
cp_dif_tol=1e-2,
cp_it_max=cfg.cp_iterations,
split_damp_ratio=0.7,
verbose=cfg.cp_verbose,
max_num_threads=cfg.cp_num_threads,
balance_parallel_split=True,
compute_Time=True,
compute_List=True,
compute_Graph=True)
#!--- free RAM
del cp_kmpp_d0_dist
elif cfg.pcp_type == 'cp_d0_dist':
#--- parallel cutpursuit 2024 - cp_d0_dist
#- coor_weights = coor_weights | None
from sls_partition.pcutp_2024.python.wrappers.cp_d0_dist import cp_d0_dist
pred_in_component, x_c, pred_components, edges, times = cp_d0_dist(
ver_value.shape[0],
ver_value,
source_csr,
target,
edge_weights=edge_weights,
vert_weights=node_size,
coor_weights=None,
min_comp_weight=cfg.cp_cutoff,
cp_dif_tol=1e-2,
cp_it_max=cfg.cp_iterations,
split_damp_ratio=0.7,
verbose=cfg.cp_verbose,
max_num_threads=cfg.cp_num_threads,
balance_parallel_split=True,
compute_Time=True,
compute_List=True,
compute_Graph=True)
#!--- free RAM
del cp_d0_dist
else:
raise NotImplementedError('unknown pcutp type ' + cfg.pcp_type)
#?--- print memory usage before cutp
process = psutil.Process()
memory_info = process.memory_full_info().rss / (1024 * 1024)
print(f"current step memory usage (after cutp) {memory_info} MB")
```
```shell
...
...
current step memory usage (before collate) 1314.43359375 MB
current step memory usage (before train data load) 1330.3515625 MB
current step memory usage (before cutp) 1330.3515625 MB
current step memory usage (after cutp) 1332.97265625 MB
11%|███████▏ | 7/62 [00:04<00:37, 1.48it/s]
current step memory usage (before collate) 1334.26171875 MB
current step memory usage (before train data load) 1334.51953125 MB
current step memory usage (before cutp) 1334.51953125 MB
current step memory usage (after cutp) 1334.40625 MB
13%|████████▎ | 8/62 [00:05<00:34, 1.58it/s]
current step memory usage (before collate) 1335.6953125 MB
current step memory usage (before train data load) 1338.53125 MB
current step memory usage (before cutp) 1338.53125 MB
current step memory usage (after cutp) 1339.1015625 MB
...
...
current step memory usage (before collate) 1594.14453125 MB
current step memory usage (before train data load) 1594.14453125 MB
current step memory usage (before cutp) 1594.14453125 MB
current step memory usage (after cutp) 1594.4921875 MB
55%|██████████████████████████████████▌ | 34/62 [00:21<00:17, 1.58it/s]
current step memory usage (before collate) 1595.78125 MB
current step memory usage (before train data load) 1596.296875 MB
current step memory usage (before cutp) 1596.296875 MB
current step memory usage (after cutp) 1600.8046875 MB
...
...
```
To find what cause the memory leakage, I further converted the SSP backbone to an semantic segmentation network(Backbone+MLP and a BCE loss), then the leakage disappear, so I think the leakage is caused by the loss computing part of SSP but not the cut-pursuit part(C++ Part).
Further, I try to `del` every used variables in loss computing part, but the leakage stays. So I think maybe the non-end-to-end character caused that `pytorch` cannot release memory properly.
Thank you again for your quick reply !
--
Reply to this email directly or view it on GitHub:
#10 (comment)
You are receiving this because you commented.
Message ID: ***@***.***>
|
Thank so much for open source such great work !!!
When I running the superpoint generation process using
parallel_cut_pursuit
, I notice that the memory usage keep going up when process multiple files, when meet the maximum memory capacity, the python program got killed by the system.This block me from training SSP(supervised superpoint, CVPR 2019) with more epochs when I transplant
parallel_cut_pursuit
as the optimization backend, although the training speed up a lot.Does the
parallel_cut_pursuit
really have memory leak issue? How can I fix it?Looking forward to your reply, thanks again~
The text was updated successfully, but these errors were encountered: