segment fault #3

fengjiaxin · 2020-02-18T09:51:00Z

hi，excuse me
i meet a new issue,when i train the model
i meet another issue
segment fault core dump
would you update the new code,i have no idea to solve the problem

and more:
i think GLN/gln/mods/mol_gnn/gnn_family/utils.py can update by replace cuda() to to(DEVICE)
thanks a lot

Hanjun-Dai · 2020-02-18T09:57:42Z

could you please provide more details for the segfault?

fengjiaxin · 2020-02-18T10:02:58Z

./run_mf.sh: 行 60: 9301 段错误 (吐核)python ../main.py -gm $gm -fp_degree 2 -neg_sample $neg_sample -att_type $att_type -gnn_out $gnn_out -tpl_enc $tpl_enc -subg_enc $subg_enc -latent_dim $msg_dim -bn $bn -gen_method $gen -retro_during_train $retro -neg_num $neg_size -embed_dim $embed_dim -readout_agg_type $graph_agg -act_func $act -act_last True -max_lv $lv -dropbox $dropbox -data_name $data_name -save_dir $save_dir -tpl_name $tpl_name -f_atoms $dropbox/cooked_$data_name/atom_list.txt -iters_per_val 3000 -gpu 1 -topk 50 -beam_size 50 -num_parts 1

no other information, i think its not environment issue

Hanjun-Dai · 2020-02-18T10:05:38Z

are you able to run the test with existing model dumps?

Hanjun-Dai · 2020-02-18T10:06:50Z

and did you modify the script?

I use -gpu 0 in the script. Please try with the vanilla code and see if that works

fengjiaxin · 2020-02-18T10:27:12Z

get another issue gpu cuda error
are ckpt file saved by gpu?

fengjiaxin · 2020-02-18T11:28:59Z

i use -gpu 1 ,and did you save the model by gpu 0, i run test script by error as follows:

Traceback (most recent call last):
File "main_test.py", line 139, in
model = RetroGLN(cmd_args.dropbox, local_args.model_for_test)
File "/home/fengjiaxin/GLN/gln/test/model_inference.py", line 43, in init
self.gln.load_state_dict(torch.load(model_file))
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 426, in load
return _load(f, map_location, pickle_module, **pickle_load_args)
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 613, in _load
result = unpickler.load()
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 576, in persistent_load
deserialized_objects[root_key] = restore_location(obj, location)
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 155, in default_restore_location
result = fn(storage, location)
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/serialization.py", line 135, in _cuda_deserialize
return storage_type(obj.size())
File "/home/fengjiaxin/.conda/envs/my-rdkit-env/lib/python3.6/site-packages/torch/cuda/init.py", line 634, in _lazy_new
return super(_CudaBase, cls).new(cls, *args, **kwargs)
RuntimeError: CUDA error: out of memory

Hanjun-Dai · 2020-02-18T19:12:59Z

yes it uses gpu by default. Please always use -gpu 0 in your script.
If you want to change GPU, please use CUDA_VISIBLE_DEVICES instead

fengjiaxin · 2020-02-24T07:38:27Z

hi , i debug the code ,some error at GLN/gln/graph_logic/soft_logic.py line 29
jagged_forward graph_embed = graph_enc(list)
no other information
can you introduce your code in brief
i can not find the error
thanks

fengjiaxin · 2020-02-24T09:13:50Z

can you give a docker image? i think it will be useful

Hanjun-Dai · 2020-02-25T00:35:32Z

graph_enc is from another sub package in this repo.

Can you first try without GPU? Please take a look at this:
https://discuss.pytorch.org/t/on-a-cpu-device-how-to-load-checkpoint-saved-on-gpu-device/349

to see how to load a gpu dump into cpu

fengjiaxin · 2020-02-25T01:55:10Z

hi, i debug the traing file and test file
got the same error ,not cuda error
would you introduce your code in brief ,thanks

Hanjun-Dai · 2020-02-25T20:56:01Z

If the error is happening in that line, you may double check the
https://github.com/Hanjun-Dai/GLN/blob/master/gln/mods/mol_gnn/gnn_family/utils.py#L64

note that different graph nn implementation will override this function.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

segment fault #3

segment fault #3

fengjiaxin commented Feb 18, 2020

Hanjun-Dai commented Feb 18, 2020

fengjiaxin commented Feb 18, 2020

Hanjun-Dai commented Feb 18, 2020

Hanjun-Dai commented Feb 18, 2020

fengjiaxin commented Feb 18, 2020

fengjiaxin commented Feb 18, 2020

Hanjun-Dai commented Feb 18, 2020

fengjiaxin commented Feb 24, 2020

fengjiaxin commented Feb 24, 2020

Hanjun-Dai commented Feb 25, 2020

fengjiaxin commented Feb 25, 2020

Hanjun-Dai commented Feb 25, 2020

segment fault #3

segment fault #3

Comments

fengjiaxin commented Feb 18, 2020

Hanjun-Dai commented Feb 18, 2020

fengjiaxin commented Feb 18, 2020

Hanjun-Dai commented Feb 18, 2020

Hanjun-Dai commented Feb 18, 2020

fengjiaxin commented Feb 18, 2020

fengjiaxin commented Feb 18, 2020

Hanjun-Dai commented Feb 18, 2020

fengjiaxin commented Feb 24, 2020

fengjiaxin commented Feb 24, 2020

Hanjun-Dai commented Feb 25, 2020

fengjiaxin commented Feb 25, 2020

Hanjun-Dai commented Feb 25, 2020