You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, I would like to thank you for sharing the code of your awesome projects. I am trying to run your code and reproduce your experiments. Currently, I'm facing a problem.
Here are the errors and my fixes:
[0]
File ".../eICU-GNN-LSTM/graph_construction/create_bert_graph.py", line 19, in make_graph_bert
distances = torch.cdist(batch, bert, p=2.0, compute_mode='use_mm_for_euclid_dist_if_necessary')
RuntimeError: cdist only supports floating-point dtypes, X1 got: Byte
Fix:
changed dtype from ByteTensor to FloatTensor
File ".../eICU-GNN-LSTM/graph_construction/create_graph.py", line 15
dtype = torch.cuda.sparse.FloatTensor if device.type == 'cuda' else torch.sparse.FloatTensor
File "/home/sale/eICU-GNN-LSTM/graph_construction/create_graph.py", line 65, in make_graph_penalise
s_pen = 5 * s - total_combined_diags # the 5 is fairly arbitrary but I don't want to penalise not sharing diagnoses too much
RuntimeError: The size of tensor a (89123) must match the size of tensor b (1000) at non-singleton dimension 1
Fix:
File ".../eICU-GNN-LSTM/graph_construction/create_graph.py", line 194
u, v, vals, k = make_graph_penalise(all_diagnoses, scores, debug=False, k=args.k) ############### debug=False Fixes problem
u, v, vals, k=make_graph_penalise(all_diagnoses, scores, k=args.k)
[1]
File "../projects/eICU-GNN-LSTM/src/models/pyg_ns.py", line 241, in inference
edge_attn = torch.cat(edge_attn, dim=0) # [no. of edges, n_heads of that layer]
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
Fix:
if i == 1 and get_attn:
edge_index_w_self_loops = torch.cat(edge_index_w_self_loops, dim=1) # [2, n. of edges]
if get_attn:
edge_attn = torch.cat(edge_attn, dim=0) # [no. of edges, n_heads of that layer]
all_edge_attn.append(edge_attn)
edge_attn=torch.cat(edge_attn, dim=0) # [no. of edges, n_heads of that layer]
[2]
File "../eICU-GNN-LSTM/train_ns_lstmgnn.py", line 94, in validation_step
out = out[self.dataset.data.val_mask]
TypeError: only integer tensors of a single element can be converted to an index
because when I print those matrices found some NaN values.
After this, the code starts training. BUT with wired training progress (loss always nan) !!!
Print out the output matrices found that it's always NANs !!!
Hello! Thanks for the detailed comment. I’ll work through the code and tell you what outputs I should get. I won’t have time this weekend though, so I’ll get back to you after that!
File ".../eICU-GNN-LSTM/src/dataloader/convert.py", line 75, in convert_into_mmap
write_file[n : n+arr_len, :] = arr # write into mmap
ValueError: could not broadcast input array from shape (62385,92) into shape (62385,57)
Thank you for sharing! I will be looking into this soon. Apologies for the incredibly long delay in getting back and I appreciate you sharing the solution here
Hello
First, I would like to thank you for sharing the code of your awesome projects. I am trying to run your code and reproduce your experiments. Currently, I'm facing a problem.
Here are the errors and my fixes:
[0]
File ".../eICU-GNN-LSTM/graph_construction/create_bert_graph.py", line 19, in make_graph_bert
distances = torch.cdist(batch, bert, p=2.0, compute_mode='use_mm_for_euclid_dist_if_necessary')
RuntimeError: cdist only supports floating-point dtypes, X1 got: Byte
Fix:
changed dtype from ByteTensor to FloatTensor
File ".../eICU-GNN-LSTM/graph_construction/create_graph.py", line 15
dtype = torch.cuda.sparse.FloatTensor if device.type == 'cuda' else torch.sparse.FloatTensor
eICU-GNN-LSTM/graph_construction/create_graph.py
Line 15 in 5167eea
File "/home/sale/eICU-GNN-LSTM/graph_construction/create_graph.py", line 65, in make_graph_penalise
s_pen = 5 * s - total_combined_diags # the 5 is fairly arbitrary but I don't want to penalise not sharing diagnoses too much
RuntimeError: The size of tensor a (89123) must match the size of tensor b (1000) at non-singleton dimension 1
Fix:
File ".../eICU-GNN-LSTM/graph_construction/create_graph.py", line 194
u, v, vals, k = make_graph_penalise(all_diagnoses, scores, debug=False, k=args.k) ############### debug=False Fixes problem
eICU-GNN-LSTM/graph_construction/create_graph.py
Line 194 in 5167eea
[1]
File "../projects/eICU-GNN-LSTM/src/models/pyg_ns.py", line 241, in inference
edge_attn = torch.cat(edge_attn, dim=0) # [no. of edges, n_heads of that layer]
RuntimeError: There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::_cat. This usually means that this function requires a non-empty list of Tensors. Available functions are [CPU, CUDA, QuantizedCPU, BackendSelect, Named, AutogradOther, AutogradCPU, AutogradCUDA, AutogradXLA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, Tracer, Autocast, Batched, VmapMode].
Fix:
if i == 1 and get_attn:
edge_index_w_self_loops = torch.cat(edge_index_w_self_loops, dim=1) # [2, n. of edges]
if get_attn:
edge_attn = torch.cat(edge_attn, dim=0) # [no. of edges, n_heads of that layer]
all_edge_attn.append(edge_attn)
eICU-GNN-LSTM/src/models/pyg_ns.py
Line 241 in 5167eea
[2]
File "../eICU-GNN-LSTM/train_ns_lstmgnn.py", line 94, in validation_step
out = out[self.dataset.data.val_mask]
TypeError: only integer tensors of a single element can be converted to an index
Fix:
out = out[0][self.dataset.data.val_mask]
eICU-GNN-LSTM/train_ns_lstmgnn.py
Line 94 in 5167eea
[3]
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
Fix:
In the same file "../eICU-GNN-LSTM/train_ns_lstmgnn.py" line 96 Added the following lines:
out[out != out] = 0
out_lstm[out_lstm != out_lstm] = 0
eICU-GNN-LSTM/train_ns_lstmgnn.py
Line 94 in 5167eea
because when I print those matrices found some NaN values.
After this, the code starts training. BUT with wired training progress (loss always nan) !!!
Print out the output matrices found that it's always NANs !!!
acc: 0.9049
prec0: 0.9049
prec1: nan
rec0: 1.0000
rec1: 0.0000
auroc: 0.5000
auprc: 0.5476
minpse: 0.0951
f1macro: 0.4750
Epoch 1: 92%|█████████████████████████████████████████████████████████████████████████████████▎ | 452/489 [00:35<00:02, 12.78it/s, loss=nan, v_num=83]
I tried to trace the source of the error and the NaNs come after lstm layer this line:
eICU-GNN-LSTM/src/models/lstm.py
Line 39 in 5167eea
Please correct me if I'm wrong ... Thanks a lot in advance...
Note: I have used the same version of packages listed on the requirements.txt file
The text was updated successfully, but these errors were encountered: