always killed by OS #119

parkourcx · 2019-09-19T09:09:43Z

batch size 64, killed by OS after 123 steps

parkourcx · 2019-09-20T04:14:33Z

run Elmo example, using my own data, which is formatted the same as the example data("word[tab]tag" each line)
training file: about 130mb;
training batch_size: tried from 32 to 512;
training epoch:1;
elmo model: my own trained Elmo ;
own Elmo options file:
{"lstm": {"use_skip_connections": true, "projection_dim": 512, "cell_clip": 3, "proj_clip": 3, "dim": 4096, "n_layers": 2}, "char_cnn": {"activation": "relu", "filters": [[1, 32], [2, 32], [3, 64], [4, 128], [5, 256], [6, 512], [7, 1024]], "n_highway": 2, "embedding": {"dim": 16}, "n_characters": 262, "max_characters_per_token": 50}};
other training option: is set to default;

OS: ubuntu 18.04;
keras: 2.2.4;
tensorflow-gpu: 1.13.1
GPU:Nvidia 1080 Ti (12GB Mem)
RAM: 128GB

The situation is : after training some steps(according to batch_size), program will be killed by system, but I don't see a system or GPU memory leak.
The question is: how did that happen? What did I do wrong? Is it my batch_size set too much or my training data too big? Someone HELP!!!

parkourcx · 2019-09-20T12:04:33Z

but I do see a lot of process going on

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

always killed by OS #119

always killed by OS #119

parkourcx commented Sep 19, 2019

parkourcx commented Sep 20, 2019

parkourcx commented Sep 20, 2019

always killed by OS #119

always killed by OS #119

Comments

parkourcx commented Sep 19, 2019

parkourcx commented Sep 20, 2019

parkourcx commented Sep 20, 2019