Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

always killed by OS #119

Open
parkourcx opened this issue Sep 19, 2019 · 2 comments
Open

always killed by OS #119

parkourcx opened this issue Sep 19, 2019 · 2 comments

Comments

@parkourcx
Copy link

batch size 64, killed by OS after 123 steps

@parkourcx
Copy link
Author

run Elmo example, using my own data, which is formatted the same as the example data("word[tab]tag" each line)
training file: about 130mb;
training batch_size: tried from 32 to 512;
training epoch:1;
elmo model: my own trained Elmo ;
own Elmo options file:
{"lstm": {"use_skip_connections": true, "projection_dim": 512, "cell_clip": 3, "proj_clip": 3, "dim": 4096, "n_layers": 2}, "char_cnn": {"activation": "relu", "filters": [[1, 32], [2, 32], [3, 64], [4, 128], [5, 256], [6, 512], [7, 1024]], "n_highway": 2, "embedding": {"dim": 16}, "n_characters": 262, "max_characters_per_token": 50}};
other training option: is set to default;

OS: ubuntu 18.04;
keras: 2.2.4;
tensorflow-gpu: 1.13.1
GPU:Nvidia 1080 Ti (12GB Mem)
RAM: 128GB

The situation is : after training some steps(according to batch_size), program will be killed by system, but I don't see a system or GPU memory leak.
The question is: how did that happen? What did I do wrong? Is it my batch_size set too much or my training data too big? Someone HELP!!!

@parkourcx
Copy link
Author

but I do see a lot of process going on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant