Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab TPU : process terminated with signal SIGKILL #1590

Closed
astariul opened this issue Apr 24, 2020 · 5 comments
Closed

Colab TPU : process terminated with signal SIGKILL #1590

astariul opened this issue Apr 24, 2020 · 5 comments
Labels
bug Something isn't working help wanted Open to be worked on

Comments

@astariul
Copy link
Contributor

astariul commented Apr 24, 2020

🐛 Bug

I'm trying to train BART (with transformers library) on Colab TPU. I followed the TPU documentation of Pytorch Lightning, but before the training can start, I receive the following error :

Exception: process 0 terminated with signal SIGKILL

To Reproduce

I'm using the official example for text summarization on transformers library : https://github.com/huggingface/transformers/blob/master/examples/summarization/bart/finetune.py

Here is the full stack trace :

INFO:transformers.configuration_utils:loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-base-discriminator/config.json from cache at /root/.cache/torch/transformers/9236d197566a7f1be2b2151f5afcc5a8e17f31e1e23c52f3cdf2340019986e78.3de31bca490b759d81268bc95fdc9ab61f970ee46716ae8b25a1f4f1aba766e7
INFO:transformers.configuration_utils:Model config ElectraConfig {
  "_num_labels": 2,
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bad_words_ids": null,
  "bos_token_id": null,
  "decoder_start_token_id": null,
  "do_sample": false,
  "early_stopping": false,
  "embedding_size": 768,
  "eos_token_id": null,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "is_decoder": false,
  "is_encoder_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_eps": 1e-12,
  "length_penalty": 1.0,
  "max_length": 20,
  "max_position_embeddings": 512,
  "min_length": 0,
  "model_type": "electra",
  "no_repeat_ngram_size": 0,
  "num_attention_heads": 12,
  "num_beams": 1,
  "num_hidden_layers": 12,
  "num_return_sequences": 1,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pad_token_id": 0,
  "prefix": null,
  "pruned_heads": {},
  "repetition_penalty": 1.0,
  "task_specific_params": null,
  "temperature": 1.0,
  "top_k": 50,
  "top_p": 1.0,
  "torchscript": false,
  "type_vocab_size": 2,
  "use_bfloat16": false,
  "vocab_size": 30522
}

INFO:transformers.configuration_utils:loading configuration file https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-base-discriminator/config.json from cache at /root/.cache/torch/transformers/9236d197566a7f1be2b2151f5afcc5a8e17f31e1e23c52f3cdf2340019986e78.3de31bca490b759d81268bc95fdc9ab61f970ee46716ae8b25a1f4f1aba766e7
INFO:transformers.configuration_utils:Model config ElectraConfig {
  "_num_labels": 2,
  "architectures": [
    "ElectraForPreTraining"
  ],
  "attention_probs_dropout_prob": 0.1,
  "bad_words_ids": null,
  "bos_token_id": null,
  "decoder_start_token_id": null,
  "do_sample": false,
  "early_stopping": false,
  "embedding_size": 768,
  "eos_token_id": null,
  "finetuning_task": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "is_decoder": false,
  "is_encoder_decoder": false,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1
  },
  "layer_norm_eps": 1e-12,
  "length_penalty": 1.0,
  "max_length": 20,
  "max_position_embeddings": 512,
  "min_length": 0,
  "model_type": "electra",
  "no_repeat_ngram_size": 0,
  "num_attention_heads": 12,
  "num_beams": 1,
  "num_hidden_layers": 12,
  "num_return_sequences": 1,
  "output_attentions": false,
  "output_hidden_states": false,
  "output_past": true,
  "pad_token_id": 0,
  "prefix": null,
  "pruned_heads": {},
  "repetition_penalty": 1.0,
  "task_specific_params": null,
  "temperature": 1.0,
  "top_k": 50,
  "top_p": 1.0,
  "torchscript": false,
  "type_vocab_size": 2,
  "use_bfloat16": false,
  "vocab_size": 30522
}

INFO:transformers.tokenization_utils:loading file https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-base-discriminator/vocab.txt from cache at /root/.cache/torch/transformers/ff085885d4c95651587af553adadd34a26de8a663f2cef709635b48b3bed2bbd.542ce4285a40d23a559526243235df47c5f75c197f04f37d1a0c124c32c9a084
INFO:transformers.modeling_utils:loading weights file https://s3.amazonaws.com/models.huggingface.co/bert/google/electra-base-discriminator/pytorch_model.bin from cache at /root/.cache/torch/transformers/3c8e97e5021532563898ceb491dbfbc068ab4cb9eaa31f555990b9993e3228b4.b7514d01ce5acfe02313470cce3175018852a5e8cbcb8784268ab87dc21daf4c
INFO:transformers.modeling_utils:Weights from pretrained model not used in ElectraModel: ['electra.embeddings_project.weight', 'electra.embeddings_project.bias']
/usr/local/lib/python3.6/dist-packages/pytorch_lightning/utilities/warnings.py:18: RuntimeWarning: You have defined a `val_dataloader()` and have defined a `validation_step()`, you may also want to define `validation_epoch_end()` for accumulating stats.
  warnings.warn(*args, **kwargs)
=> Dataset loaded from cache (./cnndm/ElectraTokenizer_cache_train.pt)
=> Dataset loaded from cache (./cnndm/ElectraTokenizer_cache_val.pt)
INFO:lightning:training on 8 TPU cores
=> Dataset loaded from cache (./cnndm/ElectraTokenizer_cache_test.pt)
INFO:lightning:INIT TPU local core: 0, global rank: 0
INFO:lightning:INIT TPU local core: 2, global rank: 2
INFO:lightning:INIT TPU local core: 3, global rank: 3
INFO:lightning:INIT TPU local core: 1, global rank: 1
INFO:lightning:INIT TPU local core: 4, global rank: 4
INFO:lightning:INIT TPU local core: 5, global rank: 5
INFO:lightning:INIT TPU local core: 6, global rank: 6
INFO:lightning:INIT TPU local core: 7, global rank: 7
INFO:lightning:
    | Name                                              | Type              | Params
------------------------------------------------------------------------------------
0   | model                                             | ElectraModel      | 108 M 
1   | model.embeddings                                  | ElectraEmbeddings | 23 M  
2   | model.embeddings.word_embeddings                  | Embedding         | 23 M  
3   | model.embeddings.position_embeddings              | Embedding         | 393 K 
4   | model.embeddings.token_type_embeddings            | Embedding         | 1 K   
5   | model.embeddings.LayerNorm                        | LayerNorm         | 1 K   
6   | model.embeddings.dropout                          | Dropout           | 0     
7   | model.encoder                                     | BertEncoder       | 85 M  
8   | model.encoder.layer                               | ModuleList        | 85 M  
9   | model.encoder.layer.0                             | BertLayer         | 7 M   
10  | model.encoder.layer.0.attention                   | BertAttention     | 2 M   
11  | model.encoder.layer.0.attention.self              | BertSelfAttention | 1 M   
12  | model.encoder.layer.0.attention.self.query        | Linear            | 590 K 
13  | model.encoder.layer.0.attention.self.key          | Linear            | 590 K 
14  | model.encoder.layer.0.attention.self.value        | Linear            | 590 K 
15  | model.encoder.layer.0.attention.self.dropout      | Dropout           | 0     
16  | model.encoder.layer.0.attention.output            | BertSelfOutput    | 592 K 
17  | model.encoder.layer.0.attention.output.dense      | Linear            | 590 K 
18  | model.encoder.layer.0.attention.output.LayerNorm  | LayerNorm         | 1 K   
19  | model.encoder.layer.0.attention.output.dropout    | Dropout           | 0     
20  | model.encoder.layer.0.intermediate                | BertIntermediate  | 2 M   
21  | model.encoder.layer.0.intermediate.dense          | Linear            | 2 M   
22  | model.encoder.layer.0.output                      | BertOutput        | 2 M   
23  | model.encoder.layer.0.output.dense                | Linear            | 2 M   
24  | model.encoder.layer.0.output.LayerNorm            | LayerNorm         | 1 K   
25  | model.encoder.layer.0.output.dropout              | Dropout           | 0     
26  | model.encoder.layer.1                             | BertLayer         | 7 M   
27  | model.encoder.layer.1.attention                   | BertAttention     | 2 M   
28  | model.encoder.layer.1.attention.self              | BertSelfAttention | 1 M   
29  | model.encoder.layer.1.attention.self.query        | Linear            | 590 K 
30  | model.encoder.layer.1.attention.self.key          | Linear            | 590 K 
31  | model.encoder.layer.1.attention.self.value        | Linear            | 590 K 
32  | model.encoder.layer.1.attention.self.dropout      | Dropout           | 0     
33  | model.encoder.layer.1.attention.output            | BertSelfOutput    | 592 K 
34  | model.encoder.layer.1.attention.output.dense      | Linear            | 590 K 
35  | model.encoder.layer.1.attention.output.LayerNorm  | LayerNorm         | 1 K   
36  | model.encoder.layer.1.attention.output.dropout    | Dropout           | 0     
37  | model.encoder.layer.1.intermediate                | BertIntermediate  | 2 M   
38  | model.encoder.layer.1.intermediate.dense          | Linear            | 2 M   
39  | model.encoder.layer.1.output                      | BertOutput        | 2 M   
40  | model.encoder.layer.1.output.dense                | Linear            | 2 M   
41  | model.encoder.layer.1.output.LayerNorm            | LayerNorm         | 1 K   
42  | model.encoder.layer.1.output.dropout              | Dropout           | 0     
43  | model.encoder.layer.2                             | BertLayer         | 7 M   
44  | model.encoder.layer.2.attention                   | BertAttention     | 2 M   
45  | model.encoder.layer.2.attention.self              | BertSelfAttention | 1 M   
46  | model.encoder.layer.2.attention.self.query        | Linear            | 590 K 
47  | model.encoder.layer.2.attention.self.key          | Linear            | 590 K 
48  | model.encoder.layer.2.attention.self.value        | Linear            | 590 K 
49  | model.encoder.layer.2.attention.self.dropout      | Dropout           | 0     
50  | model.encoder.layer.2.attention.output            | BertSelfOutput    | 592 K 
51  | model.encoder.layer.2.attention.output.dense      | Linear            | 590 K 
52  | model.encoder.layer.2.attention.output.LayerNorm  | LayerNorm         | 1 K   
53  | model.encoder.layer.2.attention.output.dropout    | Dropout           | 0     
54  | model.encoder.layer.2.intermediate                | BertIntermediate  | 2 M   
55  | model.encoder.layer.2.intermediate.dense          | Linear            | 2 M   
56  | model.encoder.layer.2.output                      | BertOutput        | 2 M   
57  | model.encoder.layer.2.output.dense                | Linear            | 2 M   
58  | model.encoder.layer.2.output.LayerNorm            | LayerNorm         | 1 K   
59  | model.encoder.layer.2.output.dropout              | Dropout           | 0     
60  | model.encoder.layer.3                             | BertLayer         | 7 M   
61  | model.encoder.layer.3.attention                   | BertAttention     | 2 M   
62  | model.encoder.layer.3.attention.self              | BertSelfAttention | 1 M   
63  | model.encoder.layer.3.attention.self.query        | Linear            | 590 K 
64  | model.encoder.layer.3.attention.self.key          | Linear            | 590 K 
65  | model.encoder.layer.3.attention.self.value        | Linear            | 590 K 
66  | model.encoder.layer.3.attention.self.dropout      | Dropout           | 0     
67  | model.encoder.layer.3.attention.output            | BertSelfOutput    | 592 K 
68  | model.encoder.layer.3.attention.output.dense      | Linear            | 590 K 
69  | model.encoder.layer.3.attention.output.LayerNorm  | LayerNorm         | 1 K   
70  | model.encoder.layer.3.attention.output.dropout    | Dropout           | 0     
71  | model.encoder.layer.3.intermediate                | BertIntermediate  | 2 M   
72  | model.encoder.layer.3.intermediate.dense          | Linear            | 2 M   
73  | model.encoder.layer.3.output                      | BertOutput        | 2 M   
74  | model.encoder.layer.3.output.dense                | Linear            | 2 M   
75  | model.encoder.layer.3.output.LayerNorm            | LayerNorm         | 1 K   
76  | model.encoder.layer.3.output.dropout              | Dropout           | 0     
77  | model.encoder.layer.4                             | BertLayer         | 7 M   
78  | model.encoder.layer.4.attention                   | BertAttention     | 2 M   
79  | model.encoder.layer.4.attention.self              | BertSelfAttention | 1 M   
80  | model.encoder.layer.4.attention.self.query        | Linear            | 590 K 
81  | model.encoder.layer.4.attention.self.key          | Linear            | 590 K 
82  | model.encoder.layer.4.attention.self.value        | Linear            | 590 K 
83  | model.encoder.layer.4.attention.self.dropout      | Dropout           | 0     
84  | model.encoder.layer.4.attention.output            | BertSelfOutput    | 592 K 
85  | model.encoder.layer.4.attention.output.dense      | Linear            | 590 K 
86  | model.encoder.layer.4.attention.output.LayerNorm  | LayerNorm         | 1 K   
87  | model.encoder.layer.4.attention.output.dropout    | Dropout           | 0     
88  | model.encoder.layer.4.intermediate                | BertIntermediate  | 2 M   
89  | model.encoder.layer.4.intermediate.dense          | Linear            | 2 M   
90  | model.encoder.layer.4.output                      | BertOutput        | 2 M   
91  | model.encoder.layer.4.output.dense                | Linear            | 2 M   
92  | model.encoder.layer.4.output.LayerNorm            | LayerNorm         | 1 K   
93  | model.encoder.layer.4.output.dropout              | Dropout           | 0     
94  | model.encoder.layer.5                             | BertLayer         | 7 M   
95  | model.encoder.layer.5.attention                   | BertAttention     | 2 M   
96  | model.encoder.layer.5.attention.self              | BertSelfAttention | 1 M   
97  | model.encoder.layer.5.attention.self.query        | Linear            | 590 K 
98  | model.encoder.layer.5.attention.self.key          | Linear            | 590 K 
99  | model.encoder.layer.5.attention.self.value        | Linear            | 590 K 
100 | model.encoder.layer.5.attention.self.dropout      | Dropout           | 0     
101 | model.encoder.layer.5.attention.output            | BertSelfOutput    | 592 K 
102 | model.encoder.layer.5.attention.output.dense      | Linear            | 590 K 
103 | model.encoder.layer.5.attention.output.LayerNorm  | LayerNorm         | 1 K   
104 | model.encoder.layer.5.attention.output.dropout    | Dropout           | 0     
105 | model.encoder.layer.5.intermediate                | BertIntermediate  | 2 M   
106 | model.encoder.layer.5.intermediate.dense          | Linear            | 2 M   
107 | model.encoder.layer.5.output                      | BertOutput        | 2 M   
108 | model.encoder.layer.5.output.dense                | Linear            | 2 M   
109 | model.encoder.layer.5.output.LayerNorm            | LayerNorm         | 1 K   
110 | model.encoder.layer.5.output.dropout              | Dropout           | 0     
111 | model.encoder.layer.6                             | BertLayer         | 7 M   
112 | model.encoder.layer.6.attention                   | BertAttention     | 2 M   
113 | model.encoder.layer.6.attention.self              | BertSelfAttention | 1 M   
114 | model.encoder.layer.6.attention.self.query        | Linear            | 590 K 
115 | model.encoder.layer.6.attention.self.key          | Linear            | 590 K 
116 | model.encoder.layer.6.attention.self.value        | Linear            | 590 K 
117 | model.encoder.layer.6.attention.self.dropout      | Dropout           | 0     
118 | model.encoder.layer.6.attention.output            | BertSelfOutput    | 592 K 
119 | model.encoder.layer.6.attention.output.dense      | Linear            | 590 K 
120 | model.encoder.layer.6.attention.output.LayerNorm  | LayerNorm         | 1 K   
121 | model.encoder.layer.6.attention.output.dropout    | Dropout           | 0     
122 | model.encoder.layer.6.intermediate                | BertIntermediate  | 2 M   
123 | model.encoder.layer.6.intermediate.dense          | Linear            | 2 M   
124 | model.encoder.layer.6.output                      | BertOutput        | 2 M   
125 | model.encoder.layer.6.output.dense                | Linear            | 2 M   
126 | model.encoder.layer.6.output.LayerNorm            | LayerNorm         | 1 K   
127 | model.encoder.layer.6.output.dropout              | Dropout           | 0     
128 | model.encoder.layer.7                             | BertLayer         | 7 M   
129 | model.encoder.layer.7.attention                   | BertAttention     | 2 M   
130 | model.encoder.layer.7.attention.self              | BertSelfAttention | 1 M   
131 | model.encoder.layer.7.attention.self.query        | Linear            | 590 K 
132 | model.encoder.layer.7.attention.self.key          | Linear            | 590 K 
133 | model.encoder.layer.7.attention.self.value        | Linear            | 590 K 
134 | model.encoder.layer.7.attention.self.dropout      | Dropout           | 0     
135 | model.encoder.layer.7.attention.output            | BertSelfOutput    | 592 K 
136 | model.encoder.layer.7.attention.output.dense      | Linear            | 590 K 
137 | model.encoder.layer.7.attention.output.LayerNorm  | LayerNorm         | 1 K   
138 | model.encoder.layer.7.attention.output.dropout    | Dropout           | 0     
139 | model.encoder.layer.7.intermediate                | BertIntermediate  | 2 M   
140 | model.encoder.layer.7.intermediate.dense          | Linear            | 2 M   
141 | model.encoder.layer.7.output                      | BertOutput        | 2 M   
142 | model.encoder.layer.7.output.dense                | Linear            | 2 M   
143 | model.encoder.layer.7.output.LayerNorm            | LayerNorm         | 1 K   
144 | model.encoder.layer.7.output.dropout              | Dropout           | 0     
145 | model.encoder.layer.8                             | BertLayer         | 7 M   
146 | model.encoder.layer.8.attention                   | BertAttention     | 2 M   
147 | model.encoder.layer.8.attention.self              | BertSelfAttention | 1 M   
148 | model.encoder.layer.8.attention.self.query        | Linear            | 590 K 
149 | model.encoder.layer.8.attention.self.key          | Linear            | 590 K 
150 | model.encoder.layer.8.attention.self.value        | Linear            | 590 K 
151 | model.encoder.layer.8.attention.self.dropout      | Dropout           | 0     
152 | model.encoder.layer.8.attention.output            | BertSelfOutput    | 592 K 
153 | model.encoder.layer.8.attention.output.dense      | Linear            | 590 K 
154 | model.encoder.layer.8.attention.output.LayerNorm  | LayerNorm         | 1 K   
155 | model.encoder.layer.8.attention.output.dropout    | Dropout           | 0     
156 | model.encoder.layer.8.intermediate                | BertIntermediate  | 2 M   
157 | model.encoder.layer.8.intermediate.dense          | Linear            | 2 M   
158 | model.encoder.layer.8.output                      | BertOutput        | 2 M   
159 | model.encoder.layer.8.output.dense                | Linear            | 2 M   
160 | model.encoder.layer.8.output.LayerNorm            | LayerNorm         | 1 K   
161 | model.encoder.layer.8.output.dropout              | Dropout           | 0     
162 | model.encoder.layer.9                             | BertLayer         | 7 M   
163 | model.encoder.layer.9.attention                   | BertAttention     | 2 M   
164 | model.encoder.layer.9.attention.self              | BertSelfAttention | 1 M   
165 | model.encoder.layer.9.attention.self.query        | Linear            | 590 K 
166 | model.encoder.layer.9.attention.self.key          | Linear            | 590 K 
167 | model.encoder.layer.9.attention.self.value        | Linear            | 590 K 
168 | model.encoder.layer.9.attention.self.dropout      | Dropout           | 0     
169 | model.encoder.layer.9.attention.output            | BertSelfOutput    | 592 K 
170 | model.encoder.layer.9.attention.output.dense      | Linear            | 590 K 
171 | model.encoder.layer.9.attention.output.LayerNorm  | LayerNorm         | 1 K   
172 | model.encoder.layer.9.attention.output.dropout    | Dropout           | 0     
173 | model.encoder.layer.9.intermediate                | BertIntermediate  | 2 M   
174 | model.encoder.layer.9.intermediate.dense          | Linear            | 2 M   
175 | model.encoder.layer.9.output                      | BertOutput        | 2 M   
176 | model.encoder.layer.9.output.dense                | Linear            | 2 M   
177 | model.encoder.layer.9.output.LayerNorm            | LayerNorm         | 1 K   
178 | model.encoder.layer.9.output.dropout              | Dropout           | 0     
179 | model.encoder.layer.10                            | BertLayer         | 7 M   
180 | model.encoder.layer.10.attention                  | BertAttention     | 2 M   
181 | model.encoder.layer.10.attention.self             | BertSelfAttention | 1 M   
182 | model.encoder.layer.10.attention.self.query       | Linear            | 590 K 
183 | model.encoder.layer.10.attention.self.key         | Linear            | 590 K 
184 | model.encoder.layer.10.attention.self.value       | Linear            | 590 K 
185 | model.encoder.layer.10.attention.self.dropout     | Dropout           | 0     
186 | model.encoder.layer.10.attention.output           | BertSelfOutput    | 592 K 
187 | model.encoder.layer.10.attention.output.dense     | Linear            | 590 K 
188 | model.encoder.layer.10.attention.output.LayerNorm | LayerNorm         | 1 K   
189 | model.encoder.layer.10.attention.output.dropout   | Dropout           | 0     
190 | model.encoder.layer.10.intermediate               | BertIntermediate  | 2 M   
191 | model.encoder.layer.10.intermediate.dense         | Linear            | 2 M   
192 | model.encoder.layer.10.output                     | BertOutput        | 2 M   
193 | model.encoder.layer.10.output.dense               | Linear            | 2 M   
194 | model.encoder.layer.10.output.LayerNorm           | LayerNorm         | 1 K   
195 | model.encoder.layer.10.output.dropout             | Dropout           | 0     
196 | model.encoder.layer.11                            | BertLayer         | 7 M   
197 | model.encoder.layer.11.attention                  | BertAttention     | 2 M   
198 | model.encoder.layer.11.attention.self             | BertSelfAttention | 1 M   
199 | model.encoder.layer.11.attention.self.query       | Linear            | 590 K 
200 | model.encoder.layer.11.attention.self.key         | Linear            | 590 K 
201 | model.encoder.layer.11.attention.self.value       | Linear            | 590 K 
202 | model.encoder.layer.11.attention.self.dropout     | Dropout           | 0     
203 | model.encoder.layer.11.attention.output           | BertSelfOutput    | 592 K 
204 | model.encoder.layer.11.attention.output.dense     | Linear            | 590 K 
205 | model.encoder.layer.11.attention.output.LayerNorm | LayerNorm         | 1 K   
206 | model.encoder.layer.11.attention.output.dropout   | Dropout           | 0     
207 | model.encoder.layer.11.intermediate               | BertIntermediate  | 2 M   
208 | model.encoder.layer.11.intermediate.dense         | Linear            | 2 M   
209 | model.encoder.layer.11.output                     | BertOutput        | 2 M   
210 | model.encoder.layer.11.output.dense               | Linear            | 2 M   
211 | model.encoder.layer.11.output.LayerNorm           | LayerNorm         | 1 K   
212 | model.encoder.layer.11.output.dropout             | Dropout           | 0     
213 | dropout                                           | Dropout           | 0     
214 | classifier                                        | Linear            | 769   
Validation sanity check: 0%
0/5 [00:00<?, ?it/s]
---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
<ipython-input-22-ab45ae238b6a> in <module>()
     41 args = parser.parse_args(cmd_args)
     42 
---> 43 main(args)

5 frames
/usr/local/lib/python3.6/dist-packages/torch/multiprocessing/spawn.py in join(self, timeout)
    106                 raise Exception(
    107                     "process %d terminated with signal %s" %
--> 108                     (error_index, name)
    109                 )
    110             else:

Exception: process 0 terminated with signal SIGKILL

Environment

* CUDA:
	- GPU:
	- available:         False
	- version:           None
* Packages:
	- numpy:             1.18.2
	- pyTorch_debug:     False
	- pyTorch_version:   1.6.0a0+b889e0d
	- pytorch-lightning: 0.7.3
	- tensorboard:       2.2.1
	- tqdm:              4.38.0
* System:
	- OS:                Linux
	- architecture:
		- 64bit
		- 
	- processor:         x86_64
	- python:            3.6.9
	- version:           #1 SMP Wed Feb 19 05:26:34 PST 2020

Additional context

I saw the issue #996 but I don't think it's the issue because my RAM does not appear to be full :

image

@astariul astariul added bug Something isn't working help wanted Open to be worked on labels Apr 24, 2020
@williamFalcon
Copy link
Contributor

you need the memory crash note at the top of the colab no?
@srush @dlibenzi

@dlibenzi
Copy link

That message is almost certainly the Linux kernel OOM killer getting rid of the fattiest process.
You can take a look at this thread where I posted some notebooks to try to overcomes the limited RAM in the Colab/Kaggle notebooks:

pytorch/xla#1870

Unfortunately those are not what our usual way of doing things is, and what Lightning adopts ATM.

@astariul
Copy link
Contributor Author

I could solve my specific problem by specifying num_worker=1 in my data loader (instead of 4)

@Mykrass
Copy link

Mykrass commented Apr 29, 2020

What is different colab vs kaggle? I had this problem only with colab...

@talhaanwarch
Copy link

I could solve my specific problem by specifying num_worker=1 in my data loader (instead of 4)

Did not worked for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on
Projects
None yet
Development

No branches or pull requests

5 participants