Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CTM training fails. #21

Closed
cidrugHug8 opened this issue Jul 7, 2021 · 5 comments
Closed

CTM training fails. #21

cidrugHug8 opened this issue Jul 7, 2021 · 5 comments

Comments

@cidrugHug8
Copy link

  • OCTIS version: 1.8.0
  • Python version: 3.8.10
  • Operating System: Ubuntu 20.04.02

Description

CTM training fails.

What I Did

dataset = Dataset()
dataset.load_custom_dataset_from_folder(DATASET_PATH)
model = CTM(num_topics=TOPIC_SIZE)
model_output = model.train_model(dataset)
save_model_output(model_output, MODEL_OUTPUT_PATH)
save_model_output(model, MODEL_PATH)

The following error message was displayed.

Batches:  84%|████████████████████████████████████████████████████████████████████████████████████████▌                 | 21790/26093 [59:43<11:47,  6.08it/s]
Traceback (most recent call last):
  File "train.py", line 62, in <module>
    model = ProdLDA(num_topics=TOPIC_SIZE)
  File "/usr/local/lib/python3.8/dist-packages/octis/models/CTM.py", line 95, in train_model
    x_train, x_test, x_valid, input_size = self.preprocess(
  File "/usr/local/lib/python3.8/dist-packages/octis/models/CTM.py", line 175, in preprocess
    b_train = CTM.load_bert_data(bert_train_path, train, bert_model)
  File "/usr/local/lib/python3.8/dist-packages/octis/models/CTM.py", line 208, in load_bert_data
    bert_ouput = bert_embeddings_from_list(texts, bert_model)
  File "/usr/local/lib/python3.8/dist-packages/octis/models/contextualized_topic_models/utils/data_preparation.py", line 35, in bert_embeddings_from_list
    return np.array(model.encode(texts, show_progress_bar=True, batch_size=batch_size))
  File "/usr/local/lib/python3.8/dist-packages/sentence_transformers/SentenceTransformer.py", line 160, in encode
    out_features = self.forward(features)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/sentence_transformers/models/Transformer.py", line 51, in forward
    output_states = self.auto_model(**trans_features, return_dict=False)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 991, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 582, in forward
    layer_outputs = layer_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 470, in forward
    self_attention_outputs = self.attention(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 401, in forward
    self_outputs = self.self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 305, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`
@cidrugHug8
Copy link
Author

Thanks for the advice.
I'll check my data set.

@cidrugHug8
Copy link
Author

corpus file: a .tsv file (tab-separated) that contains up to three columns, i.e. the document, the partitition, and the label associated to the document (optional).

Is it acceptable to leave the data in the third column empty?

@cidrugHug8
Copy link
Author

Python 3.8.10 (default, Jun  2 2021, 10:49:15) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from octis.dataset.dataset import Dataset
>>> dataset = Dataset()
>>> dataset.fetch_dataset("20NewsGroup")
>>> dataset.save('/home/root/20newsgroup')
>>> del dataset
>>> dataset = Dataset()
>>> dataset.load_custom_dataset_from_folder('/home/root/20newsgroup')
>>> from octis.dataset.dataset import Dataset
>>> from octis.models.CTM import CTM
>>> model = CTM(num_topics=25)
>>> model_output = model.train_model(dataset)
Batches:  77%|█████████████████████████████████████████████████████████████████████████████████████████████████████████▎                              | 89/115 [00:14<00:04,  5.95it/s]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.8/dist-packages/octis/models/CTM.py", line 95, in train_model
    x_train, x_test, x_valid, input_size = self.preprocess(
  File "/usr/local/lib/python3.8/dist-packages/octis/models/CTM.py", line 175, in preprocess
    b_train = CTM.load_bert_data(bert_train_path, train, bert_model)
  File "/usr/local/lib/python3.8/dist-packages/octis/models/CTM.py", line 208, in load_bert_data
    bert_ouput = bert_embeddings_from_list(texts, bert_model)
  File "/usr/local/lib/python3.8/dist-packages/octis/models/contextualized_topic_models/utils/data_preparation.py", line 35, in bert_embeddings_from_list
    return np.array(model.encode(texts, show_progress_bar=True, batch_size=batch_size))
  File "/usr/local/lib/python3.8/dist-packages/sentence_transformers/SentenceTransformer.py", line 160, in encode
    out_features = self.forward(features)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/sentence_transformers/models/Transformer.py", line 51, in forward
    output_states = self.auto_model(**trans_features, return_dict=False)
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 991, in forward
    encoder_outputs = self.encoder(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 582, in forward
    layer_outputs = layer_module(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 470, in forward
    self_attention_outputs = self.attention(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 401, in forward
    self_outputs = self.self(
  File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/transformers/models/bert/modeling_bert.py", line 305, in forward
    attention_scores = torch.matmul(query_layer, key_layer.transpose(-1, -2))
RuntimeError: CUDA error: CUBLAS_STATUS_EXECUTION_FAILED when calling `cublasSgemmStridedBatched( handle, opa, opb, m, n, k, &alpha, a, lda, stridea, b, ldb, strideb, &beta, c, ldc, stridec, num_batches)`

Is the code wrong?

@silviatti
Copy link
Collaborator

Yes, the third column can be missing.

I don't understand why you fetch the dataset, then save it, then delete the variable, and reload the dataset. You could just do fetch_dataset and run the model. Do you do something in between? However, the code should work (I tried it on colab).

I wonder if this is related to your GPU and CUDA version. Can you try to run the code on the CPU and see if it works?

Thanks,

Silvia

@cidrugHug8
Copy link
Author

I wanted to confirm the correct data set, so I saved it once.

Can you try to run the code on the CPU and see if it works?

It worked well. Hmm.

I wonder if this is related to your GPU and CUDA version.

Your remark seems to be correct. My environment is as follows.

GPU: Pascal TITAN X
Driver Version: 470.42.01
CUDA Version: 11.4

My CUDA version may be too high. Anyway, I'll try it with the CPU. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants