"TypeError: not a sequence" when running "Minimal Start for Multilabel Classification" #42

GillesJ · 2019-11-13T14:04:21Z

Describe the bug
I wanted to test this package for multilabel so I tried the example code for "Minimal Start for Multilabel Classification".

To Reproduce

Copy example code from README.md to clipboard (reproduced here) and paste into file multilabelmve.py:

from simpletransformers.classification import MultiLabelClassificationModel
import pandas as pd


# Train and Evaluation data needs to be in a Pandas Dataframe containing at least two columns, a 'text' and a 'labels' column. The `labels` column should contain multi-hot encoded lists.
train_data = [['Example sentence 1 for multilabel classification.', [1, 1, 1, 1, 0, 1]]] + [['This is another example sentence. ', [0, 1, 1, 0, 0, 0]]]
train_df = pd.DataFrame(train_data, columns=['text', 'labels'])
train_df = pd.DataFrame(train_data)

eval_data = [['Example eval sentence for multilabel classification.', [1, 1, 1, 1, 0, 1]], ['Another example eval sentence.', **0**], ['Example eval senntence belonging to class 2', [0, 1, 1, 0, 0, 0]]]
eval_df = pd.DataFrame(eval_data)

# Create a MultiLabelClassificationModel
model = MultiLabelClassificationModel('roberta', 'roberta-base', num_labels=6, args={'reprocess_input_data': True, 'overwrite_output_dir': True, 'num_train_epochs': 5})
print(train_df.head())

# Train the model
model.train_model(train_df)

# Evaluate the model
result, model_outputs, wrong_predictions = model.eval_model(eval_df)
print(result)
print(model_outputs)

predictions, raw_outputs = model.predict(['This thing is entirely different from the other thing. '])
print(predictions)
print(raw_outputs)

From active simpletransformers conda environment run: python multilabelmve.py
The model trains fine but fails on evaluation @ line 21 model.eval_model(eval_df)
Error trace:

Traceback (most recent call last):
  File "multilabelmve.py", line 21, in <module>
    result, model_outputs, wrong_predictions = model.eval_model(eval_df)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/multi_label_classification_model.py", line 103, in eval_model
    return super().eval_model(eval_df, output_dir=output_dir, multi_label=multi_label, verbose=verbose, **kwargs)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/classification_model.py", line 307, in eval_model
    result, model_outputs, wrong_preds = self.evaluate(eval_df, output_dir, multi_label=multi_label, **kwargs)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/multi_label_classification_model.py", line 106, in evaluate
    return super().evaluate(eval_df, output_dir, multi_label=multi_label, prefix=prefix, **kwargs)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/classification_model.py", line 337, in evaluate
    eval_dataset = self.load_and_cache_examples(eval_examples, evaluate=True)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/multi_label_classification_model.py", line 109, in load_and_cache_examples
    return super().load_and_cache_examples(examples, evaluate=evaluate, no_cache=no_cache, multi_label=multi_label)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/classification_model.py", line 446, in load_and_cache_examples
    all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.long)
TypeError: not a sequence

Expected behavior
Evaluation in minimal example for multilabel classification works.
I figured out that the value for
[f.label_id for f in features] is [[1, 1, 1, 1, 0, 1], 0, [0, 1, 1, 0, 0, 0]] which is probably not correct, because to input is not a one-hot encoding list but simple int 0.

Desktop (please complete the following information):

Ubuntu 18.04
All requirements except Apex installed following README.md

The text was updated successfully, but these errors were encountered:

ThilinaRajapakse · 2019-11-13T14:20:34Z

Thanks for this. Looks like there was a typo in the minimal example.

eval_data = [['Example eval sentence for multilabel classification.', [1, 1, 1, 1, 0, 1]], ['Another example eval sentence.', **0**], ['Example eval senntence belonging to class 2', [0, 1, 1, 0, 0, 0]]]

The labels on the second example are messed up. Should be good now!

GillesJ · 2019-11-13T14:23:40Z

I am making a PR to fix the README, should I also add all examples in seperate to an examples/ directory at project root while I am at it? e.g. ./examples/multiclass.py, ./examples/multilabel.py, etc

I think it would help discovery for this project because people can quickly test code.

ThilinaRajapakse · 2019-11-13T14:26:31Z

Yeah, that would be great, thanks!

ThilinaRajapakse mentioned this issue Nov 13, 2019

getting error while making model #41

Closed

ThilinaRajapakse closed this as completed Nov 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"TypeError: not a sequence" when running "Minimal Start for Multilabel Classification" #42

"TypeError: not a sequence" when running "Minimal Start for Multilabel Classification" #42

GillesJ commented Nov 13, 2019 •

edited

ThilinaRajapakse commented Nov 13, 2019

GillesJ commented Nov 13, 2019

ThilinaRajapakse commented Nov 13, 2019

"TypeError: not a sequence" when running "Minimal Start for Multilabel Classification" #42

"TypeError: not a sequence" when running "Minimal Start for Multilabel Classification" #42

Comments

GillesJ commented Nov 13, 2019 • edited

ThilinaRajapakse commented Nov 13, 2019

GillesJ commented Nov 13, 2019

ThilinaRajapakse commented Nov 13, 2019

GillesJ commented Nov 13, 2019 •

edited