Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"TypeError: not a sequence" when running "Minimal Start for Multilabel Classification" #42

Closed
GillesJ opened this issue Nov 13, 2019 · 3 comments

Comments

@GillesJ
Copy link
Contributor

GillesJ commented Nov 13, 2019

Describe the bug
I wanted to test this package for multilabel so I tried the example code for "Minimal Start for Multilabel Classification".

To Reproduce

  1. Copy example code from README.md to clipboard (reproduced here) and paste into file multilabelmve.py:
from simpletransformers.classification import MultiLabelClassificationModel
import pandas as pd


# Train and Evaluation data needs to be in a Pandas Dataframe containing at least two columns, a 'text' and a 'labels' column. The `labels` column should contain multi-hot encoded lists.
train_data = [['Example sentence 1 for multilabel classification.', [1, 1, 1, 1, 0, 1]]] + [['This is another example sentence. ', [0, 1, 1, 0, 0, 0]]]
train_df = pd.DataFrame(train_data, columns=['text', 'labels'])
train_df = pd.DataFrame(train_data)

eval_data = [['Example eval sentence for multilabel classification.', [1, 1, 1, 1, 0, 1]], ['Another example eval sentence.', **0**], ['Example eval senntence belonging to class 2', [0, 1, 1, 0, 0, 0]]]
eval_df = pd.DataFrame(eval_data)

# Create a MultiLabelClassificationModel
model = MultiLabelClassificationModel('roberta', 'roberta-base', num_labels=6, args={'reprocess_input_data': True, 'overwrite_output_dir': True, 'num_train_epochs': 5})
print(train_df.head())

# Train the model
model.train_model(train_df)

# Evaluate the model
result, model_outputs, wrong_predictions = model.eval_model(eval_df)
print(result)
print(model_outputs)

predictions, raw_outputs = model.predict(['This thing is entirely different from the other thing. '])
print(predictions)
print(raw_outputs)
  1. From active simpletransformers conda environment run: python multilabelmve.py
  2. The model trains fine but fails on evaluation @ line 21 model.eval_model(eval_df)
    Error trace:
Traceback (most recent call last):
  File "multilabelmve.py", line 21, in <module>
    result, model_outputs, wrong_predictions = model.eval_model(eval_df)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/multi_label_classification_model.py", line 103, in eval_model
    return super().eval_model(eval_df, output_dir=output_dir, multi_label=multi_label, verbose=verbose, **kwargs)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/classification_model.py", line 307, in eval_model
    result, model_outputs, wrong_preds = self.evaluate(eval_df, output_dir, multi_label=multi_label, **kwargs)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/multi_label_classification_model.py", line 106, in evaluate
    return super().evaluate(eval_df, output_dir, multi_label=multi_label, prefix=prefix, **kwargs)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/classification_model.py", line 337, in evaluate
    eval_dataset = self.load_and_cache_examples(eval_examples, evaluate=True)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/multi_label_classification_model.py", line 109, in load_and_cache_examples
    return super().load_and_cache_examples(examples, evaluate=evaluate, no_cache=no_cache, multi_label=multi_label)
  File "/home/gilles/repos/simpletransformers/simpletransformers/classification/classification_model.py", line 446, in load_and_cache_examples
    all_label_ids = torch.tensor([f.label_id for f in features], dtype=torch.long)
TypeError: not a sequence

Expected behavior
Evaluation in minimal example for multilabel classification works.
I figured out that the value for
[f.label_id for f in features] is [[1, 1, 1, 1, 0, 1], 0, [0, 1, 1, 0, 0, 0]] which is probably not correct, because to input is not a one-hot encoding list but simple int 0.

Desktop (please complete the following information):

  • Ubuntu 18.04
  • All requirements except Apex installed following README.md
@ThilinaRajapakse
Copy link
Owner

Thanks for this. Looks like there was a typo in the minimal example.

eval_data = [['Example eval sentence for multilabel classification.', [1, 1, 1, 1, 0, 1]], ['Another example eval sentence.', **0**], ['Example eval senntence belonging to class 2', [0, 1, 1, 0, 0, 0]]]

The labels on the second example are messed up. Should be good now!

@GillesJ
Copy link
Contributor Author

GillesJ commented Nov 13, 2019

I am making a PR to fix the README, should I also add all examples in seperate to an examples/ directory at project root while I am at it? e.g. ./examples/multiclass.py, ./examples/multilabel.py, etc

I think it would help discovery for this project because people can quickly test code.

@ThilinaRajapakse
Copy link
Owner

Yeah, that would be great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants