TypeError: classification_report() got an unexpected keyword argument 'target_names' #93

guggio · 2019-09-18T21:49:50Z

Hi guys,

I really love your pre-trained models and that they are so easy to implement with FARM. I was classifying German text according to the tutorial and was able to adopt it to use it on my own dataset. However, for the BIO-classification of my texts, I keep on getting the same error message during training, despite closely following your tutorial code.

Thanks in advance,
Sebastian

Error message

TypeError: Traceback (most recent call last)
<ipython-input-21-17cc446f9014> in <module>()
----> 1 model = trainer.train(model)
1 frames
/usr/local/lib/python3.6/dist-packages/farm/eval.py in eval(self, model)
    121                 else:
    122                     result["report"] = report_fn(
--> 123                         label_all[head_num], preds_all[head_num], digits=4, target_names=head.label_list)
    124 
    125             all_results.append(result)
TypeError: classification_report() got an unexpected keyword argument 'target_names'

Here you can find my code that worked fine until training.

tokenizer = BertTokenizer.from_pretrained(
    pretrained_model_name_or_path="bert-base-german-cased",
    do_lower_case=False)

ner_processor = NERProcessor(tokenizer=tokenizer, 
                             max_seq_len=512, 
                             data_dir="",
                             train_filename='train.txt',
                             dev_filename=None,
                             dev_split=0.1,
                             labels=["[PAD]","X","B","I","O"]
                             )
ner_labels = ["[PAD]","X", "B", "I", "O"]
ner_processor.add_task("ner", "seq_f1", ner_labels)

LAYER_DIMS = [768, 5]
ner_prediction_head = TokenClassificationHead(layer_dims=LAYER_DIMS)

BATCH_SIZE = 8
EMBEDS_DROPOUT_PROB = 0.1
LEARNING_RATE = 2e-5
WARMUP_PROPORTION = 0.1
N_EPOCHS = 1
N_GPU = 1

data_silo = DataSilo(
    processor=ner_processor,
    batch_size=BATCH_SIZE)

model = AdaptiveModel(
    language_model=language_model,
    prediction_heads=[ner_prediction_head],
    embeds_dropout_prob=EMBEDS_DROPOUT_PROB,
    lm_output_types=["per_token"],
    device=device)

optimizer, warmup_linear = initialize_optimizer(
    model=model,
    learning_rate=LEARNING_RATE,
    warmup_proportion=WARMUP_PROPORTION,
    n_batches=len(data_silo.loaders["train"]),
    n_epochs=N_EPOCHS)

trainer = Trainer(
    optimizer=optimizer,
    data_silo=data_silo,
    epochs=N_EPOCHS,
    n_gpu=N_GPU,
    warmup_linear=warmup_linear,
    device=device,
)

System:

OS: Google Colab
GPU/CPU: GPU
FARM version: farm 0.2.0

The text was updated successfully, but these errors were encountered:

guggio · 2019-09-19T07:55:39Z

I guess the problem lies in the eval.py file. Since I am doing "per_token" classification, it takes the seqeval.metrics.classification_report which does not take target_names as input.

from seqeval.metrics import classification_report as token_classification_report
from sklearn.metrics import classification_report

...

if self.classification_report:
                if head.ph_output_type == "per_token":
                    report_fn = token_classification_report
                elif head.ph_output_type == "per_sequence":
                    report_fn = classification_report
                elif head.ph_output_type == "per_token_squad":
                    report_fn = lambda *args, **kwargs: "not Implemented"
                elif head.ph_output_type == "per_sequence_continuous":
                    report_fn = r2_score
                else:
                    raise NotImplementedError

                # CHANGE PARAMETERS, not all report_fn accept digits
                if head.ph_output_type == "per_sequence_continuous":
                    result["report"] = report_fn(
                        label_all[head_num], preds_all[head_num]
                    )
                else:
                    result["report"] = report_fn(
                        label_all[head_num], preds_all[head_num], digits=4, target_names=head.label_list)

Timoeller · 2019-09-19T08:35:12Z

Hey @guggio thanks for reporting the issue and also for finding the problematic code.

We already fixed this bug and are merging quite some changes into master this minute. Please update your farm installation with a git pull if you installed through "pip install --editable ." Please keep me updated if that resolved your issue. Thanks!

guggio · 2019-09-19T12:17:42Z

Hi @Timoeller thanks a lot for the fix! I can run my BIO-classification model now. However, there is an additional issues I faced:
My NERProcessor settings from above do not work anymore, since setting a dev_split != 0.0 throws the following error message:

/usr/local/lib/python3.6/dist-packages/farm/data_handler/data_silo.py in _get_dataset(self, filename)
     65                     dicts = random.shuffle(dicts)
     66 
---> 67         dict_batches_to_process = int(len(dicts) / self.multiprocessing_chunk_size)
     68         num_cpus = min(mp.cpu_count(), self.max_processes,  dict_batches_to_process) or 1
     69 
TypeError: object of type 'NoneType' has no len()

Timoeller · 2019-09-19T12:32:22Z

We changed the way the dev set is splitted away from train set, there might be some issues there that didn't come up during our test pipeline. I will look into that next.

There also seems to be a small mistake in the classification report. I am currently working on that, too.

Timoeller · 2019-09-19T12:57:44Z

Ok, we fixed the bug with the classification report.

The issue you reported on a dev_split != 0.0 seems to be coming from a error in _file_to_dicts() in processor.py file, because the dicts returned from that function is None.
In _file_to_dicts is a function called read_ner_file() that has quite some limitations on what it can read. Please have a look here to see examples of how the input format should look like.
Otherwise you could adjust the _file_to_dicts() and create a own data processor.

Maybe you can post the input format of your NER data, so we could help you out.

guggio · 2019-09-19T16:12:01Z

Thanks a lot for the bugfix and your help :) I had to write two textfiles based on my dataframe (probably not the most efficient solution).

I was able to run the model on my data and it seems like I am achieving very promising results.

Thanks again for your great work!

Timoeller · 2019-09-19T16:28:14Z

Nice! If you have good results to share we are more than happy to celebrate with you : )

guggio added the bug Something isn't working label Sep 18, 2019

Timoeller self-assigned this Sep 19, 2019

guggio closed this as completed Sep 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TypeError: classification_report() got an unexpected keyword argument 'target_names' #93

TypeError: classification_report() got an unexpected keyword argument 'target_names' #93

guggio commented Sep 18, 2019

guggio commented Sep 19, 2019

Timoeller commented Sep 19, 2019

guggio commented Sep 19, 2019 •

edited

Loading

Timoeller commented Sep 19, 2019

Timoeller commented Sep 19, 2019 •

edited

Loading

guggio commented Sep 19, 2019

Timoeller commented Sep 19, 2019

TypeError: classification_report() got an unexpected keyword argument 'target_names' #93

TypeError: classification_report() got an unexpected keyword argument 'target_names' #93

Comments

guggio commented Sep 18, 2019

guggio commented Sep 19, 2019

Timoeller commented Sep 19, 2019

guggio commented Sep 19, 2019 • edited Loading

Timoeller commented Sep 19, 2019

Timoeller commented Sep 19, 2019 • edited Loading

guggio commented Sep 19, 2019

Timoeller commented Sep 19, 2019

guggio commented Sep 19, 2019 •

edited

Loading

Timoeller commented Sep 19, 2019 •

edited

Loading