TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]] #1429
Comments
Hi @DrRaja, can you help in reproducing the error? |
Here's my code: import flash #Create the DataModule #Build the task #Create the trainer and finetune the model The 1st Epoch runs till 73% and then I get an error. This is the stack trace: 25 frames TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]` I've checked my dataset, it doesn't have any missing values or NULLs. |
Thanks for mentioning the error. I found this online. Did you see this? I think that, maybe due to the |
@uakarsh, thank you for the response. Yes I came across that link while I was searching for the solution, but I just didn't understand how/where can I set the token as I'm not directly importing the tokenizer. Can you please help me with that? |
Sure, if possible can you provide me the dataset, so that i can run on colab and try to debug it? Regards, |
@uakarsh, thank you I was able to get the code running. I was getting that error because of very short 2-3 character strings in the text. Thank you very much for your help. |
@DrRaja Good to hear that you have debugged the issue. Maybe, you can check for the fact that the labels are appropriate. Also, it could be the case, that if the confidence is not above a certain threshold, the empty list is being returned, or a high class imbalance is present. Maybe, we can use something like Oversampling or Undersampling for the same, since i have also faced the issue related to imbalance classes. How about adding a feature similar to it in flash @krshrimali ? |
Thanks for your responses guys, really appreciate it. Is it possible to change the confidence thresholds in the prediction functions? |
When you run this notebook, you would get an output like:
So, I think that there is some issue with the threshold if I am not wrong. Maybe @krshrimali can help us guide us about where to look for the threshold. I scrolled through the code, i.e of Flash and Lightning Trainer, but I tend to get a bit lost about how to go about it. |
Hey @uakarsh @DrRaja I think there you are just seeing the raw logits output from the model. To see e.g. probabilities you would need to use trainer.predict(model, datamodule=datamodule, output="probabilities") To get classes or labels output with a custom threshold, you can pass the
Hope that helps 😃 Closing this but please feel free to re-open if you have any other questions |
Hi,
While trying to finetune a bert-base model for multi-label text classification, I keep encountering this error. TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]. I looked around and saw people suggesting to check if there's some missing values of None values in the dataset. I've checked my dataset and its been properly preprocessed to remove an NaN and missing values. I even compared my dataset with the toy example's dataset of toxic comments, and the only difference I could see was the number of categories (in my case these are > 30).
Can anyone please help me on this one?
Thank you
The text was updated successfully, but these errors were encountered: