New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pickling error while retraining [BUG] #111
Comments
|
@davebulaval, Fasttext without attention. Python 3.8.10, Poutyne 1.8 |
That is what I thought, we have a lot of difficulties with fasttext on Windows. |
@davebulaval, I tried it on BPEmb and ran into this error -
|
Yes, see your other issue (#112). To me, it looks like some data points are empty. I will try to investigate Windows and fasttext error next week. |
@davebulaval, you're right there were some missing data points. Sorry to trouble you. I'll close both the issues. |
@ChargedMonk no problem. I'm working RN on a fix to improve the error handling to give users more insight into the problem. |
@davebulaval, I've ran into a couple of issues now -
Update: this (1st issue) happens when the model early stops in the middle of 1st epoch. |
The first issue is normal if at least one epoch is not processed. |
The language is English but this time there are no empty addresses as Fasttext did train on it but BPEmb ran into this issue (on both Windows & Linux) |
Will take a look. Can you share you're dataset and code to ease the debug? |
I'm not allowed to share the dataset but I'll share with you some examples and the code.
https://drive.google.com/drive/folders/1g1YMAcYJgG9yQKEgqArk4iSdqlaVU31r?usp=sharing |
Looking again at the part where it break, I am 100% sure that it is an "empty" address. This part of the code sends to bpemb package the address to construct a byte-pair embedding. Then, we construct a list of the decomposition len. My hypothesis is that you have at least one address that is just a |
Yeah maybe it was buggy on my part but I don't know why fasttext trained on it while bpemb didn't. Anyways, I recreated the dataset (carefully this time) and it seems to be working. Sorry for all the trouble. Great project btw. |
It is normal since fasttext and BPEmb don't use the same parsing of words tokens. BPEmb iterates over the token and uses byte-pair encoding. Thus, an address of only whitespace ( In the next version, this type of error will be handled better. |
Describe the bug
To Reproduce
I'm trying to train on custom tags on my own data like this -
Desktop (please complete the following information):
The text was updated successfully, but these errors were encountered: