-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One word comments don't pass the filter #1
Comments
Off the top of my head, we could:
I think option 1 would be the simplest way and might work well enough, although right now we don't save all posts with an undetected language, so it'll be a bit hard to test. I'll change it so it saves them as having an "unknown" language so we have something to backtest on it in a few days - 1f50067 |
Writing here since I've reached my limit on twitter DMs today 💁♀️ |
I'm actually looking into 3 right now, I've managed to make facebook's |
Actually, we don't need to default to hebrew - when the model doesn't know it defaults to |
Hebrew detection results, out of all the posts that have any hebrew characters: I tried some model combinations but it didn't improve percentages significantly. This looks like an easy decision, to go ahead with FastText Compressed. If anyone's interested in reviewing, I've thrown all the model results into sqlite for easy querying. |
I think this should mostly be resolved with FastText, we'll revisit if needed |
I saw this happening with comments where they were one word (perfectly valid hebrew words) but they weren't showing up on the "with comments" feed. This might be happening with one word posts as well, if it matters I can check it out.
I'm opening an issue as a way to communicate about this and to remember it when I have some free time to open a PR to fix it.
If anyone happens to stumble across this issue - would the best fix be to just "hardcoded" enable posts of one word that have hebrew letters? Is there a smarter solution?
The text was updated successfully, but these errors were encountered: