-
-
Notifications
You must be signed in to change notification settings - Fork 884
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tag posts with Language #440
Comments
I'm not saying no to this, I'm still open to it, but here are my concerns:
|
I dont think thats good enough. Keep in mind that most people in the world speak more than one language. And there are a lot of regions where more than one language is spoken. As a concrete example, I live in the Basque country, where Basque and Spanish are official languages (both are spoken by a lot of people). Making seperate Lemmy communities makes little sense in that context, and splits the userbase for no good reason.
Sure, but we will also federate with those instances, and then we need a mechanism to hide those posts from users who dont speak the language.
Then we can also add a language to comments. Or even auto-detect the language (Mastodon does that afaik).
As above, we could use some automated tool to assign a language to every post (considering the languages the user has in their profile).
Agreed, we will probably need it on all levels at some point. But we can start with a more basic implementation. |
By the way, we're running peertube.social with multiple languages and its working fine. We have a couple of mods who speak different languages, and if a language is not spoken, I've made good experiences with asking people on Mastodon for moderation advice. |
As a native non-English speaker, but an English language consumer, it would benefit me to have some kind of option to at least filter content based no possible languages. For instance, right now i see on the DEV instance from Lemmy that there are a lot of Spanish posts, there is no way for me to understand any of that. So to have some kind of mechanism to be able to filter those posts out, would be helpful. Another way that tagging a post with a language (or multiple languages), compared to communities, could be beneficial is that we have more control over easier translation, provide better meta data about which language a certain post is in etc. This wouldn't prevent communities from being started that are focused on a particular language. |
One thing that would make me happy here, is if the language detection of content was automatic... maybe using something like this: https://github.com/wooorm/franc But that also scares me, because the language detection should be on the back-end, since any future clients wouldn't necessarily be in javascript. Even aside from that, it means that every major content table now needs a bridge table (since a single post / comment can have multiple languages) like :
|
Language detection should definitely be on the backend, anything else doesnt make much sense. I found this Rust library which looks good. We can also give it a blacklist/whitelist of languages based on the user profile, which should make detection more accurate. https://github.com/greyblake/whatlang-rs I dont think we need to support multiple languages in a single post/comment, at least not for now. |
I'd be okay with implementing this, at least on the principle of things only being tagged with one language. Some things I could see this needing:
|
Here is a library which might be helpful (in addition to manual language selection). |
I updated the first comment with details on how to implement this. In fact its not that complicated, mainly needs some changes in the database code and in the frontend. I wouldnt use automated language detection for the initial version, because in most cases we can already guess the language based on the parent post/comment language (or remember the last selection for posts). |
Similar to the way Peertube does it, post creation could have a language dropbown. That would allow users to filter by language, and only see those posts they can understand. This would make it easier to have multiple different languages in the same server/community.
Reddit doesnt support this at all, so users in other languages have to go to their own subreddit, and all the big ones are completely dominated by English. Honestly what you expect from an American company when it comes to language support, so I think we can do much better.
Edit: relevant section in activitypub spec
The text was updated successfully, but these errors were encountered: