Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-language posts #267

Open
lifenautjoe opened this issue Apr 2, 2019 · 10 comments
Open

Multi-language posts #267

lifenautjoe opened this issue Apr 2, 2019 · 10 comments

Comments

@lifenautjoe
Copy link
Member

@lifenautjoe lifenautjoe commented Apr 2, 2019

From Ronald on Slack

Perhaps it’s a good idea to be able to set preferred languages before we go public. If the trending timeline is full with posts written in chinese that is going to be a problem.

Possible solution is.. on on-boarding let the person select preferred languages, preselecting the current device language.

When a person is posting, we can try to detect the language and show this somewhere at all times.

The person can then tap this to override it if wrong. We can show the preferred languages list first.

After these two things are set, we can filter the timelines on language/s.

EDIT: See bottom for latest suggestion.

@lifenautjoe lifenautjoe added this to To do in Openbook World via automation Apr 2, 2019
@lifenautjoe

This comment has been minimized.

Copy link
Member Author

@lifenautjoe lifenautjoe commented Apr 2, 2019

Another option is having a translate button.

We can look into open-source, pretrained translations models and perhaps start from there?

http://opennmt.net/Models/

@ronaldvdmeer

This comment has been minimized.

Copy link

@ronaldvdmeer ronaldvdmeer commented Apr 4, 2019

Someone in the comments on OB mentioned: https://www.deepl.com/pro.html#pricing

@lifenautjoe

This comment has been minimized.

Copy link
Member Author

@lifenautjoe lifenautjoe commented May 4, 2019

We can detect the content of a language on posting locally with https://github.com/Mimino666/langdetect

@lifenautjoe

This comment has been minimized.

Copy link
Member Author

@lifenautjoe lifenautjoe commented May 5, 2019

So... We're bumping this up in prio and we'll pick it up right after reporting flows are done.

How it looks like so far is:

  1. Detect language locally on server with the langdetect library and store it as a post attribute.
  2. When someone retrieves the post, check if the post language matches the device language. *1
  3. If it does, do nothing, if it does not, show a Translate button.
  4. When translate is pressed, call a /postUuid/translate/ api with the desired language.
  5. The server calls an external translation API and returns the result *2

*1 Although device language might work for first iterations, this should become something like preferred language that can be bootstrapped to the device language.

*2 There's 2 options so far, deepl.com and AWS translation API.

Deepl looks like a great option being based in Germany and claiming to have strong privacy principles but.. it is another third party. Using amazon's translate would keep it all within the AWS ecosystem but they do say they "may" use the contents to improve their translation models.

Personally, I'd rather go with Deepl.

Thoughts welcomed as usual.

@lifenautjoe lifenautjoe added this to To do in Openbook Beta via automation May 5, 2019
@lifenautjoe lifenautjoe removed this from To do in Openbook World May 5, 2019
@schmitzel76

This comment has been minimized.

Copy link

@schmitzel76 schmitzel76 commented May 5, 2019

With regards to point 3, there should also be an option to never show a translate link for a certain language. My device is set to Dutch, but I don't want the translate button to appear for English posts. Google added a similar option after their translate function in Chrome generated a lot of backlash from multilingual people.

Language detection is not flawless and will get it wrong or does not support a language at all. How should those cases be handled? Should the poster be able to override it if needed?

@Komposten

This comment has been minimized.

Copy link
Member

@Komposten Komposten commented May 5, 2019

The downside with deepl (and maybe AWS) is that they only support a limited selection of languages (so far). Of course, a majority of the userbase will be covered with just English, German, French and Spanish, but the remaining few percent will have a lesser experience.

Bing and Google aren't really options, though, given privacy concerns.

@oliverzet

This comment has been minimized.

Copy link

@oliverzet oliverzet commented May 5, 2019

The quality of DeepL results is great but I agree that the limited range of available languages might become a problem.
Another thing is costs. I don't know about AWS but DeepL charges 4.99€ / month for developers plus 0.01ct per 500 characters.

@lifenautjoe

This comment has been minimized.

Copy link
Member Author

@lifenautjoe lifenautjoe commented May 5, 2019

Thanks for the info @oliverzet !

At this time, Amazon Translate supports translation between the following 21 languages: Arabic, Chinese (Simplified), Chinese (Traditional), Czech, Danish, Dutch, English, Finnish, French, German, Hebrew, Indonesian, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, and Turkish. Between these languages, the service supports 417 translation combinations

And for pricing

image

Not sure how expensive it might turn out to be but definitely supports more languages.

@schmitzel76 Definitely, we'll add an option for "Never translate LANGUAGE posts".

Not sure how should we deal with wrong translations 🤔 .

As for deepl vs AWS, we can design it to be replaceable so question is just which one to try first.

Also, this will most likely only be available for public posts.

@duichwer

This comment has been minimized.

Copy link

@duichwer duichwer commented May 6, 2019

I'm not sure that applies directly with this issue. But it should be possible to change the language attribute. Especially with very mixed posts with several foreign words, it can happen that the wrong language is stored. Even MS Word produces regular errors from my experience.

@oliverzet

This comment has been minimized.

Copy link

@oliverzet oliverzet commented May 8, 2019

@lifenautjoe Well, AWS seems to be less expensive and supports way more languages. Translation itself will probably be better with DeepL. On the other hand it's usually enough to get the gist. So it looks like Amazon is the better choice. I don't know how this might affect privacy though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Openbook Beta
  
To do
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
6 participants
You can’t perform that action at this time.