Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adhearsion should support internationalising prompts (recorded and TTS) #361

Closed
benlangfeld opened this issue Aug 31, 2013 · 20 comments
Closed
Milestone

Comments

@benlangfeld
Copy link
Member

This is the place to discuss how this will work. Pinging @bklang and @JustinAiken

@benlangfeld
Copy link
Member Author

@JustinAiken made a start on this at adhearsion/voicemail#11.

@JustinAiken
Copy link
Member

Okay, couple of points to get this discussion started...

  1. Devil's advocate - SHOULD adhearsion worry about built-in internationalization? Adhearsion never handles content, only directs the softswitch which actions to perform; is there anything wrong with just having apps that need it do play I18n.t 'welcome'?
  2. If we do place internationalizing into the core, I think it should not differentiate between audio files and TTS files. That way if somebody's startup has enough funding to hire Patrick Stewart to come record their intro message in English, but not all the languages they support, the key for the file in the en.yml could contain file:///var/sounds/welcome.wav, but the key in the other languages could just contain some text like "thanks for calling"
  3. API: Here's a few examples of how an i18ned API could look.
    1. play "i18n://greetings.welcome"
      When seeing this, punchblock would replace the the url with the content of whatever is in the greetings -> welcome section of the current language. This does involve using a non-standard URL in the adhearsion app, but it wouldn't get anywhere close to the softswitch. I think I don't like this one very much.
    2. play_i18n "welcome"
      This one is more immediately recognizable/readable; not sure if I like this one either though, because than you only have one method - what you want an i18n key for play_numeric or something?
    3. play "welcome", i18n: true
      I think this one makes the most sense, since it avoids the pitfalls that popped up during the first two ideas.
    4. Just have a global "use i18n option", and if set, treat EVERY argument as a i18n key..
  4. Interpolation
    We also need a good way to allow for interpolation... You said %{resp}. That's obviously wrong!"

Perhaps any keys passed in the option hash could be checked against variables to be interpolated? So you'd do something like:
play 'woodchucks.response', i18n: true, resp: 4 which would speak "You said 4. That's obviously wrong!"

Not sure if I like that idea though, because then users would have to double check the list of option keys already in use to make sure they don't use a reserved word... or maybe we could have a single key with an interpolation hash?
play 'woodchucks.response', i18n: true, interpolations: {resp: 4}

Anyways, my random thoughts... curious as to what ideas you guys have had.

@lpradovera
Copy link
Member

So far, using an option such as "i18n: true" has been the most flexible way to pass options to any media control method.
It enables us to effortlessly propagate it to the actual Output invocation regardless of the code needing #play, #menu, or any other method.

Luca Pradovera
luca.pradovera@gmail.com

On Sep 2, 2013, at 12:35 AM, JustinAiken notifications@github.com wrote:

Okay, couple of points to get this discussion started...

Devil's advocate - SHOULD adhearsion worry about built-in internationalization? Adhearsion never handles content, only directs the softswitch which actions to perform; is there anything wrong with just having apps that need it do play I18n.t 'welcome'?

If we do place internationalizing into the core, I think it should not differentiate between audio files and TTS files. That way if somebody's startup has enough funding to hire Patrick Stewart to come record their intro message in English, but not all the languages they support, the key for the file in the en.yml could contain file:///var/sounds/welcome.wav, but the key in the other languages could just contain some text like "thanks for calling"

API: Here's a few examples of how an i18ned API could look.

play "i18n://greetings.welcome" When seeing this, punchblock would replace the the url with the content of whatever is in the greetings -> welcome section of the current language. This does involve using a non-standard URL in the adhearsion app, but it wouldn't get anywhere close to the softswitch. I think I don't like this one very much.
play_i18n "welcome"
This one is more immediately recognizable/readable; not sure if I like this one either though, because than you only have one method - what you want an i18n key for play_numeric or something?
play "welcome", i18n: true I think this one makes the most sense, since it avoids the pitfalls that popped up during the first two ideas.
Just have a global "use i18n option", and if set, treat EVERY argument as a i18n key..
Interpolation
We also need a good way to allow for interpolation... You said %{resp}. That's obviously wrong!"

Perhaps any keys passed in the option hash could be checked against variables to be interpolated? So you'd do something like:
play 'woodchucks.response', i18n: true, resp: 4 which would speak "You said 4. That's obviously wrong!"

Not sure if I like that idea though, because then users would have to double check the list of option keys already in use to make sure they don't use a reserved word... or maybe we could have a single key with an interpolation hash?
play 'woodchucks.response', i18n: true, interpolations: {resp: 4}

Anyways, my random thoughts... curious as to what ideas you guys have had.


Reply to this email directly or view it on GitHub.

@bklang
Copy link
Member

bklang commented Sep 2, 2013

Re "i18n: true", this feels messy and unnecessary to me.

Can we just treat symbols as i18n keys if i18n is globally enabled? Or maybe if we simply detect that i18n translations are available?

Examples:

play :woodchucks, interpolations: {resp: 4}
play "You said #{resp} woodchucks.  That's obviously wrong.", interpolations: {resp: 4}

If you need namespacing (like woodchucks.response above), it's still possible, though slightly messy:

play :"woodchucks.response", interpolations: {resp: 4}

@JustinAiken makes an interesting point about being able to interchange audio files with TTS strings. Perhaps if the i18n lookup returns a URL we can treat it like a recording?

PS. I'm not sold on the word "interpolations" but can't think of anything better right this second

@benlangfeld
Copy link
Member Author

So, I'm perfectly happy with this sort of syntax:

play t("You said %{resp}. That's obviously wrong!", resp: 4)

I don't think we need to integrate translation key lookup into core CallController methods, but we might make CallController#t an alias for I18n.t to avoid requiring the module name. This is much the approach that Rails takes and I think it's entirely acceptable.

As for audio vs TTS, I figure we should approach this the same as the auto-detection in CallController#play works, looking for a URL. The result of all lookups will be an SSML document, and detection of a URL will simply result in the appropriate <audio/> tags.

If we were to do any kind of integration, I favour the latter approach with a slight twist, combining the two keys:

play 'woodchucks.response', i18n: {resp: 4}

So that's lookup dealt with, here are my real concerns about i18n as applied to voice applications:

  1. Pronunciation data in translation keys (SSML tags) should be handled correctly (detecting an SSML document as a string as well as the current approach of looking for a RubySpeech object).
  2. In a web app, language is set per request, either manually or via URL/header/session data. In our case, I guess this should be set per Call, and we need an API for this.
  3. Different logical languages will require passing different language headers on Output components, and these might not exactly match up to language keys. This means passing through the call's logical language to every output component executed on it.
  4. Different renderers may be required for different languages. Perhaps Nuance should be used for 'en-GB', but Lumenvox for 'pt-BR'. Do we make it Adhearsion's responsibility to select this, or push the responsibility down to the Rayo server (perhaps its config) to use as defaults (while still allowing Ahn to override the renderer with appropriate errors in case the Renderer cannot handle the requested language).
  5. What should our behaviour be in cases of missing keys? The norm is to fallback to en, but what if the key is missing entirely? Do we playback an error recording, not playback anything (and log a warning) or raise an exception (and thus perhaps end the call)?
  6. In many cases, audio recordings are numerous and are maintained scoped to a directory like recordings/en-GB/greeting.wav and recordings/pt-BR/greeting.wav. Should we require manually specifying keys for all of these, or should we have some default convention which makes it implicit?
  7. How do we deal with localising dates, currency, measurement units, timezones and phone numbers?
  8. Do we include recording distribution in the scope for i18n?
  9. Anything that Adhearsion does to enable localisation has to be well documented and scaffolded in generated applications and examples such as the Simon Game.

Lets do one round of feedback on these points and then decide on a direction in which to start a spike of this.

@JustinAiken
Copy link
Member

play t("You said %{resp}. That's obviously wrong!", resp: 4)

I really like this, I think we should go with that. Makes it easy to use in any method (#play, #menu, etc), and also makes it easy to chain i18n'd prompts with non-i18n'd prompts if needed.

play "file://universal_wrong_sound.wav", t("You said %{resp}. That's obviously wrong!", resp: 4), other_play_option: :other_play_value

we might make CallController#t an alias for I18n.t to avoid requiring the module name.

👍

If we were to do any kind of integration, I favour the latter approach with a slight twist, combining the two keys

The twist is nice, better than interpolations: {..., but I think just aliasing #t is better

1 Pronunciation data in translation keys (SSML tags) should be handled correctly (detecting an SSML document as a string as well as the current approach of looking for a RubySpeech object).

I think this would go outside of i18n and depend on the language - one could pass non-English strings directly to #play without going through translation, pronunciation markup should happen on any non-English text string probably...

2 In a web app, language is set per request, either manually or via URL/header/session data. In our case, I guess this should be set per Call, and we need an API for this.
3 Different logical languages will require passing different language headers on Output components, and these might not exactly match up to language keys. This means passing through the call's logical language to every output component executed on it.

Perhaps a #set_langauge or #set_region method that set both the current i18n region, as well as the language to use for SSML markup would be a nice way to go.

5 What should our behaviour be in cases of missing keys? The norm is to fallback to en, but what if the key is missing entirely? Do we playback an error recording, not playback anything (and log a warning) or raise an exception (and thus perhaps end the call)?

I think exceptions/ending calls should be avoided at all costs... fallback to en (or a customizable fallback), and if not found just play nothing and log an error at fatal error... that's what I'd prefer if I use using this in the apps I worked on.

6 In many cases, audio recordings are numerous and are maintained scoped to a directory like recordings/en-GB/greeting.wav and recordings/pt-BR/greeting.wav. Should we require manually specifying keys for all of these, or should we have some default convention which makes it implicit?

Hmm, good question... perhaps we could have a rake task that generated the necassary .ymls based on what languages they desire?

8 Do we include recording distribution in the scope for i18n?

If we allowed specifying record location (either absolute, or relative to /var/punchblock/record) or filename I suppose if somebody really wanted they could do something like:

record "#{t("record_dir")}/#{call.language}-#{SecureRandom.uuid}", options

But since all #record does now is beep or not, and then dump a random file in the same place, don't think we need to worry about anything here.

9 Anything that Adhearsion does to enable localisation has to be well documented and scaffolded in generated applications and examples such as the Simon Game.

Yes. And perhaps if we're clever, we could use i18n to allow the Simon Game to be run using either TTS or record files... two birds, one stone!

I have no useful thoughts on points 4 and 7 at this time.

@bklang
Copy link
Member

bklang commented Sep 5, 2013

So, I'm perfectly happy with this sort of syntax:

play t("You said %{resp}. That's obviously wrong!", resp: 4)

I don't think we need to integrate translation key lookup into core CallController methods, but we might make CallController#t an alias for I18n.t to avoid requiring the module name. This is much the approach that Rails takes and I think it's entirely acceptable.
1>

👍 These two suggestions work for me.

As for audio vs TTS, I figure we should approach this the same as the auto-detection in CallController#play works, looking for a URL. The result of all lookups will be an SSML document, and detection of a URL will simply result in the appropriate <audio/> tags.

If we were to do any kind of integration, I favour the latter approach with a slight twist, combining the two keys:

play 'woodchucks.response', i18n: {resp: 4}

:i18n as the key for interpolation replacements seems logical to me.

So that's lookup dealt with, here are my real concerns about i18n as applied to voice applications:

  1. Pronunciation data in translation keys (SSML tags) should be handled correctly (detecting an SSML document as a string as well as the current approach of looking for a RubySpeech object).

I think you mean handling of translation where the translated string is itself an SSML doc? Yes, I agree that is a requirement.

  1. In a web app, language is set per request, either manually or via URL/header/session data. In our case, I guess this should be set per Call, and we need an API for this.

Agreed. Asterisk does this with channel variables, so each call can be in a different language and the setting can be applied for the lifetime of the call. We also probably want to allow for an application-wide default via configuration.

  1. Different logical languages will require passing different language headers on Output components, and these might not exactly match up to language keys. This means passing through the call's logical language to every output component executed on it.

Why would they not match up? Can't we use ISO language codes for everything? Perhaps I misunderstand your concern.

  1. Different renderers may be required for different languages. Perhaps Nuance should be used for 'en-GB', but Lumenvox for 'pt-BR'. Do we make it Adhearsion's responsibility to select this, or push the responsibility down to the Rayo server (perhaps its config) to use as defaults (while still allowing Ahn to override the renderer with appropriate errors in case the Renderer cannot handle the requested language).

This is an interesting question. I think that the service provider is a backend configuration choice and not something the application (or hte application developer) should have to worry about. Thus the backend should handle selecting the appropriate renderer based on language code if that is a requirement. I believe this is how Tropo/Voxeo PRISM does it today.

  1. What should our behaviour be in cases of missing keys? The norm is to fallback to en, but what if the key is missing entirely? Do we playback an error recording, not playback anything (and log a warning) or raise an exception (and thus perhaps end the call)?

If we log anything it should definitely be an error, not a warning. We could attempt to speak the translation key directly as well. Also, we could trigger the exception handler without actually throwing an exception. The rationale there is to prevent stopping the call while still hitting the exception catcher (like Airbrake) to increase visibility of the error.

  1. In many cases, audio recordings are numerous and are maintained scoped to a directory like recordings/en-GB/greeting.wav and recordings/pt-BR/greeting.wav. Should we require manually specifying keys for all of these, or should we have some default convention which makes it implicit?

I favor some kind of default convention.

  1. How do we deal with localising dates, currency, measurement units, timezones and phone numbers?

I don't know, but certainly this problem has been solved somewhere before? Where can we look for inspiration?

  1. Do we include recording distribution in the scope for i18n?

Recording distribution? Not sure what you're asking.

  1. Anything that Adhearsion does to enable localisation has to be well documented and scaffolded in generated applications and examples such as the Simon Game.

Agreed. We still need Examples on the website too, but I digress :)

Lets do one round of feedback on these points and then decide on a direction in which to start a spike of this.

@lpradovera
Copy link
Member

+1 for the Rails-style approach. It is clean and widely used. Would you start off supporting a YAML backend then eventually adding more?

@benlangfeld
Copy link
Member Author

My point on recording distribution (which has been misunderstood above) was one we've discussed previously. If an application has recordings, should we provide a way of bundling them in the deployment, and perhaps providing an HTTP interface to serve them up to the engine? Is this even a concern we want to conflate with i18n or should we treat it entirely separately?

@bklang
Copy link
Member

bklang commented Sep 5, 2013

My point on recording distribution (which has been misunderstood above) was one we've discussed previously. If an application has recordings, should we provide a way of bundling them in the deployment, and perhaps providing an HTTP interface to serve them up to the engine? Is this even a concern we want to conflate with i18n or should we treat it entirely separately?

Let's not conflate the HTTP interface to serve recordings with the i18n need to be able to reference them. Historically we've always bundled recordings with the application as an asset, much like you would bundle images into a web app. I think that should continue, and perhaps we need a separate issue open to discuss how best to share those recordings with the telephony engine, especially in cases where they are not on the same server.

But for this issue's purposes, let's assume they are on the same box and file:/// URIs are acceptable.

@JustinAiken
Copy link
Member

Ooh, recording distribution... haha, that makes more sense than what I thought it did.

@ik5
Copy link

ik5 commented Sep 5, 2013

with languages there are more issues, for example rules.
There are Bi-Directional languages (Arabic, Hebrew) that you should either know to take care of, or use a took that knows.
BiDi, is using both right to left and left to right chars on the same string/paragraph.

Furthermore, it's not simple to do fail-over for languages when using telephony.

I also would suggest to support multi-language "menu" (IVR) command, and be able to program a much simple solution for things like:

For Spanish press one, for French press two etc...
Something that will help you set the language according the IVR itself.

@bhavinjavia
Copy link

Hi guys, adding my thoughts as one of your Adhearsion users who will need this feature soon.

play t("You said %{resp}. That's obviously wrong!", resp: 4)

I don't think we need to integrate translation key lookup into core CallController methods, but we might make
CallController#t an alias for I18n.t to avoid requiring the module name. This is much the approach that Rails takes > and I think it's entirely acceptable.

+1 to the Rails approach. It would be least surprising for people coming from Rails background.

How do we deal with localising dates, currency, measurement units, timezones and phone numbers?
I don't know, but certainly this problem has been solved somewhere before? Where can we look for inspiration?

  1. Apart from I18n.t, there is an I18n.l to localize date and time
  2. rails-i18n handles a lot of these concerns and could be a good source of inspiration
  3. Phone number localisation is altogether a different beast. There are few good solutions around like phony which do a reasonably good job. Something on the lines of phony_rails could be considered for adhearsion if you want to offer tighter integration.

I also would suggest to support multi-language "menu" (IVR) command, and be able to program a much simple solution for things like:
For Spanish press one, for French press two etc.

+1 to this. In fact, I think this would probably a 'must have' for internationalized Ahn apps and it can be a plugin instead of a core feature to let people opt-in if they need it.

@lpradovera
Copy link
Member

Once a locale can be set and is used for translations, that kind of menu can be easily built in application logic.
No need for a core feature!

Inviato da iPhone

Il giorno 06/set/2013, alle ore 08:49, Bhavin Javia notifications@github.com ha scritto:

Hi guys, adding my thoughts as one of your Adhearsion users who will need this feature soon.

play t("You said %{resp}. That's obviously wrong!", resp: 4)
I don't think we need to integrate translation key lookup into core CallController methods, but we might make
CallController#t an alias for I18n.t to avoid requiring the module name. This is much the approach that Rails takes > and I think it's entirely acceptable.

+1 to the Rails approach. It would be least surprising for people coming from Rails background.

How do we deal with localising dates, currency, measurement units, timezones and phone numbers?
I don't know, but certainly this problem has been solved somewhere before? Where can we look for inspiration?

Apart from I18n.t, there is an I18n.l to localize date and time
rails-i18n handles a lot of these concerns and could be a good source of inspiration
Phone number localisation is altogether a different beast. There are few good solutions around like phony which do a reasonably good job. Something on the lines of phony_rails could be considered for adhearsion if you want to offer tighter integration.
I also would suggest to support multi-language "menu" (IVR) command, and be able to program a much simple solution for things like:
For Spanish press one, for French press two etc.

+1 to this. In fact, I think this would probably a 'must have' for internationalized Ahn apps and it can be a plugin instead of a core feature to let people opt-in if they need it.


Reply to this email directly or view it on GitHub.

@JustinAiken
Copy link
Member

Language selection IVR should definitely not be a core feature, but a plugin would be great. Would be a very small plugin, and would make it so adhearsion applications could do something like:

def run
  answer
  if call.from =~ /some_region_check/
    call.set_language :english
  else
    invoke LanguageIVRController, languages: ["English", "French"...]
  end
end

@bklang
Copy link
Member

bklang commented Feb 6, 2014

Pinging on this issue. It sounds like we made a lot of progress toward decisions on it. @benlangfeld this is a candidate for Adhearsion 2.6?

@benlangfeld
Copy link
Member Author

It is on the cards for 2.6, yes. It's also a potential candidate for a GSoC project should we be successful this year.

@bklang
Copy link
Member

bklang commented Feb 22, 2014

A start on this has been made in https://github.com/adhearsion/adhearsion-i18n. Feedback welcome. The plugin is an experiment, and will likely be merged into core if it works well enough.

@benlangfeld
Copy link
Member Author

We're going to merge adhearsion-i18n into core, including documentation on the website, for the 2.6.0 release.

@benlangfeld
Copy link
Member Author

Closing in favour of #557

benlangfeld added a commit that referenced this issue Jun 16, 2015
benlangfeld added a commit that referenced this issue Jun 16, 2015
benlangfeld added a commit that referenced this issue Jun 16, 2015
benlangfeld added a commit that referenced this issue Jun 16, 2015
benlangfeld added a commit that referenced this issue Jun 16, 2015
benlangfeld added a commit that referenced this issue Jun 16, 2015
gfaza pushed a commit to gfaza/adhearsion that referenced this issue Dec 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants