Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement #128: runtime-editable config for default parameters and module settings #185

Open
wants to merge 21 commits into
base: master
Choose a base branch
from

Conversation

ChanceNCounter
Copy link
Contributor

@ChanceNCounter ChanceNCounter commented Apr 2, 2021

Adds lingua_franca.config, a dict of dicts.

My current intention is the following (though it will be accessed using getters and setters, rather than indices, for everyone's sanity):

lingua_franca.config['global'] provides packagewide defaults, both for options that haven't been localized and for options that can't be localized

lingua_franca.config['en']['universal'] serves a similar purpose for all English dialects

lingua_franca.config['en']['en-us'] (or ['en']['us']) provides localized settings for en-US (naturally.)

As of the creation of this PR, only the dict itself exists, and must be manually reinstantiated in the interpreter to test

Config can now be tested as follows:

>>> lingua_franca.load_languages(['en-us', 'en-au'])
>>> lingua_franca.config.get('short_scale') # default loc, en-us
True
>>> lingua_franca.config.get('short_scale', lang='en-au')
False
>>> lingua_franca.config.set('short_scale', False) # default loc, en-us
>>> lingua_franca.config.get('short_scale', lang='en-us')
False
>>> lingua_franca.config.set('short_scale', True)
>>> lingua_franca.config.get('short_scale', lang='en-us')
True
>>> lingua_franca.config.get('short_scale', lang='en-au')
False
>>> 

Module-wide settings are also being migrated:

>>> import lingua_franca
>>> lingua_franca.config.get('load_langs_on_demand')
False

Note, because I know someone will ask: the 'universal' English setting is currently short_scale because it culls that info from the default localization for a given language, and English currently defaults to en-US rather than en-AU or en-GB.

This is a technically-working implementation of #128, which creates
an object, exposed at runtime as lingua_franca.config, which descends
from dict. More robust get/set methods to follow.

The implementation is extensible to support full use of localization and
'full lang codes'.

Either that or hooking it up to the functions will be next.
@devs-mycroft devs-mycroft added the CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors) label Apr 2, 2021
@ChanceNCounter
Copy link
Contributor Author

The dawn of true localization: tests that fail because it ain't quite doing full lang codes.

Step 3 of this PR: en-au

change initial implementation from

@localized_function(config_vars=['short_scale', 'ordinals']

to

@localized_function
def extract_number(..., short_scale=ConfigVar, ordinals=ConfigVar)

also, .gitignore my editor settings because it is driving me out of my mind
DOUBLE CHECK CORRECTNESS BEFORE MERGING

i got a peaceful, easy feeling...
i know you won't let me down...
@JarbasAl
Copy link
Collaborator

JarbasAl commented Apr 3, 2021

it's looking good so far!

also contains a partially-functional fix for problem in nice_number when
the param `denominators` is not iterable. this was intended to help
integrate that param with config. if it isn't working by the time this
feature branch is finished, strip it out as OOS.
@ChanceNCounter
Copy link
Contributor Author

BROKEN: (note to self, fix in next step)

the "active" localization is always set to the language's default localization, regardless of what language you pass to load_language()

in other words, at the moment:

>>> import lingua_franca
>>> lingua_franca.load_language('en-au')
>>> lingua_franca.get_default_loc()
'en-us'

@ChanceNCounter ChanceNCounter changed the title Initial, working impl. of #128 (not hooked up) Implement #128: runtime-editable config for default parameters and module settings Apr 5, 2021
@ChanceNCounter ChanceNCounter marked this pull request as ready for review April 5, 2021 22:09
@ChanceNCounter
Copy link
Contributor Author

Ready for review, but probably not for merge. I have yet to run coverage, for instance, although config's guts should be covered by accident, and direct access is inherent to the existing lang tests.

Copy link
Contributor

@krisgesling krisgesling left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few initial comments, but there's a lot to digest here and it's made me realise how confusing even a small number of different concepts are in LF.

lang = language eg "en"
full_lang_code = language code including the localized variant code eg "en-us"
loc = localized language which is represented by the full_lang_code?

I wonder if "language_variant" might be a better descriptor than locale, localized_lang, etc?

get_active_langs, _set_active_langs, get_primary_lang_code, \
get_full_lang_code, resolve_resource_file, load_language, \
load_languages, unload_language, unload_languages, get_supported_langs
### DO NOT CHANGE THIS IMPORT ORDER ###
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we detail why?

If it's a circular import issue is there a way we can restructure to avoid this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's exactly what it is, that's for the duration of the feature branch until I figure out exactly where the line is.

It may be possible to reduce the scope of internal's config imports and avoid the circle.

Comment on lines +2 to +4
from .internal import get_active_langs, get_supported_locs, \
get_full_lang_code, get_supported_langs, get_default_loc, \
get_primary_lang_code
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move to using full words in variable names?
lang I think is understandable but could still be changed, however loc I think really starts getting ambiguous. Does it mean locale, location, localization - is there a difference between these things?
Could it be referring to something else? 🚂

I think this also reduces the cognitive load when reading through code - particularly for people unfamiliar with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm all for standardizing. loc is the only one I didn't inherit.

get_full_lang_code() is an existing function which returns "the" full code corresponding to a primary lang code. This can't be removed wholesale, but it doesn't serve the whole purpose, and it doesn't serve "nondefault" locales in any way.

get_default_loc() returns the current "default" full code for a given primary code, which is configurable. On your rig, for instance, get_full_lang_code('en') will always return en-us, but I imagine get_default_loc('en') will return en-au.

I was thinking 'locale', but I didn't give it too much thought. I am by no means married to the names.

Comment on lines +78 to +80
If a function argument's default value should be read from config, it should be set
to the *data type* `ConfigVar` (not an instance.) For example, here is `parse.extract_number()`'s
top-level signature:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of introducing this specific datatype compared to using something generic like None?
It makes it more explicit in the signature that it's a config variable as opposed to "not used". Though we could also document that in the docstrings.

I'm not taking a position on what's best, just curious on the thought process behind it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unique, so the wrapper can rely on it to mean what it means. No fancy magic to determine when a value needs to be read from config, just, if param is ConfigVar. The old business with "what if it's positional" takes more code =P


break
try:
return setting_available_in[-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we take the last one? I haven't reasoned this through completely but I assume the order would roughly be:

[
global, 
default_lang.universal, 
default_lang.specific, 
other_lang.universal, 
other_lang.specific,
default_locale.universal,
default_locale.specific,
other_locale.universal,
other_locale.specific,
place # which I'm unclear what this is
]

Is that right?

If so it seems like we would want the default lang setting over an other locale setting. But I could be getting the whole thing mixed up.

Given that I need to reason about it we should document what is intended in the docstring and write some tests for it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is definitely one of the things that will need the most discussion, but you're about right re: priority. The list is just in the opposite order of what you're thinking =P

'place' might not be a great variable name in this context, but it's referring to the possible indices of the desired setting, nothing to do with LF's job. Could be in this dict or this dict or that dict...

possible_places becomes a list of dicts to check for the config value. setting_available_in is the result, and it's ordered from global to local.

If the setting is found at all, setting_available_in[-1] will be, in the following order:

  1. The locale passed in the lang param, or the default locale for the lang param, or the current default locale when no lang param is passed
  2. Language-wide settings, which start out identical to their "default" locale. For instance, en's default locale (from get_full_lang_code() is en-us, so English's default language-wide settings are loaded from en-us's config.json
  3. "Global," LF-wide settings

If the value isn't present in any of those places, it's unavailable in that language.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perusing stales, rephrased. At runtime, LF config is a dict of dicts. It's got settings for the whole module and settings for each locale. Each language defers to a particular locale as its "default" locale. This is configurable. Gez's default English locale will be en-AU, Chance's will be en-US, Aditya might change it up on the fly!

So. We want to get(setting).

  1. Given a list of possible_places (dicts) where the requested setting might be stored, look for it
  2. Let the list of dicts in which you found the value be setting_available_in
  3. setting_available_in should, thanks to the algorithm that built it, be ordered as follows:
    [global_value, fallback_lang_value, fallback_loc_value, specified_lang_value, specified_loc_value] (where "fallback" is the default language, if different from the one specified)
  4. Return the last element of setting_available_in, representing the "most local" value that conforms with the input query

This way, a "more local" config dict will always override its less-local counterpart, assuming the requested key is present in the more local dict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of populating each field, let's say I am running with

  • Default language English, default locale en-US
  • Current user language and locale en/en-AU
  • Spanish loaded, default locale es-ES
  • Alternate locale es-US also loaded

If I ask for an es-US setting that's fully localized, I expect setting_available_in to offer values in the order:

[en-US, en-AU, es-ES, es-US]

representing, per the previous comment,

[fallback_lang_value, fallback_loc_value, specified_lang_value, specified_loc_value]

but, if any of the config dicts are missing the setting I asked for, they'll be absent from this list and LF will move on with its life, returning instead the next value "up" the chain.

At the end, this bit would return [en-US, en-AU, es-ES, es-US][-1], getting es-US and, per the initial function call, causing the getter to return the config value from es-US.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA: Yes Contributor License Agreement exists (see https://github.com/MycroftAI/contributors)
Projects
Roadmap
v0.5.0
Development

Successfully merging this pull request may close these issues.

Move runtime config (default language, localization settings, etc) to a stateful object
4 participants