Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expressions #11

Merged
merged 4 commits into from Mar 1, 2017
Merged

Expressions #11

merged 4 commits into from Mar 1, 2017

Conversation

nicpottier
Copy link
Contributor

First cut at description Excellent, expression language used in RapidPro and our submission as a starting point for an expression language for FLOIP.

@rowanseymour @ericnewcomer

First cut at description Excellent, expression language used in RapidPro and our submission as a starting point for an expression language for FLOIP.

@rowanseymour @ericnewcomer
expressions.md Outdated
"age": 30,
"tel": "+12065551212",
"birthday": "22-04-1986",
"__string__": "Marshawn Lynch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this needs to be __default__ or *

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, was thinking for long term it may be more flexible / interesting to allow a "default" representation based on type required, therefore the use of __string__ here. One could imagine a scenario where a dict can advertise a default value for a date for example, and the casting of that parameter would use it via ___datetime___.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the additional complexity buys us much there over a single key for a default value - which can have different types. I can't envisage a situation where we need to explicitly have different defaults for different types.

I like * as a default key. It's not so Pythony but it also can't collide with actual keys.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think there's a strong argument for __string__ colliding with actual keys though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No added complexity, we are just changing __default__ to __string__, this buys us room to grow in the future if we decide default types make sense. Given that some languages support this I think not putting ourselves in a box makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think your groups examples demonstrates why we might need different concepts here, one for a stringified version of the value and one for a default value.

Because you really don't want a list of strings for groups for any kind of group test, but rather a list of UUIDs. So that would be the __default__ value, while the __string__ value would be a concatenated list of the groups.

So what do you think about having those two? __default__ representing whatever type is the most logical "default" for a more complex object, while __string__ representing the string representation used when evaluating down to a template (or concatenating to a string).

Will require some thought as to when to use which but I think that gives us the best of both worlds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of __string__ being a way for a item in the context to control how it is formatted for template inclusion, but I wonder how useful that would be given that we need to consistently stringify values of all types, regardless of whether they come from the context or not. For example, all of following should render consistently with thousands separators:

@flow.number           --> "1,234"
@(flow.number + 1)     --> "1,235"
@(1234)                --> "1,234"

The easiest way to ensure that, is to have standard ways of stringifying each type. If lists are a type that we support in expressions then I think any time an expression evaluates to a list, it's rendered as a comma-separated list. So again, not sure we need __string__ unless a particular list in the context needs to do that differently. For example, if we had a SPLIT function, then both of the following should render consistently as CSV lists:

@contact.groups     --> "1234-abcd, 2345-bcde, 3456-cdef"
@(SPLIT("a b"))     --> "a, b"

And as mentioned in previous comment, if we did have a way for items in the context to control how they are stringified, I don't think it should be limited to default values. For example if x.y and x.y.z are context paths and both can be non-string values, why should only x.y be able to provide a __string__ function? So that's why I'm proposing that any value x in the context can be represented by an object in the form {"__value__": x}, so that any value can include __string__ and potentially other functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm understanding you correctly you are saying:

  1. We define how "primitive" values (int, decimal, array, dictionary) are stringified, which can include arrays etc.. Totally agree on that front, want to give that first shot?
  2. Any complex value can optionally provide both a __value__ and __string__ key. If __string__ is provided and we are evaluating in a string context, then it is used, otherwise if __value__ is provided then that is used. (possibly using the rules above to turn into a string if evaluated to a string context). I support that as well.

Does that represent your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can give this a stab. I think our spec will need to define a set of "evaluation context variables" as some of these will want to be configurable. Right now we only support two hardcoded date formats (DD-MM-YYYY and MM-DD-YYYY) but it seems reasonable for a flow runner to use any date format for output. Also seems reasonable to support different number formats as different countries use spaces, commas and period in different ways.
  2. I'm actually saying any non-dict value can be substituted by a special dict that can provide __value__ and __string__ keys. The __default__ of a dict can also do this, e.g.
{
    "__default__": 2,
    "foo": "bar",
    "list": ["1a2b", "2bc3"]
}

can be rewritten with __string__ overrides for everything:

{
    "__default__": {"__value__": 2, "__string__": "2.0"},
    "foo": {"__value__": "bar", "__string__": "BAR!"},
    "list": {"__value__": ["1a2b", "2bc3"], "__string__": "1a2b|2bc3"},
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From today's chat:

We agreed that __string__ and __value__ likely make sense, both are optional, IFF there are sane rules around stringifying both complex and primitive values.

expressions.md Outdated
Excellent will attempt to cast strings to the following types when used in functions:
* Strings
* Decimal values
* Datetimes: (ISO 8601) or `dd-mm-yyyy HH:MM(:SS)` or `mm-dd-yyyy HH:MM(:SS)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dates can be many different formats and we also support booleans

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added boolean.. can you help with alternate date syntaxes. (those you think are kosher not the insane variety currently allowed)

expressions.md Outdated
* Strings
* Decimal values
* Datetimes: (ISO 8601) or `dd-mm-yyyy HH:MM(:SS)` or `mm-dd-yyyy HH:MM(:SS)`
* Boolean: True or False (case insensitive)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh integer is also currently a type - distinct from decimal in that it must be a whole number, e.g. WORD_SLICE(name, 4)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come to think of it, we probably need to be even more specific here. IE, are decimals floating point or decimals, how it math done, to how many significant digits? When do integers under and overflow?

I would propose having ints be 32 bit signed. And floating point to be something precise, say with 14 digits of precision (10.4?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now talking myself round to using decimal for all numbers for simplicity's sake, and just use rounding whenever an operation requires a whole number - which is essentially what we already do to convert decimals to integers - it's just integer wouldn't be a type that can be passed around.

For decimals we've got Decimal in Python, BigDecimal in Java, and some ports of BigDecimal in Javascript. All of those support any amount of floating point precision (by 10.4 you mean fixed?), and all support the IEEE standards DECIMAL64, DECIMAL128 etc. I'd propose DECIMAL128 (precision=34, rounding mode=HALF_EVEN).

expressions.md Outdated
Excellent will attempt to cast strings to the following types when used in functions:
* Strings
* Decimal values
* Datetimes: (ISO 8601) or `dd-mm-yyyy HH:MM(:SS)` or `mm-dd-yyyy HH:MM(:SS)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Challenging to list all the data formats we accept because we accept everything from "12th February 2016" to "Fev 12er '16" to "12.2.16", i.e. anything a human might enter in no particular format.

What if we put the date parsing stuff in a separate library (with Java, Python targets) and said anything parseable by this library?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the expression library now supported a much smaller set of options? We need to list these out in a spec so having a library isn't good enough. Perhaps we should reduce them down to a set we think encompasses 90% of use cases. Anything we can be inspired by?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The date parsing works by extracting numerical and text values from the input text and seeing what meaningful date/time values can be constructed from them. It's less likely to falsely match non-date input, but still accepts an infinite number of formats. We can though probably describe (at least approximately) its input set with regexes.

I'm not entirely convinced we need to list every possible date format supported because the date formatting is intentionally designed to support arbitrarily formatted dates entered by humans.

It's not unlike Date.parse() in Javascript https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/parse

It is not recommended to use Date.parse as until ES5, parsing of strings was entirely implementation dependent. There are still many differences in how different hosts parse date strings, therefore date strings should be manually parsed (a library can help if many different formats are to be accommodated).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm.. I think we need to do better than Javascript here even if that means starting with a smaller set of supported formats. The point of this entire exercise is that the same flow has identical outputs regardless of the implementing flow engine, so we need to be specific and strict on things like this.

Maybe we start with ISO 8601, then also add in numerical representations according to the month / day order config?

10.31
10.31.2016
10-31-2016
10 31 2016
10 31

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking a step back - the fact that we can parse input from a human like hi my dob is 24th feb '81. thanks! into a valid date, is a really useful feature, that I hope we're not planning on throwing away.

The fact that this parsing cannot be described by a set of typical date format strings does not mean it can't be described. If we need a spec, I think we can describe it algorithmically, or possibly even as regexes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of this exercise will definitely be to have a higher level components. (even more high level than just a date widget)

People should be using the "has a date" test right now, so there should be a loop there in theory. Either way, this is one case where we have lots of data and can just make an informed decision based on usage we've seen. So we should just run some jobs and figure out rough %s for different types of date inputs we've seen users use.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I've always felt it a bit goofy how fast and loose we are with dates, and being able to explicitly describe this behavior with a more discrete set of parsing rules to allow for deterministic behavior across flow engines seems like a good opportunity to fix that. To me it's not a burden to push this down to flow authors enforcing them to request their users to supply a given format, validate that format and manage that flow -- though that doesn't preclude there being a complex block type for making that easier as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok am coming round to stricter date parsing. I took a quick look and couldn't find a lot of flows prompting for dates. It would remove some oddness in date arithmetic where we try to parse strings as dates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky to describe date formats in a way which is not language specific - Python and Java date format strings don't look anything alike. Regular format strings also don't lend themselves to flexible whitespace and punctuation. Thoughts on the following as a relatively restrictive method of parsing human entered dates?

Evaluation context specifies one of the following formats:

  1. D sep M sep Y (e.g. 31-1-16, 31/01/2016, 31 . 1 . 16)
  2. M sep D sep Y (e.g. 1-31-16, 01/31/2016, 1 . 31 . 16)
  3. Y sep M sep D (e.g. 16-1-31, 2016/01/31, 16 . 1 . 31)
  • Where sep is one of \/-.
  • Where D is a number between 1 and 31, possibly zero-padded
  • Where M is a number between 1 and 12, possibly zero-padded
  • Where Y is a number between 1 and 9999, which must be zero-padded to 4 digits.

Rules:

  • Strip whitespace from start and end of input.
  • Whitespace between components is ignored
  • Logically invalid dates (e.g. 31-2-2017) are not parsed.
  • Two character years are not allowed and 0017 is interpreted as 17CE rather than 2017.

Formats 1 & 2 can be matched by the regex \d\d?\s*[/\\\-\.]\d\d?\s*[/\\\-\.]\d\d\d\d and format 3 by the regex \d\d\d\d\s*[/\\\-\.]\d\d?\s*[/\\\-\.]\d\d?.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some notes from today's chat:

  • we should also support two digit years (what century determined by config option)
  • we should support month/date only. (always current year, fancier things possible via parsing expressions)

Copy link
Contributor

@pld pld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add the EXP function from excel to the list of math functions? We have users who do linear regression classification in their XLSForm expressions, using EXP is a lot better than having to use POW( 2.7182818284590452353602874713527, x)

expressions.md Outdated

## Templating

For templating, RapidPro uses the `@` character to denote either a single variable substitution or the beginning of a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a => an

@rowanseymour
Copy link
Contributor

@pld that seems like something we can add quite easily

@markboots
Copy link
Contributor

I'm really happy with this, certainly as a starting point. I vote to approve.

What I'd love to explore is some work on the overlap points:

  • The connection between expressions and Blocks that can make decisions: ie: what types of Conditional Blocks do we have, and how do they use expressions?
  • How do we embed expressions in Block definitions, and where are all the places Expressions can be used?
  • What are the required keys within the Context? (We noted that each level of the specification would add different required elements to the Context)

@markboots markboots self-requested a review January 26, 2017 05:33
@nditada nditada self-requested a review February 2, 2017 16:13
@markboots markboots merged commit d6c3b0d into master Mar 1, 2017
@markboots markboots deleted the expressions branch March 1, 2017 02:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants