Expressions #11

nicpottier · 2016-12-23T17:31:26Z

First cut at description Excellent, expression language used in RapidPro and our submission as a starting point for an expression language for FLOIP.

@rowanseymour @ericnewcomer

@rowanseymour

First cut at description Excellent, expression language used in RapidPro and our submission as a starting point for an expression language for FLOIP. @rowanseymour @ericnewcomer

rowanseymour · 2016-12-24T17:03:14Z

expressions.md

+     "age": 30,
+     "tel": "+12065551212",
+     "birthday": "22-04-1986",
+     "__string__": "Marshawn Lynch"


Currently this needs to be __default__ or *

Ya, was thinking for long term it may be more flexible / interesting to allow a "default" representation based on type required, therefore the use of __string__ here. One could imagine a scenario where a dict can advertise a default value for a date for example, and the casting of that parameter would use it via ___datetime___.

Not sure the additional complexity buys us much there over a single key for a default value - which can have different types. I can't envisage a situation where we need to explicitly have different defaults for different types.

I like * as a default key. It's not so Pythony but it also can't collide with actual keys.

Don't think there's a strong argument for __string__ colliding with actual keys though.

No added complexity, we are just changing __default__ to __string__, this buys us room to grow in the future if we decide default types make sense. Given that some languages support this I think not putting ourselves in a box makes sense.

So I think your groups examples demonstrates why we might need different concepts here, one for a stringified version of the value and one for a default value.

Because you really don't want a list of strings for groups for any kind of group test, but rather a list of UUIDs. So that would be the __default__ value, while the __string__ value would be a concatenated list of the groups.

So what do you think about having those two? __default__ representing whatever type is the most logical "default" for a more complex object, while __string__ representing the string representation used when evaluating down to a template (or concatenating to a string).

Will require some thought as to when to use which but I think that gives us the best of both worlds.

I like the idea of __string__ being a way for a item in the context to control how it is formatted for template inclusion, but I wonder how useful that would be given that we need to consistently stringify values of all types, regardless of whether they come from the context or not. For example, all of following should render consistently with thousands separators:

@flow.number --> "1,234" @(flow.number + 1) --> "1,235" @(1234) --> "1,234"

The easiest way to ensure that, is to have standard ways of stringifying each type. If lists are a type that we support in expressions then I think any time an expression evaluates to a list, it's rendered as a comma-separated list. So again, not sure we need __string__ unless a particular list in the context needs to do that differently. For example, if we had a SPLIT function, then both of the following should render consistently as CSV lists:

@contact.groups --> "1234-abcd, 2345-bcde, 3456-cdef" @(SPLIT("a b")) --> "a, b"

And as mentioned in previous comment, if we did have a way for items in the context to control how they are stringified, I don't think it should be limited to default values. For example if x.y and x.y.z are context paths and both can be non-string values, why should only x.y be able to provide a __string__ function? So that's why I'm proposing that any value x in the context can be represented by an object in the form {"__value__": x}, so that any value can include __string__ and potentially other functions.

So if I'm understanding you correctly you are saying:

We define how "primitive" values (int, decimal, array, dictionary) are stringified, which can include arrays etc.. Totally agree on that front, want to give that first shot?

Any complex value can optionally provide both a __value__ and __string__ key. If __string__ is provided and we are evaluating in a string context, then it is used, otherwise if __value__ is provided then that is used. (possibly using the rules above to turn into a string if evaluated to a string context). I support that as well.

Does that represent your thoughts?

Can give this a stab. I think our spec will need to define a set of "evaluation context variables" as some of these will want to be configurable. Right now we only support two hardcoded date formats (DD-MM-YYYY and MM-DD-YYYY) but it seems reasonable for a flow runner to use any date format for output. Also seems reasonable to support different number formats as different countries use spaces, commas and period in different ways.

I'm actually saying any non-dict value can be substituted by a special dict that can provide __value__ and __string__ keys. The __default__ of a dict can also do this, e.g.

{ "__default__": 2, "foo": "bar", "list": ["1a2b", "2bc3"] }

can be rewritten with __string__ overrides for everything:

{ "__default__": {"__value__": 2, "__string__": "2.0"}, "foo": {"__value__": "bar", "__string__": "BAR!"}, "list": {"__value__": ["1a2b", "2bc3"], "__string__": "1a2b|2bc3"}, }

From today's chat:

We agreed that __string__ and __value__ likely make sense, both are optional, IFF there are sane rules around stringifying both complex and primitive values.

rowanseymour · 2016-12-24T17:05:25Z

expressions.md

+Excellent will attempt to cast strings to the following types when used in functions:
+ * Strings
+ * Decimal values
+ * Datetimes: (ISO 8601) or `dd-mm-yyyy HH:MM(:SS)` or `mm-dd-yyyy HH:MM(:SS)` 


Dates can be many different formats and we also support booleans

Added boolean.. can you help with alternate date syntaxes. (those you think are kosher not the insane variety currently allowed)

rowanseymour · 2016-12-27T10:27:40Z

expressions.md

+ * Strings
+ * Decimal values
+ * Datetimes: (ISO 8601) or `dd-mm-yyyy HH:MM(:SS)` or `mm-dd-yyyy HH:MM(:SS)`
+ * Boolean: True or False (case insensitive)


Oh integer is also currently a type - distinct from decimal in that it must be a whole number, e.g. WORD_SLICE(name, 4)

Come to think of it, we probably need to be even more specific here. IE, are decimals floating point or decimals, how it math done, to how many significant digits? When do integers under and overflow?

I would propose having ints be 32 bit signed. And floating point to be something precise, say with 14 digits of precision (10.4?)

I'm now talking myself round to using decimal for all numbers for simplicity's sake, and just use rounding whenever an operation requires a whole number - which is essentially what we already do to convert decimals to integers - it's just integer wouldn't be a type that can be passed around.

For decimals we've got Decimal in Python, BigDecimal in Java, and some ports of BigDecimal in Javascript. All of those support any amount of floating point precision (by 10.4 you mean fixed?), and all support the IEEE standards DECIMAL64, DECIMAL128 etc. I'd propose DECIMAL128 (precision=34, rounding mode=HALF_EVEN).

rowanseymour · 2016-12-27T10:37:43Z

expressions.md

+Excellent will attempt to cast strings to the following types when used in functions:
+ * Strings
+ * Decimal values
+ * Datetimes: (ISO 8601) or `dd-mm-yyyy HH:MM(:SS)` or `mm-dd-yyyy HH:MM(:SS)`


Challenging to list all the data formats we accept because we accept everything from "12th February 2016" to "Fev 12er '16" to "12.2.16", i.e. anything a human might enter in no particular format.

What if we put the date parsing stuff in a separate library (with Java, Python targets) and said anything parseable by this library?

I thought the expression library now supported a much smaller set of options? We need to list these out in a spec so having a library isn't good enough. Perhaps we should reduce them down to a set we think encompasses 90% of use cases. Anything we can be inspired by?

The date parsing works by extracting numerical and text values from the input text and seeing what meaningful date/time values can be constructed from them. It's less likely to falsely match non-date input, but still accepts an infinite number of formats. We can though probably describe (at least approximately) its input set with regexes.

I'm not entirely convinced we need to list every possible date format supported because the date formatting is intentionally designed to support arbitrarily formatted dates entered by humans.

It's not unlike Date.parse() in Javascript https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/parse

It is not recommended to use Date.parse as until ES5, parsing of strings was entirely implementation dependent. There are still many differences in how different hosts parse date strings, therefore date strings should be manually parsed (a library can help if many different formats are to be accommodated).

Hrmm.. I think we need to do better than Javascript here even if that means starting with a smaller set of supported formats. The point of this entire exercise is that the same flow has identical outputs regardless of the implementing flow engine, so we need to be specific and strict on things like this.

Maybe we start with ISO 8601, then also add in numerical representations according to the month / day order config?

10.31 10.31.2016 10-31-2016 10 31 2016 10 31

Taking a step back - the fact that we can parse input from a human like hi my dob is 24th feb '81. thanks! into a valid date, is a really useful feature, that I hope we're not planning on throwing away.

The fact that this parsing cannot be described by a set of typical date format strings does not mean it can't be described. If we need a spec, I think we can describe it algorithmically, or possibly even as regexes.

Part of this exercise will definitely be to have a higher level components. (even more high level than just a date widget)

People should be using the "has a date" test right now, so there should be a loop there in theory. Either way, this is one case where we have lots of data and can just make an informed decision based on usage we've seen. So we should just run some jobs and figure out rough %s for different types of date inputs we've seen users use.

Honestly I've always felt it a bit goofy how fast and loose we are with dates, and being able to explicitly describe this behavior with a more discrete set of parsing rules to allow for deterministic behavior across flow engines seems like a good opportunity to fix that. To me it's not a burden to push this down to flow authors enforcing them to request their users to supply a given format, validate that format and manage that flow -- though that doesn't preclude there being a complex block type for making that easier as well.

Ok am coming round to stricter date parsing. I took a quick look and couldn't find a lot of flows prompting for dates. It would remove some oddness in date arithmetic where we try to parse strings as dates.

Tricky to describe date formats in a way which is not language specific - Python and Java date format strings don't look anything alike. Regular format strings also don't lend themselves to flexible whitespace and punctuation. Thoughts on the following as a relatively restrictive method of parsing human entered dates?

Evaluation context specifies one of the following formats:

D sep M sep Y (e.g. 31-1-16, 31/01/2016, 31 . 1 . 16)

M sep D sep Y (e.g. 1-31-16, 01/31/2016, 1 . 31 . 16)

Y sep M sep D (e.g. 16-1-31, 2016/01/31, 16 . 1 . 31)

Where sep is one of \/-.

Where D is a number between 1 and 31, possibly zero-padded

Where M is a number between 1 and 12, possibly zero-padded

Where Y is a number between 1 and 9999, which must be zero-padded to 4 digits.

Rules:

Strip whitespace from start and end of input.

Whitespace between components is ignored

Logically invalid dates (e.g. 31-2-2017) are not parsed.

Two character years are not allowed and 0017 is interpreted as 17CE rather than 2017.

Formats 1 & 2 can be matched by the regex \d\d?\s*[/\\\-\.]\d\d?\s*[/\\\-\.]\d\d\d\d and format 3 by the regex \d\d\d\d\s*[/\\\-\.]\d\d?\s*[/\\\-\.]\d\d?.

Some notes from today's chat:

we should also support two digit years (what century determined by config option)

we should support month/date only. (always current year, fancier things possible via parsing expressions)

pld

Can we add the EXP function from excel to the list of math functions? We have users who do linear regression classification in their XLSForm expressions, using EXP is a lot better than having to use POW( 2.7182818284590452353602874713527, x)

pld · 2017-01-05T17:39:07Z

expressions.md

+
+## Templating
+
+For templating, RapidPro uses the `@` character to denote either a single variable substitution or the beginning of a


grammar!

rowanseymour · 2017-01-06T09:23:58Z

@pld that seems like something we can add quite easily

markboots · 2017-01-16T04:54:36Z

I'm really happy with this, certainly as a starting point. I vote to approve.

What I'd love to explore is some work on the overlap points:

The connection between expressions and Blocks that can make decisions: ie: what types of Conditional Blocks do we have, and how do they use expressions?
How do we embed expressions in Block definitions, and where are all the places Expressions can be used?
What are the required keys within the Context? (We noted that each level of the specification would add different required elements to the Context)

Expressions

4929d39

First cut at description Excellent, expression language used in RapidPro and our submission as a starting point for an expression language for FLOIP. @rowanseymour @ericnewcomer

rowanseymour suggested changes Dec 24, 2016

View reviewed changes

Update expressions.md

e206f92

rowanseymour reviewed Dec 27, 2016

View reviewed changes

pld reviewed Jan 5, 2017

View reviewed changes

Update expressions.md

8a5f4b5

grammar!

rowanseymour mentioned this pull request Jan 6, 2017

Add EXP function rapidpro/expressions#17

Closed

markboots self-requested a review January 26, 2017 05:33

markboots approved these changes Jan 26, 2017

View reviewed changes

Update expressions.md

ab72a5c

nditada self-requested a review February 2, 2017 16:13

markboots merged commit d6c3b0d into master Mar 1, 2017

markboots deleted the expressions branch March 1, 2017 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expressions #11

Expressions #11

nicpottier commented Dec 23, 2016

rowanseymour Dec 24, 2016

nicpottier Dec 26, 2016

rowanseymour Dec 27, 2016

ericnewcomer Dec 27, 2016 •

edited

nicpottier Dec 27, 2016

nicpottier Jan 3, 2017

rowanseymour Jan 4, 2017

nicpottier Jan 5, 2017

rowanseymour Jan 6, 2017

nicpottier Jan 10, 2017

rowanseymour Dec 24, 2016

nicpottier Dec 26, 2016

rowanseymour Dec 27, 2016

nicpottier Dec 27, 2016 •

edited

rowanseymour Jan 6, 2017

rowanseymour Dec 27, 2016

nicpottier Dec 27, 2016

rowanseymour Dec 27, 2016 •

edited

nicpottier Dec 27, 2016

rowanseymour Dec 27, 2016

nicpottier Dec 29, 2016

ericnewcomer Dec 29, 2016

rowanseymour Dec 29, 2016

rowanseymour Jan 6, 2017 •

edited

nicpottier Jan 10, 2017

pld left a comment

pld Jan 5, 2017

rowanseymour commented Jan 6, 2017

markboots commented Jan 16, 2017


		## Templating

		For templating, RapidPro uses the `@` character to denote either a single variable substitution or the beginning of a

Expressions #11

Expressions #11

Conversation

nicpottier commented Dec 23, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ericnewcomer Dec 27, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nicpottier Dec 27, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rowanseymour Dec 27, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rowanseymour Jan 6, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pld left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rowanseymour commented Jan 6, 2017

markboots commented Jan 16, 2017

ericnewcomer Dec 27, 2016 •

edited

nicpottier Dec 27, 2016 •

edited

rowanseymour Dec 27, 2016 •

edited

rowanseymour Jan 6, 2017 •

edited