Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expressions #11

Merged
merged 4 commits into from Mar 1, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
338 changes: 338 additions & 0 deletions expressions.md
@@ -0,0 +1,338 @@
# Expressions

While a flow paradigm is useful in representing control logic in an easy to understand manner for non-technical users,
there is a need to both refer to variables collected within other parts of a flow and transform them in basic ways.

Requirements:
* Given a dictionary (the context) and a string expression, evaluates to a dictionary value and an error which can be null
* Must allow easy variable substitutions within strings as this is the most common activity. ex: `Hi @contact.name`
* Must solve halting problem, we cannot let users build expressions which do not exit
* Must not allow looping, these should be represented in the flows themselves, we don't want programs
in expressions, the logic should be in the flow. Same goes for declaration of variables etc
* Must allow for the vast majority of transformations users require, though we realize not all will be possible within
an expression alone

## Excellent - Excel Inspired Expressions

Excellent is an expression language which consists of the functions provided by Excel with a few additions. This provides
a number of advantages:
* No halting problem, a single function call (possibly nested)
* Many users are already familiar with the provided functions, as they have used them in Excel
* More complex expressions can be 'tried out' within Excel
* Excel's set of functions has evolved over time to capture most use cases while remaining lightweight
* Variables from the context are referenced using standard dot-notation

Function and variable names are not case-sensitive so UPPER is equivalent to upper:
* `contact.name` -> `Marshawn Lynch`
* `FIRST_WORD(contact.name)` -> `Marshawn`
* `first_word(CONTACT.NAME)` -> `Marshawn`

Expressions can also include arithmetic with the add (+), subtract (-), multiply (*), divide (/) and exponent (^) operators:
* `1 + (2 - 3) * 4 / 5 ^ 6` -> `0.999744`

You can also join strings with the concatenate operator (&):
* `contact.first_name & " " & contact.last_name` -> `Marshawn Lynch`

### Types

Excellent will attempt to cast strings to the following types when used in functions:
* Strings
* Decimal values
* Datetimes: (ISO 8601) or `dd-mm-yyyy HH:MM(:SS)` or `mm-dd-yyyy HH:MM(:SS)`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dates can be many different formats and we also support booleans

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added boolean.. can you help with alternate date syntaxes. (those you think are kosher not the insane variety currently allowed)


### Logical comparisons

A logical comparison is an expression which evaluates to TRUE or FALSE. These may use the equals (=), not-equals (<>),
greater-than (>), greater-than-or-equal (>=), less-then (<) and less-than-or-equal (<=) operators
* `contact.age > 18` -> `TRUE`

Note that when comparing text values, the equals (=) and not-equals (<>) operators are case-insensitive.

## Templating

For templating, RapidPro uses the `@` character to denote either a single variable substitution or the beginning of a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a => an

Excellent block. `@` was chosen as it is known how to type by a broad number of users regardless of keyboard. It does have
the disadvantage of being used in email addresses and Twitter handles, but these are rarely ambiguous and escaping can
be done easily via doubling of the character (`@@`).

Functions are called by using the block syntax: `10 plus 4 is @(SUM(10, 4))`
Within a block, `@` is not required to refer to variable in the context: `Hello @(contact.name)`
A template can contain more than one substitution or block: `Hello @contact.name, you were born in @(YEAR(contact.birthday))`

Examples below use the following context:

```json
{
"contact": {
"name": "Marshawn Lynch",
"jersey": 24,
"age": 30,
"tel": "+12065551212",
"birthday": "22-04-1986",
"__string__": "Marshawn Lynch"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently this needs to be __default__ or *

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ya, was thinking for long term it may be more flexible / interesting to allow a "default" representation based on type required, therefore the use of __string__ here. One could imagine a scenario where a dict can advertise a default value for a date for example, and the casting of that parameter would use it via ___datetime___.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure the additional complexity buys us much there over a single key for a default value - which can have different types. I can't envisage a situation where we need to explicitly have different defaults for different types.

I like * as a default key. It's not so Pythony but it also can't collide with actual keys.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think there's a strong argument for __string__ colliding with actual keys though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No added complexity, we are just changing __default__ to __string__, this buys us room to grow in the future if we decide default types make sense. Given that some languages support this I think not putting ourselves in a box makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I think your groups examples demonstrates why we might need different concepts here, one for a stringified version of the value and one for a default value.

Because you really don't want a list of strings for groups for any kind of group test, but rather a list of UUIDs. So that would be the __default__ value, while the __string__ value would be a concatenated list of the groups.

So what do you think about having those two? __default__ representing whatever type is the most logical "default" for a more complex object, while __string__ representing the string representation used when evaluating down to a template (or concatenating to a string).

Will require some thought as to when to use which but I think that gives us the best of both worlds.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of __string__ being a way for a item in the context to control how it is formatted for template inclusion, but I wonder how useful that would be given that we need to consistently stringify values of all types, regardless of whether they come from the context or not. For example, all of following should render consistently with thousands separators:

@flow.number           --> "1,234"
@(flow.number + 1)     --> "1,235"
@(1234)                --> "1,234"

The easiest way to ensure that, is to have standard ways of stringifying each type. If lists are a type that we support in expressions then I think any time an expression evaluates to a list, it's rendered as a comma-separated list. So again, not sure we need __string__ unless a particular list in the context needs to do that differently. For example, if we had a SPLIT function, then both of the following should render consistently as CSV lists:

@contact.groups     --> "1234-abcd, 2345-bcde, 3456-cdef"
@(SPLIT("a b"))     --> "a, b"

And as mentioned in previous comment, if we did have a way for items in the context to control how they are stringified, I don't think it should be limited to default values. For example if x.y and x.y.z are context paths and both can be non-string values, why should only x.y be able to provide a __string__ function? So that's why I'm proposing that any value x in the context can be represented by an object in the form {"__value__": x}, so that any value can include __string__ and potentially other functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm understanding you correctly you are saying:

  1. We define how "primitive" values (int, decimal, array, dictionary) are stringified, which can include arrays etc.. Totally agree on that front, want to give that first shot?
  2. Any complex value can optionally provide both a __value__ and __string__ key. If __string__ is provided and we are evaluating in a string context, then it is used, otherwise if __value__ is provided then that is used. (possibly using the rules above to turn into a string if evaluated to a string context). I support that as well.

Does that represent your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Can give this a stab. I think our spec will need to define a set of "evaluation context variables" as some of these will want to be configurable. Right now we only support two hardcoded date formats (DD-MM-YYYY and MM-DD-YYYY) but it seems reasonable for a flow runner to use any date format for output. Also seems reasonable to support different number formats as different countries use spaces, commas and period in different ways.
  2. I'm actually saying any non-dict value can be substituted by a special dict that can provide __value__ and __string__ keys. The __default__ of a dict can also do this, e.g.
{
    "__default__": 2,
    "foo": "bar",
    "list": ["1a2b", "2bc3"]
}

can be rewritten with __string__ overrides for everything:

{
    "__default__": {"__value__": 2, "__string__": "2.0"},
    "foo": {"__value__": "bar", "__string__": "BAR!"},
    "list": {"__value__": ["1a2b", "2bc3"], "__string__": "1a2b|2bc3"},
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From today's chat:

We agreed that __string__ and __value__ likely make sense, both are optional, IFF there are sane rules around stringifying both complex and primitive values.

},
"channel": {
"name": "Twilio 1423",
"address": "1423"
}
}
```

| Template | Evaluated | Notes |
| ---------------------------- | ----------------------------------------------- | ----------------------------------------- |
| Hi @contact.name | Hi Marshawn Lynch | |
| Hi @contact | Hi Marshawn Lynch | if a dictionary is referred to which has a `__string__` element, that is used in substitution |
| Hi @channel | Hi { "name": "Twilio 1423", "address": "1423" }| without a `__string__` element, JSON representation is substituted |
| You can contact us at foo@bar.com | You can contact us at foo@bar.com | bar is not in context, so passed through |
| You can contact us at foo@contact.com | You can contact us at foo@contact.com | contact.com not in context, so passed through |
| You can contact us at foo@@contact.tel | You can contact us at foo@contact.tel | `@@` escapes to `@` |
| Next year you will be @(contact.age+1) | Next year you will be 31 | simple math possible within expression |
| You first name is @(WORD(contact.name, 1)) | Your first name is Marshawn | Specific to Excellent, 1 based indexing |
| You are now @(YEAR(NOW()) - YEAR(contact.birthday)) | You are now 30 | Conversion to date, then subtraction


# Function Reference

Function arguments in square brackets ([ ... ]) are optional.

### Date and time functions

#### DATE(year, month, day)
Defines a new date value

```This is a date @DATE(2012, 12, 25)```

#### DATEVALUE(text)
Converts date stored in text to an actual date, using your organization's date format setting

```You joined on @DATEVALUE(contact.joined_date)```

#### DAY(date)
Returns only the day of the month of a date (1 to 31)

```The current day is@DAY(contact.joined_date)```

#### EDATE(date, months)
Moves a date by the given number of months

```Next month's meeting will be on @EDATE(date.today, 1)```

#### HOUR(datetime)
Returns only the hour of a datetime (0 to 23)

```The current hour is @HOUR(NOW())```

#### MINUTE(datetime)
Returns only the minute of a datetime (0 to 59)

```The current minute is @MINUTE(NOW())```

#### MONTH(date)
Returns only the month of a date (1 to 12)

```The current month is @MONTH(NOW())```

#### NOW()
Returns the current date and time

```It is currently @NOW()```

#### SECOND(datetime)
Returns only the second of a datetime (0 to 59)

```The current second is @SECOND(NOW())```

#### TIME(hours, minutes, seconds)
Defines a time value which can be used for time arithmetic

```2 hours and 30 minutes from now is @(date.now + TIME(2, 30, 0))```

#### TIMEVALUE(text)
Converts time stored in text to an actual time

```Your appointment is at @(date.today + TIME("2:30"))```

#### TODAY()
Returns the current date

```Today's date is @TODAY()```

#### WEEKDAY(date)
Returns the day of the week of a date (1 for Sunday to 7 for Saturday)

```Today is day no. @WEEKDAY(TODAY()) in the week```

#### YEAR(date)
Returns only the year of a date

```The current year is =YEAR(NOW())```

### Logical functions

#### AND(arg1, arg2, ...)
Returns TRUE if and only if all its arguments evaluate to TRUE

```@AND(contact.gender = "F", contact.age >= 18)```

#### IF(arg1, arg2, ...)
Returns one value if the condition evaluates to TRUE, and another value if it evaluates to FALSE

```Dear @IF(contact.gender = "M", "Sir", "Madam")```

####OR(arg1, arg2, ...)
Returns TRUE if any argument is TRUE

```@OR(contact.state = "GA", contact.state = "WA", contact.state = "IN")```

### Math functions

#### ABS(number)
Returns the absolute value of a number

```The absolute value of -1 is @ABS(-1)```

#### MAX(arg1, arg2, ...)
Returns the maximum value of all arguments

```Please complete at most @MAX(flow.questions, 10) questions```

#### MIN(arg1, arg2, ...)
Returns the minimum value of all arguments

```Please complete at least @MIN(flow.questions, 10) questions```

#### POWER(number, power)
Returns the result of a number raised to a power - equivalent to the ^ operator

```2 to the power of 3 is @POWER(2, 3)```

#### SUM(arg1, arg2, ...)
Returns the sum of all arguments, equivalent to the + operator

```You have =SUM(contact.reports, contact.forms) reports and forms```

### Text functions

#### CHAR(number)
Returns the character specified by a number

```As easy as @CHAR(65), @CHAR(66), @CHAR(67)```

#### CLEAN(text)
Removes all non-printable characters from a text string

```You entered @CLEAN(step.value)```

#### CODE(text)
Returns a numeric code for the first character in a text string

```The numeric code of A is @CODE("A")```

#### CONCATENATE(args)
Joins text strings into one text string

```Your name is @CONCATENATE(contact.first_name, " ", contact.last_name)```

#### FIXED(number, [decimals], [no_commas])
Formats the given number in decimal format using a period and commas

```You have @FIXED(contact.balance, 2) in your account```

#### LEFT(text, num_chars)
Returns the first characters in a text string

```You entered PIN @LEFT(step.value, 4)```

#### LEN(text)
Returns the number of characters in a text string

```You entered @LEN(step.value) characters```

#### LOWER(text)
Converts a text string to lowercase

```Welcome @LOWER(contact)```

#### PROPER(text)
Capitalizes the first letter of every word in a text string

```Your name is @PROPER(contact)```

#### REPT(text, number_times)
Repeats text a given number of times

```Stars! @REPT("*", 10)```

#### RIGHT(text, num_chars)
Returns the last characters in a text string

```Your input ended with ...=RIGHT(step.value, 3)```

#### SUBSTITUTE(text, old_text, new_text, [instance_num])
Substitutes new_text for old_text in a text string. If instance_num is given, then only that instance will be substituted

```@SUBSTITUTE(step.value, "can't", "can")```

#### UNICHAR(number)
Returns the unicode character specified by a number

```As easy as =UNICHAR(65), =UNICHAR(66), =UNICHAR(67)```

#### UNICODE(text)
Returns a numeric code for the first character in a text string

```The numeric code of A is @UNICODE("A")```

#### UPPER(text)
Converts a text string to uppercase

```WELCOME =UPPER(contact)!!```

### Excellent Specific Functions
These functions are not found in Excel but have been provided for the sake of convenience.

#### FIRST\_WORD(text)
Returns the first word in the given text - equivalent to WORD(text, 1)

```The first word you entered was @FIRST_WORD(step.value)```

#### PERCENT(number)
Formats a number as a percentage

```You've completed @PERCENT(contact.reports_done / 10) reports```

#### READ\_DIGITS(text)
Formats digits in text for reading in TTS

```Your number is @READ_DIGITS(contact.tel_e164)```

#### REMOVE\_FIRST\_WORD(text)
Removes the first word from the given text. The remaining text will be unchanged
```You entered @REMOVE_FIRST_WORD(step.value)```

#### WORD(text, number, [by_spaces])
Extracts the nth word from the given text string. If stop is a negative number, then it is treated as count backwards from the end of the text. If by_spaces is specified and is TRUE then the function splits the text into words only by spaces. Otherwise the text is split by punctuation characters as well

```@WORD("hello cow-boy", 2)``` will return "cow"

```@WORD("hello cow-boy", 2, TRUE)``` will return "cow-boy"

```@WORD("hello cow-boy", -1)``` will return "boy"

#### WORD\_COUNT(text, [by_spaces])
Returns the number of words in the given text string. If by_spaces is specified and is TRUE then the function splits the text into words only by spaces. Otherwise the text is split by punctuation characters as well

```You entered @WORD_COUNT(step.value) words```

#### WORD\_SLICE(text, start, [stop], [by_spaces])
Extracts a substring of the words beginning at start, and up to but not-including stop. If stop is omitted then the substring will be all words from start until the end of the text. If stop is a negative number, then it is treated as count backwards from the end of the text. If by_spaces is specified and is TRUE then the function splits the text into words only by spaces. Otherwise the text is split by punctuation characters as well

```@WORD_SLICE("RapidPro expressions are fun", 2, 4)``` will return 2nd and 3rd words "expressions are"

```@WORD_SLICE("RapidPro expressions are fun", 2)``` will return "expressions are fun"

```@WORD_SLICE("RapidPro expressions are fun", 1, -2)``` will return "RapidPro expressions"

```@WORD_SLICE("RapidPro expressions are fun", -1)``` will return "fun"