Skip to content

Row queries

David Megginson edited this page May 4, 2020 · 6 revisions

The HXL Proxy's filters uses row queries to select (or skip) certain rows in a dataset. They are most common in the Select rows filter, but can also appear to control where other filters apply. Here is a sample query:

org+name=unicef

This query selects every row where the value "unicef" (case-insensitive) appears in a column that matches the tag pattern "org+name" (note that the tag pattern omits the initial '#' — it is optional in queries and most other places that tag patterns appear.)

In place of a literal value on the right, you can also use a row formula. For example, this row query will select every row where the value of the #targeted column is less than half the value of the #affected column:

#targeted < {{ #affected / 2 }}

Operators

The following operators are available:

Operator Description Example
= The value appears in a column matching the tag pattern. org+name=unicef
!= The value does not appear in a column matching the tag pattern. status+code!=completed
< The content of the column is numerically or lexically less than the value provided. targeted<300
<= The content of the column is numerically or lexically less than or equal to the value provided. population+f<=20000
> The content of the column is numerically or lexically great than the value provided. population+f>;20000
>= The content of the column is numerically or lexically great than or equal to the value provided. contact+name>=m
~ The content of the column matches the regular expression provided. activity.name~vaccination
!~ The content of the column does not match the regular expression provided. status!~^(planned|completed)$
is The content matches a condition. Currently available conditions are "empty", "not empty", "number", "not number", "date", "not date", "min", "not min", "max", and "not max". #date+year is max

Notes

  • Matching is always case-insensitive: org=UNICEF and org=unicef will match the same rows.
  • The leading # is optional in the tag pattern: org=red cross and #org=red cross will match the same rows.
  • Strings are always complete matches; regular expressions can match just part of the content (use "^" and "$" to constrain to a full match).
  • Leading and trailing whitespace is always removed, and remaining whitespace, normalised, before matching.
  • If both values can be converted to numbers, row queries will use numeric comparison instead of string comparison
  • Row queries will use the #date hashtag to trigger special handling for dates
  • Note that "is (not) min" and "is (not) max" force the whole dataset to be cached at that point in the processing chain, so could slow down processing for very large datasets.
Clone this wiki locally