Skip to content
Anthony Truskinger edited this page Oct 12, 2023 · 23 revisions

Filtering

Basic filtering (for /resource)

  • For resources with text, QSP: filter_partial_match - partial match, case insensitive, on all string fields.
  • There is also generic equality filtering, QSP: filter_{any_field_on_resource}. This can be any data type, however for strings it is a case sensitive, entire contents match.
  • Returns a 'filtering details object`

Advanced filtering (for /resource)

  • Specialised route: POST : /resource/filter
  • Inspired by elastic search filters
  • Posts & returns a 'filter details object`
    • any field accepted
    • some associations accepted depending on the model
      • This follows the doted notation as follows: plural_model_name.model_parameter: Combination or Comparison or Subset
    • available filter operations:
      • Combinations: and, or, not (when no combinator is given, and is assumed)
        • NEW you can now supply an array to a combination object. This allows multiple conditions for the same field to exist in one query (previously the duplicate key in a hash would mean only the last condition was kept)
          • an array is accepted as a root for a filter (in which case it is assumed to contained within an and combinator
          • combinator arrays can only contain other filter hashes
          • combinator arrays flatten with the next level, thus:
              {
                "filter": {
                  "or": [
                    {
                      "id": { "gt": 1 },
                      "name": { "eq": "goku" }
                    },
                    {
                      "power_level": { "gt": 9000 }
                    }
                  ]
                }
              }
            is equivalent to:
              {
                "filter": {
                  "or": 
                    {
                      "id": { "gt": 1 },
                      "name": { "eq": "goku" },
                      "power_level": { "gt": 9000 }
                    }
                }
              }
            and roughly translates to id > 1 OR name = 'goku' or power_level > 9000.
      • Comparisons:
        • eq, equal
          • for json/jsonb columns provide a object to compare
        • not_eq, not_equal
          • for json/jsonb columns provide a object to compare
        • lt, less_than
        • not_lt, not_less_than
        • gt,greater_than
        • not_gt, not_greater_than
        • lteq, less_than_or_equal
        • not_lteq, not_less_than_or_equal
        • gteq, greater_than_or_equal
        • not_gteq, not_greater_than_or_equal
      • Subsets:
        • Ranges: range, in_range, not_range, not_in_range
          • interval (Uses regex /(\[|\()(.*),(.*)(\)|\])/ which allows for inclusive or exclusive bounds)
          • from and to
        • Arrays: in, not_in
        • Case insensitive contents match: contains, contain, not_contains, not_contain, does_not_contain
          • for json/jsonb columns provide a json like string fragment to search for
        • Starts with: starts_with, start_with, not_starts_with,not_start_with, does_not_start_with
        • Ends with: ends_with,end_with, not_ends_with, not_end_with, does_not_end_with
        • Regular expressions: regex, regex_match, matches, not_regex, not_regex_match. does_not_match, not_match
    • Other available options as in API:-Spec:
      • paging
      • projection
      • sorting
    • The values that can be provided to each filter operation are not precisely defined, but should be a string, number, null/nil, or true/false. Anything else may produce unexpected results.

Filter request object

Here is a summary of a filter request object payload:

// all keys are optional
{
  // the filter object that restricts what results are returned
  "filter": {},
  "paging": {},
  "projection": {},
  "sorting": {},
  // proposed: no API supports this yet
  "options": {}
}

Advanced filtering (with a Query String Parameter)

It won't always be possible to send a JSON payload with POST to a filter endpoint. For example, when we want a browser to handle a result or a response natively that is generated from a link or a form, only form multipart data (for POSTs) or query string parameters (for GET/POST) are supported.

If you're in this situation you can send a filter body, as per the above specification, encoded in the filter_encoded query string parameter.

  • query string parameter name: filter_encoded
  • format: base64url encoded JSON payload of a standard filter object
  • precedence: the payload from the filter_encoded parameter has the lowest priority. Parameters from other query string parameters or the body of a filter request will override the keys from the filter payload.

Example:

GET /audio_recordings?filter_encoded=eyJmaWx0ZXIiOnsicmVnaW9ucy5pZCI6eyJlcSI6MTF9fSwic29ydGluZyI6eyJvcmRlcl9ieSI6InJlY29yZGVkX2RhdGUiLCJkaXJlY3Rpb24iOiJkZXNjIn0sInBhZ2luZyI6eyJpdGVtcyI6MjV9LCJwcm9qZWN0aW9uIjp7ImluY2x1ZGUiOlsiaWQiLCJyZWNvcmRlZF9kYXRlIiwic2l0ZXMubmFtZSIsInNpdGVfaWQiLCJjYW5vbmljYWxfZmlsZV9uYW1lIl19fQ%3D%3D

Take care: we're limited to sending 1024 characters of query strings. If the parameter is cut off the payload will be invalid.

Because of the way rails merges all parameter sources together, the filter_encoded parameter could be passed in either:

  • the body of a POST request (e.g. to /filter) in a JSON object (e.g. {"filter_encoded":"xxxxxxxxx"})
  • the body of a POST request as multipart fordata (e.g. filter_encoded=xxxxxxxxx)

Expressions (experimental)

A very limited number of types support expressions inserted where the value of a filter expression normally would be placed.

These expressions serve to transform a type into a useful derivative; they're not meant for direct equality comparison, mutation, or any other purpose.

Expressions are an array of functions applied to a field. Each function takes one parameter passed to it from the output of the previous pipeline expression. So far no functions accept parameters.

Where a value is normally accepted, replace it with an object that has the key expressions.

{
  "expressions": [ <expression>... ],
  "value": <value>
}

Supported expressions

  • Date types
    • time_of_day - extract the time fraction from a date
    • local_tz - convert a date to local date. Requires a date that can infer it's timezone. Must come before time_of_day.
      • Warning: each date in a timezone in a query could have a different timezone offset. DST is an example where different records each day would be selected with this filter.
    • local_offset - convert a date to local date with a fixed UTC offset. Requires a date that can infer it's timezone. Must come before time_of_day.
      • The "base" UTC offset of a timezone - that is the offset without accounting for DST - will be used.
      • Why is this useful? It ignores changes in offset (like in the case of DST) and returns dates in one offset for sub-selections that match the physical realities of time. In other words, dawn chorus doesn't suddenly start at a different real time because DST kicks in.

Examples

Filtering by time of day

When posting to /audio_recordings, find any recording that has audio overlapping the time of day range 3 AM to 5 AM, ignoring any changes in DST.

{
  "filter": {
    "or": {
      "recorded_end_date": {
        "gteq": { "expressions": [ "local_offset", "time_of_day"], "value": "03:00" }
      },
      "recorded_date": {
        "lteq": { "expressions": [ "local_offset", "time_of_day"], "value":  "05:00" }
      }
    }
  }
}

The overlapping inclusion behavior is enabled through the gteq and lteq comparisons. This can visualized with the following diagram. Given two recordings:

  02:00    04:00    06:00 
    |--------|--------|      (two recordings)
         <------->           (filter range)
  • The first file is selected because it's end time of day (04:00) is greater than or equal (gteq) to the lower filter limit (03:00)
  • The second file is selected because it's start time of day (04:00) is less than or equal (lteq) to the upper filter limit (05:00)

Request

POST /audio_recordings/filter?filter_notes=hello&filter_channels=28&filter_partial_match=testing_testing
{
   "filter":{
      "and":{
         "site_id":{
            "less_than":123456,
            "greater_than":9876,
            "in":[
               1,
               2,
               3
            ],
            "range":{
               "from":100,
               "to":200
            }
         },
         "status":{
            "greater_than_or_equal":4567,
            "contains":"contain text",
            "starts_with":"starts with text",
            "ends_with":"ends with text",
            "range":{
               "interval":"[123, 128]"
            }
         },
         "audio_events.creator_id": {
           "greater_than": 10
         },
         "or":{
            "duration_seconds":{
               "not_eq":40
            },
            "not":{
               "channels":{
                  "less_than_or_equal":9999
               }
            }
         }
      },
      "or":{
         "recorded_date":{
            "contains":"Hello"
         },
         "media_type":{
            "ends_with":"world"
         },
         "duration_seconds":{
            "eq":60,
            "lteq":70,
            "equal":50,
            "gteq":80
         },
         "channels":{
            "eq":1,
            "less_than_or_equal":8888
         }
      },
      "not":{
         "duration_seconds":{
            "not_eq":140
         }
      }
   },
   "projection":{
      "include":[
         "recorded_date",
         "site_id",
         "duration_seconds",
         "media_type"
      ]
   },
   "sort":{
      "order_by":"duration_seconds",
      "direction":"desc"
   },
   "paging":{
      "page":1,
      "items":10
   }
}

Resulting SQL

SELECT
"audio_recordings"."recorded_date", "audio_recordings"."site_id",
"audio_recordings"."duration_seconds", "audio_recordings"."media_type"
FROM "audio_recordings"
WHERE ("audio_recordings"."deleted_at" IS NULL)
AND ("audio_recordings"."site_id" < 123456
AND "audio_recordings"."site_id" > 9876
AND "audio_recordings"."site_id" IN (1,2,3)
AND "audio_recordings"."site_id" >= 100
AND "audio_recordings"."site_id" < 200
AND "audio_recordings"."status" >= 4567
AND "audio_recordings"."status" ILIKE '%containtext%'
AND "audio_recordings"."status" ILIKE 'startswithtext%'
AND "audio_recordings"."status" ILIKE '%endswithtext'
AND "audio_recordings"."status" >= '123'
AND "audio_recordings"."status" <= '128'
AND ("audio_recordings"."duration_seconds" != 40
OR NOT ("audio_recordings"."channels"<=9999)))
AND (((((((("audio_recordings"."recorded_date" ILIKE '%Hello%'
OR "audio_recordings"."media_type" ILIKE '%world')
OR "audio_recordings"."duration_seconds" = 60)
OR "audio_recordings"."duration_seconds" <= 70)
OR "audio_recordings"."duration_seconds" = 50)
OR "audio_recordings"."duration_seconds" >= 80)
OR "audio_recordings"."channels" = 1)
OR "audio_recordings"."channels" <= 8888))
AND ( NOT ("audio_recordings"."duration_seconds" != 140))
AND (("audio_recordings"."media_type" ILIKE '%testing_testing%'
OR "audio_recordings"."status" ILIKE '%testing_testing%'))
AND ("audio_recordings"."status" = 'hello'
AND "audio_recordings"."channels" = 28)
ORDER BY "audio_recordings"."duration_seconds" DESC
LIMIT 10 OFFSET 0