Skip to content

Tilda JSON Syntax: View Pivots

Laurent Hasson edited this page Dec 21, 2021 · 71 revisions

<-- View Syntax

Single Pivots

Pivoting data is one of the most powerful data preparation processes. A lot of data is presented in row-based format which becomes a lot easier to use when transformed into column-based data. Much survey of form-based data is available in row-based formats for example.

Conceptually, the simplest way to think about Pivoting is as rotating your data 90 degrees so that rows become columns. In the following example, we look at a fictitious satisfaction survey consisting of 4 questions (Q1 through Q3) with values from 1 to 10 (from least to most satisfied) and one question, Q4, with free form text for comments. The following table is a very common representation of that data, where formId is a foreign key to another table containing form-level details such as for example, the data the form was entered, the user who entered the form etc...

refnum fillDate userRefnum
111 2019-09-03 543567
222 2019-09-03 634568
formRefnum field value
111 Q1 10
111 Q2 8
111 Q3 8
111 Q4 Blah...
222 Q1 9
222 Q2 7
222 Q3 7
222 Q4 Bleh...

A very common task, for online applications as well as analytics scenarios, is to transform that data by pivoting it in the following way:

formRefnum formFillDate Q1 Q2 Q3 Q4
111 2019-09-03 10 8 8 Blah...
222 2019-09-03 9 7 7 Bleh...

This works obviously when you know in advance the columns you want to create. Defining a pivot looks like this:

 { "name": "Form_PivotView"
  ,"description": "A pivoted view of forms"
  ,"columns":[
      { "sameas": "Form.refnum"     , "name":"formRefnum"     }
     ,{ "sameas": "Form.fillDate"   , "name":"formFillDate"   }
     ,{ "sameas": "FormAnswer.value", "aggregate":"MAX"       }
    ]
  ,"pivots":[
      { "on": "FormAnswer.field" 
       ,"aggregates":[
           { "name":"value", "prefix":"", "suffix":"" }
         ]
       ,"values":[ { "value":"Q1", "expression":"Tilda.toInt(?,null)", "type":"INTEGER", "description":"Q1"}
                  ,{ "value":"Q2", "expression":"Tilda.toInt(?,null)", "type":"INTEGER", "description":"Q2"}
                  ,{ "value":"Q3", "expression":"Tilda.toInt(?,null)", "type":"INTEGER", "description":"Q3"}
                  ,{ "value":"Q4"                                                      , "description":"Q4"}
                ]
      }
    ]
 }

To create a pivoted view, you must:

  • Specify a meaningful name and description for the pivot view. The string "PivotView" in any case format you wish, must end the view name.
  • List first all the columns that will form your anchor (here, "formRefnum" and "formFillDate").
  • Add the aggregates.
    • As per the View Columns Syntax, aggregates must be defined last and Tilda will assume a group by for the first non-aggregate columns defined (here, the first 4 forming our anchor).
    • In this example, we only have one aggregate, but multiple ones could be defined as we'll see when we expand this example further below.
    • A trick for single name/value pair pivoting is to simply use MAX or MIN as an aggregate since we know there is only one answer to a given question. Technically, FIRST or LAST (the other 2 aggregates that yield a single value of the same type as the source column) could also be used but may not perform as well. As is always the case with performance, measure what you get.

Then comes an array of one or more Pivot definitions. The main fields are:

  • on: The column that will act as the pivot. That column shouldn't be specified in the main list of columns.
  • aggregates: an array of the aggregate column names (defined in the main view) being used.
    • name: the name of the aggregate column
    • prefix: a prefix to be used for generated column names
    • suffix: a suffix to be used for generated column names
  • values: an array of value definitions for the pivoted columns
    • value: the value of the on column to pivot for.
    • name: optionally, a base-line name for the generated column. If unspecified, the value will be used.
    • description: a text description of the pivoted value
    • expression: optionally, you can also define Short Expressions for the pivoted value.
  • interleave: true|false, false by default, specifying the order in which the columns are generated for the aggregates. If true, the columns will be generated aggregate-first. If false, the default, they will be generated value-first.

🎈 NOTE: Pivoted column names are generated by taking the individual pivot value names cross-producted with the list of aggregates specified with prefix/suffix if provided.

  • Let's assume that we define aggregates=[{name="abc",prefix="a_"}, {name="xyz", prefix="x_"}] respectively and the values "Q1" through "Q4", the system will generate columns "a_Q1", ... , "a_Q4", "x_Q1", .., "x_Q4" if interleave is false (default), or columns "a_Q1", "x_Q1", ... , "a_Q4", "x_Q4" if interleave is true.

🎈 NOTE: If more than one aggregate is specified, at least one of prefix or suffix must be specified to differentiate the generated columns for each aggregate.

🎈 NOTE: Pivoting can specify type transformations to make the final data easier to use in many scenarios, including analytics. In most cases, a flat table of question/answers would use a generic VARCHAR/STRING data type to hold answer values. But when pivoting, you may know that answer X is a date, answer Y is text, and answer Z is an integer.

Multiple Pivots

As you noticed in the pivoting definition JSON elements, we indicated that you could list multiple aggregates. Let's assume a richer forms data table where we have an extra 2 columns to keep track of whether the answer was the correct one, and the time it took to answer the question:

formRefnum field value correct timeMillis
111 QA 5 1 4500
111 QB 2 1 6000
111 QC 3 0 9500
111 QD 3 1 3000
222 QA 5 1 4000
222 QB 4 0 3000
222 QC 3 1 4000
222 QD 3 1 3500

We want a pivot like this:

formRefnum formFillDate countCorrect timeMillisTotal timeMillisAvg QA_correct QA_timeMillis QB_correct QB_timeMillis
111 2019-09-03 3 23000 5750 1 4500 1 6000
222 2019-09-03 3 14500 3625 1 4000 0 3000

The view definition would look like this:

,{ "name": "Form2_PivotView"
  ,"description": "A pivoted view of forms"
  ,"columns":[
      { "sameas": "Form.refnum"  , "name":"formRefnum"     }
     ,{ "sameas": "Form.fillDate", "name":"formFillDate"   }
     ,{ "sameas": "TestAnswer.correct"   , "name":"countCorrect"        , "aggregate":"COUNT", "filter":"correct = true"   }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisTotal", "aggregate":"SUM"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisAvg"  , "aggregate":"AVG"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisMax"  , "aggregate":"MAX", "joinOnly":true  }
     ,{ "sameas": "TestAnswer.correct"                             , "aggregate":"MIN", "joinOnly":true  }
    ]
  ,"pivots":[
      { "on": "TestAnswer.questionId" 
       ,"aggregates":[
           { "name":"correct"      , "suffix":"_correct"    }
          ,{ "name":"timeMillisMax", "suffix":"_timeMillis" }
         ]
       ,"values":[ { "value":"QA", "description":"Question A" }
                  ,{ "value":"QB", "description":"Question B" }
                ]
       ,"interleave":true
      }
    ]
 }

So what did we do here? We are generating a view with a single row per form where we want

  • The number of correct answers for each form.
  • The total, avg of 'timeMillis' for each form.
  • Two columns for the 'timeMills' and 'correct' values for each question listed (i.e., A, B...).

What are the tricks?

  • The order of the columns matter.
    • You must first list your anchor columns: 'formRefnum', 'formFillDate'.
    • Then your plain aggregate columns: 'countCorrect', 'timeMillisTotal' and 'timeMillisAvg'.
    • Then your pivoted aggregate columns: 'timeMillisMax' and 'correct'
  • Note that we named our aggregates in a way that makes them sortable easily. That may be of importance to you if you use external tools that typically show columns in a view or a table alphabetically.
  • Note the use of 'filter' for the "correct" and "incorrect" counts
  • We are pivoting on the last two aggregates:
    • "timeMillisMax" makes sense to have in the final view.
    • The MIN aggregate on "correct" however only makes sense for the pivoting but not in the final view: what is the meaning of having a Min(correct) at the form level (or a MAX for that matter)? It's kind of useless semantically. So we mark that column as "joinOnly" which will not output it in the final view irrespective of whether we actually had any joins (which in this case, we do not).
  • In the resulting view, it'll be nice to see QX_correct, QX_timeMillis and so on rather than QX_correct, QY_correct and so on, so we specified "interleave" to true.
  • Since we are interleaving, it makes more sense in our case to use a suffix definition rather than a prefix. In general, suffixes are a better choice for interleave=true and prefixes are a better choice for interleave=false.
  • Adding the pivoting for QC and QD should be obvious by adding extra lines in the values section:
       ,"values":[ { "value":"QA", "description":"Question A" }
                  ,{ "value":"QB", "description":"Question B" }
                  ,{ "value":"QC", "description":"Question C" }
                  ,{ "value":"QD", "description":"Question D" }
                ]

Several new features have been added to Tilda V2

Aggregate composition

When doing pivots, it's possible to want two nested levels of aggregates. For example, consider daily weather data with multiple readings per day per zip codes. You want a pivoted view that collects daily averages per zip-code over the past 30 days. The first level would define daily averages, while the second level would create an array of the average values for the last 30 days for each measure, i.e., temperature, aqi, rainfall...

In V2, we can now specify an aggregate for pivots either in the source column and/or in the pivot aggregate definition. This provides for a more natural expression of the aggregation logic needed for pivoting as well as the ability to mix different types of aggregation at each level so as to achieve maybe a better representation of all the data points for the past 30 days in an array. For example:

 { "name":"Weather_PivotView"
  ,"description":"An aggregate of weather data for the past 30 days",
   "columns":[
      { "sameas":"weather.zipCode" }
     ,{ "sameas":"weather.taken"  , "expression":"?::DATE", "type":"DATE" }
     ,{ "sameas":"weather.value"  , "aggregate":"AVG"  }
    ]
  ,"subWhere":"weather.taken >= now() - interval '30 days'"
  ,"pivots":[
      { "on": "weather.name"
       ,"aggregates":[
            { "name":"taken", "suffix":"_days", "aggregate":"ARRAY", "orderBy":["taken"] }
           ,{ "name":"value", "suffix":"_values", "aggregate":"ARRAY", "orderBy":["taken"] }
         ]
       ,"interleave":true
       ,"values":[ { "value":"TEMP"    , "description":"Temperature"        }
                  ,{ "value":"AQI"     , "description":"Air Quality Index." }
                  ,{ "value":"RAINFALL", "description":"Rain Fall"          }
                 ]
      }
    ]
 }

This will generate data as:

zipCode TEMP_days TEMP_values AQI_days AQI_values ...
11111 [2021-12-15,2021-12-16,...] [42, 40,...] [.....] [.........] ...
11112 [2021-12-15,2021-12-16,...] [44, 38,...] [.....] [.........] ...
11113 [2021-12-15,2021-12-16,...] [41, 36,...] [.....] [.........] ...

This is nice to send as part of a service for example for some straightforward UI display.

What are the tricks?

  • 'taken' is used for pivoting for doesn't define an aggregate. It then gets rolled up in an ARRAY as part of the second level definition.
  • 'value' is averaged per day in the first level, and then aggregated in an array in the second level definition.
  • in order for the arrays to match element for element, we have to make sure the order for the aggregate is specified, in this case, 'taken'.

Globals

When you have a column you want to use in your logic, but not see in the final view, you can use the "viewOnly" flag. However, what happens if you need that column in the second level of the view, in a 'expression' of the pivots, or an 'orderBy'? The column must come through, and if it is also a pivot, it will be defined in the second level. This is a catch-22. With "globals":true|false defined for the Pivot, you can control that behavior.

For example, let's assume you are calculating simple counts in your pivot, it makes sense to have a "count_all" value as well:

zipCode day count_all TEMP_count AQI_count ...
11111 2021-12-15 10 5 2 ...
11111 2021-12-16 10 5 2 ...
11113 2021-12-15 9 8 1 ...

But if you didn't want this perhaps because you didn't need the information, or it didn't make sense, you can use "globals":false to eliminate it. For example, assume you want averages for each measure, a "global" average across all measures makes little sense.

Clone this wiki locally