Skip to content

First Tutorial Part 6: Multi Pivot And Analytics Views

Laurent Hasson edited this page Sep 16, 2019 · 2 revisions
Previous Main Next
<-- Part 5 Main Part 7 -->

Multi-Pivot View

We now want to exploit that new table and build a view to do the pivoting like we did before, but with some extra goodies:

,{ "name": "Test_XYZ_PivotView"
  ,"description": "A pivoted view of XYZ forms"
  ,"columns":[
      { "sameas": "Form.refnum"  , "name":"formRefnum"     }
     ,{ "sameas": "Form.fillDate", "name":"formFillDate"   }
     ,{ "sameas": "User.refnum"  , "name":"formUserRefnum" }
     ,{ "sameas": "User.email"   , "name":"formUserEmail"  }
     ,{ "sameas": "TestAnswer.correct"   , "name":"countCorrect"        , "aggregate":"COUNT", "filter":"correct = true"   }
     ,{ "sameas": "TestAnswer.correct"   , "name":"countIncorrect"      , "aggregate":"COUNT", "filter":"correct = false"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisTotal", "aggregate":"SUM"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisAvg"  , "aggregate":"AVG"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisMin"  , "aggregate":"MIN"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisMax"  , "aggregate":"MAX"  }
     ,{ "sameas": "TestAnswer.correct"                             , "aggregate":"MIN", "joinOnly":true  }
    ]
  ,"subWhereX":{
      "clause":["    Form.type = 'XYZ'" 
               ,"and Form.deleted is null"
               ,"and TestAnswer.deleted is null"
               ]
     ,"description":["Only look at XYZ forms and answers that have not been deleted."
                    ]
    }
  ,"pivots":[
      { "on": "TestAnswer.questionId" 
       ,"aggregates":[
           { "name":"correct"      , "suffix":"_correct"    }
          ,{ "name":"timeMillisMax", "suffix":"_timeMillis" }
         ]
       ,"values":[ { "value":"QX", "description":"Question X" }
                  ,{ "value":"QY", "description":"Question Y" }
                  ,{ "value":"QZ", "description":"Question Z" }
                ]
       ,"interleave":true
      }
    ]
 }

So what did we do here? We are generating a view with a single row per form where we want

  • The number of correct answers for each form.
  • the number of incorrect answers for each form.
  • The total, avg, min and max of 'timeMillis' for each form.
  • Two columns for the 'timeMills' and 'correct' values for each question listed (i.e., X, Y and Z).

What are the tricks?

  • The order of the columns matter.
    • You must first list your anchor columns: 'formRefnum', 'formFillDate', 'formUserRefnum' and 'formUserEmail'.
    • Then your plain aggregate columns: 'countCorrect', 'countIncorrect', 'timeMillisTotal', 'timeMillisAvg' and 'timeMillisMin'.
    • Then your pivotted aggregate columns: 'timeMillisMax' and 'correct'
  • Note that we named our aggregates in a way that makes them sortable easily. That may be of importance to you if you use external tools that typically show columns in a view or a table alphabetically.
  • Note the use of 'filter' for the "correct" and "incorrect" counts
  • We are pivoting on the last two aggregates:
    • "timeMillisMax" makes sense to have in the final view.
    • The MIN aggregate on "correct" however only makes sense for the pivotting but not in the final view: what is the meaning of having a Min(correct) at the form level (or a MAX for that matter)? It's kind of useless semantically. So we mark that column as "joinOnly" which will not output it in the final view irrespective of whether we actually had any joins (which in this case, we do not).
  • In the resulting view, it'll be nices to see QX_correct, QX_timeMillis and so on rather than QX_correct, QY_correct and so on, so we specified "interleave" to true.
  • Since we are interleaving, it makes more sense in our case to use a suffix definition rather than a prefix. In general, suffixes are a better choice for interleave=true and prefixes are a better choice for interleave=false.

The results from that view would look like:

formRefnum formFillDate formUserRefnum formUserEmail countCorrect countIncorrect timeMillisTotal timeMillisAvg timeMillisMin timeMillisMax QX_correct QX_timeMillis QY_correct QY_timeMillis QZ_correct QZ_timeMillis
111 2019-09-03 543567 a@some.com 3 0 30000 10000 5000 15000 1 5000 1 10000 1 15000
222 2019-09-03 634568 b@some.com 2 1 21000 7000 3000 10000 0 3000 1 10000 1 8000

The SQL generated under the covers is more complex than previously only because we are doing more complex things now. Still, it should be somewhat evident:

create or replace view TILDATEST.Test_XYZ_PivotView as 
with T as (
-- 'A pivoted view of XYZ forms'
select TILDATEST.Form."refnum" as "formRefnum" -- The primary key for this record
     , trim(TILDATEST.Form."fillDateTZ") as "formFillDateTZ" -- Generated helper column to hold the time zone ID for 'fillDate'.
     , TILDATEST.Form."fillDate" as "formFillDate" -- The date the form was filled
     , TILDATEST.User."refnum" as "formUserRefnum" -- The primary key for this record
     , TILDATEST.User."email" as "formUserEmail" -- The user's email
     , TILDATEST.TestAnswer."questionId" as "questionId" -- Question id
     , count(TILDATEST.TestAnswer."correct") filter(where correct = true) as "countCorrect" -- Whether the answer is correct or not
     , count(TILDATEST.TestAnswer."correct") filter(where correct = false) as "countIncorrect" -- Whether the answer is correct or not
     , sum(TILDATEST.TestAnswer."timeMillis") as "timeMillisTotal" -- Time in milliseconds for the time spent answering the question
     , avg(TILDATEST.TestAnswer."timeMillis") as "timeMillisAvg" -- Time in milliseconds for the time spent answering the question
     , min(TILDATEST.TestAnswer."timeMillis") as "timeMillisMin" -- Time in milliseconds for the time spent answering the question
     , max(TILDATEST.TestAnswer."timeMillis") as "timeMillisMax" -- Time in milliseconds for the time spent answering the question
  from TILDATEST.Form
     inner join TILDATEST.User on TILDATEST.Form."userRefnum" = TILDATEST.User."refnum"
     left  join TILDATEST.TestAnswer on TILDATEST.TestAnswer."formRefnum" = TILDATEST.Form."refnum"
 where (    TILDATEST.Form."type" = 'XYZ'
        and TILDATEST.Form."deleted" is null
        and TILDATEST.TestAnswer."deleted" is null
       )
   and ( TILDATEST.TestAnswer."questionId" in ('QX', 'QY', 'QZ')
       )
 group by 1, 2, 3, 4, 5, 6
) select "formRefnum" 
       , "formFillDateTZ" 
       , "formFillDate" 
       , "formUserRefnum" 
       , "formUserEmail" 
     , SUM("countCorrect") as "countCorrect"
     , SUM("countIncorrect") as "countIncorrect"
     , SUM("timeMillisTotal") as "timeMillisTotal"
     , AVG("timeMillisAvg") as "timeMillisAvg"
     , MIN("timeMillisMin") as "timeMillisMin"
     , MAX("timeMillisMax") as "timeMillisMax"
     , MIN("correct") as "correct"
     , MIN("correct") filter (where "questionId"= 'QX')  as "QX_correct"
     , MAX("timeMillisMax") filter (where "questionId"= 'QX')  as "QX_timeMillis"
     , MIN("correct") filter (where "questionId"= 'QY')  as "QY_correct"
     , MAX("timeMillisMax") filter (where "questionId"= 'QY')  as "QY_timeMillis"
     , MIN("correct") filter (where "questionId"= 'QZ')  as "QZ_correct"
     , MAX("timeMillisMax") filter (where "questionId"= 'QZ')  as "QZ_timeMillis"
from T
     group by 1, 2, 3, 4, 5
;


COMMENT ON VIEW TILDATEST.Test_XYZ_PivotView IS E'-- DDL META DATA VERSION 2019-01-09\ncreate or replace view TILDATEST.Test_XYZ_PivotView as \nwith T as ...';

COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."formRefnum" IS E'The primary key for this record';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."formFillDateTZ" IS E'Generated helper column to hold the time zone ID for ''fillDate''.';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."formFillDate" IS E'The date the form was filled';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."formUserRefnum" IS E'The primary key for this record';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."formUserEmail" IS E'The user''s email';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."countCorrect" IS E'Whether the answer is correct or not';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."countIncorrect" IS E'Whether the answer is correct or not';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."timeMillisTotal" IS E'Time in milliseconds for the time spent answering the question';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."timeMillisAvg" IS E'Time in milliseconds for the time spent answering the question';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."timeMillisMin" IS E'Time in milliseconds for the time spent answering the question';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."QX_correct" IS E'Question X (pivot of MIN(TILDATEST.TestAnswer.correct) on TILDATEST.TestAnswer.questionId=''QX'')';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."QY_correct" IS E'Question Y (pivot of MIN(TILDATEST.TestAnswer.correct) on TILDATEST.TestAnswer.questionId=''QY'')';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."QZ_correct" IS E'Question Z (pivot of MIN(TILDATEST.TestAnswer.correct) on TILDATEST.TestAnswer.questionId=''QZ'')';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."QX_timeMillis" IS E'Question X (pivot of MAX(TILDATEST.TestAnswer.timeMillis) on TILDATEST.TestAnswer.questionId=''QX'')';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."QY_timeMillis" IS E'Question Y (pivot of MAX(TILDATEST.TestAnswer.timeMillis) on TILDATEST.TestAnswer.questionId=''QY'')';
COMMENT ON COLUMN TILDATEST.Test_XYZ_PivotView."QZ_timeMillis" IS E'Question Z (pivot of MAX(TILDATEST.TestAnswer.timeMillis) on TILDATEST.TestAnswer.questionId=''QZ'')';

Structurally, the view is identical to what we had for the simpler example. The main differences are due to the fact that we are using more varied aggregates:

  • Understand that in the first layer of the view (the CTE), the aggregates only execute over a single value since we have a single row per question in the data source.
  • The framework is smart enough to now for example that if we did a COUNT in the CTE, we should to a SUM in the final view.

Aggregate Analytics View

Another interesting view we'd want to create over this data is to aggregate information over all tests:

 { "name": "Test_View"
  ,"description": "A view of aggregate stats over all forms"
  ,"columns":[
      { "sameas": "Form.type"    }
     ,{ "sameas": "Form.refnum"  , "name":"testCount"   , "aggregate":"COUNT", "distinct":true }
     ,{ "sameas": "Form.fillDate", "name":"testFirst"   , "aggregate":"MIN"   }
     ,{ "sameas": "Form.fillDate", "name":"testLast"    , "aggregate":"LAST"  }
     ,{ "sameas": "User.refnum"  , "name":"userCount"   , "aggregate":"COUNT", "distinct":true }
     ,{ "sameas": "TestAnswer.correct"   , "name":"answerCount"          , "aggregate":"COUNT"   }
     ,{ "sameas": "TestAnswer.correct"   , "name":"answerCountCorrect"   , "aggregate":"COUNT", "filter":"correct = true"   }
     ,{ "sameas": "TestAnswer.correct"   , "name":"answerCountIncorrect" , "aggregate":"COUNT", "filter":"correct = false"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisTotal", "aggregate":"SUM"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisAvg"  , "aggregate":"AVG"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisMin"  , "aggregate":"MIN"  }
     ,{ "sameas": "TestAnswer.timeMillis", "name":"timeMillisMax"  , "aggregate":"MAX"  }
    ]
  ,"subWhereX":{
      "clause":["    Form.deleted is null"
               ,"and TestAnswer.deleted is null"
               ]
     ,"description":["Look at all forms and answers that have not been deleted."
                    ]
    }
 }      

To note:

  • We use "distinct" for the counts of forms and users since the data this view encompasses has a grain down to a single question/answer row.
  • The COUNT on "correct" to count the total number of answers may seem counter intuitive, but COUNT(someCol) is the same as count(*) if that column has no null values. We could as easily have done a COUNT on FormAnswer.refnum which would have been more intuitive, but not serve as a good example for a common trick.
  ,{ "sameas": "TestAnswer.refnum"    , "name":"answerCount"          , "aggregate":"COUNT"   }

which would generate the following SQL:

create or replace view TILDATEST.Test_View as 
-- 'A view of aggregate stats over all forms'
select TILDATEST.Form."type" as "type" -- Form template type
     , count(distinct TILDATEST.Form."refnum") as "testCount" -- The primary key for this record
     , min(TILDATEST.Form."fillDate") as "testFirst" -- The date the form was filled
     , last(TILDATEST.Form."fillDate") as "testLast" -- The date the form was filled
     , count(distinct TILDATEST.User."refnum") as "userCount" -- The primary key for this record
     , count(TILDATEST.TestAnswer."correct") as "answerCount" -- Whether the answer is correct or not
     , count(TILDATEST.TestAnswer."correct") filter(where correct = true) as "answerCountCorrect" -- Whether the answer is correct or not
     , count(TILDATEST.TestAnswer."correct") filter(where correct = false) as "answerCountIncorrect" -- Whether the answer is correct or not
     , sum(TILDATEST.TestAnswer."timeMillis") as "timeMillisTotal" -- Time in milliseconds for the time spent answering the question
     , avg(TILDATEST.TestAnswer."timeMillis") as "timeMillisAvg" -- Time in milliseconds for the time spent answering the question
     , min(TILDATEST.TestAnswer."timeMillis") as "timeMillisMin" -- Time in milliseconds for the time spent answering the question
     , max(TILDATEST.TestAnswer."timeMillis") as "timeMillisMax" -- Time in milliseconds for the time spent answering the question
  from TILDATEST.Form
     inner join TILDATEST.User on TILDATEST.Form."userRefnum" = TILDATEST.User."refnum"
     left  join TILDATEST.TestAnswer on TILDATEST.TestAnswer."formRefnum" = TILDATEST.Form."refnum"
 where (    TILDATEST.Form."deleted" is null
        and TILDATEST.TestAnswer."deleted" is null
       )
     group by 1
;


COMMENT ON VIEW TILDATEST.Test_View IS E'-- DDL META DATA VERSION 2019-01-09\ncreate or replace view TILDATEST.Test_View as...';

COMMENT ON COLUMN TILDATEST.Test_View."type" IS E'Form template type';
COMMENT ON COLUMN TILDATEST.Test_View."testCount" IS E'The primary key for this record';
COMMENT ON COLUMN TILDATEST.Test_View."testFirst" IS E'The date the form was filled';
COMMENT ON COLUMN TILDATEST.Test_View."testLast" IS E'The date the form was filled';
COMMENT ON COLUMN TILDATEST.Test_View."userCount" IS E'The primary key for this record';
COMMENT ON COLUMN TILDATEST.Test_View."answerCount" IS E'Whether the answer is correct or not';
COMMENT ON COLUMN TILDATEST.Test_View."answerCountCorrect" IS E'Whether the answer is correct or not';
COMMENT ON COLUMN TILDATEST.Test_View."answerCountIncorrect" IS E'Whether the answer is correct or not';
COMMENT ON COLUMN TILDATEST.Test_View."timeMillisTotal" IS E'Time in milliseconds for the time spent answering the question';
COMMENT ON COLUMN TILDATEST.Test_View."timeMillisAvg" IS E'Time in milliseconds for the time spent answering the question';
COMMENT ON COLUMN TILDATEST.Test_View."timeMillisMin" IS E'Time in milliseconds for the time spent answering the question';
COMMENT ON COLUMN TILDATEST.Test_View."timeMillisMax" IS E'Time in milliseconds for the time spent answering the question';

That example is much simpler than the previous Pivot example but highlights again the simplicity of defining what you want and letting Tilda do the rest.

As an exercise to the reader, you can do a lot of creative work here in terms of analytics, for example:

  • What are the easiest and most difficult questions in terms of the number of correct answers overall?
  • What are the questions people spend the most or least amount of time on?
  • What are the most and least popular tests in terms of the number of times they were taken?
  • Who are the users who took the most tests?
  • Who are the users who took tests more than once?
  • Which tests are taken most often by the same users?

If you can aggregate it and/or pivot it, you can find all of that out.

🎈 NOTE: By the way, that extra Tilda JSON markup for TestAnswer, Test_XYZ_PivotView and Test_View added an extra 80 lines. That generated an extra 130 lines of SQL and comments. Of course, we are focusing on the SQL/Schema management aspects of Tilda and under the covers, many more assets such as Java, JavaScript and documentation is generated as we'll see in a follow up tutorial.

🎈 NOTE: You may think that 80 Vs 130 is not such a big deal, but at scale it counts. More complex models add even more advantages in terms of maintainability. Most importantly, you have a living breathing model, and as you iterate through your development work, adding new columns, new pivots, new aggregates becomes trivial, and Tilda handles everything else, including automated migration and documentation. Migration in particular can be nasty with views since managing dependencies can become quite complex and creating migration scripts impractical at scale.

Previous Main Next
<-- Part 5 Main Part 7 -->
Clone this wiki locally