View key clean-up #67

jamesrkg · 2015-09-23T20:04:28Z

There are some problems with the current view key notation that need to be cleaned up.

method colon-delimiting

The method-part of the view key needs to be colon-delimit-able so that it it can describe the effect of different methods acting on x and y. Where only 1 method is named and both x and y are present, the same method should be assumed to be working on both.

The general rule should be that method_a:method_b|x:y means the intersection of method_a(x) by method_b(y), a more concrete example being frequency:mean|x:y means the intersection of frequency(x) by mean(y).

By extension, though, this renders what is currently frequency|x:y incorrect as the key for a column base row because this should mean the intersection of frequency(x) by frequency(y), or in plain speak where the row and column bases intersect (e.g. the number of cases in x and y).

As a consequence, the correct key for a column base row should be simply frequency|x and for a row base column frequency|y. Incidentally this is perfectly in keeping with the fundamental meaning of frequency| as basic counts, since the mention of either x or y is an implied collapse of all their values, respectively.

More examples (assuming x and y each have 3 possible values):

x|frequency||||counts                      
x|frequency||y||c%                          
x|frequency||x||r%                  
x|frequency|x|||cbase                      # same as x|frequency|x[(1,2,3)]|||net1-3
x|frequency|y|||rbase                      # same as x|frequency|y[(1,2,3)]|||net1-3
x|frequency|x[(1,2)]|||xnet1-2
x|frequency|y[(1,2)]|||ynet1-2
x|frequency|x:y|||cbase*rbase              # same as x|frequency|x[(1,2,3)]:y[(1,2,3)]|||cbase*rbase

... and so on.

Another important change that should be made is to use the conventional curly brace for set notation, so logic descriptors should be written as x[{1,2}]: instead of x[(1,2)]:. Currently the curly brace is used for answer count, but the two uses should be swapped. In this way one answer from codes 1 or 2 would be written as x[{1,2}(1)]:.

Due to the required delimitable-nature of the method-part of the view key, it may be prudent to put in place some truncation rules that method names must adhere to. For example instead of frequency perhaps simply f will suffice, especially given that it's so common. for other methods a 6-character limit per sub/method-part (to allow for needed abbreviations like stddev, stderr and so on) would help condense the overall key length and improve readability.

To avoid ambiguity, what is currently the relation part of the view key must always include a colon.

|:| means no conditions placed on either x or y
|x:| collapsed x, no conditions placed on y
|:y| collapsed y, no conditions placed on x
|x:y| collapsed x and y

The new convention means you should never see something like |y:x| because the left-hand side will always describe x and the right-hand side will always describe y.

In accordance with all of these proposed changes, the above view keys would become:

x|f|:|||counts                      
x|f|:|y||c%                          
x|f|:|x||r%                  
x|f|x:|||cbase                      # same as x|f|x[{1,2,3}]|||net1-3
x|f|:y|||rbase                      # same as x|f|y[{1,2,3}]|||net1-3
x|f|x:y|||cbase*rbase               # same as x|f|x[{1,2,3}]:y[{1,2,3}]|||cbase*rbase

However, all of these examples use the same method on x and y, which will often not be the case. Where a different method is used on each, both methods must be named and must be colon-delimited.

In conjunction with the need for descriptive stats to be named using sub-methods, this leads to:

x|d.mean:f|x:|||cmean                # column mean
x|f:d.mean|:y|||rmean                # row mean

Including the change for set notation, block nets also need to appear in discrete x/y-blocks delimited with a comma, meaning they will change from |x[(1,2),(3,4),(5,6):y to |x[{1,2}],x[{3,4}],x[{5,6}]: This both corrects for ambiguity compared to complex logic and to provide for a comma-delimited relationship between the multiple methods and x/y.

Given the likely eventuality of other block methods the conventions should be similarly lazy, where f|x[{1,2}],x[{3,4}],x[{5,6}]: is effectively shorthand for f,f,f:f|x[{1,2}],x[{3,4}],x[{5,6}]:.

This is more relevant when imagining the needs of a block of descriptive stats, in which case d.mean,d.stddev,d.stderr:f|x: is more meaningful. In any case, parts that are not mentioned explicitly imply uniform application, so as to prevent the need for something like d.mean,d.stddev,d.stderr:f|x,x,x:.

x|d.mean:f|x:|||cmean                # column mean
x|f:d.mean|:y|||rmean                # row mean

effective base

Effective base view keys should indicate a sub-method of frequency and must name a weight-part. What is currently x|frequency|x:y|||ebase should become x|f.eff:f|x:||weight|ecbase. Similarly, an effective row base would be x|f:f.eff|:y||weight|erbase.

The text was updated successfully, but these errors were encountered:

jamesrkg · 2015-09-23T20:28:17Z

The more I think about this the more I wonder if the meaning of relationship as we've understood it until now is defunct, because the colon becomes the link between x/y and their respective methods, especially given examples like x|f|:y|||rbase, in which x doesn't need to be mentioned at all but the relationship between x and y is still described.

One option would be instead of:

We could move to:

Since the third part of the view key actually describes the conditions placed on x and/or y as they are fed into their respective methods.

jamesrkg · 2015-09-25T21:02:45Z

Examples of frequency-only keys:

############################ Counts
x|f|:|||counts              

   1  2  3  4  5
1  1  7  3  2  7
2  4  3  5  4  6
3  6  2  4  6  3

############################ Column base
x|f|x:|||cbase  

       1   2   3   4   5
cbase  11  12  12  12  16

############################ Column base percentages
x|f|:|y||counts             

    a   b   c   d   e
x   9   58  25  17  44
y   36  25  42  33  38
z   55  17  33  50  19

############################ Row base
x|f|:y|||rbase  

   rbase
1  20
2  22
3  21

############################ Row base percentages
x|f|:|x||counts             

    a   b   c   d   e
x   5   35  15  10  35
y   18  14  23  18  27
z   29  10  19  29  14

############################ Intersection base
x|f|x:y|||base  

       rbase
cbase  63

############################ Intersection base percentage
x|f|:|xy||counts                

    a   b   c   d   e
x   2   11  5   3   11
y   6   5   8   6   10
z   10  3   6   10  5

############################ Unfiltered column base percentages
x|f|:|y@||counts                

    a   b   c   d   e
x   9   58  25  17  44
y   36  25  42  33  38
z   55  17  33  50  19

############################ Unfiltered y base percentages
x|f|:|@y||counts                

    a   b   c   d   e
x   9   58  25  17  44
y   36  25  42  33  38
z   55  17  33  50  19

############################ Unfiltered row base percentages
x|f|:|x@||counts                

    a   b   c   d   e
x   5   35  15  10  35
y   18  14  23  18  27
z   29  10  19  29  14

############################ Unfiltered x base percentages
x|f|:|@x||counts                

    a   b   c   d   e
x   5   35  15  10  35
y   18  14  23  18  27
z   29  10  19  29  14

############################ Unfiltered total N percentages
x|f|:|N||counts             

    a   b   c   d   e
x   1   7   3   2   7
y   4   3   5   4   6
z   6   2   4   6   3

############################ Column logic
x|f|x[{1,2}]:|||clogic

         1   2   3   4   5
clogic   5   10  8   6   13

############################ Column count logic
x|f|x[{1,2}(1)]:|||cclogic

          1   2   3   4   5
cclogic   3   4   2   3   4

############################ Column arithemtic logic
x|f.math:f|x[{1,2}-{3}]:|||calogic

          1    2    3    4    5
calogic   -1   8    4    0    10

############################ Row logic
x|f|:y[{3,4}]|||rlogic  

   rlogic
1  5
2  9
3  10

############################ Row count logic
x|f|:y[{3,4}(1)]|||rlogic   

   rclogic
1  3
2  5
3  4

############################ Row arithemtic logic
x|f:f.math|:y[{3,4}-{5}]|||ralogic  

   ralogic
1  -2
2  3
3  7

############################ Intersection logic
x|f|x[{1,2}]:y[{3,4}]|||base    

         rlogic
clogic   63

############################ Block logic rows
x|f|x[{1,2}],x[{2,3}]:|||clogic 

         1   2   3   4   5
clogic1  5   10  8   6   13
clogic2  10  5   9   10  9

############################ Block logic columns
x|f|:y[{3,4}],y[{4,5}]|||rlogic 

   rlogic1 rlogic2
1  5       9
2  9       10
3  10      9

############################ Intersection block logic
x|f|x[{1,2}],x[{2,3}]:y[{3,4}],y[{4,5}]|||base  

         rlogic1 rlogic2
clogic1  14      19
clogic2  19      19

############################ Effective column base
x|f.eff:f|x:||weight|ecbase 

       1   2   3   4   5
ecbase 11  12  12  12  16

############################ Effective row base
x|f:f.eff|:y||weight|ernet  

   ernet
1  5
2  9
3  10

############################ Effective intersection base
x|f.eff|x:y||weight|base    

       erbase
ecbase 63

These examples include something we're not planning to support for a while yet:

############################ Unfiltered column base percentages
x|f|:|y@||counts                

############################ Unfiltered y base percentages
x|f|:|@y||counts                

############################ Unfiltered row base percentages
x|f|:|x@||counts                

############################ Unfiltered x base percentages
x|f|:|@x||counts        

############################ Unfiltered total N percentages
x|f|:|N||counts

In these cases:

y@ means column percentages based on the frequency of y-values independent of x, and vice-versa for x@
@y means percentages based on the base of y independent of x, and vice-versa for @x
N means percentages based on the total sample size (N) of the source data
These conventions could potentially support percentages based on any arbitrary variable base by using something like @q5, but that's certainly not required in the forseeable future!

jamesrkg · 2015-09-26T19:59:49Z

Nested notation

x and y are themselves lazy-notations for x0 and y0, which become explicitly required when the axes are nested.

Following is an example of notation describing column logic on the 2nd x-level filtered by those who answered each of the values in the column on the 1st x-level.

Nested notation also requires the presence of >-delimiters to identify each nested level. The use of > will be identical to how it appears in the x or y keys of the link.

As with the absence of x/y in a non-nested view keys, an "unattended" > indicates that no special conditions were placed on the preceding level, as in the following example:

x|f|>x1[{1,2}]:|||cnlogic

x0    x1         1   2   3   4   5
1     clogic     2   4   9   8   3
2     clogic     3   2   1   2   4
3     clogic     6   7   3   4   7
4     clogic     3   3   5   1   5

As with the presence of x/y in non-nested view keys, an "attended" > indicates that a full-collapse or partial-conditioning has been applied, as in the following example:

x|f|x0>x1[{1,2}]:|||cnlogic

x0        x1         1   2   3   4   5
cbase     clogic     2   4   9   8   3

Other than the explicit x0/x1 notation and the addition of >, all the same rules apply, so the row base for this relationship would be:

x|f|>x1[{1,2}]:y|||cnlogic

x0    x1         rbase
1     clogic     26
2     clogic     12
3     clogic     27
4     clogic     17

Nested notation also applies to relative notation, let's assume the y-axis is also nested and we want percentages based on the 1st y-level rather than the 2nd. In this case the percentages for the first two columns are based on y0=1, and the third and fourth columns on y0=2.

x|f|>x1[{1,2}]:|y0||cnlogic

      y0         1     1     2     2
      y1         1     2     1     2
x0    x1         
1     clogic     34    42    74    45
2     clogic     22    75    63    23
3     clogic     58    87    22    36
4     clogic     63    63    15    17

In any case echewing explicit level notation will always be interpreted as the last-level. So if the y-axis had 2 nested levels, relative to y should be interpreted as relative to y1. The same is true for the conditional part of the key notation, where in the above example :y should be interpreted as :y1 (if y was nested).

jamesrkg · 2015-12-16T21:32:57Z

This will be resolved by #290.

jamesrkg added bug enhancement labels Sep 23, 2015

jamesrkg self-assigned this Sep 23, 2015

jamesrkg added this to the RG-11 milestone Sep 23, 2015

This was referenced Sep 29, 2015

f.sum view method #81

Closed

Applying custom base conditions to frequency views #83

Closed

jamesrkg modified the milestones: RG-12, RG-11, RG-14 Oct 9, 2015

jamesrkg assigned alextanski and unassigned jamesrkg Oct 20, 2015

jamesrkg modified the milestones: RG-17, RG-14, RG-20 Nov 6, 2015

jamesrkg modified the milestones: RG-22, RG-20 Dec 11, 2015

jamesrkg mentioned this issue Dec 16, 2015

I203 i206 grid summary intro #290

Merged

AlasdairE mentioned this issue Dec 22, 2015

ExcelPainter needs to work with updated view notations #297

Closed

jamesrkg modified the milestones: RG-25, RG-22 Dec 28, 2015

jamesrkg closed this as completed Jan 29, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

View key clean-up #67

View key clean-up #67

jamesrkg commented Sep 23, 2015

jamesrkg commented Sep 23, 2015

jamesrkg commented Sep 25, 2015

jamesrkg commented Sep 26, 2015

jamesrkg commented Dec 16, 2015

View key clean-up #67

View key clean-up #67

Comments

jamesrkg commented Sep 23, 2015

jamesrkg commented Sep 23, 2015

jamesrkg commented Sep 25, 2015

jamesrkg commented Sep 26, 2015

jamesrkg commented Dec 16, 2015