All computational results are collected in a so-called qp.Stack
object which acts as a container for large amount of aggregations in form of qp.Link
s.
A qp.Link
is defined by four attributes that make it unique and set how it is stored in a qp.Stack
. These four attributes are data_key
, filter
, x
(downbreak) and y
(crossbreak), which are positioned in a qp.Stack
similar to a tree diagram:
- Each
Stack
can have variousdata_key
s.- Each
data_key
can have variousfilter
s.- Each
filter
can have variousx
s.- Each
x
can have variousy
s.
Consequently qp.Stack[dk][filter][x][y]
is one qp.Link
that can be added using add_link(self, data_keys=None, filters=['no_filter'], x=None, y=None, ...)
qp.Link
s are are storing different qp.View
s (frequencies, statistics, etc. - all kinds of computations) that are applied on the same four data attributes.
A qp.Stack
is able to cope with a large amount of aggregations, so it is impractical to add Link
s one by one with repeated Stack.add_link()
calls. It is much easier to create a "construction plan" using a qp.Batch
and apply the settings saved in DataSet._meta['sets']['batches']
to populate a qp.Stack
instance. In the following, let's assume dataset
is containing the definitions of two qp.Batch
es, a qp.Stack
can be created running:
stack = dataset.populate(batches='all')
For the Batch
definitions from here <../batch/00_overview>
, you will get the following construction plans:
>>> batch1 = dataset.get_batch('batch1') >>> batch1.add_y_on_y('y_keys')
>>> print batch1.x_y_map OrderedDict([('q1', ['@', 'gender', 'q1', 'locality', 'ethnicity']), ('q2', ['locality', 'ethnicity']), ('q6', ['@']), ('@', ['q6']), (u'q6_1', ['@', 'gender', 'q1']), (u'q6_2', ['@', 'gender', 'q1']), (u'q6_3', ['@', 'gender', 'q1'])])
>>> print batch1.x_filter_map OrderedDict([('q1', {'(men only)+(q1)': (<function _intersection at 0x0000000019AE06D8>, [{'gender': 1}, {'age': [20, 21, 22, 23, 24, 25]}])}), ('q2', {'men only': {'gender': 1}}), ('q6', {'men only': {'gender': 1}}), ('q6_1', {'men only': {'gender': 1}}), ('q6_2', {'men only': {'gender': 1}}), ('q6_3', {'men only': {'gender': 1}})])
>>> batch2 = dataset.get_batch('batch2')
>>> print batch2.x_y_map OrderedDict([('q2b', ['@', 'gender'])])
>>> print batch2.x_filter_map OrderedDict([('q2b', 'no_filter')])
As both Batch
es refer to the same data file, the same data_key
(in this case the name of dataset
) is defining all Links
.
After populating the Stack
content can be viewed using .describe()
:
- >>> stack.describe()
data filter x y view #
0 Example Data (A) men only q1 q1 NaN 1 1 Example Data (A) men only q1 @ NaN 1 2 Example Data (A) men only q1 gender NaN 1 3 Example Data (A) men only @ q6 NaN 1 4 Example Data (A) men only q2 ethnicity NaN 1 5 Example Data (A) men only q2 locality NaN 1 6 Example Data (A) men only q6_1 q1 NaN 1 7 Example Data (A) men only q6_1 @ NaN 1 8 Example Data (A) men only q6_1 gender NaN 1 9 Example Data (A) men only q6_2 q1 NaN 1 10 Example Data (A) men only q6_2 @ NaN 1 11 Example Data (A) men only q6_2 gender NaN 1 12 Example Data (A) men only q6_3 q1 NaN 1 13 Example Data (A) men only q6_3 @ NaN 1 14 Example Data (A) men only q6_3 gender NaN 1 15 Example Data (A) men only gender q1 NaN 1 16 Example Data (A) men only gender @ NaN 1 17 Example Data (A) men only gender gender NaN 1 18 Example Data (A) men only q6 @ NaN 1 19 Example Data (A) (men only)+(q1) q1 q1 NaN 1 20 Example Data (A) (men only)+(q1) q1 @ NaN 1 21 Example Data (A) (men only)+(q1) q1 locality NaN 1 22 Example Data (A) (men only)+(q1) q1 ethnicity NaN 1 23 Example Data (A) (men only)+(q1) q1 gender NaN 1 24 Example Data (A) no_filter q2b @ NaN 1 25 Example Data (A) no_filter q2b gender NaN 1
You can find all combinations defined in the x_y_map
in the Stack
structure, but also Link
s like Stack['Example Data (A)']['men only']['gender']['gender']
are included. These special cases arising from the y_on_y
setting. Sometimes it is helpful to group a describe
-dataframe and create a cross-tabulation of the four Link
attributes to get a better overview, e.g. to see how many Links
are included for each x-filter combination. :
>>> stack.describe('x', 'filter') filter (men only)+(q1) men only no_filter x @ NaN 1.0 NaN gender NaN 3.0 NaN q1 5.0 3.0 NaN q2 NaN 2.0 NaN q2b NaN NaN 2.0 q6 NaN 1.0 NaN q6_1 NaN 3.0 NaN q6_2 NaN 3.0 NaN q6_3 NaN 3.0 NaN