Skip to content

Expand lists filter

David Megginson edited this page Jun 15, 2020 · 1 revision
Deduplicate rows filter form

The expand lists filter expands in-cell lists by duplicating data rows for each list value. When expanding lists from multiple columns, it can either correlate the values or generate their Cartesian product (see examples).

Options

Label: An optional note to yourself about why you're using the filter.

Expand lists in these columns: Provide a list of tag patterns for the columns with lists to expand. If left will, the filter will expand lists in all columns with the +list attribute.

List separator string: By default, list items inside a cell are separated by "|". You can choose a different separator such as "," or ";" with this option.

Correlate list values in multiple columns: If checked, the filter will correlate list values (first value from column A with first value from column B, etc); otherwise, it will generate a row for every possible combination (Cartesian product). This option has no effect if you're expanding lists in only a single column.

Row query: If provided, only rows matching the row query will have lists expanded.

Examples

Without correlation

Without correlation (the default), the filter will generate the Cartesian product of all the lists specified. You will generally want to do this when the lists are not related (independent values).

Input:

#adm1 #sector +list #org +list
Coast Province Health | Education Org A | Org B
Plains Province Nutrition Org A | Org C

Output:

#adm1 #sector #org
Coast Province Health Org A
Coast Province Health Org B
Coast Province Education Org A
Coast Province Education Org B
Plains Province Nutrition Org A
Plains Province Nutrition Org C

With correlation

With correlation (the --correlate command-line option, or the correlate=True Python parameter), the filter will match list values from multiple columns adding blank values if one list is longer than another. You generally want to do this when the lists are related (e.g. names and codes for the same things).

Input:

#adm1+name+list #adm1+code+list #sector
Coast Province, Plains Province 001, 002 Health
Coast Province, Mountains Province 001 Education

Output:

#adm1+name #adm1+code #sector
Coast Province 001 Health
Plains Province 002 Health
Coast Province 001 Education
Mountains Province Education

Note that the #adm1+code is blank in the last row, since there weren't enough list values in the original dataset.

Clone this wiki locally