-
Notifications
You must be signed in to change notification settings - Fork 4
Expand lists filter
The expand lists filter expands in-cell lists by duplicating data rows for each list value. When expanding lists from multiple columns, it can either correlate the values or generate their Cartesian product (see examples).
Label: An optional note to yourself about why you're using the filter.
Expand lists in these columns: Provide a list of tag patterns for the columns with lists to expand. If left will, the filter will expand lists in all columns with the +list attribute.
List separator string: By default, list items inside a cell are separated by "|". You can choose a different separator such as "," or ";" with this option.
Correlate list values in multiple columns: If checked, the filter will correlate list values (first value from column A with first value from column B, etc); otherwise, it will generate a row for every possible combination (Cartesian product). This option has no effect if you're expanding lists in only a single column.
Row query: If provided, only rows matching the row query will have lists expanded.
Without correlation (the default), the filter will generate the Cartesian product of all the lists specified. You will generally want to do this when the lists are not related (independent values).
Input:
#adm1 | #sector +list | #org +list |
---|---|---|
Coast Province | Health | Education | Org A | Org B |
Plains Province | Nutrition | Org A | Org C |
Output:
#adm1 | #sector | #org |
---|---|---|
Coast Province | Health | Org A |
Coast Province | Health | Org B |
Coast Province | Education | Org A |
Coast Province | Education | Org B |
Plains Province | Nutrition | Org A |
Plains Province | Nutrition | Org C |
With correlation (the --correlate
command-line option, or the correlate=True
Python parameter), the filter will match list values from multiple columns adding blank values if one list is longer than another. You generally want to do this when the lists are related (e.g. names and codes for the same things).
Input:
#adm1+name+list | #adm1+code+list | #sector |
---|---|---|
Coast Province, Plains Province | 001, 002 | Health |
Coast Province, Mountains Province | 001 | Education |
Output:
#adm1+name | #adm1+code | #sector |
---|---|---|
Coast Province | 001 | Health |
Plains Province | 002 | Health |
Coast Province | 001 | Education |
Mountains Province | Education |
Note that the #adm1+code is blank in the last row, since there weren't enough list values in the original dataset.
Learn more about the HXL standard at http://hxlstandard.org