New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spread array of values generated by an expression to form a record #4823
Comments
@wetneb I am in agreement with the proposal. Could you update the beginning of the proposal (or milestone) to signify that this would be a pre-4.0 feature enhancement? I understand from your notes it will not be targeting 4.0, where in 4.0 we might do further data model changes and affected code to support storing arrays directly in cells. Note that we do also have Transpose which could be utilized after the |
I can see all three options being useful here:
I wonder if a control like we have for errors would be useful here - in the transformation dialogue the user is offered options as to what to do with errors (keep original, store error, store blank) - could we offer a set of options for arrays in a similar way so that the user chooses how to deal with them? My biggest frustration with arrays (and the reason I originally looked at storing them in cells) is that currently it is too easy (IMO) for the user to overlook the need to convert the array to a string (or other storable representation) and end up storing nothing in the cell instead - any solution that avoids this has my support! One argument I can see in favour of preferring the |
Indeed, I did not think about the recursive case, that is going to be pretty fiddly to implement. |
For me getting an option for I agree with your comment above @wetneb
I have to admit that from a user perspective:
If we take the approach of adding user options here I'm wondering if it would make sense to move the settings that control storage options to a separate tab in the transformation dialogue (so move the 'on error' options and add proposed new 'on array' options in this new tab). Potentially also the "Retransform X times" option could live here as well? If we are happy for scope to creep we could also consider:
|
When we use an expression which returns an array in the
Transform
orAdd column based on this column
dialogs, the resulting cells are currently left blank because we cannot store an array in a single cell.This is of course not ideal and quite confusing because there is no explanation or warning about this (#1475, #1088).
Proposed solution
This should behave like the
Add column from reconciled values
operation: spread multiple values in consecutive cells to form a record.For instance, if I start with a table such as:
If I add a column
letters
generated by the expressionvalue.parseJson().letters
, I want to get the following result:This solution could be implemented in 3.x, without being an important breaking change (since expressions which return arrays are currently useless, we can assume that it is unlikely anyone relies on the current behaviour in a workflow).
Alternatives considered
Just keep the logic as it is but improve error reporting
We could simply make it clearer at the preview stage that those values will not be stored. @thadguidry proposed the following UI:
@ostephens has noted that it would be better not to render those as errors (because the expression does not return an error itself) but rather to signal this in a different way (for instance by improving the way we distinguish between data types of returned values in this preview area).
Spread the values as columns
In issue #36, it is proposed that values are spread on columns instead. With the example above, you would get:
I find that less helpful because storing an array like this makes it hard to reuse the results afterwards.
The lists can have variable lengths, so the columns created by the operation depend on the data.
And this behaviour can already be obtained by successively adding columns using the expression
value.parseJson().letters[0]
,value.parseJson().letters[1]
and so on (or rather in the reverse order to get the exact result above).Store arrays as strings in cells
@thadguidry also proposed to convert these values to strings before storing them, for instance by serializing them to JSON with
jsonize()
.That is indeed a sensible workaround that users can resort to at the moment, but it does not feel so convenient to reuse afterwards (although it is not as bad as when values are spread as columns in my opinion).
Make it possible for cells to store arrays natively
In PR #1586 @ostephens proposed to make it possible to store arrays in cells natively.
That is definitely an interesting approach, which could potentially even remove the need for the records mode entirely: just store lists of values in the cells instead of spreading them on consecutive rows. Hence it seems to be a much bigger project to me. What I find daunting about it is that:
Object
to be stored as cell value, which means that we cannot rely on the compiler to make sure we are handling this new possibility in all places of the code where we should add this handling. This is also a problem for extension developers, who will not become aware of this unless they read our release notes.That being said I would be happy to consider this change in 4.0, where we already change to a very different data model - such a change could be introduced there. That does not make the design effort less daunting of course (it is still basically the same problem as redesigning the records mode).
Give the user a choice between various of those options
The dialogs where those expressions are previewed could be adapted to give the user a choice in how they want their arrays to be stored. Those expression preview dialogs are already quite full, but if we think there can be multiple sensible things to do there, it is natural to give the user a choice. But it is not really clear to me that it is worth it, because:
Split columns
operation instead (by adapting it to accept an expression returning an array), as suggested by @dfhuynh and others.jsonize()
at the end of the expression. Arguably this is not as discoverable as an explicit option in the UI to do it for you, but I would expect people who want this to be aware ofparseJson()
at least (otherwise how do they intend to reuse the results?) so it is also quite likely that they know ofjsonize()
Additional context
Related issues I am aware of: #36, #1475, #1088.
This is important in the context of the Structured Data on Commons integration project because the CommonsExtension offers GREL functions which return arrays.
The text was updated successfully, but these errors were encountered: