-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend cross() function to take either a cell or a value #1204
Comments
Pushed the change f03be76, it covers:
|
@felixlohmeier Thanks for the summary. Do you have links the the original sources of the quotes? Was it one of the mailing lists or another issue? @jackyq2015 The referenced commit appears to be to master, not a branch. Does it resolve this issue? Who reviewed it? (Apparently Thad has reviewed it post-facto based on his typo comment) |
"Thad" being @thadguidry, of course (so he gets notified) |
Nice, that was quick! If you have time to write one or two unit tests that would be fantastic too! |
@jackyq2015 Yeah, thanks for the quick solution! I am afk until July, 16. Shall I close the issue now, so you can get your reward at bountysource? |
@felixlohmeier That would be nice Felix. Jacky and I chatted about a few other things on Sunday, like perhaps adding 1 or 2 more unit tests for the cross() work...other than that, its currently shippable and I'm using the trunk build now. Any bugs or issues we can probably resolve down the line, so I'd say go ahead and close the issue so that Jacky can withdraw his funds :) He did a great job on it, and now as a bonus, because of this work, we now have a way to easily generate summed up column values now as part of a cross() function, a much requested feature that Excel does in Pivot Tables but we never did https://groups.google.com/forum/#!topic/openrefine/CfNurLR5CII :) I'll post about how to do that later on the wiki... we could probably even add that function as a column operation later on perhaps if folks want it easier. |
@wetneb Will add few test case for non-literal value. |
Again, thank you very much @jackyq2015 for the great job! I wonder if this is a breaking change that needs to be addressed in upcoming release notes. The original proposal by @tfmorris in 2012 was to...
The new cross() function takes a string only, which is fine by me but breaks existing user transformation history and the VIB-BITS OpenRefine extension. Is there a chance to make the new cross() function backward compatible (to take either a cell or a value)? I added a note in the wiki to reflect this different behavior between the stable release (2.7) and the development version. |
@felixlohmeier can you please elaborate the symptom of " breaks existing user transformation history"? Definitely we can address it. Either make it back compatible or fix the history issue you refer to. |
@jackyq2015 So, if an existing transformation script assumes that the cross function expects a cell, and the new version of that function does not accept cells anymore, this transformation script that worked for OpenRefine 2.7 will not work for a later version. (That is how I understand the problem) |
I think of OpenRefine projects that use the old cross function that expects Additionally, there are many tutorials out there (stackoverflow, google forum, ...) that suggest to use the cross function with Is it possible to extend the cross function one step further to allow both forms of user input (cell and string)?
|
I have another issue... maybe there is still a problem in the caching logic. I would like to use the new cross() function together with split and join:
This works for "mary,john" but not for "anne,mary" In this example, the value "anne" is not part of the recipient column. If I add a row for "anne" then "anne,mary" works too. I think this behavior is unexpected. It seems that the new cross() function still expects that the string input value is present as a single cell. |
I will review the document related to the cross function to make it consistent and can have the code also support cell if necessary. For the transformation, I can still redo and undo between cross function. could you please give an example? For the logic of cache, the key of the cache is based on:
|
I think that would be great to make it backwards compatible.
Thank you @jackyq2015, now I understand the caching logic. The reason why I raised this issue was the idea that combining the split, cross and join in one expression could reduce total run time. Row operations (split/join multi-valued-cells, delete rows) are very expensive and often amount to more than 75% of total run time in my projects. Do you have an idea how to support custom operations (e.g. split) together with the cross function? I am willing to give another bounty if this helps. |
Will find some time this weekend for the compatibility issue. For the cache logic I think the good practice is that always do the cross on normalized data(source and destination) rather than row data to prevent the unpredictable result. Also in this way you can leverage the cache efficiently. |
As I feared, this change "breaks" the VibBits extension, which allows to make a cell.cross() visually. Too bad, it is very convenient. |
@ettorerizza Could you please test from the head? |
Thank you @jackyq2015, it works. We should update the integrated help too: line 85 in Cross.java
|
@felixlohmeier corrected the description |
Hi @jackyq2015 and ALL, may I ask again if there is a way to support multiple-value-cell-input in the cross function? I am willing to give another bounty. I had a look at the InterProjectModel again and I wonder if it could be possible to add an optional split function before computing the cache? From a user perspective, I would like to call the cross function with a separator as 4th argument, so that multiple values (separated by a custom separator, e.g. ␟) in an input cell get "crossed". Multiple output values should be joined with the same separator. Example (see screenshots above):
The reason why I raised this issue was the idea that combining split, cross and join in one (cached) function could reduce total run time. Row operations (split/join multi-valued-cells, delete rows) are very expensive and often amount to more than 75% of total run time in my projects. |
Implemented as optional 4th parameter to cross() function: https://github.com/claussni/OpenRefine/tree/cross-func-split @felixlohmeier can you test this, please? Here is the expression I used: |
result for anne should be "17 Morning Crescent"; i am going to open a new issue with example data |
Implements support for comma separated multiple-value keys for joining another project using the cross() function. See #1204 (comment)
The cross function now accepts a 4th parameter defining a regular expression separator for splitting multi-value field values when joining projects. See #1204 (comment)
On Saturday, 13 October 2012 01:12:23 UTC+2, Tom Morris wrote:
On Sunday, 14 October 2012 23:13:57 UTC+2, David Huynh wrote:
Current state:
The text was updated successfully, but these errors were encountered: