Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance cross() with empty "project name" calling parameter to represent the current project and empty column name to represent the index #2504

Closed
antoine2711 opened this issue Mar 31, 2020 · 3 comments · Fixed by #2518
Assignees
Labels
cross/join About the cross function, which makes it possible to retrieve data from other projects by joining grel The default expression language, GREL, could be improved in many ways! Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Milestone

Comments

@antoine2711
Copy link
Member

antoine2711 commented Mar 31, 2020

Is your feature request related to a problem or area of OpenRefine? Please describe.
The cross() function can get data from an external project, but also from the current project (kind of a self-join). In order to minimize the text to write (and read), if the project name parameter could be set to an empty string, it would help or visibility, and for reusability, as the SAME transformation formula can be used in 2 different projects that share some data structure (or on the same project that would have changed name since).

Also, in order to be able to do a cross with the index of the row, I would let empty value for the column name. In that case, the value (first parameter of cross()) should be of type number.

Describe the solution you'd like
if cross() is called with an empty project name, then it should mean that the current project is the one to use. Ex.: cross(1, "", "ID")[0].
if cross() is called with an empty column name, then the cross should be done not with a column, but with the internal index of the row, in a numerical match.

Describe alternatives you've considered
Writing the full column name, like now. Creating a column in the related project with the index as the value of each row.

@antoine2711 antoine2711 added cross/join About the cross function, which makes it possible to retrieve data from other projects by joining grel The default expression language, GREL, could be improved in many ways! Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators labels Mar 31, 2020
@wetneb wetneb added the Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. label Mar 31, 2020
@antoine2711 antoine2711 changed the title Enabling the cross function to have an empty project name calling parameter to represent the current project Enhance cross() with empty "project name" calling parameter to represent the current project and empty column name to represent the index Apr 3, 2020
@antoine2711
Copy link
Member Author

antoine2711 commented Apr 6, 2020

Hi @wetneb, @thadguidry, @ostephens, @tfmorris, @magdmartin, dear lead developers, mentors, and/or long time contributors, I respectfully ask your attention regarding this issue.

Since the major enhancement of cross() function 2 weeks ago, that made cross() 1) not only available when creating a new column, but also when transforming an existing column, but 2) also when creating the new column not only from the index column but from any column in the table and 3) also be able to cross using any datatype, not just strings, here are new propositions that could have, I believe, a great impact on the usefulness of the cross() function.

@afkbrb, once again, coded in a short period of time a PR that seams to cover the new angles of this issue without problems. There has not been a lot of tests, so that would still have to be done. But before that, since no one commented on this issue, I think it is now time to have the opinion of the community on this proposition, as to make it go forward, change it, or just drop it.

In short, this issue will provide 2 enhancements to the cross() function.

  1. the 2nd parameter, projectName, can now be empty (""), in order to reference the current project. With this, you can copy/paste a calculation from one project to another, and, with commun field structure, the function would work without any changes. It also makes it easier to use self-cross in a project, and a saved calculation will continue to work even if you change your project name.
  2. In the 3rd parameter, columnName, can be put to an empty string ("") to access the internal index value of the referenced project. So, without building any reference column in an external project (or the current project in itself), you can still access its rows data with the cross() function.

I think those both changes would be great enhancements. But, of course, it's up to the community to choose. In any case, we seams to be in a capacity to deliver these changes pretty soon, if the go is given.

Regards, Antoine

@antoine2711
Copy link
Member Author

Oh, I forgot, this is worth mentioning (from @afkbrb in PR #2518):

One interesting thing is that we already have an embedded "rowIndex" variable for GREL, so we can use cross(rowIndex + offset) to access a row relative to the current row.

So cross(rowIndex + 1)[0] gets you next row and cross(rowIndex - 1)[0] gets you your previous row… kind a bit like the old R[-1]C[0] notation… ;-)

A.

@antoine2711 antoine2711 added this to the 3.4 milestone Apr 18, 2020
@antoine2711 antoine2711 removed the Status: Pending Review Indicates that the issue or pull request is awaiting review by project maintainers or collaborators label Apr 18, 2020
@antoine2711 antoine2711 removed this from the 3.4 milestone Apr 18, 2020
@wetneb wetneb added this to the 3.5 milestone May 30, 2020
@wetneb wetneb mentioned this issue Apr 24, 2021
16 tasks
wetneb added a commit to wetneb/OpenRefine that referenced this issue Apr 24, 2021
wetneb added a commit that referenced this issue May 8, 2021
* Take snapshot of docs for version 3.4

* Versioning for docs of the cross function, for #2504

* Document 'Store archive file' option (#1963)

* Remove unsupported preference from 3.4 docs (#2624)

* Mention that forEach works on JSON objects (#3149)

* Remove wholeText from 3.4 docs (#3180)

* Document -H, /H CLI options (#3288)

* Migrate Wikibase documentation from Wikidata (#1640)

* Miscellanous, copy-editing
@gitonthescene
Copy link
Contributor

Oh, I forgot, this is worth mentioning (from @afkbrb in PR #2518):

One interesting thing is that we already have an embedded "rowIndex" variable for GREL, so we can use cross(rowIndex + offset) to access a row relative to the current row.

So cross(rowIndex + 1)[0] gets you next row and cross(rowIndex - 1)[0] gets you your previous row… kind a bit like the old R[-1]C[0] notation… ;-)

A.

This is terrific! It’s probably worth updating some of the recipes when this is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cross/join About the cross function, which makes it possible to retrieve data from other projects by joining grel The default expression language, GREL, could be improved in many ways! Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants