Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 2 new GREL functions for replacing strings matching a search list #2606

Closed
thadguidry opened this issue Apr 30, 2020 · 2 comments · Fixed by #4752
Closed

Add 2 new GREL functions for replacing strings matching a search list #2606

thadguidry opened this issue Apr 30, 2020 · 2 comments · Fixed by #4752
Assignees
Labels
grel The default expression language, GREL, could be improved in many ways! Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Milestone

Comments

@thadguidry
Copy link
Member

thadguidry commented Apr 30, 2020

Is your feature request related to a problem or area of OpenRefine? Please describe.
Currently we do not have a nice single replace() function that takes a List of Strings as an argument to search for in a String and replace any found match with a replacement character.
We have to resort to use of cross() or forEach() with inArray() to perform the lookup/inspection of a List or Array and then perform our String replacement.

Mailing list thread comment:
https://groups.google.com/d/msg/openrefine/XIfKTL6sv0w/BjUedqn0AgAJ

GIVEN: 2 columns where column A has the string we want to replace when a list of substrings are found.
AND: column B has a list (or can be split into an Array) of those substrings to search for
AND: and when matching,
THEN: replace ANY matching substring in our column A with a single replacement value like "" empty string or whatever the user desires.

The following GREL performs this:

forEach(
  forEach(
    value.split(","),
    i,
    i.trim()
  ),
  i,
  if(
    inArray(cells.B.value.split("|"),i),
    "",
    i
    )
)
.join(",")

however, it is very verbose.

Describe the solution you'd like
Instead we can probably simplify this large expression to something like this that takes as an argument (a list of strings to search for) and those that match get replaced with a replacement char.

value.trim().replaceEach(cells.B.value.split("|"), "")

Describe alternatives you've considered
headaches and more coffee and continue with GREL snakes.

Additional context
Apache StringUtils seems to have already 2 nice functions to help replaceEach() and replaceEachRepeatedly() :
http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#replaceEach-java.lang.String-java.lang.String:A-java.lang.String:A-

... as well as many "index" methods.

Example Problem OpenRefine Project with provided "GREL snake" current solution:
vidal_santos.openrefine.tar.gz

@thadguidry thadguidry added Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements. grel The default expression language, GREL, could be improved in many ways! labels Apr 30, 2020
@thadguidry
Copy link
Member Author

thadguidry commented Apr 30, 2020

I also noticed that Apache StringUtils.replaceEach() does expect the lengths of the arrays to be the same size, otherwise it throws IllegalArgumentException. Its replacementList argument would need to be constructed, or optionally overridden, sure, as @tfmorris hinted at in the mailing list thread.

Note that if we followed the Apache pattern the third argument to replaceEach is an array, not a string, and for this case would require constructing a variable length array of empty strings to match the length of the search array.

Tom

For our own GREL function of replaceEach() it might be good to have an argument option to maintain that pseudo-index mapping that Apache StringUtils.replaceEach replacementList arg does now, as well as have another option that removes the need for that pseudo-index mapping and just generates that replacementList for you, based on a replace char string, or overrides slightly the way the method works to fit our purposes more.

its implied "index mapping" shown:

StringUtils.replaceEach("abcde", new String[]{"ab", "d"}, new String[]{"d", "t"})  = "dcte"

but our proposed GREL replaceEach function could also have an option that makes it work without a pseudo-index mapping replacementList, but instead a single String only, overriding it to also work that way, for example:

StringUtils.replaceEach("abcde", new String[]{"ab", "d"}, new String{""})  = "ce"

Something to discuss, for sure.

@elroykanye
Copy link
Member

Let me check this out

@elroykanye elroykanye self-assigned this Apr 7, 2022
wetneb pushed a commit that referenced this issue Jun 2, 2022
@wetneb wetneb added this to the 3.6 milestone Jun 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
grel The default expression language, GREL, could be improved in many ways! Type: Feature Request Identifies requests for new features or enhancements. These involve proposing new improvements.
Projects
None yet
3 participants