You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering a dataset (after open_datatset() ). Specifically, the code below :
These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?
A first step to enable this, would be to have a "compute kernel" for substrings (from looking at the overview at https://github.com/apache/arrow/blob/master/docs/source/cpp/compute.rst, I don't think we currently have functionality to create such substrings).
A related compute kernel is actually match_substring with which you could check that (using your example) "a" is present in the string. But, that doesn't easily guarantee anything about the position of the substring in the string (although with a regular expression pattern, you could achieve this in some ways).
Then, a second step would be to be able to "express" such a compute kernel in an Expression that can be used to filter the dataset (although this might not be needed for the dplyr syntax? It could maybe also be done with an actual compute filter kernel? cc @nealrichardson?).
Neal Richardson / @nealrichardson:
Marking this as resolved; many string operations will be included in the 4.0 release and any remaining ones have their own JIRAs.
Hi,
Some expressions, such as substr(), grepl(), str_detect() or others, are not supported while filtering a dataset (after open_datatset() ). Specifically, the code below :
gives this error :
These expressions may be very helpful, not to say necessary, to filter and collect a very large dataset. Is there anything it can be done to implement this new feature ?
Thank you.
Reporter: Pal
Related issues:
Note: This issue was originally created as ARROW-10305. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: