Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Refactor line terminator handling code #7090

Closed
NVnavkumar opened this issue Nov 17, 2022 · 2 comments · Fixed by #7211
Closed

[BUG] Refactor line terminator handling code #7090

NVnavkumar opened this issue Nov 17, 2022 · 2 comments · Fixed by #7211
Assignees
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf

Comments

@NVnavkumar
Copy link
Collaborator

Describe the bug
Once rapidsai/cudf#11979 is resolved using the fix described here, the regular expression transpiler code will need to updated for new handling of $, \z and \Z.

@NVnavkumar NVnavkumar added bug Something isn't working ? - Needs Triage Need team to review and classify labels Nov 17, 2022
@NVnavkumar NVnavkumar added the cudf_dependency An issue or PR with this label depends on a new feature in cudf label Nov 17, 2022
@sameerz sameerz removed the ? - Needs Triage Need team to review and classify label Nov 22, 2022
@NVnavkumar
Copy link
Collaborator Author

rapidsai/cudf#12181 is now merged

@andygrove
Copy link
Contributor

andygrove commented Nov 29, 2022

I installed spark-rapids-jni with the cuDF changes and I see these failures in the scala tests in 23.02

- string anchors - find *** FAILED ***
  javaPattern[7]=test\z, cudfPattern=test$, input='test\n', cpu=false, gpu=true (RegularExpressionTranspilerSuite.scala:840)
- line anchors - replace *** FAILED ***
  javaPattern[4]=test\z, cudfPattern=test$, input='test\n', cpu=test\n, gpu=_RE\PLACE_\n (RegularExpressionTranspilerSuite.scala:867)
- string anchors - replace *** FAILED ***
  javaPattern[1]=test\z, cudfPattern=test$, input='test\n', cpu=test\n, gpu=_RE\PLACE_\n (RegularExpressionTranspilerSuite.scala:867)
- line anchor $ - find *** FAILED ***
  javaPattern[0]=a$, cudfPattern=a(?:[\n\r\u0085\u2028\u2029]|\r\n)?$, input='a\u0085\n', cpu=false, gpu=true (RegularExpressionTranspilerSuite.scala:840)
- string anchor \Z - find *** FAILED ***
  javaPattern[0]=a\Z, cudfPattern=a(?:[\n\r\u0085\u2028\u2029]|\r\n)?$, input='a\u0085\n', cpu=false, gpu=true (RegularExpressionTranspilerSuite.scala:840)

Also, this integration test fails:

regexp_test.py::test_re_replace_anchors

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cudf_dependency An issue or PR with this label depends on a new feature in cudf
Projects
None yet
3 participants