Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_regexp_choice failed #10641

Open
jlowe opened this issue Mar 27, 2024 · 2 comments
Open

[BUG] test_regexp_choice failed #10641

jlowe opened this issue Mar 27, 2024 · 2 comments
Assignees
Labels
bug Something isn't working

Comments

@jlowe
Copy link
Member

jlowe commented Mar 27, 2024

Nightly integration test run had a failure in test_regexp_choice.

[2024-03-27T19:43:44.838Z] FAILED ../../src/main/python/regexp_test.py::test_regexp_choice[DATAGEN_SEED=1711561563, TZ=UTC, INJECT_OOM] - AssertionError: GPU and CPU string values are different at [609, 'regexp_extract(a, (abc1a$|ab2ab$), 1)']
Details
[2024-03-27T19:43:44.835Z] =================================== FAILURES ===================================

[2024-03-27T19:43:44.835Z] ______________________________ test_regexp_choice ______________________________

[2024-03-27T19:43:44.835Z] [gw0] linux -- Python 3.9.19 /opt/conda/bin/python

[2024-03-27T19:43:44.835Z] 

[2024-03-27T19:43:44.835Z]     def test_regexp_choice():

[2024-03-27T19:43:44.835Z]         gen = mk_str_gen('[abcd]{1,3}[0-9]{1,3}[abcd]{1,3}[ \n\t\r]{0,2}')

[2024-03-27T19:43:44.835Z] >       assert_gpu_and_cpu_are_equal_collect(

[2024-03-27T19:43:44.835Z]                 lambda spark: unary_op_df(spark, gen).selectExpr(

[2024-03-27T19:43:44.835Z]                     'rlike(a, "[abcd]|[123]")',

[2024-03-27T19:43:44.835Z]                     'rlike(a, "[^\n\r]|abcd")',

[2024-03-27T19:43:44.835Z]                     'rlike(a, "abd1a$|^ab2a")',

[2024-03-27T19:43:44.835Z]                     'rlike(a, "[a-c]*|[\n]")',

[2024-03-27T19:43:44.835Z]                     'rlike(a, "[a-c]+|[\n]")',

[2024-03-27T19:43:44.835Z]                     'regexp_extract(a, "(abc1a$|^ab2ab|a3abc)", 1)',

[2024-03-27T19:43:44.835Z]                     'regexp_extract(a, "(abc1a$|ab2ab$)", 1)',

[2024-03-27T19:43:44.835Z]                     'regexp_extract(a, "(ab+|^ab)", 1)',

[2024-03-27T19:43:44.835Z]                     'regexp_extract(a, "(ab*|^ab)", 1)',

[2024-03-27T19:43:44.835Z]                     'regexp_replace(a, "[abcd]$|^abc", "@")',

[2024-03-27T19:43:44.835Z]                     'regexp_replace(a, "[ab]$|[cd]$", "@")',

[2024-03-27T19:43:44.835Z]                     'regexp_replace(a, "[ab]+|^cd1", "@")'

[2024-03-27T19:43:44.835Z]                 ),

[2024-03-27T19:43:44.835Z]             conf=_regexp_conf)

[2024-03-27T19:43:44.835Z] 

[2024-03-27T19:43:44.835Z] ../../src/main/python/regexp_test.py:568: 

[2024-03-27T19:43:44.835Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:595: in assert_gpu_and_cpu_are_equal_collect

[2024-03-27T19:43:44.835Z]     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)

[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:517: in _assert_gpu_and_cpu_are_equal

[2024-03-27T19:43:44.835Z]     assert_equal(from_cpu, from_gpu)

[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:107: in assert_equal

[2024-03-27T19:43:44.835Z]     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])

[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:43: in _assert_equal

[2024-03-27T19:43:44.835Z]     _assert_equal(cpu[index], gpu[index], float_check, path + [index])

[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:36: in _assert_equal

[2024-03-27T19:43:44.835Z]     _assert_equal(cpu[field], gpu[field], float_check, path + [field])

[2024-03-27T19:43:44.835Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

[2024-03-27T19:43:44.835Z] 

[2024-03-27T19:43:44.835Z] cpu = 'ab2ab', gpu = 'ab2ab\r'

[2024-03-27T19:43:44.835Z] float_check = <function get_float_check.<locals>.<lambda> at 0x7f86b47c1040>

[2024-03-27T19:43:44.835Z] path = [609, 'regexp_extract(a, (abc1a$|ab2ab$), 1)']

[2024-03-27T19:43:44.835Z] 

[2024-03-27T19:43:44.835Z]     def _assert_equal(cpu, gpu, float_check, path):

[2024-03-27T19:43:44.835Z]         t = type(cpu)

[2024-03-27T19:43:44.835Z]         if (t is Row):

[2024-03-27T19:43:44.835Z]             assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))

[2024-03-27T19:43:44.835Z]             if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):

[2024-03-27T19:43:44.835Z]                 assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)

[2024-03-27T19:43:44.836Z]                 for field in cpu.__fields__:

[2024-03-27T19:43:44.836Z]                     _assert_equal(cpu[field], gpu[field], float_check, path + [field])

[2024-03-27T19:43:44.836Z]             else:

[2024-03-27T19:43:44.836Z]                 for index in range(len(cpu)):

[2024-03-27T19:43:44.836Z]                     _assert_equal(cpu[index], gpu[index], float_check, path + [index])

[2024-03-27T19:43:44.836Z]         elif (t is list):

[2024-03-27T19:43:44.836Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))

[2024-03-27T19:43:44.836Z]             for index in range(len(cpu)):

[2024-03-27T19:43:44.836Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])

[2024-03-27T19:43:44.836Z]         elif (t is tuple):

[2024-03-27T19:43:44.836Z]             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))

[2024-03-27T19:43:44.836Z]             for index in range(len(cpu)):

[2024-03-27T19:43:44.836Z]                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])

[2024-03-27T19:43:44.836Z]         elif (t is pytypes.GeneratorType):

[2024-03-27T19:43:44.836Z]             index = 0

[2024-03-27T19:43:44.836Z]             # generator has no zip :( so we have to do this the hard way

[2024-03-27T19:43:44.836Z]             done = False

[2024-03-27T19:43:44.836Z]             while not done:

[2024-03-27T19:43:44.836Z]                 sub_cpu = None

[2024-03-27T19:43:44.836Z]                 sub_gpu = None

[2024-03-27T19:43:44.836Z]                 try:

[2024-03-27T19:43:44.836Z]                     sub_cpu = next(cpu)

[2024-03-27T19:43:44.836Z]                 except StopIteration:

[2024-03-27T19:43:44.836Z]                     done = True

[2024-03-27T19:43:44.836Z]     

[2024-03-27T19:43:44.836Z]                 try:

[2024-03-27T19:43:44.836Z]                     sub_gpu = next(gpu)

[2024-03-27T19:43:44.836Z]                 except StopIteration:

[2024-03-27T19:43:44.836Z]                     done = True

[2024-03-27T19:43:44.836Z]     

[2024-03-27T19:43:44.836Z]                 if done:

[2024-03-27T19:43:44.836Z]                     assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)

[2024-03-27T19:43:44.836Z]                 else:

[2024-03-27T19:43:44.836Z]                     _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])

[2024-03-27T19:43:44.836Z]     

[2024-03-27T19:43:44.836Z]                 index = index + 1

[2024-03-27T19:43:44.836Z]         elif (t is dict):

[2024-03-27T19:43:44.836Z]             # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark

[2024-03-27T19:43:44.836Z]             # so sort the items to do our best with ignoring the order of dicts

[2024-03-27T19:43:44.836Z]             cpu_items = list(cpu.items()).sort(key=_RowCmp)

[2024-03-27T19:43:44.836Z]             gpu_items = list(gpu.items()).sort(key=_RowCmp)

[2024-03-27T19:43:44.836Z]             _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])

[2024-03-27T19:43:44.836Z]         elif (t is int):

[2024-03-27T19:43:44.836Z]             assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)

[2024-03-27T19:43:44.836Z]         elif (t is float):

[2024-03-27T19:43:44.836Z]             if (math.isnan(cpu)):

[2024-03-27T19:43:44.836Z]                 assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)

[2024-03-27T19:43:44.836Z]             else:

[2024-03-27T19:43:44.836Z]                 assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)

[2024-03-27T19:43:44.836Z]         elif isinstance(cpu, str):

[2024-03-27T19:43:44.836Z] >           assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)

[2024-03-27T19:43:44.836Z] E           AssertionError: GPU and CPU string values are different at [609, 'regexp_extract(a, (abc1a$|ab2ab$), 1)']

[2024-03-27T19:43:44.836Z] 

[2024-03-27T19:43:44.836Z] ../../src/main/python/asserts.py:85: AssertionError

@jlowe jlowe added bug Something isn't working ? - Needs Triage Need team to review and classify labels Mar 27, 2024
@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Apr 2, 2024
@sameerz
Copy link
Collaborator

sameerz commented Apr 24, 2024

Not sure if we should set a fixed seed for this issue, seems like it needs to be investigated and resolved.

@revans2
Copy link
Collaborator

revans2 commented Apr 25, 2024

We should resolve all of the failures. Setting a fixed seed is a way for the build to stop failing while we do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants