You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Nightly integration test run had a failure in test_regexp_choice.
[2024-03-27T19:43:44.838Z] FAILED ../../src/main/python/regexp_test.py::test_regexp_choice[DATAGEN_SEED=1711561563, TZ=UTC, INJECT_OOM] - AssertionError: GPU and CPU string values are different at [609, 'regexp_extract(a, (abc1a$|ab2ab$), 1)']
Details
[2024-03-27T19:43:44.835Z] =================================== FAILURES ===================================
[2024-03-27T19:43:44.835Z] ______________________________ test_regexp_choice ______________________________
[2024-03-27T19:43:44.835Z] [gw0] linux -- Python 3.9.19 /opt/conda/bin/python
[2024-03-27T19:43:44.835Z]
[2024-03-27T19:43:44.835Z] def test_regexp_choice():
[2024-03-27T19:43:44.835Z] gen = mk_str_gen('[abcd]{1,3}[0-9]{1,3}[abcd]{1,3}[ \n\t\r]{0,2}')
[2024-03-27T19:43:44.835Z] > assert_gpu_and_cpu_are_equal_collect(
[2024-03-27T19:43:44.835Z] lambda spark: unary_op_df(spark, gen).selectExpr(
[2024-03-27T19:43:44.835Z] 'rlike(a, "[abcd]|[123]")',
[2024-03-27T19:43:44.835Z] 'rlike(a, "[^\n\r]|abcd")',
[2024-03-27T19:43:44.835Z] 'rlike(a, "abd1a$|^ab2a")',
[2024-03-27T19:43:44.835Z] 'rlike(a, "[a-c]*|[\n]")',
[2024-03-27T19:43:44.835Z] 'rlike(a, "[a-c]+|[\n]")',
[2024-03-27T19:43:44.835Z] 'regexp_extract(a, "(abc1a$|^ab2ab|a3abc)", 1)',
[2024-03-27T19:43:44.835Z] 'regexp_extract(a, "(abc1a$|ab2ab$)", 1)',
[2024-03-27T19:43:44.835Z] 'regexp_extract(a, "(ab+|^ab)", 1)',
[2024-03-27T19:43:44.835Z] 'regexp_extract(a, "(ab*|^ab)", 1)',
[2024-03-27T19:43:44.835Z] 'regexp_replace(a, "[abcd]$|^abc", "@")',
[2024-03-27T19:43:44.835Z] 'regexp_replace(a, "[ab]$|[cd]$", "@")',
[2024-03-27T19:43:44.835Z] 'regexp_replace(a, "[ab]+|^cd1", "@")'
[2024-03-27T19:43:44.835Z] ),
[2024-03-27T19:43:44.835Z] conf=_regexp_conf)
[2024-03-27T19:43:44.835Z]
[2024-03-27T19:43:44.835Z] ../../src/main/python/regexp_test.py:568:
[2024-03-27T19:43:44.835Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:595: in assert_gpu_and_cpu_are_equal_collect
[2024-03-27T19:43:44.835Z] _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:517: in _assert_gpu_and_cpu_are_equal
[2024-03-27T19:43:44.835Z] assert_equal(from_cpu, from_gpu)
[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:107: in assert_equal
[2024-03-27T19:43:44.835Z] _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:43: in _assert_equal
[2024-03-27T19:43:44.835Z] _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-03-27T19:43:44.835Z] ../../src/main/python/asserts.py:36: in _assert_equal
[2024-03-27T19:43:44.835Z] _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2024-03-27T19:43:44.835Z] _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
[2024-03-27T19:43:44.835Z]
[2024-03-27T19:43:44.835Z] cpu = 'ab2ab', gpu = 'ab2ab\r'
[2024-03-27T19:43:44.835Z] float_check = <function get_float_check.<locals>.<lambda> at 0x7f86b47c1040>
[2024-03-27T19:43:44.835Z] path = [609, 'regexp_extract(a, (abc1a$|ab2ab$), 1)']
[2024-03-27T19:43:44.835Z]
[2024-03-27T19:43:44.835Z] def _assert_equal(cpu, gpu, float_check, path):
[2024-03-27T19:43:44.835Z] t = type(cpu)
[2024-03-27T19:43:44.835Z] if (t is Row):
[2024-03-27T19:43:44.835Z] assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-03-27T19:43:44.835Z] if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
[2024-03-27T19:43:44.835Z] assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
[2024-03-27T19:43:44.836Z] for field in cpu.__fields__:
[2024-03-27T19:43:44.836Z] _assert_equal(cpu[field], gpu[field], float_check, path + [field])
[2024-03-27T19:43:44.836Z] else:
[2024-03-27T19:43:44.836Z] for index in range(len(cpu)):
[2024-03-27T19:43:44.836Z] _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-03-27T19:43:44.836Z] elif (t is list):
[2024-03-27T19:43:44.836Z] assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-03-27T19:43:44.836Z] for index in range(len(cpu)):
[2024-03-27T19:43:44.836Z] _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-03-27T19:43:44.836Z] elif (t is tuple):
[2024-03-27T19:43:44.836Z] assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
[2024-03-27T19:43:44.836Z] for index in range(len(cpu)):
[2024-03-27T19:43:44.836Z] _assert_equal(cpu[index], gpu[index], float_check, path + [index])
[2024-03-27T19:43:44.836Z] elif (t is pytypes.GeneratorType):
[2024-03-27T19:43:44.836Z] index = 0
[2024-03-27T19:43:44.836Z] # generator has no zip :( so we have to do this the hard way
[2024-03-27T19:43:44.836Z] done = False
[2024-03-27T19:43:44.836Z] while not done:
[2024-03-27T19:43:44.836Z] sub_cpu = None
[2024-03-27T19:43:44.836Z] sub_gpu = None
[2024-03-27T19:43:44.836Z] try:
[2024-03-27T19:43:44.836Z] sub_cpu = next(cpu)
[2024-03-27T19:43:44.836Z] except StopIteration:
[2024-03-27T19:43:44.836Z] done = True
[2024-03-27T19:43:44.836Z]
[2024-03-27T19:43:44.836Z] try:
[2024-03-27T19:43:44.836Z] sub_gpu = next(gpu)
[2024-03-27T19:43:44.836Z] except StopIteration:
[2024-03-27T19:43:44.836Z] done = True
[2024-03-27T19:43:44.836Z]
[2024-03-27T19:43:44.836Z] if done:
[2024-03-27T19:43:44.836Z] assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
[2024-03-27T19:43:44.836Z] else:
[2024-03-27T19:43:44.836Z] _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
[2024-03-27T19:43:44.836Z]
[2024-03-27T19:43:44.836Z] index = index + 1
[2024-03-27T19:43:44.836Z] elif (t is dict):
[2024-03-27T19:43:44.836Z] # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark
[2024-03-27T19:43:44.836Z] # so sort the items to do our best with ignoring the order of dicts
[2024-03-27T19:43:44.836Z] cpu_items = list(cpu.items()).sort(key=_RowCmp)
[2024-03-27T19:43:44.836Z] gpu_items = list(gpu.items()).sort(key=_RowCmp)
[2024-03-27T19:43:44.836Z] _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
[2024-03-27T19:43:44.836Z] elif (t is int):
[2024-03-27T19:43:44.836Z] assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
[2024-03-27T19:43:44.836Z] elif (t is float):
[2024-03-27T19:43:44.836Z] if (math.isnan(cpu)):
[2024-03-27T19:43:44.836Z] assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
[2024-03-27T19:43:44.836Z] else:
[2024-03-27T19:43:44.836Z] assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
[2024-03-27T19:43:44.836Z] elif isinstance(cpu, str):
[2024-03-27T19:43:44.836Z] > assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
[2024-03-27T19:43:44.836Z] E AssertionError: GPU and CPU string values are different at [609, 'regexp_extract(a, (abc1a$|ab2ab$), 1)']
[2024-03-27T19:43:44.836Z]
[2024-03-27T19:43:44.836Z] ../../src/main/python/asserts.py:85: AssertionError
The text was updated successfully, but these errors were encountered:
Nightly integration test run had a failure in test_regexp_choice.
Details
The text was updated successfully, but these errors were encountered: