Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]Test in json_test.py failed: test_from_json_struct_decimal #10349

Closed
nartal1 opened this issue Jan 31, 2024 · 2 comments · Fixed by #10614
Closed

[BUG]Test in json_test.py failed: test_from_json_struct_decimal #10349

nartal1 opened this issue Jan 31, 2024 · 2 comments · Fixed by #10614
Assignees
Labels
bug Something isn't working

Comments

@nartal1
Copy link
Collaborator

nartal1 commented Jan 31, 2024

Describe the bug
test_from_json_struct_decimal failed on databricks nightly builds

 =================================== FAILURES ===================================
________________________ test_from_json_struct_decimal _________________________

     @allow_non_gpu(*non_utc_allow)
     def test_from_json_struct_decimal():
         json_string_gen = StringGen(r'{ "a": "[+-]?([0-9]{0,5})?(\.[0-9]{0,2})?([eE][+-]?[0-9]{1,2})?" }') \
             .with_special_pattern('', weight=50) \
             .with_special_pattern('null', weight=50)
>       assert_gpu_and_cpu_are_equal_collect(
             lambda spark : unary_op_df(spark, json_string_gen) \
                 .select(f.from_json('a', 'struct<a:decimal>')),
             conf={"spark.rapids.sql.expression.JsonToStructs": True})

../../src/main/python/json_test.py�[0m:634: 
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
./../src/main/python/asserts.py�[0m:595: in assert_gpu_and_cpu_are_equal_collect
     _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
../../src/main/python/asserts.py�[0m:517: in _assert_gpu_and_cpu_are_equal
    assert_equal(from_cpu, from_gpu)
../../src/main/python/asserts.py�[0m:107: in assert_equal
     _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
./../src/main/python/asserts.py�[0m:43: in _assert_equal
     _assert_equal(cpu[index], gpu[index], float_check, path + [index])
../../src/main/python/asserts.py�[0m:36: in _assert_equal
     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
../../src/main/python/asserts.py�[0m:36: in _assert_equal
     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

 cpu = Decimal('0'), gpu = None
float_check = <function get_float_check.<locals>.<lambda> at 0x7fc6bea6e560>
path = [1438, 'from_json(a)', 'a']

     def _assert_equal(cpu, gpu, float_check, path):
        t = type(cpu)
         if (t is Row):
            assert len(cpu) == len(gpu), "CPU and GPU row have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
             if hasattr(cpu, "__fields__") and hasattr(gpu, "__fields__"):
                assert cpu.__fields__ == gpu.__fields__, "CPU and GPU row have different fields at {} CPU: {} GPU: {}".format(path, cpu.__fields__, gpu.__fields__)
                 for field in cpu.__fields__:
                     _assert_equal(cpu[field], gpu[field], float_check, path + [field])
             else:
                 for index in range(len(cpu)):
                    _assert_equal(cpu[index], gpu[index], float_check, path + [index])
         elif (t is list):
             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
             for index in range(len(cpu)):
                 _assert_equal(cpu[index], gpu[index], float_check, path + [index])
         elif (t is tuple):
             assert len(cpu) == len(gpu), "CPU and GPU list have different lengths at {} CPU: {} GPU: {}".format(path, len(cpu), len(gpu))
            for index in range(len(cpu)):
                _assert_equal(cpu[index], gpu[index], float_check, path + [index])
        elif (t is pytypes.GeneratorType):
             index = 0
            # generator has no zip :( so we have to do this the hard way
             done = False
             while not done:
                sub_cpu = None
                 sub_gpu = None
                 try:
                    sub_cpu = next(cpu)
                 except StopIteration:
                     done = True
     
                try:
                     sub_gpu = next(gpu)
                except StopIteration:
                    done = True
     
                 if done:
                    assert sub_cpu == sub_gpu and sub_cpu == None, "CPU and GPU generators have different lengths at {}".format(path)
                 else:
                    _assert_equal(sub_cpu, sub_gpu, float_check, path + [index])
    
                 index = index + 1
         elif (t is dict):
             # The order of key/values is not guaranteed in python dicts, nor are they guaranteed by Spark
             # so sort the items to do our best with ignoring the order of dicts
             cpu_items = list(cpu.items()).sort(key=_RowCmp)
             gpu_items = list(gpu.items()).sort(key=_RowCmp)
            _assert_equal(cpu_items, gpu_items, float_check, path + ["map"])
         elif (t is int):
            assert cpu == gpu, "GPU and CPU int values are different at {}".format(path)
        elif (t is float):
             if (math.isnan(cpu)):
                 assert math.isnan(gpu), "GPU and CPU float values are different at {}".format(path)
             else:
                 assert float_check(cpu, gpu), "GPU and CPU float values are different {}".format(path)
         elif isinstance(cpu, str):
             assert cpu == gpu, "GPU and CPU string values are different at {}".format(path)
         elif isinstance(cpu, datetime):
            assert cpu == gpu, "GPU and CPU timestamp values are different at {}".format(path)
        elif isinstance(cpu, date):
             assert cpu == gpu, "GPU and CPU date values are different at {}".format(path)
        elif isinstance(cpu, bool):
             assert cpu == gpu, "GPU and CPU boolean values are different at {}".format(path)
        elif isinstance(cpu, Decimal):
>           assert cpu == gpu, "GPU and CPU decimal values are different at {}".format(path)
[E           AssertionError: GPU and CPU decimal values are different at [1438, 'from_json(a)', 'a']�[0m

../../src/main/python/asserts.py�[0m:93: AssertionError

@nartal1 nartal1 added bug Something isn't working ? - Needs Triage Need team to review and classify labels Jan 31, 2024
@nartal1 nartal1 changed the title [BUG]One test in json_test.py failed: test_from_json_struct_decimal [BUG]Test in json_test.py failed: test_from_json_struct_decimal Jan 31, 2024
@andygrove
Copy link
Contributor

andygrove commented Jan 31, 2024

cpu = Decimal('0'), gpu = None

This is an example of the issue where we let cuDF infer types in from_json rather than ask for primitives as strings and then cast in the plugin, as we do with GpuJsonScan. This is covered in issue #8204.

I will create a PR to use a fixed seed until we resolve this.

@mattahrens
Copy link
Collaborator

Scope for this bug to stay open as a P1 is to re-enable random seed once cudf dependency is satisfied.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants