[BUG] `test_cast_string_ts_valid_format` failed at `seed = 1701362564` #9916

winningsix · 2023-12-01T03:06:54Z

Describe the bug
Spark didn't accept year 0 which is generated at seed = 1701362564 in our integration test.

=========================================================================================== FAILURES ===========================================================================================
__________________________________________________________________________ test_cast_string_ts_valid_format[String0] ___________________________________________________________________________
[gw0] linux -- Python 3.10.12 /usr/bin/python

data_gen = String

    @pytest.mark.parametrize('data_gen', [StringGen('[0-9]{1,4}-[0-9]{1,2}-[0-9]{1,2}'),
                                          StringGen('[0-9]{1,4}-[0-3][0-9]-[0-5][0-9][ |T][0-3][0-9]:[0-6][0-9]:[0-6][0-9]'),
                                          StringGen('[0-9]{1,4}-[0-3][0-9]-[0-5][0-9][ |T][0-3][0-9]:[0-6][0-9]:[0-6][0-9]\.[0-9]{0,6}Z?')],
                            ids=idfn)
    @allow_non_gpu(*non_utc_allow)
    def test_cast_string_ts_valid_format(data_gen):
        # In Spark 3.2.0+ the valid format changed, and we cannot support all of the format.
        # This provides values that are valid in all of those formats.
>       assert_gpu_and_cpu_are_equal_collect(
                lambda spark : unary_op_df(spark, data_gen).select(f.col('a').cast(TimestampType())),
                conf = {'spark.rapids.sql.hasExtendedYearValues': 'false',
                    'spark.rapids.sql.castStringToTimestamp.enabled': 'true'})

../../src/main/python/cast_test.py:157: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../src/main/python/asserts.py:581: in assert_gpu_and_cpu_are_equal_collect
    _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
../../src/main/python/asserts.py:486: in _assert_gpu_and_cpu_are_equal
    from_cpu = run_on_cpu()
../../src/main/python/asserts.py:471: in run_on_cpu
    from_cpu = with_cpu_session(bring_back, conf=conf)
../../src/main/python/spark_session.py:106: in with_cpu_session
    return with_spark_session(func, conf=copy)
../../src/main/python/spark_session.py:90: in with_spark_session
    ret = func(_spark)
../../src/main/python/asserts.py:205: in <lambda>
    bring_back = lambda spark: limit_func(spark).collect()
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/dataframe.py:1217: in collect
    return list(_load_from_socket(sock_info, BatchedSerializer(CPickleSerializer())))
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/serializers.py:152: in load_stream
    yield self._read_with_length(stream)
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/serializers.py:174: in _read_with_length
    return self.loads(obj)
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/serializers.py:472: in loads
    return cloudpickle.loads(obj, encoding=encoding)
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:2008: in <lambda>
    return lambda *a: dataType.fromInternal(a)
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:1019: in fromInternal
    values = [
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:1020: in <listcomp>
    f.fromInternal(v) if c else v
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:668: in fromInternal
    return self.dataType.fromInternal(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimestampType(), ts = -62158435543000000

    def fromInternal(self, ts: int) -> datetime.datetime:
        if ts is not None:
            # using int to avoid precision loss in float
>           return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
E           ValueError: year 0 is out of range

/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:280: ValueError

The text was updated successfully, but these errors were encountered:

winningsix added bug Something isn't working ? - Needs Triage Need team to review and classify labels Dec 1, 2023

winningsix mentioned this issue Dec 1, 2023

[FEA] Support time zones that are not UTC #6839

Open

47 tasks

winningsix assigned winningsix and thirtiseven and unassigned winningsix Dec 1, 2023

winningsix mentioned this issue Dec 1, 2023

Support fine grained timezone checker instead of type based [databricks] #9719

Merged

3 tasks

thirtiseven mentioned this issue Dec 1, 2023

Prevent generation of 'year 0 is out of range' strings in IT #9918

Merged

pxLi mentioned this issue Dec 1, 2023

Params for build and test CI scripts on Databricks #9904

Merged

pxLi closed this as completed in #9918 Dec 1, 2023

sameerz removed the ? - Needs Triage Need team to review and classify label Dec 16, 2023

sameerz mentioned this issue Dec 19, 2023

[FEA] Figure out the right long term way to deal with dates and timestamps in integration tests. #9747

Closed

NVnavkumar mentioned this issue Dec 19, 2023

[TASK] Revert the forced datagen_seed for integration tests that failed with year 0 out of range #10080

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] `test_cast_string_ts_valid_format` failed at `seed = 1701362564` #9916

[BUG] `test_cast_string_ts_valid_format` failed at `seed = 1701362564` #9916

winningsix commented Dec 1, 2023

[BUG] test_cast_string_ts_valid_format failed at seed = 1701362564 #9916

[BUG] test_cast_string_ts_valid_format failed at seed = 1701362564 #9916

Comments

winningsix commented Dec 1, 2023

[BUG] `test_cast_string_ts_valid_format` failed at `seed = 1701362564` #9916

[BUG] `test_cast_string_ts_valid_format` failed at `seed = 1701362564` #9916