Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_cast_string_ts_valid_format failed at seed = 1701362564 #9916

Closed
winningsix opened this issue Dec 1, 2023 · 0 comments · Fixed by #9918
Closed

[BUG] test_cast_string_ts_valid_format failed at seed = 1701362564 #9916

winningsix opened this issue Dec 1, 2023 · 0 comments · Fixed by #9918
Assignees
Labels
bug Something isn't working

Comments

@winningsix
Copy link
Collaborator

Describe the bug
Spark didn't accept year 0 which is generated at seed = 1701362564 in our integration test.

=========================================================================================== FAILURES ===========================================================================================
__________________________________________________________________________ test_cast_string_ts_valid_format[String0] ___________________________________________________________________________
[gw0] linux -- Python 3.10.12 /usr/bin/python

data_gen = String

    @pytest.mark.parametrize('data_gen', [StringGen('[0-9]{1,4}-[0-9]{1,2}-[0-9]{1,2}'),
                                          StringGen('[0-9]{1,4}-[0-3][0-9]-[0-5][0-9][ |T][0-3][0-9]:[0-6][0-9]:[0-6][0-9]'),
                                          StringGen('[0-9]{1,4}-[0-3][0-9]-[0-5][0-9][ |T][0-3][0-9]:[0-6][0-9]:[0-6][0-9]\.[0-9]{0,6}Z?')],
                            ids=idfn)
    @allow_non_gpu(*non_utc_allow)
    def test_cast_string_ts_valid_format(data_gen):
        # In Spark 3.2.0+ the valid format changed, and we cannot support all of the format.
        # This provides values that are valid in all of those formats.
>       assert_gpu_and_cpu_are_equal_collect(
                lambda spark : unary_op_df(spark, data_gen).select(f.col('a').cast(TimestampType())),
                conf = {'spark.rapids.sql.hasExtendedYearValues': 'false',
                    'spark.rapids.sql.castStringToTimestamp.enabled': 'true'})

../../src/main/python/cast_test.py:157: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../src/main/python/asserts.py:581: in assert_gpu_and_cpu_are_equal_collect
    _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
../../src/main/python/asserts.py:486: in _assert_gpu_and_cpu_are_equal
    from_cpu = run_on_cpu()
../../src/main/python/asserts.py:471: in run_on_cpu
    from_cpu = with_cpu_session(bring_back, conf=conf)
../../src/main/python/spark_session.py:106: in with_cpu_session
    return with_spark_session(func, conf=copy)
../../src/main/python/spark_session.py:90: in with_spark_session
    ret = func(_spark)
../../src/main/python/asserts.py:205: in <lambda>
    bring_back = lambda spark: limit_func(spark).collect()
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/dataframe.py:1217: in collect
    return list(_load_from_socket(sock_info, BatchedSerializer(CPickleSerializer())))
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/serializers.py:152: in load_stream
    yield self._read_with_length(stream)
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/serializers.py:174: in _read_with_length
    return self.loads(obj)
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/serializers.py:472: in loads
    return cloudpickle.loads(obj, encoding=encoding)
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:2008: in <lambda>
    return lambda *a: dataType.fromInternal(a)
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:1019: in fromInternal
    values = [
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:1020: in <listcomp>
    f.fromInternal(v) if c else v
/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:668: in fromInternal
    return self.dataType.fromInternal(obj)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = TimestampType(), ts = -62158435543000000

    def fromInternal(self, ts: int) -> datetime.datetime:
        if ts is not None:
            # using int to avoid precision loss in float
>           return datetime.datetime.fromtimestamp(ts // 1000000).replace(microsecond=ts % 1000000)
E           ValueError: year 0 is out of range

/opt/spark-3.4.1-bin-hadoop3/python/pyspark/sql/types.py:280: ValueError
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants