Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-47969][PYTHON][TESTS] Make test_creation_index deterministic #46200

Closed

Conversation

zhengruifeng
Copy link
Contributor

What changes were proposed in this pull request?

Make test_creation_index deterministic

Why are the changes needed?

it may fail in some env

FAIL [16.261s]: test_creation_index (pyspark.pandas.tests.frame.test_constructor.FrameConstructorTests.test_creation_index)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/python/pyspark/testing/pandasutils.py", line 91, in _assert_pandas_equal
    assert_frame_equal(
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 1257, in assert_frame_equal
    assert_index_equal(
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 407, in assert_index_equal
    raise_assert_detail(obj, msg, left, right)
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 665, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.index are different
DataFrame.index values are different (40.0 %)
[left]:  Int64Index([2, 3, 4, 6, 5], dtype='int64')
[right]: Int64Index([2, 3, 4, 5, 6], dtype='int64')

Does this PR introduce any user-facing change?

no. test only

How was this patch tested?

ci

Was this patch authored or co-authored using generative AI tooling?

no

@dongjoon-hyun dongjoon-hyun changed the title [MINOR][PYTHON][TESTS] Make test_creation_index deterministic [SPARK-47969][PYTHON][TESTS] Make test_creation_index deterministic Apr 24, 2024
Copy link
Member

@dongjoon-hyun dongjoon-hyun left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM. Thank you, @zhengruifeng .

@dongjoon-hyun
Copy link
Member

Merged to master for Apache Spark 4.0.0.

@zhengruifeng zhengruifeng deleted the fix_test_creation_index branch April 24, 2024 06:17
@zhengruifeng
Copy link
Contributor Author

thank you @dongjoon-hyun so much

dongjoon-hyun pushed a commit that referenced this pull request May 4, 2024
…` deterministic

### What changes were proposed in this pull request?
followup #46200

### Why are the changes needed?
there is still non-deterministic codes in this test:
```
Traceback (most recent call last):
  File "/home/jenkins/python/pyspark/testing/pandasutils.py", line 91, in _assert_pandas_equal
    assert_frame_equal(
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 1257, in assert_frame_equal
    assert_index_equal(
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 407, in assert_index_equal
    raise_assert_detail(obj, msg, left, right)
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 665, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.index are different
DataFrame.index values are different (75.0 %)
[left]:  DatetimeIndex(['2022-09-02', '2022-09-03', '2022-08-31', '2022-09-05'], dtype='datetime64[ns]', freq=None)
[right]: DatetimeIndex(['2022-08-31', '2022-09-02', '2022-09-03', '2022-09-05'], dtype='datetime64[ns]', freq=None)

```

### Does this PR introduce _any_ user-facing change?
no, test only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes #46378 from zhengruifeng/ps_test_create_index.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
### What changes were proposed in this pull request?
Make `test_creation_index` deterministic

### Why are the changes needed?
it may fail in some env
```
FAIL [16.261s]: test_creation_index (pyspark.pandas.tests.frame.test_constructor.FrameConstructorTests.test_creation_index)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins/python/pyspark/testing/pandasutils.py", line 91, in _assert_pandas_equal
    assert_frame_equal(
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 1257, in assert_frame_equal
    assert_index_equal(
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 407, in assert_index_equal
    raise_assert_detail(obj, msg, left, right)
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 665, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.index are different
DataFrame.index values are different (40.0 %)
[left]:  Int64Index([2, 3, 4, 6, 5], dtype='int64')
[right]: Int64Index([2, 3, 4, 5, 6], dtype='int64')
```

### Does this PR introduce _any_ user-facing change?
no. test only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#46200 from zhengruifeng/fix_test_creation_index.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
JacobZheng0927 pushed a commit to JacobZheng0927/spark that referenced this pull request May 11, 2024
…` deterministic

### What changes were proposed in this pull request?
followup apache#46200

### Why are the changes needed?
there is still non-deterministic codes in this test:
```
Traceback (most recent call last):
  File "/home/jenkins/python/pyspark/testing/pandasutils.py", line 91, in _assert_pandas_equal
    assert_frame_equal(
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 1257, in assert_frame_equal
    assert_index_equal(
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 407, in assert_index_equal
    raise_assert_detail(obj, msg, left, right)
  File "/databricks/python3/lib/python3.11/site-packages/pandas/_testing/asserters.py", line 665, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: DataFrame.index are different
DataFrame.index values are different (75.0 %)
[left]:  DatetimeIndex(['2022-09-02', '2022-09-03', '2022-08-31', '2022-09-05'], dtype='datetime64[ns]', freq=None)
[right]: DatetimeIndex(['2022-08-31', '2022-09-02', '2022-09-03', '2022-09-05'], dtype='datetime64[ns]', freq=None)

```

### Does this PR introduce _any_ user-facing change?
no, test only

### How was this patch tested?
ci

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#46378 from zhengruifeng/ps_test_create_index.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants