[SPARK-37511][PYTHON] Introduce TimedeltaIndex to pandas API on Spark#34657
[SPARK-37511][PYTHON] Introduce TimedeltaIndex to pandas API on Spark#34657xinrong-meng wants to merge 9 commits intoapache:masterfrom
Conversation
|
Test build #145423 has finished for PR 34657 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
d739a22 to
37d2ff4
Compare
|
Test build #145747 has finished for PR 34657 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
nice! |
|
Test build #145787 has finished for PR 34657 at commit
|
|
Kubernetes integration test starting |
|
Considering the "Note" section in the PR description, shall we call the type |
| raise TypeError("Index.name must be a hashable type") | ||
|
|
||
| if isinstance(data, (Series, Index)): | ||
| # TODO(SPARK-37512): Support TimedeltaIndex creation given a timedelta Series/Index |
There was a problem hiding this comment.
To support TimedeltaIndex creation given a timedelta Series/Index involves many changes in python/pyspark/pandas/data_type_ops/. Shall we implement that separately in https://issues.apache.org/jira/browse/SPARK-37512?
|
Test build #145794 has finished for PR 34657 at commit
|
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Test build #145797 has finished for PR 34657 at commit
|
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test starting |
|
Kubernetes integration test status failure |
|
Kubernetes integration test status failure |
|
Merged to master. |
…imedeltaIndex ### What changes were proposed in this pull request? This PR is a followup of #34657 that adds underline to match with the title. ### Why are the changes needed? To fix the PySpark documentation build warning: ``` /.../spark/python/docs/source/reference/pyspark.pandas/indexing.rst:340: WARNING: Title underline too short. TimedeltaIndex ------------- ``` ### Does this PR introduce _any_ user-facing change? No, dev-only. ### How was this patch tested? Manual build of the PySpark documentation. Closes #34775 from HyukjinKwon/SPARK-37511. Authored-by: Hyukjin Kwon <gurwls223@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Introduce TimedeltaIndex to pandas API on Spark.
Properties, functions, and basic operations of TimedeltaIndex will be supported in follow-up PRs.
Note
Please note that PySpark DayTimeIntervalType follows python datetime.timedelta, in which the smallest time unit is
microsecond. However, pandas TimedeltaIndex hasnanosecondsupport.Thus, we may observe the inconsistency as below:
To inspect further in PySpark side:
Why are the changes needed?
Since DayTimeIntervalType is supported in PySpark, we may add TimedeltaIndex support in pandas API on Spark accordingly.
Does this PR introduce any user-facing change?
Yes.
How was this patch tested?
Unit tests.