-
Notifications
You must be signed in to change notification settings - Fork 28.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-36470][PYTHON] Implement CategoricalIndex.map
and DatetimeIndex.map
#33756
Conversation
Test build #142521 has finished for PR 33756 at commit
|
Test build #142523 has finished for PR 33756 at commit
|
Kubernetes integration test starting |
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status failure |
CategoricalIndex.map
and DatetimeIndx.map
CategoricalIndex.map
and DatetimeIndex.map
Test build #142568 has finished for PR 33756 at commit
|
Test build #142569 has finished for PR 33756 at commit
|
Kubernetes integration test starting |
Test build #142570 has finished for PR 33756 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
Kubernetes integration test status success |
Test build #142572 has finished for PR 33756 at commit
|
Kubernetes integration test starting |
Kubernetes integration test status success |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise, LGTM.
Test build #142653 has finished for PR 33756 at commit
|
@@ -516,19 +515,6 @@ def test_missing(self): | |||
getattr(psdf.set_index("c").index, name)() | |||
|
|||
# CategoricalIndex functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove from this line to line 525 as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Removed.
Test build #142654 has finished for PR 33756 at commit
|
Kubernetes integration test starting |
Kubernetes integration test unable to build dist. exiting with code: 1 |
Kubernetes integration test status failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, pending tests.
Test build #142674 has finished for PR 33756 at commit
|
Kubernetes integration test unable to build dist. exiting with code: 1 |
Merged to master. |
Merging to branch-3.2 too since RC1 failed. |
…ndex.map` Implement `CategoricalIndex.map` and `DatetimeIndex.map` `MultiIndex.map` cannot be implemented in the same way as the `map` of other indexes. It should be taken care of separately if necessary. Mapping values using input correspondence is a common operation that is supported in pandas. We shall support that as well. Yes. `CategoricalIndex.map` and `DatetimeIndex.map` can be used now. - CategoricalIndex.map ```py >>> idx = ps.CategoricalIndex(['a', 'b', 'c']) >>> idx CategoricalIndex(['a', 'b', 'c'], categories=['a', 'b', 'c'], ordered=False, dtype='category') >>> idx.map(lambda x: x.upper()) CategoricalIndex(['A', 'B', 'C'], categories=['A', 'B', 'C'], ordered=False, dtype='category') >>> pser = pd.Series([1, 2, 3], index=pd.CategoricalIndex(['a', 'b', 'c'], ordered=True)) >>> idx.map(pser) CategoricalIndex([1, 2, 3], categories=[1, 2, 3], ordered=True, dtype='category') >>> idx.map({'a': 'first', 'b': 'second', 'c': 'third'}) CategoricalIndex(['first', 'second', 'third'], categories=['first', 'second', 'third'], ordered=False, dtype='category') ``` - DatetimeIndex.map ```py >>> pidx = pd.date_range(start="2020-08-08", end="2020-08-10") >>> psidx = ps.from_pandas(pidx) >>> mapper_dict = { ... datetime.datetime(2020, 8, 8): datetime.datetime(2021, 8, 8), ... datetime.datetime(2020, 8, 9): datetime.datetime(2021, 8, 9), ... } >>> psidx.map(mapper_dict) DatetimeIndex(['2021-08-08', '2021-08-09', 'NaT'], dtype='datetime64[ns]', freq=None) >>> mapper_pser = pd.Series([1, 2, 3], index=pidx) >>> psidx.map(mapper_pser) Int64Index([1, 2, 3], dtype='int64') >>> psidx DatetimeIndex(['2020-08-08', '2020-08-09', '2020-08-10'], dtype='datetime64[ns]', freq=None) >>> psidx.map(lambda x: x.strftime("%B %d, %Y, %r")) Index(['August 08, 2020, 12:00:00 AM', 'August 09, 2020, 12:00:00 AM', 'August 10, 2020, 12:00:00 AM'], dtype='object') ``` Unit tests. Closes #33756 from xinrong-databricks/other_indexes_map. Authored-by: Xinrong Meng <xinrong.meng@databricks.com> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org> (cherry picked from commit 0b6af46) Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
What changes were proposed in this pull request?
Implement
CategoricalIndex.map
andDatetimeIndex.map
MultiIndex.map
cannot be implemented in the same way as themap
of other indexes. It should be taken care of separately if necessary.Why are the changes needed?
Mapping values using input correspondence is a common operation that is supported in pandas. We shall support that as well.
Does this PR introduce any user-facing change?
Yes.
CategoricalIndex.map
andDatetimeIndex.map
can be used now.How was this patch tested?
Unit tests.