Skip to content

[SPARK-36742][PYTHON] Fix ps.to_datetime with plurals of keys like years, months, days #34182

Closed
dchvn wants to merge 2 commits intoapache:masterfrom
dchvn:SPARK-36742
Closed

[SPARK-36742][PYTHON] Fix ps.to_datetime with plurals of keys like years, months, days #34182
dchvn wants to merge 2 commits intoapache:masterfrom
dchvn:SPARK-36742

Conversation

@dchvn
Copy link
Contributor

@dchvn dchvn commented Oct 5, 2021

What changes were proposed in this pull request?

Fix ps.to_datetime with plurals of keys like years, months, days.

Why are the changes needed?

Fix ps.to_datetime with plurals of keys like years, months, days
Before this PR

# pandas
df_test = pd.DataFrame({'years': [2015, 2016], 'months': [2, 3], 'days': [4, 5]})
df_test['date'] = pd.to_datetime(df_test[['years', 'months', 'days']])
df_test

   years  months  days       date
0   2015       2     4 2015-02-04
1   2016       3     5 2016-03-05


# pandas on spark
df_test = ps.DataFrame({'years': [2015, 2016], 'months': [2, 3], 'days': [4, 5]})
df_test['date'] = ps.to_datetime(df_test[['years', 'months', 'days']])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/u02/spark/python/pyspark/pandas/namespace.py", line 1643, in to_datetime
    psdf = arg[["year", "month", "day"]]
  File "/u02/spark/python/pyspark/pandas/frame.py", line 11888, in __getitem__
    return self.loc[:, list(key)]
  File "/u02/spark/python/pyspark/pandas/indexing.py", line 480, in __getitem__
    ) = self._select_cols(cols_sel)
  File "/u02/spark/python/pyspark/pandas/indexing.py", line 325, in _select_cols
    return self._select_cols_by_iterable(cols_sel, missing_keys)
  File "/u02/spark/python/pyspark/pandas/indexing.py", line 1356, in _select_cols_by_iterable
    raise KeyError("['{}'] not in index".format(name_like_string(key)))
KeyError: "['year'] not in index"

Does this PR introduce any user-facing change?

After this PR :

df_test = ps.DataFrame({'years': [2015, 2016], 'months': [2, 3], 'days': [4, 5]})
df_test['date'] = ps.to_datetime(df_test[['years', 'months', 'days']])
df_test

   years  months  days       date
0   2015       2     4 2015-02-04
1   2016       3     5 2016-03-05

How was this patch tested?

Unit tests

@HyukjinKwon
Copy link
Member

add to whitelist

@HyukjinKwon
Copy link
Member

cc @itholic @ueshin @xinrong-databricks FYI

@SparkQA
Copy link

SparkQA commented Oct 5, 2021

Test build #143843 has finished for PR 34182 at commit a23c74a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48356/

@SparkQA
Copy link

SparkQA commented Oct 5, 2021

Test build #143845 has finished for PR 34182 at commit a6b6211.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 5, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48358/

@SparkQA
Copy link

SparkQA commented Oct 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48356/

@SparkQA
Copy link

SparkQA commented Oct 5, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48358/

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me, pending tests

@srowen
Copy link
Member

srowen commented Oct 6, 2021

Jenkins test this please

@SparkQA
Copy link

SparkQA commented Oct 6, 2021

Test build #143886 has finished for PR 34182 at commit a6b6211.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 6, 2021

Kubernetes integration test starting
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48398/

@SparkQA
Copy link

SparkQA commented Oct 6, 2021

Kubernetes integration test status failure
URL: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/48398/

@HyukjinKwon
Copy link
Member

Merged to master.

@dchvn
Copy link
Contributor Author

dchvn commented Oct 7, 2021

thanks ! @HyukjinKwon @srowen

@SparkQA
Copy link

SparkQA commented Oct 7, 2021

Test build #143918 has finished for PR 34182 at commit a6b6211.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 7, 2021

Test build #143951 has finished for PR 34182 at commit a6b6211.

  • This patch fails Spark unit tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Oct 7, 2021

Test build #143977 has finished for PR 34182 at commit a6b6211.

  • This patch passes all tests.
  • This patch does not merge cleanly.
  • This patch adds no public classes.

@dchvn dchvn deleted the SPARK-36742 branch October 9, 2021 01:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants