-
Notifications
You must be signed in to change notification settings - Fork 28.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-8535][PySpark]PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name #7124
Conversation
Can one of the admins verify this patch? |
Jenkins this is ok to test |
Merged build triggered. |
Merged build started. |
Test build #36145 has started for PR 7124 at commit |
Test build #36145 has finished for PR 7124 at commit
|
Merged build finished. Test FAILed. |
@@ -342,13 +342,15 @@ def createDataFrame(self, data, schema=None, samplingRatio=None): | |||
|
|||
>>> sqlContext.createDataFrame(df.toPandas()).collect() # doctest: +SKIP | |||
[Row(name=u'Alice', age=1)] | |||
>>> sqlContext.createDataFrame(pandas.DataFrame([[1, 2]]).collect()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no pandas in jenkins, we need to skip the tests by # doctest: +SKIP
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @shaneknapp, want to help us install Pandas? 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But PySpark SQL does not depends on pandas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@JoshRosen @davies
I'm sorry.
I added # doctest: +SKIP
.
Merged build triggered. |
Merged build started. |
LGTM |
Test build #36216 has started for PR 7124 at commit |
Thank you very much ✨ 🙇 ✨ |
Test build #36216 has finished for PR 7124 at commit
|
Merged build finished. Test PASSed. |
…ataframe with no explicit column name Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`. So I think `pandas.columns` are should be convert to `String`. ### issue * [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535) Author: x1- <viva008@gmail.com> Closes #7124 from x1-/SPARK-8535 and squashes the following commits: d68fd38 [x1-] modify unit-test using pandas. ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String. (cherry picked from commit b6e76ed) Signed-off-by: Davies Liu <davies@databricks.com>
…ataframe with no explicit column name Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`. So I think `pandas.columns` are should be convert to `String`. ### issue * [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535) Author: x1- <viva008@gmail.com> Closes #7124 from x1-/SPARK-8535 and squashes the following commits: d68fd38 [x1-] modify unit-test using pandas. ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String. (cherry picked from commit b6e76ed) Signed-off-by: Davies Liu <davies@databricks.com>
Merged into master, 1.3 and 1.4 branch, thanks! |
Because implicit name of
pandas.columns
are Int, butStructField
json expectString
.So I think
pandas.columns
are should be convert toString
.issue