Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-8535][PySpark]PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name #7124

Closed
wants to merge 2 commits into from

Conversation

x1-
Copy link
Contributor

@x1- x1- commented Jun 30, 2015

Because implicit name of pandas.columns are Int, but StructField json expect String.
So I think pandas.columns are should be convert to String.

issue

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@JoshRosen
Copy link
Contributor

Jenkins this is ok to test

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@SparkQA
Copy link

SparkQA commented Jun 30, 2015

Test build #36145 has started for PR 7124 at commit ea1897d.

@SparkQA
Copy link

SparkQA commented Jun 30, 2015

Test build #36145 has finished for PR 7124 at commit ea1897d.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test FAILed.

@@ -342,13 +342,15 @@ def createDataFrame(self, data, schema=None, samplingRatio=None):

>>> sqlContext.createDataFrame(df.toPandas()).collect() # doctest: +SKIP
[Row(name=u'Alice', age=1)]
>>> sqlContext.createDataFrame(pandas.DataFrame([[1, 2]]).collect())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no pandas in jenkins, we need to skip the tests by # doctest: +SKIP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @shaneknapp, want to help us install Pandas? 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But PySpark SQL does not depends on pandas.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JoshRosen @davies
I'm sorry.
I added # doctest: +SKIP.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@davies
Copy link
Contributor

davies commented Jul 1, 2015

LGTM

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36216 has started for PR 7124 at commit d68fd38.

@x1-
Copy link
Contributor Author

x1- commented Jul 1, 2015

Thank you very much ✨ 🙇 ✨

@SparkQA
Copy link

SparkQA commented Jul 1, 2015

Test build #36216 has finished for PR 7124 at commit d68fd38.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@AmplabJenkins
Copy link

Merged build finished. Test PASSed.

@asfgit asfgit closed this in b6e76ed Jul 1, 2015
asfgit pushed a commit that referenced this pull request Jul 1, 2015
…ataframe with no explicit column name

Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`.
So I think `pandas.columns` are should be convert to `String`.

### issue

* [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535)

Author: x1- <viva008@gmail.com>

Closes #7124 from x1-/SPARK-8535 and squashes the following commits:

d68fd38 [x1-] modify unit-test using pandas.
ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String.

(cherry picked from commit b6e76ed)
Signed-off-by: Davies Liu <davies@databricks.com>
asfgit pushed a commit that referenced this pull request Jul 1, 2015
…ataframe with no explicit column name

Because implicit name of `pandas.columns` are Int, but `StructField` json expect `String`.
So I think `pandas.columns` are should be convert to `String`.

### issue

* [SPARK-8535 PySpark : Can't create DataFrame from Pandas dataframe with no explicit column name](https://issues.apache.org/jira/browse/SPARK-8535)

Author: x1- <viva008@gmail.com>

Closes #7124 from x1-/SPARK-8535 and squashes the following commits:

d68fd38 [x1-] modify unit-test using pandas.
ea1897d [x1-] For implicit name of pandas.columns are Int, so should be convert to String.

(cherry picked from commit b6e76ed)
Signed-off-by: Davies Liu <davies@databricks.com>
@davies
Copy link
Contributor

davies commented Jul 1, 2015

Merged into master, 1.3 and 1.4 branch, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants