Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict #29510

Closed

Conversation

nchammas
Copy link
Contributor

@nchammas nchammas commented Aug 21, 2020

What changes were proposed in this pull request?

As discussed in #29491 (comment) and in SPARK-32686, this PR un-deprecates Spark's ability to infer a DataFrame schema from a list of dictionaries. The ability is Pythonic and matches functionality offered by Pandas.

Why are the changes needed?

This change clarifies to users that this behavior is supported and is not going away in the near future.

Does this PR introduce any user-facing change?

Yes. There used to be a UserWarning for this, but now there isn't.

How was this patch tested?

I tested this manually.

Before:

>>> spark.createDataFrame(spark.sparkContext.parallelize([{'a': 5}]))
/Users/nchamm/Documents/GitHub/nchammas/spark/python/pyspark/sql/session.py:388: UserWarning: Using RDD of dict to inferSchema is deprecated. Use pyspark.sql.Row instead
  warnings.warn("Using RDD of dict to inferSchema is deprecated. "
DataFrame[a: bigint]

>>> spark.createDataFrame([{'a': 5}])
.../python/pyspark/sql/session.py:378: UserWarning: inferring schema from dict is deprecated,please use pyspark.sql.Row instead
  warnings.warn("inferring schema from dict is deprecated,"
DataFrame[a: bigint]

After:

>>> spark.createDataFrame(spark.sparkContext.parallelize([{'a': 5}]))
DataFrame[a: bigint]                                                            

>>> spark.createDataFrame([{'a': 5}])
DataFrame[a: bigint]

@nchammas
Copy link
Contributor Author

cc @HyukjinKwon

@SparkQA
Copy link

SparkQA commented Aug 21, 2020

Test build #127755 has finished for PR 29510 at commit 4e5c365.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@HyukjinKwon HyukjinKwon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good assuming from the history:

  • Python 2.7's dict does not keep the order, which is fixed in Python 3 and PySpark at SPARK-29748
  • It was deprecated when Row API was introduced at 51aa135 in SPARK-2010. I think we now target Python usability and only allowing Row doesn't look Python friendly.

@HyukjinKwon HyukjinKwon changed the title [SPARK-32686] Un-deprecate inferring DataFrame schema from list of dict [SPARK-32686][PYTHON] Un-deprecate inferring DataFrame schema from list of dict Aug 22, 2020
@HyukjinKwon
Copy link
Member

cc @BryanCutler fyi

@nchammas
Copy link
Contributor Author

Yeah, this deprecation has been around for ~6 years. I wonder if we should check with dev@ to make sure there are no surprising consequences of un-deprecating it?

@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127845 has finished for PR 29510 at commit 37ceb56.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@BryanCutler BryanCutler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@SparkQA
Copy link

SparkQA commented Aug 24, 2020

Test build #127851 has finished for PR 29510 at commit 5136829.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@BryanCutler
Copy link
Member

merged to master, thanks @nchammas !

@nchammas nchammas deleted the SPARK-32686-df-dict-infer-schema branch August 24, 2020 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants