-
Notifications
You must be signed in to change notification settings - Fork 28k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPARK-25881 #22888
SPARK-25881 #22888
Conversation
Can one of the admins verify this patch? |
I think you can just manually convert from Pandas DataFrame, no? |
If I'm using function toPandas, I dont think decimal to object is right. |
and this also have no effect on timestamp values. |
Then, you can convert the type into double or floats in Spark DataFrame. This is super easily able to work around at Pandas DataFrame or Spark's DataFrame. I don't think we should add this flag. BTW, the same feature should be added to when Arrow optimization is enabled as well. |
Or can we correct this conversion in function dataframe._to_corrected_pandas_type ? |
You're introducing a flag to convert. I think it's virtually same enabling the flag vs calling a function to convert. |
I would close this, @351zyf. |
OK |
add parametere coerce_float
https://issues.apache.org/jira/browse/SPARK-25881
What changes were proposed in this pull request?
when using pyspark dataframe.toPandas()
the type decimal in spark df turn to object in pandas dataframe
the paramater coerce_float in pd.DataFrame.from_records will handle type decimal.Decimal to floating point.
(Please fill in changes proposed in this fix)
How was this patch tested?
(Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
(If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
Please review http://spark.apache.org/contributing.html before opening a pull request.