[SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.repr` and `DataFrame.dtypes` #38735

zhengruifeng · 2022-11-21T05:07:01Z

What changes were proposed in this pull request?

Implement DataFrame.__repr__ and DataFrame.dtypes

Why are the changes needed?

For api coverage

Does this PR introduce any user-facing change?

yes

How was this patch tested?

added UT

init init init

zhengruifeng · 2022-11-21T05:07:52Z

python/pyspark/sql/connect/dataframe.py

@@ -115,6 +115,9 @@ def __init__(
        self._cache: Dict[str, Any] = {}
        self._session: "RemoteSparkSession" = session

+    def __repr__(self) -> str:
+        return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes))


this follows the default behavior of

spark/python/pyspark/sql/dataframe.py

Lines 860 to 869 in 40a9a6e

def __repr__(self) -> str:

if not self._support_repr_html and self.sparkSession._jconf.isReplEagerEvalEnabled():

vertical = False

return self._jdf.showString(

self.sparkSession._jconf.replEagerEvalMaxNumRows(),

self.sparkSession._jconf.replEagerEvalTruncate(),

vertical,

)

else:

return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes))

Question: is this public API?

it will be invoked here:

In [1]: df = spark.createDataFrame([(10, 80, "Alice"), (5, None, "Bob"), (None, 10, "Tom"), (None, None, None)], schema=["age", "height", "name"]) In [2]: df Out[2]: DataFrame[age: bigint, height: bigint, name: string] In [3]: df.__repr__() Out[3]: 'DataFrame[age: bigint, height: bigint, name: string]'

thanks for the example!

zhengruifeng · 2022-11-22T01:06:16Z

@HyukjinKwon @cloud-fan @grundprinzip @amaliujia

HyukjinKwon · 2022-11-22T06:44:14Z

Merged to master.

zhengruifeng · 2022-11-22T06:52:36Z

@HyukjinKwon thanks for the reviews

…taFrame.dtypes` ### What changes were proposed in this pull request? Implement `DataFrame.__repr__` and `DataFrame.dtypes` ### Why are the changes needed? For api coverage ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? added UT Closes apache#38735 from zhengruifeng/connect_df_repr. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

init

11f0f53

init init init

github-actions bot added CONNECT CORE PYTHON SQL labels Nov 21, 2022

zhengruifeng commented Nov 21, 2022

View reviewed changes

HyukjinKwon approved these changes Nov 22, 2022

View reviewed changes

HyukjinKwon closed this in 55addb3 Nov 22, 2022

zhengruifeng deleted the connect_df_repr branch November 22, 2022 06:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.repr` and `DataFrame.dtypes` #38735

[SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.repr` and `DataFrame.dtypes` #38735

zhengruifeng commented Nov 21, 2022

zhengruifeng Nov 21, 2022

amaliujia Nov 22, 2022

zhengruifeng Nov 22, 2022

amaliujia Nov 22, 2022

zhengruifeng commented Nov 22, 2022

HyukjinKwon commented Nov 22, 2022

zhengruifeng commented Nov 22, 2022

	def __repr__(self) -> str:
	if not self._support_repr_html and self.sparkSession._jconf.isReplEagerEvalEnabled():
	vertical = False
	return self._jdf.showString(
	self.sparkSession._jconf.replEagerEvalMaxNumRows(),
	self.sparkSession._jconf.replEagerEvalTruncate(),
	vertical,
	)
	else:
	return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes))

[SPARK-41213][CONNECT][PYTHON] Implement DataFrame.__repr__ and DataFrame.dtypes #38735

[SPARK-41213][CONNECT][PYTHON] Implement DataFrame.__repr__ and DataFrame.dtypes #38735

Conversation

zhengruifeng commented Nov 21, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

zhengruifeng Nov 21, 2022

Choose a reason for hiding this comment

amaliujia Nov 22, 2022

Choose a reason for hiding this comment

zhengruifeng Nov 22, 2022

Choose a reason for hiding this comment

amaliujia Nov 22, 2022

Choose a reason for hiding this comment

zhengruifeng commented Nov 22, 2022

HyukjinKwon commented Nov 22, 2022

zhengruifeng commented Nov 22, 2022

[SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.repr` and `DataFrame.dtypes` #38735

[SPARK-41213][CONNECT][PYTHON] Implement `DataFrame.repr` and `DataFrame.dtypes` #38735