[SPARK-44887][DOCS] Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples by zhengruifeng · Pull Request #42579 · apache/spark

zhengruifeng · 2023-08-21T02:58:41Z

What changes were proposed in this pull request?

Fix wildcard import from pyspark.sql.functions import * in https://spark.apache.org/docs/latest/quick-start.html

Why are the changes needed?

to follow the PEP 8 - Style Guide for Python Code

Wildcard imports (from import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn’t known in advance).
When republishing names this way, the guidelines below regarding public and internal interfaces still apply.

to avoid potential namespace conflicts, since there are several sql functions already shared the same names with built-in modules/functions (e.g. min/max/sum/hash)

Does this PR introduce any user-facing change?

yes

How was this patch tested?

CI

Was this patch authored or co-authored using generative AI tooling?

No

zhengruifeng · 2023-08-21T03:01:39Z

there are two wildcard import under ./docs:

(spark_dev_310) ➜  spark git:(master) ag -i 'import \*' docs
docs/sql-ref-datatypes.md
117:from pyspark.sql.types import *

docs/quick-start.md
133:>>> from pyspark.sql.functions import *
(spark_dev_310) ➜  spark git:(master)

As to the from pyspark.sql.types import * in https://spark.apache.org/docs/latest/sql-ref-datatypes.html#data-types, I think we don't need to touch it, since there should not be any namespace conflicts as far as I know.

zhengruifeng · 2023-08-21T03:03:40Z

cc @HyukjinKwon @allisonwang-db @grundprinzip

HyukjinKwon · 2023-08-21T04:24:47Z

Merged to master.

allisonwang-db · 2023-08-21T20:42:03Z

Thanks for the fix! Shall we also merge it and all other docstring fixes and improvements to Spark 3.5?

HyukjinKwon · 2023-08-22T01:23:50Z

let's don't. improvements shouldn't go to other branches by right, and the 3.5 release is being soon.

…mport *` in `Quick Start` Examples ### What changes were proposed in this pull request? Fix wildcard import `from pyspark.sql.functions import *` in https://spark.apache.org/docs/latest/quick-start.html ### Why are the changes needed? to follow the [PEP 8 - Style Guide for Python Code](https://peps.python.org/pep-0008/) > Wildcard imports (from <module> import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn’t known in advance). When republishing names this way, the guidelines below regarding public and internal interfaces still apply. to avoid potential namespace conflicts, since there are several sql functions already shared the same names with built-in modules/functions (e.g. `min`/`max`/`sum`/`hash`) ### Does this PR introduce _any_ user-facing change? yes ### How was this patch tested? CI ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#42579 from zhengruifeng/docs_avoid_wildcard_imports. Authored-by: Ruifeng Zheng <ruifengz@apache.org> Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>

init

560d39a

github-actions bot added the DOCS label Aug 21, 2023

HyukjinKwon approved these changes Aug 21, 2023

View reviewed changes

LuciferYang approved these changes Aug 21, 2023

View reviewed changes

HyukjinKwon closed this in 04024fd Aug 21, 2023

zhengruifeng deleted the docs_avoid_wildcard_imports branch August 21, 2023 04:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[SPARK-44887][DOCS] Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples#42579

[SPARK-44887][DOCS] Fix wildcard import `from pyspark.sql.functions import *` in `Quick Start` Examples#42579
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:docs_avoid_wildcard_imports

zhengruifeng commented Aug 21, 2023 •

edited

Loading

Uh oh!

zhengruifeng commented Aug 21, 2023 •

edited

Loading

Uh oh!

zhengruifeng commented Aug 21, 2023 •

edited

Loading

Uh oh!

HyukjinKwon commented Aug 21, 2023

Uh oh!

allisonwang-db commented Aug 21, 2023

Uh oh!

HyukjinKwon commented Aug 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Comments

Conversation

zhengruifeng commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

zhengruifeng commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zhengruifeng commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HyukjinKwon commented Aug 21, 2023

Uh oh!

allisonwang-db commented Aug 21, 2023

Uh oh!

HyukjinKwon commented Aug 22, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zhengruifeng commented Aug 21, 2023 •

edited

Loading

zhengruifeng commented Aug 21, 2023 •

edited

Loading

zhengruifeng commented Aug 21, 2023 •

edited

Loading