Skip to content

Comments

[SPARK-44887][DOCS] Fix wildcard import from pyspark.sql.functions import * in Quick Start Examples#42579

Closed
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:docs_avoid_wildcard_imports
Closed

[SPARK-44887][DOCS] Fix wildcard import from pyspark.sql.functions import * in Quick Start Examples#42579
zhengruifeng wants to merge 1 commit intoapache:masterfrom
zhengruifeng:docs_avoid_wildcard_imports

Conversation

@zhengruifeng
Copy link
Contributor

@zhengruifeng zhengruifeng commented Aug 21, 2023

What changes were proposed in this pull request?

Fix wildcard import from pyspark.sql.functions import * in https://spark.apache.org/docs/latest/quick-start.html

Why are the changes needed?

to follow the PEP 8 - Style Guide for Python Code

Wildcard imports (from import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn’t known in advance).
When republishing names this way, the guidelines below regarding public and internal interfaces still apply.

to avoid potential namespace conflicts, since there are several sql functions already shared the same names with built-in modules/functions (e.g. min/max/sum/hash)

Does this PR introduce any user-facing change?

yes

How was this patch tested?

CI

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the DOCS label Aug 21, 2023
@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Aug 21, 2023

there are two wildcard import under ./docs:

(spark_dev_310) ➜  spark git:(master) ag -i 'import \*' docs
docs/sql-ref-datatypes.md
117:from pyspark.sql.types import *

docs/quick-start.md
133:>>> from pyspark.sql.functions import *
(spark_dev_310) ➜  spark git:(master) 

As to the from pyspark.sql.types import * in https://spark.apache.org/docs/latest/sql-ref-datatypes.html#data-types, I think we don't need to touch it, since there should not be any namespace conflicts as far as I know.

@zhengruifeng
Copy link
Contributor Author

zhengruifeng commented Aug 21, 2023

@HyukjinKwon
Copy link
Member

Merged to master.

@zhengruifeng zhengruifeng deleted the docs_avoid_wildcard_imports branch August 21, 2023 04:39
@allisonwang-db
Copy link
Contributor

Thanks for the fix! Shall we also merge it and all other docstring fixes and improvements to Spark 3.5?

@HyukjinKwon
Copy link
Member

let's don't. improvements shouldn't go to other branches by right, and the 3.5 release is being soon.

valentinp17 pushed a commit to valentinp17/spark that referenced this pull request Aug 24, 2023
…mport *` in `Quick Start` Examples

### What changes were proposed in this pull request?
Fix wildcard import `from pyspark.sql.functions import *` in https://spark.apache.org/docs/latest/quick-start.html

### Why are the changes needed?
to follow the [PEP 8 - Style Guide for Python Code](https://peps.python.org/pep-0008/)

> Wildcard imports (from <module> import *) should be avoided, as they make it unclear which names are present in the namespace, confusing both readers and many automated tools. There is one defensible use case for a wildcard import, which is to republish an internal interface as part of a public API (for example, overwriting a pure Python implementation of an interface with the definitions from an optional accelerator module and exactly which definitions will be overwritten isn’t known in advance).
When republishing names this way, the guidelines below regarding public and internal interfaces still apply.

to avoid potential namespace conflicts, since there are several sql functions already shared the same names with built-in modules/functions (e.g. `min`/`max`/`sum`/`hash`)

### Does this PR introduce _any_ user-facing change?
yes

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#42579 from zhengruifeng/docs_avoid_wildcard_imports.

Authored-by: Ruifeng Zheng <ruifengz@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants