Skip to content

Change Documentation page "Python Package Management" url enclosed#34422

Closed
yetanotherlogonfail wants to merge 3 commits intomasterfrom
branch-0.5
Closed

Change Documentation page "Python Package Management" url enclosed#34422
yetanotherlogonfail wants to merge 3 commits intomasterfrom
branch-0.5

Conversation

@yetanotherlogonfail
Copy link

Proposed change to User Guide documentation page

"Python Package Management"
URL
https://spark.apache.org/docs/latest/api/python/user_guide/python_packaging.html

Reason: The paragraph is unclear
user-facing change: Yes

Change
Paragraph: Using PySpark Native Features

From:
PySpark allows to upload Python files (.py), zipped Python packages (.zip), and Egg files (.egg) to the executors by:

Setting the configuration setting spark.submit.pyFiles

Setting --py-files option in Spark scripts

Directly calling pyspark.SparkContext.addPyFile() in applications

This is a straightforward method to ship additional custom Python code to the cluster. You can just add individual files or zip whole packages and upload them. Using pyspark.SparkContext.addPyFile() allows to upload code even after having started your job.

However, it does not allow to add packages built as Wheels and therefore does not allow to include dependencies with native code.

TO:
PySpark allows to upload Python files (.py), zipped Python packages (.zip), and Egg files (.egg) to the executors by:

Setting the configuration setting spark.submit.pyFiles
OR
Setting --py-files option in Spark scripts
OR
Directly calling pyspark.SparkContext.addPyFile() in applications

This is a straightforward method to ship additional custom Python code to the cluster. You can just add individual files or zip whole packages and upload them. Using pyspark.SparkContext.addPyFile() allows to upload code even after having started your job.

However, it does not allow to add packages built as Wheels and therefore does not allow to include dependencies with native code.

tomdz and others added 3 commits October 21, 2012 00:03
Conflicts:

	core/src/main/scala/spark/NewHadoopRDD.scala
	core/src/main/scala/spark/PairRDDFunctions.scala
	project/SparkBuild.scala
@AmplabJenkins
Copy link

Can one of the admins verify this patch?

@yetanotherlogonfail
Copy link
Author

yetanotherlogonfail commented Oct 28, 2021 via email

HyukjinKwon added a commit that referenced this pull request Oct 29, 2021
…ive Features"

### What changes were proposed in this pull request?

This PR proposes to fix:

```diff
- to the executors by:
+ to the executors by one of the following:
```

to clarify that doing one of many options works (instead of doing all options together).

### Why are the changes needed?

To prevent confusion.

### Does this PR introduce _any_ user-facing change?

Yes, this is user-facing documentation change.

### How was this patch tested?

Manually double checked.

Closes #34422

Closes #34432 from HyukjinKwon/SPARK-37134.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit f258d30)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
sunchao pushed a commit to sunchao/spark that referenced this pull request Dec 8, 2021
…ive Features"

### What changes were proposed in this pull request?

This PR proposes to fix:

```diff
- to the executors by:
+ to the executors by one of the following:
```

to clarify that doing one of many options works (instead of doing all options together).

### Why are the changes needed?

To prevent confusion.

### Does this PR introduce _any_ user-facing change?

Yes, this is user-facing documentation change.

### How was this patch tested?

Manually double checked.

Closes apache#34422

Closes apache#34432 from HyukjinKwon/SPARK-37134.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit f258d30)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit b58db1f)
catalinii pushed a commit to lyft/spark that referenced this pull request Feb 22, 2022
…ive Features"

### What changes were proposed in this pull request?

This PR proposes to fix:

```diff
- to the executors by:
+ to the executors by one of the following:
```

to clarify that doing one of many options works (instead of doing all options together).

### Why are the changes needed?

To prevent confusion.

### Does this PR introduce _any_ user-facing change?

Yes, this is user-facing documentation change.

### How was this patch tested?

Manually double checked.

Closes apache#34422

Closes apache#34432 from HyukjinKwon/SPARK-37134.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit f258d30)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
catalinii pushed a commit to lyft/spark that referenced this pull request Mar 4, 2022
…ive Features"

### What changes were proposed in this pull request?

This PR proposes to fix:

```diff
- to the executors by:
+ to the executors by one of the following:
```

to clarify that doing one of many options works (instead of doing all options together).

### Why are the changes needed?

To prevent confusion.

### Does this PR introduce _any_ user-facing change?

Yes, this is user-facing documentation change.

### How was this patch tested?

Manually double checked.

Closes apache#34422

Closes apache#34432 from HyukjinKwon/SPARK-37134.

Authored-by: Hyukjin Kwon <gurwls223@apache.org>
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
(cherry picked from commit f258d30)
Signed-off-by: Hyukjin Kwon <gurwls223@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

Comments