Skip to content

docs: set --master to local[2] for the spark-shell/pyspark/spark-sql in quickstart examples#14187

Merged
xushiyan merged 1 commit intoapache:asf-sitefrom
rangareddy:br_add_master_as_local
Nov 3, 2025
Merged

docs: set --master to local[2] for the spark-shell/pyspark/spark-sql in quickstart examples#14187
xushiyan merged 1 commit intoapache:asf-sitefrom
rangareddy:br_add_master_as_local

Conversation

@rangareddy
Copy link
Collaborator

@rangareddy rangareddy commented Oct 30, 2025

Describe the issue this Pull Request addresses

This PR standardizes the local development experience and prevents accidental connection to unintended clusters when using interactive Spark shells.

Currently, if the spark-shell, pyspark, or spark-sql scripts are executed without explicitly setting the --master flag, Spark may default to a potentially undefined or shared configuration. For common development workflows, it is safer and more efficient to guarantee a local, isolated environment.

Summary and Changelog

Summary:
Configures the default behavior of interactive Spark shell scripts (spark-shell, pyspark, spark-sql) to use a dedicated local master with two threads (local[2]).

Changelog:

  • Modified shell scripts (spark-shell, pyspark, spark-sql) to include the --master local[2] configuration by default.
  • Users can still override this setting by explicitly passing a different --master flag.
Screenshot 2025-10-30 at 11 28 23 PM Screenshot 2025-10-30 at 11 28 34 PM

Impact

The impact is minor and primarily affects the development experience:

  • User-facing Change: The default behavior of running spark-shell (and related interactive tools) without arguments will now always launch a 2-thread local instance, guaranteeing resource isolation for local testing.

  • Performance Impact: None on actual cluster workloads. Local performance for interactive tasks may be marginally improved due to predictable resource allocation.

Risk Level

low

This change only affects non-cluster, interactive shell execution. It does not impact production configurations, cluster deployments, or submitted application jobs. The risk is limited to local developer environments.

Documentation Update

none

Contributor's checklist

  • Read through contributor's guide
  • Enough context is provided in the sections above
  • Adequate tests were added if applicable

@github-actions github-actions bot added docs size:L PR with lines of changes in (300, 1000] labels Oct 30, 2025
@rangareddy rangareddy requested a review from xushiyan October 30, 2025 18:03
@rangareddy rangareddy self-assigned this Oct 30, 2025
@xushiyan xushiyan merged commit 698c319 into apache:asf-site Nov 3, 2025
1 check passed
@xushiyan xushiyan changed the title docs: Set spark.master=local[2] for spark-shell, pyspark, and spark-sql docs: set --master to local[2] for the spark-shell/pyspark/spark-sql in quickstart examples Nov 3, 2025
@rangareddy rangareddy deleted the br_add_master_as_local branch November 13, 2025 08:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L PR with lines of changes in (300, 1000]

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants