PySparkProcessor - Possibility to choose different instance types for the driver node and the worker nodes

**Describe the feature you'd like**
It would be nice to have the possibility to choose different instance types for the driver node and the worker nodes when using the `PySparkProcessor`.

**How would this feature be used? Please describe.**
```
pyspark_processor = PySparkProcessor(
    base_job_name=...,
    framework_version=...,
    role=...,
    driver_instance_type="ml.m5.4xlarge",
    worker_instance_type="ml.m5.large",
    instance_count=...,
    sagemaker_session=pipeline_session,
    max_runtime_in_seconds=...,
)
```

**Describe alternatives you've considered**
It is possible to choose a high-memory instance for all instances but it could be unnecessarily costly for the user.

**Additional context**
Some pyspark operations (_e.g_. `.toPandas()`) are memory expensive for the driver node.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PySparkProcessor - Possibility to choose different instance types for the driver node and the worker nodes #3616

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PySparkProcessor - Possibility to choose different instance types for the driver node and the worker nodes #3616

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions