Why would I use sagemaker-spark if I have to use anyways my Spark cluster?

In the line of "Building your own algorithm container", is it possible to use Spark code entirely (and distributively) on Sagemaker?. What I get from the [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/apache-spark.html), is that I'm supposed to do ETL in my Spark Cluster. And then, when fitting the data to the model, use a _sagemaker_pyspark_ algorithm that will create a Sagemaker training job. Moving the dataframe into S3 with protobuf format, to then train with a new Sagemaker instance cluster. 

The question is: If I already have my dataframe loaded into my distributed cluster, why would I want to use Sagemaker? I might as well use Spark ML for it, which has a better algorithm support, and avoids creating an additional cluster. Maybe I got the whole thing wrong...﻿

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why would I use sagemaker-spark if I have to use anyways my Spark cluster? #13

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Why would I use sagemaker-spark if I have to use anyways my Spark cluster? #13

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions