title	titleSuffix	description	author	ms.author	ms.reviewer	ms.date	ms.service	ms.subservice	ms.topic
Use Spark Machine Learning	SQL Server Big Data Clusters	Introducing Spark Machine Learning on SQL Server Big Data Clusters.	HugoMSFT	hudequei	wiassaf	10/05/2021	sql	machine-learning-bdc	conceptual

Introducing Spark Machine Learning on SQL Server Big Data Clusters

[!INCLUDESQL Server 2019]

[!INCLUDEbig-data-clusters-banner-retirement]

This article explains how to effectively use Spark for Machine Learning on [!INCLUDEbig-data-clusters-nover].

Spark Machine Learning in SQL Server Big Data Clusters

SQL Server Big Data Clusters enables machine learning scenarios and solutions using different technology stacks: SQL Server Machine Learning Services and Apache Spark ML.

To better understand when to use each technology stack, refer to Machine Learning guide for SQL Server Big Data Clusters. This guide covers Apache Spark ML.

For big data-based machine learning scenarios, the usage of HDFS for big data hosting and Apache Spark ML capabilities is a more cost-effective, scalable, and powerful approach. Yet this is far from an exhaustive list of the possibilities of what can be achieved with Spark Machine Learning, for a complete list of features see: Spark MLlib.

The next section provides a curated list of scenarios and references for Spark in SQL Server Big Data Clusters.

Building blocks for Spark Machine Learning on SQL Server Big Data Clusters

Learn	Contents	Link
SQL Server Big Data Clusters runtime for Apache Spark	This will show what's included with each release	SQL Server Big Data Clusters runtime for Apache Spark Guide
The Storage Pool	How to store and use HDFS + Spark together to unlock data for machine learning	[Introducing the storage pool in [!INCLUDEbig-data-clusters-2019]](concept-storage-pool.md)
Use notebook-based experiences and your tools of choice	Connect Spark-Livy endpoint using your tools of choice	[Submit Spark jobs on [!INCLUDEbig-data-clusters-2019] in Azure Data Studio](spark-submit-job.md) Submit Spark jobs on SQL Server big data cluster in Visual Studio Code Use sparklyr in SQL Server big data cluster
How to install extra packages	In the case a package is not provided out-of-the-box, install it	Spark library management
How to troubleshoot	In case it breaks	Troubleshoot a `pyspark` notebook [Debug and Diagnose Spark Applications on [!INCLUDEbig-data-clusters-2019] in Spark History Server](spark-history-server.md)
How to submit machine learning batch jobs	Make ML training and batch scoring run using the command line	Submit Spark jobs by using command-line tools
How to quickly move data between SQL Server and Spark	Make SQL Server source and/or destination for your Spark ML scenarios. Usage of HDFS is not mandatory	Use the Apache Spark Connector for SQL Server and Azure SQL
Spark model operationalization	After training, operationalize using MLeap	[Create, export, and score Spark machine learning models on [!INCLUDEbig-data-clusters-2019]](spark-create-machine-learning-model.md)
Data wrangling	Along with Spark's powerful data wrangling capabilities, we ship PROSE	Data Wrangling using PROSE Code Accelerator

Next steps

For more information, see [Introducing [!INCLUDEbig-data-clusters-nover]](big-data-cluster-overview.md).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spark-machine-learning.md

spark-machine-learning.md

Introducing Spark Machine Learning on SQL Server Big Data Clusters

Spark Machine Learning in SQL Server Big Data Clusters

Building blocks for Spark Machine Learning on SQL Server Big Data Clusters

Next steps

Files

spark-machine-learning.md

Latest commit

History

spark-machine-learning.md

File metadata and controls

Introducing Spark Machine Learning on SQL Server Big Data Clusters

Spark Machine Learning in SQL Server Big Data Clusters

Building blocks for Spark Machine Learning on SQL Server Big Data Clusters

Next steps