title | titleSuffix | description | author | ms.author | ms.reviewer | ms.date | ms.service | ms.subservice | ms.topic |
---|---|---|---|---|---|---|---|---|---|
Machine Learning on SQL Server Big Data Clusters |
SQL Server Big Data Clusters |
Machine Learning guide for SQL Server Big Data Clusters. |
HugoMSFT |
hudequei |
wiassaf |
10/05/2021 |
sql |
machine-learning-bdc |
conceptual |
[!INCLUDESQL Server 2019]
This article explains how to use [!INCLUDEbig-data-clusters-nover] for Machine Learning Scenarios.
[!INCLUDEbig-data-clusters-banner-retirement]
[!INCLUDEbig-data-clusters-nover] enables machine learning scenarios and solutions using different technology stacks: SQL Server Machine Learning Services and Apache Spark ML.
[!INCLUDEbig-data-clusters-nover] offer Machine Learning capabilities inside the SQL Server engine, using the established SQL Server Machine Learning Services technology stack; enabling a high-performance, in-database Machine Learning inference and scoring scenarios.
For big data-based machine learning scenarios, the usage of HDFS for big data hosting and Apache Spark ML capabilities is more cost-effective, scalable, and powerful.
The machine learning capabilities enable different applications and solutions such as: fraud detection, forecasting, churn, and general classification and regression tasks. Yet, it is important to use the best technology for a scenario.
Aspect | SQL Server Machine Learning Services | Apache Spark ML |
---|---|---|
Data placement | Leverages tabular data locality on SQL Server. Premium data tier. | Scalable Big Data data tier using HDFS; either unstructured, semi-structured, and structured data. |
Best for | Low latency inference and scoring scenarios | 1. Distributed batch training and scoring machine learning models on top of Big Data 2. ETL sinks and large-scale data preparation and featurization for ML |
Feeds | ML powered BI dashboards, reports, and applications. Low latency required | Batch scored data may be promoted to SQL Server to drive ML powered scenarios |
Latency | Low latency required | Higher latency acceptable |
Read more | Run Python and R scripts with Machine Learning Services on SQL Server Big Data Clusters | Introducing Spark Machine Learning on SQL Server Big Data Clusters |
For more information, see [Introducing [!INCLUDEbig-data-clusters-nover]](big-data-cluster-overview.md).