title | titleSuffix | description | author | ms.author | ms.date | ms.service | ms.subservice | ms.topic |
---|---|---|---|---|---|---|---|---|
Compute pools in SQL Server Big Data Clusters |
SQL Server Big Data Clusters |
This article describes the compute pool in a SQL Server 2019 big data cluster. |
WilliamDAssafMSFT |
wiassaf |
10/15/2020 |
sql |
big-data-cluster |
conceptual |
Introducing compute pools in [!INCLUDEbig-data-clusters-2019]
[!INCLUDESQL Server 2019]
[!INCLUDEbig-data-clusters-banner-retirement]
This article describes the role of SQL Server compute pools in a SQL Server big data cluster. Compute pools provide scale-out computational resources for a SQL Server big data cluster. They are used to offload computational work, or intermediate result sets, from the SQL Server master instance. The following sections describe the architecture, functionality and usage scenarios of a compute pool.
You can also watch this 5-minute video for an introduction into compute pools:
A compute pool is made of one or more compute pods running in Kubernetes. The automated creation and management of these pods is coordinated by the SQL Server master instance. Each pod contains a set of base services and an instance of the SQL Server database engine.
A compute pool can act as a PolyBase scale-out group for distributed queries over different external data sources such as SQL Server, Oracle, MongoDB, Teradata and HDFS. By using compute pods in Kubernetes, a SQL Server big data cluster can automate creating and configuring compute pods for PolyBase scale-out groups.
Scenarios where the compute pool is used include:
-
When queries submitted to the master instance use one or more tables located in the storage pool.
-
When queries submitted to the master instance use one or more tables with round-robin distribution located in the data pool.
-
When queries submitted to the master instance use partitioned tables with external data sources of SQL Server, Oracle, MongoDB, and Teradata. For this scenario, the query hint OPTION (FORCE SCALEOUTEXECUTION) must be enabled.
-
When queries submitted to the master instance use one or more tables located in HDFS tiering.
Scenarios where the compute pool is not used include:
-
When queries submitted to the master instance use one or more tables in an external Hadoop HDFS cluster.
-
When queries submitted to the master instance use one or more tables in Azure Blob Storage.
-
When queries submitted to the master instance use non-partitioned tables with external data sources of SQL Server, Oracle, MongoDB, and Teradata.
-
When the query hint OPTION (DISABLE SCALEOUTEXECUTION) is enabled.
-
When queries submitted to the master instance apply to databases located on the master instance.
To learn more about the [!INCLUDEbig-data-clusters-2019], see the following resources:
- [Introducing [!INCLUDEbig-data-clusters-2019]](big-data-cluster-overview.md)
- [Workshop: Microsoft [!INCLUDEbig-data-clusters-2019] Architecture](https://github.com/microsoft/sqlworkshops-bdc)