Skip to content

Commit d5bb3ca

Browse files
authored
Merge pull request #14878 from jmart1428/edit-orchestration
Pipeline:[Canopy] Updated 'Choose a data pipeline orchestration technology in Azure
2 parents 42e20ac + 7373d2c commit d5bb3ca

File tree

1 file changed

+52
-42
lines changed

1 file changed

+52
-42
lines changed
Lines changed: 52 additions & 42 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
---
2-
title: Choose a data pipeline orchestration technology
2+
title: Choose a Data Pipeline Orchestration Technology
33
description: Choose an Azure data pipeline orchestration technology to automate pipeline orchestration, control flow, and data movement workflows.
44
author: claytonsiemens77
55
ms.author: pnp
@@ -12,81 +12,91 @@ ms.subservice: architecture-guide
1212

1313
# Choose a data pipeline orchestration technology in Azure
1414

15-
Most big data solutions consist of repeated data processing operations, encapsulated in workflows. A pipeline orchestrator is a tool that helps to automate these workflows. An orchestrator can schedule jobs, execute workflows, and coordinate dependencies among tasks.
15+
Most big data solutions consist of repeated data processing operations, encapsulated in workflows. A pipeline orchestrator helps automate these workflows. It can schedule jobs, run workflows, and coordinate dependencies among tasks.
1616

17-
## What are your options for data pipeline orchestration?
17+
## Options for data pipeline orchestration
1818

19-
In Azure, the following services and tools will meet the core requirements for pipeline orchestration, control flow, and data movement:
19+
In Azure, the following services and tools meet the core requirements for pipeline orchestration, control flow, and data movement:
2020

21-
- [Azure Data Factory](/azure/data-factory/)
22-
- [Oozie on HDInsight](/azure/hdinsight/hdinsight-use-oozie-linux-mac)
21+
- [Azure Data Factory](/azure/data-factory)
22+
- [Apache Oozie on Azure HDInsight](/azure/hdinsight/hdinsight-use-oozie-linux-mac)
2323
- [SQL Server Integration Services (SSIS)](/sql/integration-services/sql-server-integration-services)
24+
- [Fabric Data Factory](/fabric/data-factory/data-factory-overview)
2425

25-
These services and tools can be used independently from one another, or used together to create a hybrid solution. For example, the Integration Runtime (IR) in Azure Data Factory V2 can natively execute SSIS packages in a managed Azure compute environment. While there is some overlap in functionality between these services, there are a few key differences.
26+
You can use these services and tools independently or combine them to create a hybrid solution. For example, the integration runtime (IR) in Data Factory V2 can natively run SSIS packages in a managed Azure compute environment. These services share some functionality, but they have a few key differences.
2627

27-
## Key Selection Criteria
28+
## Key selection criteria
2829

29-
To narrow the choices, start by answering these questions:
30+
To narrow your options, consider the following factors:
3031

31-
- Do you need big data capabilities for moving and transforming your data? Usually this means multi-gigabytes to terabytes of data. If yes, then narrow your options to those that best suited for big data.
32+
- Determine whether you need big data capabilities to move and transform your data. These capabilities typically use multiple gigabytes (GBs) to terabytes (TBs) of data. If you require these capabilities, choose a service designed for big data.
3233

33-
- Do you require a managed service that can operate at scale? If yes, select one of the cloud-based services that aren't limited by your local processing power.
34+
- Identify whether you need a managed service that can operate at scale. If you do, choose a cloud-based service that doesn't depend on your local processing power.
3435

35-
- Are some of your data sources located on-premises? If yes, look for options that can work with both cloud and on-premises data sources or destinations.
36+
- Check whether you have data sources located on-premises. If you do, choose a service that supports both cloud and on-premises data sources or destinations.
3637

37-
- Is your source data stored in Blob storage on an HDFS filesystem? If so, choose an option that supports Hive queries.
38+
- Check whether you store source data in blob storage on a Hadoop Distributed File System (HDFS). If you do, choose a service that supports Hive queries.
39+
40+
- Determine whether you need advanced orchestration for complex extract, transform, and load (ETL) workflows across multiple data sources. If you do, choose Fabric Data Factory because it provides a set of connectors, pipeline orchestration, and integration with both on-premises and cloud environments. It's ideal for enterprise-scale data movement and transformation.
3841

3942
## Capability matrix
4043

4144
The following tables summarize the key differences in capabilities.
4245

4346
### General capabilities
4447

45-
| Capability | Azure Data Factory | SQL Server Integration Services (SSIS) | Oozie on HDInsight
46-
| --- | --- | --- | --- |
47-
| Managed | Yes | No | Yes |
48-
| Cloud-based | Yes | No (local) | Yes |
49-
| Prerequisite | Azure Subscription | SQL Server | Azure Subscription, HDInsight cluster |
50-
| Management tools | Azure Portal, PowerShell, CLI, .NET SDK | SSMS, PowerShell | Bash shell, Oozie REST API, Oozie web UI |
51-
| Pricing | Pay per usage | Licensing / pay for features | No additional charge on top of running the HDInsight cluster |
48+
| Capability | Data Factory | SSIS | Oozie on HDInsight | Fabric Data Factory |
49+
| --- | --- | --- | --- | --- |
50+
| Managed | Yes | No | Yes | Yes |
51+
| Cloud-based | Yes | No (local) | Yes | Yes |
52+
| Prerequisite | Azure subscription | SQL Server | Azure subscription, HDInsight cluster | Fabric-enabled workspace |
53+
| Management tools | Azure portal, PowerShell, CLI, .NET SDK | SQL Server Management Studio (SSMS), PowerShell | Bash shell, Oozie REST API, Oozie web user interface (UI) | Copy job, mirroring, pipeline activities, Dataflow Gen2 |
54+
| Pricing | Pay per usage | Licensing, extra features add cost | Included with HDInsight cluster | Included with Fabric capacity |
5255

5356
### Pipeline capabilities
5457

55-
| Capability | Azure Data Factory | SQL Server Integration Services (SSIS) | Oozie on HDInsight
56-
| --- | --- | --- | --- |
57-
| Copy data | Yes | Yes | Yes |
58-
| Custom transformations | Yes | Yes | Yes (MapReduce, Pig, and Hive jobs) |
59-
| Azure Machine Learning scoring | Yes | Yes (with scripting) | No |
60-
| HDInsight On-Demand | Yes | No | No |
61-
| Azure Batch | Yes | No | No |
62-
| Pig, Hive, MapReduce | Yes | No | Yes |
63-
| Spark | Yes | No | No |
64-
| Execute SSIS Package | Yes | Yes | No |
65-
| Control flow | Yes | Yes | Yes |
66-
| Access on-premises data | Yes | Yes | No |
58+
| Capability | Data Factory | SSIS | Oozie on HDInsight | Fabric Data Factory |
59+
| --- | --- | --- | --- | --- |
60+
| Copy data | Yes | Yes | Yes | Yes |
61+
| Custom transformations | Yes | Yes | Yes (MapReduce, Pig, and Hive jobs) | Yes |
62+
| Azure Machine Learning scoring | Yes | Yes (with scripting) | No | Yes (via integration) |
63+
| HDInsight on-demand | Yes | No | No | No |
64+
| Azure Batch | Yes | No | No | Yes |
65+
| Pig, Hive, and MapReduce | Yes | No | Yes | Yes |
66+
| Apache Spark | Yes | No | No | Yes |
67+
| Run SSIS packages | Yes | Yes | No | Yes |
68+
| Control flow | Yes | Yes | Yes | Yes |
69+
| Access on-premises data | Yes | Yes | No | Yes |
6770

6871
### Scalability capabilities
6972

70-
| Capability | Azure Data Factory | SQL Server Integration Services (SSIS) | Oozie on HDInsight
71-
| --- | --- | --- | --- |
72-
| Scale up | Yes | No | No |
73-
| Scale out | Yes | No | Yes (by adding worker nodes to cluster) |
74-
| Optimized for big data | Yes | No | Yes |
73+
| Capability | Data Factory | SSIS | Oozie on HDInsight | Fabric Data Factory |
74+
| --- | --- | --- | --- | --- |
75+
| Scale up | Yes | No | No | Yes |
76+
| Scale out | Yes | No | Yes (by adding worker nodes to cluster) | Yes |
77+
| Optimized for big data | Yes | No | Yes | Yes |
78+
79+
## Alternative approach
80+
81+
In addition to traditional batch-based orchestration, your platform can also use real-time intelligence through the [Fabric Real-Time Intelligence feature](/fabric/real-time-intelligence/event-streams/create-manage-an-eventstream). This approach enables continuous streaming data ingestion, in-flight transformation, and event-driven workflows so that you can respond instantly as data arrives. It supports high-value scenarios such as Internet of Things (IoT) telemetry processing, fraud detection, and operational monitoring.
7582

7683
## Contributors
7784

78-
*This article is maintained by Microsoft. It was originally written by the following contributors.*
85+
*Microsoft maintains this article. The following contributors wrote this article.*
7986

8087
Principal author:
8188

8289
- [Zoiner Tejada](https://www.linkedin.com/in/zoinertejada) | CEO and Architect
8390

91+
*To see nonpublic LinkedIn profiles, sign in to LinkedIn.*
92+
8493
## Next steps
8594

86-
- [Pipelines and activities in Azure Data Factory and Azure Synapse Analytics](/azure/data-factory/concepts-pipelines-activities)
87-
- [Provision the Azure-SSIS integration runtime in Azure Data Factory](/azure/data-factory/tutorial-deploy-ssis-packages-azure)
88-
- [Oozie on HDInsight](/azure/hdinsight/hdinsight-use-oozie-linux-mac)
95+
- [Pipelines and activities in Fabric Data Factory](/fabric/data-factory/data-factory-overview)
96+
- [Provision the Azure-SSIS integration runtime in Data Factory](/azure/data-factory/tutorial-deploy-ssis-packages-azure)
97+
- [Use Oozie to run a workflow on HDInsight](/azure/hdinsight/hdinsight-use-oozie-linux-mac)
98+
- [Medallion architecture in Fabric Real-Time Intelligence](https://blog.fabric.microsoft.com/blog/21597)
8999

90-
## Related resources
100+
## Related resource
91101

92102
- [DataOps for the modern data warehouse](../../databases/architecture/dataops-mdw.yml)

0 commit comments

Comments
 (0)