You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# Choose a data pipeline orchestration technology in Azure
14
14
15
-
Most big data solutions consist of repeated data processing operations, encapsulated in workflows. A pipeline orchestrator is a tool that helps to automate these workflows. An orchestrator can schedule jobs, execute workflows, and coordinate dependencies among tasks.
15
+
Most big data solutions consist of repeated data processing operations, encapsulated in workflows. A pipeline orchestrator helps automate these workflows. It can schedule jobs, run workflows, and coordinate dependencies among tasks.
16
16
17
-
## What are your options for data pipeline orchestration?
17
+
## Options for data pipeline orchestration
18
18
19
-
In Azure, the following services and tools will meet the core requirements for pipeline orchestration, control flow, and data movement:
19
+
In Azure, the following services and tools meet the core requirements for pipeline orchestration, control flow, and data movement:
20
20
21
-
-[Azure Data Factory](/azure/data-factory/)
22
-
-[Oozie on HDInsight](/azure/hdinsight/hdinsight-use-oozie-linux-mac)
21
+
-[Azure Data Factory](/azure/data-factory)
22
+
-[Apache Oozie on Azure HDInsight](/azure/hdinsight/hdinsight-use-oozie-linux-mac)
23
23
-[SQL Server Integration Services (SSIS)](/sql/integration-services/sql-server-integration-services)
24
+
-[Fabric Data Factory](/fabric/data-factory/data-factory-overview)
24
25
25
-
These services and tools can be used independently from one another, or used together to create a hybrid solution. For example, the Integration Runtime (IR) in Azure Data Factory V2 can natively execute SSIS packages in a managed Azure compute environment. While there is some overlap in functionality between these services, there are a few key differences.
26
+
You can use these services and tools independently or combine them to create a hybrid solution. For example, the integration runtime (IR) in Data Factory V2 can natively run SSIS packages in a managed Azure compute environment. These services share some functionality, but they have a few key differences.
26
27
27
-
## Key Selection Criteria
28
+
## Key selection criteria
28
29
29
-
To narrow the choices, start by answering these questions:
30
+
To narrow your options, consider the following factors:
30
31
31
-
-Do you need big data capabilities for moving and transforming your data? Usually this means multi-gigabytes to terabytes of data. If yes, then narrow your options to those that best suited for big data.
32
+
-Determine whether you need big data capabilities to move and transform your data. These capabilities typically use multiple gigabytes (GBs) to terabytes (TBs) of data. If you require these capabilities, choose a service designed for big data.
32
33
33
-
-Do you require a managed service that can operate at scale? If yes, select one of the cloud-based services that aren't limited by your local processing power.
34
+
-Identify whether you need a managed service that can operate at scale. If you do, choose a cloud-based service that doesn't depend on your local processing power.
34
35
35
-
-Are some of your data sources located on-premises? If yes, look for options that can work with both cloud and on-premises data sources or destinations.
36
+
-Check whether you have data sources located on-premises. If you do, choose a service that supports both cloud and on-premises data sources or destinations.
36
37
37
-
- Is your source data stored in Blob storage on an HDFS filesystem? If so, choose an option that supports Hive queries.
38
+
- Check whether you store source data in blob storage on a Hadoop Distributed File System (HDFS). If you do, choose a service that supports Hive queries.
39
+
40
+
- Determine whether you need advanced orchestration for complex extract, transform, and load (ETL) workflows across multiple data sources. If you do, choose Fabric Data Factory because it provides a set of connectors, pipeline orchestration, and integration with both on-premises and cloud environments. It's ideal for enterprise-scale data movement and transformation.
38
41
39
42
## Capability matrix
40
43
41
44
The following tables summarize the key differences in capabilities.
42
45
43
46
### General capabilities
44
47
45
-
| Capability | Azure Data Factory | SQL Server Integration Services (SSIS) | Oozie on HDInsight
| Access on-premises data | Yes | Yes | No | Yes |
67
70
68
71
### Scalability capabilities
69
72
70
-
| Capability | Azure Data Factory | SQL Server Integration Services (SSIS) | Oozie on HDInsight
71
-
| --- | --- | --- | --- |
72
-
| Scale up | Yes | No | No |
73
-
| Scale out | Yes | No | Yes (by adding worker nodes to cluster) |
74
-
| Optimized for big data | Yes | No | Yes |
73
+
| Capability | Data Factory | SSIS | Oozie on HDInsight | Fabric Data Factory |
74
+
| --- | --- | --- | --- | --- |
75
+
| Scale up | Yes | No | No | Yes |
76
+
| Scale out | Yes | No | Yes (by adding worker nodes to cluster) | Yes |
77
+
| Optimized for big data | Yes | No | Yes | Yes |
78
+
79
+
## Alternative approach
80
+
81
+
In addition to traditional batch-based orchestration, your platform can also use real-time intelligence through the [Fabric Real-Time Intelligence feature](/fabric/real-time-intelligence/event-streams/create-manage-an-eventstream). This approach enables continuous streaming data ingestion, in-flight transformation, and event-driven workflows so that you can respond instantly as data arrives. It supports high-value scenarios such as Internet of Things (IoT) telemetry processing, fraud detection, and operational monitoring.
75
82
76
83
## Contributors
77
84
78
-
*This article is maintained by Microsoft. It was originally written by the following contributors.*
85
+
*Microsoft maintains this article. The following contributors wrote this article.*
79
86
80
87
Principal author:
81
88
82
89
-[Zoiner Tejada](https://www.linkedin.com/in/zoinertejada) | CEO and Architect
83
90
91
+
*To see nonpublic LinkedIn profiles, sign in to LinkedIn.*
92
+
84
93
## Next steps
85
94
86
-
-[Pipelines and activities in Azure Data Factory and Azure Synapse Analytics](/azure/data-factory/concepts-pipelines-activities)
87
-
-[Provision the Azure-SSIS integration runtime in Azure Data Factory](/azure/data-factory/tutorial-deploy-ssis-packages-azure)
88
-
-[Oozie on HDInsight](/azure/hdinsight/hdinsight-use-oozie-linux-mac)
95
+
-[Pipelines and activities in Fabric Data Factory](/fabric/data-factory/data-factory-overview)
96
+
-[Provision the Azure-SSIS integration runtime in Data Factory](/azure/data-factory/tutorial-deploy-ssis-packages-azure)
97
+
-[Use Oozie to run a workflow on HDInsight](/azure/hdinsight/hdinsight-use-oozie-linux-mac)
98
+
-[Medallion architecture in Fabric Real-Time Intelligence](https://blog.fabric.microsoft.com/blog/21597)
89
99
90
-
## Related resources
100
+
## Related resource
91
101
92
102
-[DataOps for the modern data warehouse](../../databases/architecture/dataops-mdw.yml)
0 commit comments