Azure-Data-Factory-Cookbook-Second-Edition

This is the code repository for the Azure-Data-Factory-Cookbook-Second-Edition , published by Packt.

More details are below, pick up your copy today!

A data engineer's guide to building and managing ETL and ELT pipelines with data integration

The authors of this book are :

About the book

This new edition of the Azure Data Factory Cookbook, fully updated to reflect ADS V2, will help you get up and running by showing you how to create and execute your first job in ADF. You’ll learn how to branch and chain activities, create custom activities, and schedule pipelines, as well as discover the benefits of Cloud Data Warehousing, Azure Synapse Analytics, and Azure Data Lake Storage Gen2. With practical recipes, you’ll learn how to actively engage with analytical tools from Azure Data Services and leverage your on-premises infrastructure with cloud-native tools to get relevant business insights. As you advance, you’ll be able to integrate the most commonly used Azure Services into ADF and understand how Azure services can be useful in designing ETL pipelines. You'll familiarize yourself with the common errors that you may encounter while working with ADF and find out how to use the Azure portal to monitor pipelines. You’ll also understand error messages and resolve problems in Connectors and Data Flows with the debugging capabilities of ADF. By the end of this book, you’ll be able to use ADF as the main ETL and orchestration tool for your data warehouse or data platform projects.

Key Takeaways

Create an orchestration and transformation job in ADF
Develop, execute, and monitor Data Flows using Azure Synapse Analytics
Create Big Data pipelines using Databricks and Delta tables
Work with Big Data in Azure Data Lake Storage Gen2 using Spark pools
Migrate on-premises SSIS jobs to ADF
Integrate ADF with commonly used Azure services such as Azure ML, Azure Logic Apps, and Azure Functions
Run big data compute jobs within HDInsight and Azure Databricks
Copy data from AWS S3 and Google Cloud Storage to Azure Storage using ADF's built-in connectors

New Edition v/s Previous Edition

The new edition of the Azure Data Factory Cookbook builds upon the previous edition by providing updated insights into ADS V2 and emphasizing practical recipes for tackling real-world data challenges. It extends coverage into Cloud Data Warehousing, Azure Synapse Analytics, and Azure Data Lake Storage Gen2, offering readers a more comprehensive resource for mastering orchestration, custom activities, and Azure service integration.

What's New

In this edition, two new chapters have been added on Azure Data Explorer and Best Practices to enhance the reader's understanding. Additionally, fresh recipes have been added throughout the book covering pivotal topics such as Azure Synapse, Databricks, and Delta tables for Big Data pipelines. It also guides on migrating SSIS jobs to ADF and integrating with AWS S3 and Google Cloud Storage, ensuring comprehensive coverage of modern ETL practices.

Outline and Chapter Summary

Azure Data Factory (ADF) is a modern data integration tool available on Microsoft Azure. This Azure Data Cookbook, Second Edition helps you get up and running by showing you how to create and execute your first job in ADF. You will learn how to branch and chain activities, create custom activities, and schedule pipelines. This book will help you discover the benefits of Cloud Data Warehousing, Azure Synapse Analytics, Azure Data Lake Storage Gen2, and Databricks, which are frequently used for Big Data Analytics. Through practical recipes, you will learn how to actively engage with analytical tools from Azure Data Services and leverage your on-premises infrastructure with cloud-native tools to get relevant business insights. As you advance, you will be able to integrate the most used Azure services into ADF and understand how Azure services can be useful in designing ETL pipelines. The book will take you through the common errors that you may encounter while working with ADF and guide you in using the Azure portal to monitor pipelines. You will also understand error messages and resolve problems in connectors and data flows with the debugging capabilities of ADF. Additionally, there is also a focus on the latest innovative technology in Microsoft Fabric. You will explore how this technology enhances its capabilities for data integration and orchestration. By the end of this book, you will be able to use ADF as the main ETL and orchestration tool for your Data Warehouse and Data Platform projects.

Getting Started with ADF
Orchestration and Control Flow
Setting Up Synapse Analytics
Working with Data Lake and Spark Pools
Working with Big Data and Databricks
Data Migration – Azure Data Factory and Other Cloud Services
Extending Azure Data Factory with Logic Apps and Azure Functions
Microsoft Fabric and Power BI, Azure ML, and Cognitive Services
Managing Deployment Processes with Azure DevOps
Monitoring and Troubleshooting Data Pipelines
Working with Azure Data Explorer
The Best Practices of Working with ADF

Chapter 01, Getting Started with ADF

Chapter 1, Getting Started with ADF, will provide a general introduction to the Azure data platform. In this chapter, you will learn about the ADF interface and options as well as common use cases. You will perform hand s-on exercises in order to fi nd ADF in the Azure portal and create your fi rst ADF job.

Chapter 02, Orchestration and Control Flow

Chapter 2, Orchestration and Control Flow, will introduce you to the building blocks of data processing in ADF. The chapter contains hands-on exercises that show you how to set up linked services and datasets for your data sources, use various types of activities, design data-processing workfl ows, and create triggers for data transfers.

Chapter 03, Setting Up Synapse Analytics

Chapter 3, Setting Up Synapse Analytics, covers key features and benefi ts of cloud data warehousing and Azure Synapse Analytics. You will learn how to connect and confi gure Azure Synapse Analytics, load data, build transformation processes , and operate data flows.

Chapter 04, Working with Data Lake and Spark Pools

Chapter 4, Working with Data Lake and Spark Pools, will cover the main features of the Azure Data Lake Storage Gen2. It is a multimodal cloud storage solution that is frequently used for big data analytics. We will load and manage the datasets that we will use for analytics in the next chapter.

Chapter 05, Working with Big Data and Databricks

Chapter 5, Working with Big Data and Databricks, will actively engage with analytical tools from Azure’s data services. You will learn how to build data models in Delta Lake using Azure Databricks and mapping data fl ows. Also, this recipe will show you how to set up HDInsights clusters and how to work with delta tables.

Chapter 06, Data Migration – Azure Data Factory and Other Cloud Services

Chapter 6, Data Migration – Azure Data Factory and Other Cloud Services, will walk though several illustrative examples on migrating data from Amazon Web Services and Google Cloud providers. In addition, you will learn how to use ADF’s custom activities to work with providers who are not supported by Microsoft’s built-in connectors.

Chapter 07, Extending Azure Data Factory with Logic Apps and Azure Functions

Chapter 7, Extending Azure Data Factory with Logic Apps and Azure Functions, will show you how to harness the power of serverless execution by integrating some of the most commonly used Azure services: Azure Logic Apps and Azure Functions. These recipes will help you understand how Azure services can be useful in designing Extract, Transform, Load (ETL) pipelines.

Chapter 08, Microsoft Fabric and Power BI, Azure ML, and Cognitive Services

Chapter 8, Microsoft Fabric and Power BI, Azure ML , and Cognitive Services, will teach you how to build an ADF pipeline that operates on a pre-built Azure ML model. You will also create and run an ADF pipeline that leverages Azure AI for text data analysis. In the last three recipes, you’ll familiarize yourself with the primary components of Microsoft Fabric Data Factory.

Chapter 09, Managing Deployment Processes with Azure DevOps

Chapter 9, Managing Deployment Processes with Azure DevOps, will delve into setting up CI and CD for data analytics solutions in ADF using Azure DevOps. Throughout the process, we will also demonstrate how to use Visual Studio Code to facilitate the deployment of changes to ADF.

Chapter 10, Monitoring and Troubleshooting Data Pipelines

Chapter 10, Monitoring and Troubleshooting Data Pipelines, will introduce tools to help you manage and monitor your ADF pipelines. You will learn where and how to fi nd more information about what went wrong when a pipeline failed, how to debug a failed run, how to set up alerts that notify you when there is a problem, and how to identify problems with your integration runtimes.

Chapter 11, Working with Azure Data Explorer

Chapter 11, Working with Azure Data Explorer, will help you to set up a data ingestion pipeline from ADF to Azure Data Explorer: it includes a step-by-step guide to ingest ing JSON data from Azure Storage and will teach you how to transform data in Azure Data Explorer with ADF activities.

Chapter 12, The Best Practices of Working with ADF

Chapter 12, The Best Practices of Working with ADF, will guide you through essential considerations, strategies, and practical recipes that will elevate your ADF projects to new heights of effi ciency, security, and scalability. To get the most out

Know more on the Discord server

You can get more engaged on the discord server for more latest updates and discussions in the community at Discord

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost. Simply click on the link to claim your free PDF. Free-Ebook

We also provide a PDF file that has color images of the screenshots/diagrams used in this book at GraphicBundle

Get to know the Author

Dmitry Anoshin is a data-centric technologist and a recognized expert in building and implementing big data and analytics solutions. He has a successful track record when it comes to implementing business and digital intelligence projects in numerous industries, including retail, finance, marketing, and e-commerce. Dmitry possesses in-depth knowledge of digital/business intelligence, ETL, data warehousing, and big data technologies. He has extensive experience in the data integration process and is proficient in using various data warehousing methodologies. Dmitry has constantly exceeded project expectations when he has worked in the financial, machine tool, and retail industries. He has completed a number of multinational full BI/DI solution life cycle implementation projects. With expertise in data modeling, Dmitry also has a background and business experience in multiple relation databases, OLAP systems, and NoSQL databases. He is also an active speaker at data conferences and helps people to adopt cloud analytics.

Tonya Chernyshova is an Experienced Data Engineer with a proven track record of successfully delivering scalable, maintainable, and impactful data products. She’s Hhighly proficient in Data Modeling, Automation, Cloud Computing, and Data Visualization, consistently driving data-driven insights and business growth.

Dmitry Foshin is a business intelligence team leader, whose main goals are delivering business insights to the management team through data engineering, analytics, and visualization. He has led and executed complex full-stack BI solutions (from ETL processes to building DWH and reporting) using Azure technologies, Data Lake, Data Factory, Data Bricks, MS Office 365, PowerBI, and Tableau. He has also successfully launched numerous data analytics projects – both on-premises and cloud – that help achieve corporate goals in international FMCG companies, banking, and manufacturing industries.

Xenia Hertzenberg is a software engineer at Microsoft and has extensive knowledge in the field of data engineering, big data pipelines, data warehousing, and systems architecture.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
Chapter01		Chapter01
Chapter03		Chapter03
Chapter04		Chapter04
Chapter05		Chapter05
Chapter06		Chapter06
Chapter08		Chapter08
Chapter09		Chapter09
Chapter10		Chapter10
Chapter11		Chapter11
Chapter12		Chapter12
LICENSE		LICENSE
README.md		README.md

License

PacktPublishing/Azure-Data-Factory-Cookbook-Second-Edition

Folders and files

Latest commit

History

Repository files navigation