# Microsoft Azure Data Fundamentals: Explore modern data warehouse analytics in Azure

Learn the fundamentals of database concepts in a cloud environment, get basic skilling in cloud data services, and build your foundational knowledge of cloud data services within Microsoft Azure. You will explore the processing options available for building data analytics solutions in Azure. You will explore Azure Synapse Analytics, Azure Databricks, and Azure HDInsight.

## Examine components of a modern data warehouse

Examine the components of a modern data warehouse. Understand the role of services like Azure Databricks, Azure Synapse Analytics, and Azure HDInsight. See how to use Azure Synapse Analytics to load and process data.

**Learning objectives**

In this module, you will:

* Describe Azure data services for modern data warehousing
* Describe modern data warehousing architecture and workload
* Explore Azure data services in the Azure portal

### Introduction

The process of combining all of the local data sources is known as data warehousing. The process of analyzing streaming data and data from the Internet is known as Big Data analytics. Azure Synapse Analytics combines data warehousing with Big Data analytics.

### Describe modern data warehousing

**What is modern data warehousing?**

A modern data warehouse might contain a mixture of relational and non-relational data, including files, social media streams, and Internet of Things (IoT) sensor data. Azure provides a collection of services you can use to build a data warehouse solution, including Azure Data Factory, Azure Data Lake Storage, Azure Databricks, Azure Synapse Analytics, and Azure Analysis Services. You can use tools such as Power BI to analyze and visualize the data, generating reports, charts, and dashboards.

### Explore Azure data services for modern data warehousing

**What is Azure Data Factory?**

Azure Data Factory is described as a data integration service. The purpose of Azure Data Factory is to retrieve data from one or more data sources, and convert it into a format that you process. 
You define the work performed by Azure Data Factory as a pipeline of operations.

**What is Azure Data Lake Storage?**

A data lake is a repository for large quantities of raw data before it is converted for analysis.

Note:

A data warehouse also stores large quantities of data, but the data in a warehouse has been processed to convert it into a format for efficient analysis. A data lake holds raw data, but a data warehouse holds structured information.

![data lake](data-lake.png)

**What is Azure Databricks?**

Azure Databricks is an Apache Spark environment running on Azure to provide big data processing, streaming, and machine learning. 


Azure Databricks provides a graphical user interface where you can define and test your processing step by step, before submitting it as a set of batch tasks. You can create Databricks scripts and query data using languages such as R, Python, and Scala. You write your Spark code using notebooks.

**What is Azure Synapse Analytics?**

Azure Synapse Analytics is an analytics engine. It's designed to process large amounts of data very quickly.You can perform complex queries over this data and generate reports, graphs, and charts.

Azure Synapse Analytics leverages a massively parallel processing (MPP) architecture. This architecture includes a control node and a pool of compute nodes.

The Control node is the brain of the architecture.
The Compute nodes provide the computational power. 

Azure Synapse Analytics supports two computational models: SQL pools and Spark pools.

In a SQL pool, each compute node uses an Azure SQL Database and Azure Storage to handle a portion of the data.

![Synapse](synapse.png)

**What is Azure Analysis Services?**

Azure Analysis Services enables you to build tabular models to support online analytical processing (OLAP) queries.

Use Azure Synapse Analytics for:

* Very high volumes of data (multi-terabyte to petabyte sized datasets).
* Very complex queries and aggregations.
* Data mining, and data exploration.
* Complex ETL operations. ETL stands for Extract, Transform, and Load, and refers to the way in which you can retrieve raw data from multiple sources, convert this data into a standard format, and store it.
* Low to mid concurrency (128 users or fewer).

Use Azure Analysis Services for:

* Smaller volumes of data (a few terabytes).
* Multiple sources that can be correlated.
* High read concurrency (thousands of users).
* Detailed analysis, and drilling into data, using functions in Power BI.
* Rapid dashboard development from tabular data.

**What is Azure HDInsight?**

Azure HDInsight is a big data processing service, that provides the platform for technologies such as Spark in an Azure environment

![hdinsight](hdinsight.png)

## Explore large-scale data analytics

Explore data ingestion options to build a data warehouse with Azure, services to perform data analytics, and features of Azure Synapse Analytics. Create a Synapse Analytics workspace and use it to ingest and analyze data.

**Learning objectives**

In this module, you will:

* Describe data ingestion in Azure
* Describe components of Azure Data Factory
* See how to use Azure Data Factory to load data into a data warehouse
* Describe data processing options for performing analytics in Azure
* Explore Azure Synapse Analytics

