Azure Data Engineer Exam Preparation

Overview of services

Structure vs unstructured data
Data Storage in Azure
Data Lake in Azure
Cosmos DB in Azure
Azure SQL Database
Azure Synapse
Azure Stream Analytics (ASA)
Azure HDInsight
Other Services

Structure vs unstructured data

Structured Data: data structure is defined at design time. Data structure in form of tables.
- Microsoft SQL Server
- Azure SQL Database
- Azure SQL Data Warehouse
Nonstructured data: data stored in non-relational system, commonly NoSQL. Data is typically stored in raw format.
- Key-value store
- Document databases
- Graph databases
- Column databases

Data Storage in Azure

Azure Blob: A scalable object store for text and binary data.
Azure Files: Managed file shares for cloud or on-premises deployments
Azure Queue: A messaging store for reliable messaging between application components
Azure Table: A NoSQL store for no-schema storage of structured data

Use cases:

Cheap storage but no need to query data.
- Secure, scalable and highly available
- REST APIs and SDKs available
- Security: RBAC included
Data Ingestion: Storage Explorer, AzCopy (up to 1TB), Powershell or Visual Studio.
For querying, you can use Data Lake Storage Account

Data Lake in Azure

Azure Data Lake Storage is a Hadoop-compatible data repository that can store any size or type of data. Two versions: Gen1 or Gen2. Gen2 takes advantage of Azure Blob Storage, a hierarchical file system, performance tuning. Users can access data via Blob API or Data Lake File API. Can also be the storage layer for further platforms like Azure Databricks, Hadoop and Azure HDInsight.

Design to store massive amounts of data for big-data analytics.

Key Features:

Unlimited scalability
Hadoop compatibility
Security support for both access control lists (ACLs)
POSIX compliance
An optimized Azure Blob File System (ABFS) driver that's designed for big-data analytics
Zone-redundant storage
Geo-redundant storage

To ingest data:

Azure Data Factory
Apache Sqoop
Azure Storage Explorer
AzCopy Tool
Powershell, VStudio

Query data:

Gen1: U-SQL
Gen2 Azure Blob API or Azure Data Lake System API

Cosmos DB in Azure

Globally distributed, multimodel database. You can deploy it by using several API models.

SQL API
MongoDB API
Cassandra API
Gremlin API
Table API

Use case: need for a NoSQL database of the supported API model, at a global scale, and with low latency performance. Currently supports a HLA of 99.999% and a response below 10ms when configured correctly.

Data ingestion:

Azure Data Factory
Any source application that writes into the DB
upload JSON docs
other APIs

To query:

write stored procedures or UDF
Use the Javascript API

Azure SQL Database

Provicdes online transactional processing (OLTP) tha can scale on demand. Profits from scalability and security.

Delivers predictable performance, multiple resource types, service tiers, and compute sizes.

Data Ingestion:

Developer SDKs
T-SQL (also for querying or write stored procedures)
Azure Data Factory

Security:

Advanced threat protection
SQL Database auditing
Data encrpytion
Azure AD Auth
Multifactor Auth
Compliance certification

Azure Synapse

Use for Data Warehousing and Data Analytics. Key Feature: SQL Pools.

The Data Movement Service (DMS) coordinates and transports data between compute nodes as neccesary. Supports distributed as: hash, round-robin and replicated.

Ingestion with

SQLBulkCopy
PolyBase
Azure Data Factory

Query with T-SQL.

Supports Azure AD auth and high security features.

Azure Stream Analytics (ASA)

Designed for IoT contexts.

Data Ingestion:

Azure Event Hubs
Azure IoT Hub
Azure Blob Storage

Features:

Native columnar storage
Provides support for structured streaming.

Azure HDInsight

Helps at almost all stages of data processing.

Hadoop compatible: Apache Hive, HBase, Spark and Kafka.
Hbase: NoSQL
Storm: distributed real-time streaming analytics solution
Kafka: streaming open-source solutions.

Other Services

Databricks
Data Factory: can execute SSIS packages, can take over orchestration and workflows for data movement.
Data Catalog (metadata store)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
notes		notes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notes

notes

.gitignore

.gitignore

README.md

README.md

Repository files navigation

Azure Data Engineer Exam Preparation

Overview of services

Structure vs unstructured data

Data Storage in Azure

Data Lake in Azure

Cosmos DB in Azure

Azure SQL Database

Azure Synapse

Azure Stream Analytics (ASA)

Azure HDInsight

Other Services

About

Releases 2

Packages

Fmrhj/notes-azure-data-engineering

Folders and files

Latest commit

History

Repository files navigation

Azure Data Engineer Exam Preparation

Overview of services

Structure vs unstructured data

Data Storage in Azure

Data Lake in Azure

Cosmos DB in Azure

Azure SQL Database

Azure Synapse

Azure Stream Analytics (ASA)

Azure HDInsight

Other Services

About

Resources

Stars

Watchers

Forks