Build massive and lightning-fast analytics solutions with Azure Data Explorer

Looking to help your customers make business decisions with immediate impact based on real-time terabyte/petabyte of data in seconds? In this session, you will build a near-real-time analytical solution with Azure Data Explorer (ADX), which supports interactive adhoc queries of terabyte/petabyte data.

Walk away with a solution for your frustrated customers, so they can make immediate and impactful business decisions from their data using ADX.

Contents

Infrastructure
Ingestion
Exploration
Questions
KQL
Results
Self-Study
- Kusto Query Language (KQL)
- Questions to try out
- Power BI
  - Connect to Help cluster
  - Create Power BI report
- KQL-Results

Infrastructure

Open Azure portal in private mode: https://portal.azure.com in the Virtual Machine (on the left-hand side)
- Connect with the Azure Credentials from Environment Details tab.

The portal opens with your credentials:

Select All Services from left hand pane, Search for data explorer and click the star icon.

Drag Azure Data Explorer Clusters to the top of the Favorite menu.

Select Azure Database Explorer from Favorite menu and select the pre-deployed Kusto cluster.
Select Databases from the left-hand menu, under Data , and then select +Add Database .
In the Azure Data Explorer Database window configure using the following details and select Create:
- Database name: _adxdb
- Retention period (cold compressed data (Azure Blob Storage) : 365
- Cache period(hot compressed data (SSD)): 31
In Databases, select your new alias_adxdb database and Select Query

In the Web UI, create a table

// Create a table
.create table SampleTable
(Timestamp:datetime, ApiVersion:string, User:string, RawHeader:dynamic)

Ingestion

Create table mapping

// Create a Json ingestion mapping
.create table SampleTable ingestion json mapping 
"Mapping01" '[{"column":"Timestamp","path":"$.header.time"},{"column":"ApiVersion","path":"$.header.api_version"},{"column":"RawHeader","path":"$.header"},{"column":"User","path":"$.payload.user"}]'

View ingestion mapping

// View ingestion mappings
.show table SampleTable ingestion json mappings

Ingestion from public blob

// Ingest from public blob
.ingest into table SampleTable
@'https://westuskustopublic.blob.core.windows.net/public/SampleData-500-4394582f-668f-4d03-8bba-58f87a7e48a0.json' with (jsonMappingReference = "Mapping01")

Note: Should follow the query syntax.

Exploration

Questions

If you skipped the previous step(s), please click on database: text.

How many lines were ingested?
Add calculated column of Transaction Id from column RawHeader: attribute Id
Take a 10 row sample of RawHeader
How many records were ingested from version 1 and 2?
Create a time chart with 10 minute bins of RawHeader['time']
Drop table SampleTable
Run.show queries

Kusto Query Language (KQL)

| count Counts records in input table (e.g. T)
| take 10 Get few records to become familiar with the data. No order ensured.
| where Timestamp > ago(1) and UserId = ‘abdcdef’ Filters on specific fields
| project Col1, Col2, … Select some columns (use if input table has many columns)
| extend NewCol1=Col1+Col2 Introduces new calculated columns
| render timechart Plots the data (in KE and KWE) while exploring
| summarize count(), dcount(Id) by Col1, Col2 Analytics: aggregations
| top 10 by count_ desc Finds the needle in the haystack
| join (…) on Key1, Key2 Joins data sets
| mvexpand Col1,Col2 … Turns dynamic arrays to rows (multi-value expansion)
| parse Col1 with … Deals with unstructured data

Results

SampleTable | count
SampleTable | extend TransactionId = RawHeader.id
SampleTable | extend TransactionId = RawHeader.id
| take 10
Option 1

a. SampleTable

| extend recordversion = tostring(RawHeader.api_version)
| summarize count() by recordversion

b. SampleTable

| where tostring(RawHeader.api_version) has "1"
SampleTable
| extend x = todatetime(RawHeader['time'])
| summarize count() by bin(x, 10m)
| render timechart
Drop table
// Drop the table
.drop table SampleTable
Show queries
.show queries

Self-Study

Kusto Query Language (KQL)

We’ll use GitHub public data to query using Azure Data Explorer (Kusto) and visualize using Power BI.

Open the Browser in the Virtual Machine and connect with the temporary Lab user credentials.

Open https://dataexplorer.azure.com/clusters/demo12.westus/databases/GitHub

Cluster URL: http://demo12.westus.kusto.windows.net
Database: GitHub

please email tzgitlin@microsoft.com to get temporary permission on GitHub cluster

Questions to try out (answer at the bottom)

a. Questions to try out

What was the date yesterday?
How many events were in the last 600 days?
Take a sample of 10 rows of your data?
What is the number of Repos overall?
What is the number of unique Repos values?
What is the number of unique Repos names?
How many Torvalds are there? How many events did they produce?
What are the top 10 most watched Repos?
(**) Plot the history of all of the events for the past 2 years for Repos from #9.

Power BI

Power BI is used to visualize the data. Note that Power BI is a visualization tool with data size limitations. Default: 500,000 records and 700MB data size.

Connect to Help cluster

Connect with the Azure Credentials from Environment Details tab.

Open Power BI desktop, select Get Data, and More… Type Data Explorer in the search box.

Select Azure Data Explorer (Kusto) and Connect

Enter the following properties (leave all other fields empty) and then select OK
Cluster: Help
Database: Samples
Table name or Azure Data Explorer query: StormEvents
Data Connectivity mode: Import

Expand the Samples database and select StormEvents. If the table looks ok, select Load. To make changes, select Edit.

The new StormEvents table was added to the Power BI report.

Create a Power BI report

Create a line chart with the total number of events, by putting “Start Time” in the Axis box (not in Date Hierarchy mode) and EventId in the Values box.

Add a Map tile by putting “BeginLat” in the Latitude box and putting “BeginLon” in the Longitude box.

Create a Clustered column chart by putting “Event Type” in the Axis box and (count) “Event Id” in the value box.

Create 4 separate card tiles with “DeathDirect”, “DeathIndirect”, “InjuriesDirect” and “InjuriesIndirect in the Fields box.

Create a pie chart of reporting sources by putting the “Source” in the legend box and putting the (count) “EventId” in the values box.

Now arrange the tiles on the canvas and you’re ready to slice and dice.

Power BI Connectors

Native Connector for Power BI
- Native Connector -> data explorer -> Connect -> Preview Feature (accept) continue.
- Cluster: demo12.westus
- Database: GitHub
- Table: GithubEvent
- Import -> load data in advanced
- Seamless browsing experience
- Data size limitation
- (Click) Direct Query ->load data per request
- Load per request
- Longer response time

Sign-in -> connect
Data sample -> load
Drag ID from the Fields on the right side of the screen
Drag CreatedAt
Drag CreatedAt into ID square

Blank Query for Power BI
- Get Data -> Blank Query
- Kusto Explorer -> Tools -> Query to Power BI (Query & PBI adaptor)
- Connect - Organization account -> use your account
- Click on -> Advanced editor -> Delete everything -> Paste everything
MS-TDS (SQL) client for Power BI
- (ODBS Connector) End Point: Azure -> Azure SQL database
  - Kusto Cluster as destination https://docs.microsoft.com/en-us/azure/data-explorer/power-bi-sql-query

KQL–Results

Open: https://dataexplorer.azure.com/clusters/demo12.westus/databases/GitHub

Cluster URL: http://demo12.westus.kusto.windows.net
Database: GitHub

How many events are there in Repos that have ‘Azure’ in their name?

// 1. What is count of events in Repos that have ‘Azure’ word in their name?
GithubEvent
| where Repo has 'Azure'
| count

What is the total number of Repos?

// 2. What is the amount of the Repos overall?
GithubEvent
| summarize dcount(tostring(Repo))

What are the top 10 most watched Repos?

// 3. What are the top 10 most watched Repos?
GithubEvent
| where Type == 'WatchEvent'
| summarize count() by tostring(Repo.name)
| top 10 by count_

Plot the history of all events for Repos which are the answer for #3.

// 4. Plot the history of all events for the past 2 years for Repos coming out #3.
let repos = GithubEvent
| where Type == 'WatchEvent'
| summarize count() by name=tostring(Repo.name)
| top 10 by count_
| project name;
GithubEvent  
| where Repo.name in (repos)
| extend repo = tostring(Repo.name)
| summarize count() by bin(CreatedAt, 1d), repo
| render timechart

Show the top 10 repos by WatchEvent, along with their WatchEvent count and their total events count (hint: use join)

// 5. Show top 10 repos with most Watch Event and their total count of events (hint: use join)
GithubEvent
| where Type == 'WatchEvent'
| summarize WatchCounts = count() by name=tostring(Repo.name)
| top 10 by WatchCounts 
| join hint.strategy = broadcast  
(
    GithubEvent 
    | extend repo = tostring(Repo.name)
    | summarize TotalEvents=count() by repo
) on $left.name == $right.repo
| project repo, TotalEvents, WatchCounts 
| order by TotalEvents

Provide feedback

Saved searches

Use saved searches to filter your results more quickly