Streaming at Scale with Azure Event Hubs and Stream Analytics
This sample uses Stream Analytics to process streaming data from EventHub and uses another Event Hub as a sink to store JSON data
This is the most performance way to analyze and stream data out of Stream Analytics.
The provided scripts will an end-to-end solution complete with load test client.
Running the Scripts
Please note that the scripts have been tested on Ubuntu 18 LTS, so make sure to use that environment to run the scripts. You can run it using Docker, WSL or a VM:
The following tools/languages are also needed:
Make sure you are logged into your Azure account:
and also make sure you have the subscription you want to use selected
az account list
if you want to select a specific subscription use the following command
az account set --subscription <subscription_name>
once you have selected the subscription you want to use just execute the following command
./create-solution.sh -d <solution_name>
solution_name value will be used to create a resource group that will contain all resources created by the script. It will also be used as a prefix for all resource create so, in order to help to avoid name duplicates that will break the script, you may want to generated a name using a unique prefix. Please also use only lowercase letters and numbers only, since the
solution_name is also used to create a storage account, which has several constraints on characters usage:
to have an overview of all the supported arguments just run
Note To make sure that name collisions will be unlikely, you should use a random string to give name to your solution. The following script will generated a 7 random lowercase letter name for you:
The script will create the following resources:
- Azure Container Instances to host Locust Load Test Clients: by default two Locust client will be created, generating a load of 1000 events/second
- Event Hubs Namespace, Hub and Consumer Group: to ingest data incoming from test clients
- Stream Analytics: to process analytics on streaming data
If you want to change some setting of the solution, like number of load test clients, event hubs TU and so on, you can do it right in the
create-solution.sh script, by changing any of these values:
export EVENTHUB_PARTITIONS=2 export EVENTHUB_CAPACITY=2 export PROC_STREAMING_UNITS=3 export TEST_CLIENTS=2
The above settings has been chosen to sustain a 1000 msg/sec stream.
Please use Metrics pane in Stream Analytics , see "Input/Output Events" for throughput and "Watermark Delay" metric to see if the job is keeping up with the input rate. You can also use Event Hub "Metrics" pane to see if there are any "Throttled Requests" and adjust the Threshold Units accordingly.
The deployed Stream Analytics solution doesn't do any analytics or projection , these will be added in a .
Data is available in the created Event Hub output.