Skip to content

Commit

Permalink
Merge pull request #7119 from carlosasantos/update_batch_scoring_python
Browse files Browse the repository at this point in the history
Update batch scoring with python architecture
  • Loading branch information
PMEds28 committed Sep 15, 2022
2 parents cb5575a + 39407d2 commit c2d7894
Show file tree
Hide file tree
Showing 4 changed files with 39 additions and 32 deletions.
Binary file modified docs/browse/thumbs/batch-scoring-python.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/reference-architectures/ai/_images/batch-scoring-python.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
65 changes: 36 additions & 29 deletions docs/reference-architectures/ai/batch-scoring-python-content.md
@@ -1,49 +1,46 @@
This reference architecture shows how to build a scalable solution for batch scoring many models in parallel using [Azure Machine Learning][amls]. The solution can be used as a template and can generalize to different problems.

A reference implementation for this architecture is available on [GitHub][github].
This architecture guide shows how to build a scalable solution for batch scoring models [Azure Machine Learning][amls]. The solution can be used as a template and can generalize to different problems.

## Architecture

![Diagram that shows the batch scoring of Python models on Azure.](./_images/batch-scoring-python.png)

### Workflow

This architecture consists of the following components:
This architecture guide is applicable for both streaming and static data, provided that the ingestion process is adapted to the data type. The following steps and components describe the ingestion of these two types of data.

**Streaming data:**

[Azure Event Hubs][event-hubs]. This message ingestion service can ingest millions of event messages per second. In this architecture, sensors send a stream of data to the event hub.
1. Streaming data originates from IoT Sensors, where new events are streamed at frequent intervals.
2. Incoming streaming events are queued using Azure Event Hubs, and then pre-processed using Azure Stream Analytics.
- [Azure Event Hubs][event-hubs]. This message ingestion service can ingest millions of event messages per second. In this architecture, sensors send a stream of data to the event hub.
- [Azure Stream Analytics][stream-analytics]. An event-processing engine. A Stream Analytics job reads the data streams from the event hub and performs stream processing.

[Azure Stream Analytics][stream-analytics]. An event-processing engine. A Stream Analytics job reads the data streams from the event hub and performs stream processing.
**Static data:**

[Azure SQL Database][sql-database]. Data from the sensor readings is loaded into SQL Database. SQL is a familiar way to store the processed, streamed data (which is tabular and structured), but you can also use other data stores.
3. Static datasets can be stored as files within [Azure Data Lake Storage][adls] or in tabular form in [Azure Synapse][synapse] or [Azure SQL Database][sql].
4. [Azure Data Factory][adf] can be used to aggregate or pre-process the stored dataset.

[Azure Machine Learning][amls]. Azure Machine Learning is a cloud service for training, deploying, and managing machine learning models at scale. In the context of batch scoring, Azure Machine Learning creates a cluster of virtual machines with an automatic scaling option, where each node in the cluster runs a scoring job for a specific sensor. The scoring jobs are executed in parallel as steps of Python scripts that are queued and managed by the service. These steps are part of a Machine Learning pipeline that is created, published, and scheduled to run on a predefined interval of time.
The remaining architecture, after data ingestion, is equal for both streaming and static data, and consists of the following steps and components:

[Azure Blob Storage][storage]. Blob containers are used to store the pretrained models, the data, and the output predictions. The models are uploaded to Blob Storage in the [01_create_resources.ipynb][create-resources] notebook. These [one-class SVM][one-class-svm] models are trained on data that represents values of different sensors for different devices. This solution assumes that the data values are aggregated over a fixed interval of time.
5. The ingested, aggregated and/or pre-processed data can be stored as documents within [Azure Data Lake Storage][adls] or in tabular form in [Azure Synapse][synapse] or [Azure SQL Database][sql]. This data will then be consumed by Azure Machine Learning.
6. [Azure Machine Learning][amls] is used for training, deploying, and managing machine learning models at scale. In the context of batch scoring, Azure Machine Learning creates a cluster of virtual machines with an automatic scaling option, where jobs are executed in parallel as of Python scripts.
7. Models are deployed as [Managed Batch Endpoints][m-endpoints], which are then used to do batch inferencing on large volumes of data over a period of time. Batch endpoints receive pointers to data and run jobs asynchronously to process the data in parallel on compute clusters.
8. The inference results can be stored as documents within [Azure Data Lake Storage][adls] or in tabular form in [Azure Synapse][synapse] or [Azure SQL Database][sql].
9. Visualize: The stored model results can be consumed through user interfaces, such as Power BI dashboards, or through custom-built web applications.

[Azure Container Registry][acr]. The scoring Python [script][pyscript] runs in Docker containers that are created on each node of the cluster, where it reads the relevant sensor data, generates predictions and stores them in Blob Storage.

### Components

- [Azure Event Hubs](https://azure.microsoft.com/services/event-hubs)
- [Azure Stream Analytics](https://azure.microsoft.com/services/stream-analytics)
- [Azure SQL Database](https://azure.microsoft.com/products/azure-sql/database)
- [Azure Synapse Analytics](https://azure.microsoft.com/services/synapse-analytics/)
- [Azure Data Lake Storage](https://azure.microsoft.com/services/storage/data-lake-storage/)
- [Azure Data Factory](https://azure.microsoft.com/services/data-factory/)
- [Azure Machine Learning](https://azure.microsoft.com/services/machine-learning)
- [Azure Blob Storage](https://azure.microsoft.com/services/storage/blobs)
- [Azure Container Registry](https://azure.microsoft.com/services/container-registry)

## Solution details

This solution monitors the operation of a large number of devices in an IoT setting, where each device sends sensor readings continuously. Each device is associated with pretrained anomaly detection models (one per sensor) to predict whether a series of measurements, which are aggregated over a predefined time interval, correspond to an anomaly or not. In real-world scenarios, this could be a stream of sensor readings that need to be filtered and aggregated before being used in training or real-time scoring. For simplicity, this solution uses the same data file when executing scoring jobs.

### Potential use cases

This reference architecture is designed for scoring scenarios that are triggered on a schedule. Processing involves the following steps:

1. Send sensor readings for ingestion to Azure Event Hubs.
2. Perform stream processing and store the raw data.
3. Send the data to a Machine Learning cluster that's ready to start taking work. Each node in the cluster runs a scoring job for a specific sensor.
4. Execute the [scoring pipeline][batch-scoring], which runs the scoring jobs in parallel using machine learning Python scripts. The pipeline is created, published, and scheduled to run on a predefined interval of time.
5. Generate predictions and store them in Blob Storage for later consumption.
- [Azure Machine Learning Endpoints](https://docs.microsoft.com/azure/machine-learning/concept-endpoints)
- [Microsoft Power BI on Azure](https://azure.microsoft.com/services/developer-tools/power-bi/)
- [Azure Web Apps](https://azure.microsoft.com/services/app-service/web/)

## Considerations

Expand Down Expand Up @@ -73,15 +70,20 @@ For convenience in this scenario, one scoring task is submitted within a single

Cost optimization is about looking at ways to reduce unnecessary expenses and improve operational efficiencies. For more information, see [Overview of the cost optimization pillar](/azure/architecture/framework/cost/overview).

The most expensive components used in this reference architecture are the compute resources. The compute cluster size scales up and down depending on the jobs in the queue. Enable automatic scaling programmatically through the [Python SDK][python-sdk] by modifying the compute's provisioning configuration. Or, use the [Azure CLI][cli] to set the automatic scaling parameters of the cluster.
The most expensive components used in this architecture guide are the compute resources. The compute cluster size scales up and down depending on the jobs in the queue. Enable automatic scaling programmatically through the [Python SDK][python-sdk] by modifying the compute's provisioning configuration. Or, use the [Azure CLI][cli] to set the automatic scaling parameters of the cluster.

For work that doesn't require immediate processing, configure the automatic scaling formula so the default state (minimum) is a cluster of zero nodes. With this configuration, the cluster starts with zero nodes and only scales up when it detects jobs in the queue. If the batch scoring process happens only a few times a day or less, this setting enables significant cost savings.

Automatic scaling might not be appropriate for batch jobs that occur too close to each other. Because the time that it takes for a cluster to spin up and spin down incurs a cost, if a batch workload begins only a few minutes after the previous job ends, it might be more cost effective to keep the cluster running between jobs. This strategy depends on whether scoring processes are scheduled to run at a high frequency (every hour, for example), or less frequently (once a month, for example).

## Deploy this scenario

To deploy this reference architecture, follow the steps described in the [GitHub repo][github].
## Contributors

*This article is maintained by Microsoft. It was originally written by the following contributors.*

Principal author:

* [Carlos Alexandre Santos](https://www.linkedin.com/in/carlosafsantos) | Senior Specialized AI Cloud Solution Architect

## Next steps

Expand Down Expand Up @@ -131,3 +133,8 @@ Microsoft Learn modules:
[stream-analytics]: /azure/stream-analytics
[sql-database]: /azure/sql-database
[app-insights]: /azure/application-insights/app-insights-overview
[synapse]: https://azure.microsoft.com/services/synapse-analytics/
[adls]: https://azure.microsoft.com/services/storage/data-lake-storage/
[adf]: https://azure.microsoft.com/services/data-factory/
[m-endpoints]: https://docs.microsoft.com/azure/machine-learning/concept-endpoints
[sql]: https://azure.microsoft.com/products/azure-sql/database
6 changes: 3 additions & 3 deletions docs/reference-architectures/ai/batch-scoring-python.yml
Expand Up @@ -3,9 +3,9 @@ metadata:
title: Batch scoring of Python models on Azure
titleSuffix: Azure Reference Architectures
description: Build a scalable solution for batch scoring models on a schedule in parallel using Azure Machine Learning.
author: EdPrice-MSFT
ms.author: architectures
ms.date: 07/28/2022
author: carlosasantos
ms.author: cfernandesdo
ms.date: 09/13/2022
ms.topic: conceptual
ms.service: architecture-center
ms.subservice: reference-architecture
Expand Down

0 comments on commit c2d7894

Please sign in to comment.