Skip to content

Latest commit

 

History

History
128 lines (75 loc) · 9.8 KB

walkthrough-using-batchblock-and-batchedjoinblock-to-improve-efficiency.md

File metadata and controls

128 lines (75 loc) · 9.8 KB
description title ms.date dev_langs helpviewer_keywords ms.topic
Learn more about: Walkthrough: Using BatchBlock and BatchedJoinBlock to Improve Efficiency
Walkthrough: Using BatchBlock and BatchedJoinBlock to Improve Efficiency
03/30/2017
csharp
vb
Task Parallel Library, dataflows
TPL dataflow library, improving efficiency
tutorial

Walkthrough: Using BatchBlock and BatchedJoinBlock to Improve Efficiency

The TPL Dataflow Library provides the xref:System.Threading.Tasks.Dataflow.BatchBlock%601?displayProperty=nameWithType and xref:System.Threading.Tasks.Dataflow.BatchedJoinBlock%602?displayProperty=nameWithType classes so that you can receive and buffer data from one or more sources and then propagate out that buffered data as one collection. This batching mechanism is useful when you collect data from one or more sources and then process multiple data elements as a batch. For example, consider an application that uses dataflow to insert records into a database. This operation can be more efficient if multiple items are inserted at the same time instead of one at a time sequentially. This document describes how to use the xref:System.Threading.Tasks.Dataflow.BatchBlock%601 class to improve the efficiency of such database insert operations. It also describes how to use the xref:System.Threading.Tasks.Dataflow.BatchedJoinBlock%602 class to capture both the results and any exceptions that occur when the program reads from a database.

[!INCLUDE tpl-install-instructions]

Prerequisites

  1. Read the Join Blocks section in the Dataflow document before you start this walkthrough.

  2. Ensure that you have a copy of the Northwind database, Northwind.sdf, available on your computer. This file is typically located in the folder %Program Files%\Microsoft SQL Server Compact Edition\v3.5\Samples\.

    [!IMPORTANT] In some versions of Windows, you cannot connect to Northwind.sdf if Visual Studio is running in a non-administrator mode. To connect to Northwind.sdf, start Visual Studio or a Developer Command Prompt for Visual Studio in the Run as administrator mode.

This walkthrough contains the following sections:

Creating the Console Application

  1. In Visual Studio, create a Visual C# or Visual Basic Console Application project. In this document, the project is named DataflowBatchDatabase.

  2. In your project, add a reference to System.Data.SqlServerCe.dll and a reference to System.Threading.Tasks.Dataflow.dll.

  3. Ensure that Form1.cs (Form1.vb for Visual Basic) contains the following using (Imports in Visual Basic) statements.

    [!code-csharpTPLDataflow_BatchDatabase#1] [!code-vbTPLDataflow_BatchDatabase#1]

  4. Add the following data members to the Program class.

    [!code-csharpTPLDataflow_BatchDatabase#2] [!code-vbTPLDataflow_BatchDatabase#2]

Defining the Employee Class

Add to the Program class the Employee class.

[!code-csharpTPLDataflow_BatchDatabase#3] [!code-vbTPLDataflow_BatchDatabase#3]

The Employee class contains three properties, EmployeeID, LastName, and FirstName. These properties correspond to the Employee ID, Last Name, and First Name columns in the Employees table in the Northwind database. For this demonstration, the Employee class also defines the Random method, which creates an Employee object that has random values for its properties.

Defining Employee Database Operations

Add to the Program class the InsertEmployees, GetEmployeeCount, and GetEmployeeID methods.

[!code-csharpTPLDataflow_BatchDatabase#4] [!code-vbTPLDataflow_BatchDatabase#4]

The InsertEmployees method adds new employee records to the database. The GetEmployeeCount method retrieves the number of entries in the Employees table. The GetEmployeeID method retrieves the identifier of the first employee that has the provided name. Each of these methods takes a connection string to the Northwind database and uses functionality in the System.Data.SqlServerCe namespace to communicate with the database.

Adding Employee Data to the Database Without Using Buffering

Add to the Program class the AddEmployees and PostRandomEmployees methods.

[!code-csharpTPLDataflow_BatchDatabase#5] [!code-vbTPLDataflow_BatchDatabase#5]

The AddEmployees method adds random employee data to the database by using dataflow. It creates an xref:System.Threading.Tasks.Dataflow.ActionBlock%601 object that calls the InsertEmployees method to add an employee entry to the database. The AddEmployees method then calls the PostRandomEmployees method to post multiple Employee objects to the xref:System.Threading.Tasks.Dataflow.ActionBlock%601 object. The AddEmployees method then waits for all insert operations to finish.

Using Buffering to Add Employee Data to the Database

Add to the Program class the AddEmployeesBatched method.

[!code-csharpTPLDataflow_BatchDatabase#6] [!code-vbTPLDataflow_BatchDatabase#6]

This method resembles AddEmployees, except that it also uses the xref:System.Threading.Tasks.Dataflow.BatchBlock%601 class to buffer multiple Employee objects before it sends those objects to the xref:System.Threading.Tasks.Dataflow.ActionBlock%601 object. Because the xref:System.Threading.Tasks.Dataflow.BatchBlock%601 class propagates out multiple elements as a collection, the xref:System.Threading.Tasks.Dataflow.ActionBlock%601 object is modified to act on an array of Employee objects. As in the AddEmployees method, AddEmployeesBatched calls the PostRandomEmployees method to post multiple Employee objects; however, AddEmployeesBatched posts these objects to the xref:System.Threading.Tasks.Dataflow.BatchBlock%601 object. The AddEmployeesBatched method also waits for all insert operations to finish.

Using Buffered Join to Read Employee Data from the Database

Add to the Program class the GetRandomEmployees method.

[!code-csharpTPLDataflow_BatchDatabase#7] [!code-vbTPLDataflow_BatchDatabase#7]

This method prints information about random employees to the console. It creates several random Employee objects and calls the GetEmployeeID method to retrieve the unique identifier for each object. Because the GetEmployeeID method throws an exception if there is no matching employee with the given first and last names, the GetRandomEmployees method uses the xref:System.Threading.Tasks.Dataflow.BatchedJoinBlock%602 class to store Employee objects for successful calls to GetEmployeeID and xref:System.Exception?displayProperty=nameWithType objects for calls that fail. The xref:System.Threading.Tasks.Dataflow.ActionBlock%601 object in this example acts on a xref:System.Tuple%602 object that holds a list of Employee objects and a list of xref:System.Exception objects. The xref:System.Threading.Tasks.Dataflow.BatchedJoinBlock%602 object propagates out this data when the sum of the received Employee and xref:System.Exception object counts equals the batch size.

The Complete Example

The following example shows the complete code. The Main method compares the time that is required to perform batched database insertions versus the time to perform non-batched database insertions. It also demonstrates the use of buffered join to read employee data from the database and also report errors.

[!code-csharpTPLDataflow_BatchDatabase#100] [!code-vbTPLDataflow_BatchDatabase#100]

See also