title	description	author	ms.service	ms.subservice	ms.topic	ms.date	ms.author	ms.reviewer	ms.custom
Use the graph bulk executor .NET library with Azure Cosmos DB Gremlin API	Learn how to use the bulk executor library to massively import graph data into an Azure Cosmos DB Gremlin API container.	jasonwhowell	cosmos-db	cosmosdb-graph	how-to	05/28/2019	jasonh	sngun	devx-track-csharp

Using the graph bulk executor .NET library to perform bulk operations in Azure Cosmos DB Gremlin API

This tutorial provides instructions about using Azure CosmosDB's bulk executor .NET library to import and update graph objects into an Azure Cosmos DB Gremlin API container. This process makes use of the Graph class in the bulk executor library to create Vertex and Edge objects programmatically to then insert multiple of them per network request. This behavior is configurable through the bulk executor library to make optimal use of both database and local memory resources.

As opposed to sending Gremlin queries to a database, where the command is evaluated and then executed one at a time, using the bulk executor library will instead require to create and validate the objects locally. After creating the objects, the library allows you to send graph objects to the database service sequentially. Using this method, data ingestion speeds can be increased up to 100x, which makes it an ideal method for initial data migrations or periodical data movement operations. Learn more by visiting the GitHub page of the Azure Cosmos DB Graph bulk executor sample application.

Bulk operations with graph data

The bulk executor library contains a Microsoft.Azure.CosmosDB.BulkExecutor.Graph namespace to provide functionality for creating and importing graph objects.

The following process outlines how data migration can be used for a Gremlin API container:

Retrieve records from the data source.
Construct GremlinVertex and GremlinEdge objects from the obtained records and add them into an IEnumerable data structure. In this part of the application the logic to detect and add relationships should be implemented, in case the data source is not a graph database.
Use the Graph BulkImportAsync method to insert the graph objects into the collection.

This mechanism will improve the data migration efficiency as compared to using a Gremlin client. This improvement is experienced because inserting data with Gremlin will require the application send a query at a time that will need to be validated, evaluated, and then executed to create the data. The bulk executor library will handle the validation in the application and send multiple graph objects at a time for each network request.

Creating Vertices and Edges

GraphBulkExecutor provides the BulkImportAsync method that requires a IEnumerable list of GremlinVertex or GremlinEdge objects, both defined in the Microsoft.Azure.CosmosDB.BulkExecutor.Graph.Element namespace. In the sample, we separated the edges and vertices into two BulkExecutor import tasks. See the example below:

IBulkExecutor graphbulkExecutor = new GraphBulkExecutor(documentClient, targetCollection);

BulkImportResponse vResponse = null;
BulkImportResponse eResponse = null;

try
{
    // Import a list of GremlinVertex objects
    vResponse = await graphbulkExecutor.BulkImportAsync(
            Utils.GenerateVertices(numberOfDocumentsToGenerate),
            enableUpsert: true,
            disableAutomaticIdGeneration: true,
            maxConcurrencyPerPartitionKeyRange: null,
            maxInMemorySortingBatchSize: null,
            cancellationToken: token);

    // Import a list of GremlinEdge objects
    eResponse = await graphbulkExecutor.BulkImportAsync(
            Utils.GenerateEdges(numberOfDocumentsToGenerate),
            enableUpsert: true,
            disableAutomaticIdGeneration: true,
            maxConcurrencyPerPartitionKeyRange: null,
            maxInMemorySortingBatchSize: null,
            cancellationToken: token);
}
catch (DocumentClientException de)
{
    Trace.TraceError("Document client exception: {0}", de);
}
catch (Exception e)
{
    Trace.TraceError("Exception: {0}", e);
}

For more information on the parameters of the bulk executor library, refer to the BulkImportData to Azure Cosmos DB topic.

The payload needs to be instantiated into GremlinVertex and GremlinEdge objects. Here is how these objects can be created:

Vertices:

// Creating a vertex
GremlinVertex v = new GremlinVertex(
    "vertexId",
    "vertexLabel");

// Adding custom properties to the vertex
v.AddProperty("customProperty", "value");

// Partitioning keys must be specified for all vertices
v.AddProperty("partitioningKey", "value");

Edges:

// Creating an edge
GremlinEdge e = new GremlinEdge(
    "edgeId",
    "edgeLabel",
    "targetVertexId",
    "sourceVertexId",
    "targetVertexLabel",
    "sourceVertexLabel",
    "targetVertexPartitioningKey",
    "sourceVertexPartitioningKey");

// Adding custom properties to the edge
e.AddProperty("customProperty", "value");

Note

The bulk executor utility doesn't automatically check for existing Vertices before adding Edges. This needs to be validated in the application before running the BulkImport tasks.

Sample application

Prerequisites

Visual Studio 2019 with the Azure development workload. You can get started with the Visual Studio 2019 Community Edition for free.
An Azure subscription. You can create a free Azure account here. Alternatively, you can create a Cosmos database account with Try Azure Cosmos DB for free without an Azure subscription.
An Azure Cosmos DB Gremlin API database with an unlimited collection. This guide shows how to get started with Azure Cosmos DB Gremlin API in .NET.
Git. For more information check out the Git Downloads page.

Clone the sample application

In this tutorial, we'll follow through the steps for getting started by using the Azure Cosmos DB Graph bulk executor sample hosted on GitHub. This application consists of a .NET solution that randomly generates vertex and edge objects and then executes bulk insertions to the specified graph database account. To get the application, run the git clone command below:

git clone https://github.com/Azure-Samples/azure-cosmosdb-graph-bulkexecutor-dotnet-getting-started.git

This repository contains the GraphBulkExecutor sample with the following files:

File	Description
`App.config`	This is where the application and database-specific parameters are specified. This file should be modified first to connect to the destination database and collections.
`Program.cs`	This file contains the logic behind creating the `DocumentClient` collection, handling the cleanups and sending the bulk executor requests.
`Util.cs`	This file contains a helper class that contains the logic behind generating test data, and checking if the database and collections exist.

In the App.config file, the following are the configuration values that can be provided:

Setting	Description
`EndPointUrl`	This is your .NET SDK endpoint found in the Overview blade of your Azure Cosmos DB Gremlin API database account. This has the format of `https://your-graph-database-account.documents.azure.com:443/`
`AuthorizationKey`	This is the Primary or Secondary key listed under your Azure Cosmos DB account. Learn more about Securing Access to Azure Cosmos DB data
`DatabaseName`, `CollectionName`	These are the target database and collection names. When `ShouldCleanupOnStart` is set to `true` these values, along with `CollectionThroughput`, will be used to drop them and create a new database and collection. Similarly, if `ShouldCleanupOnFinish` is set to `true`, they will be used to delete the database as soon as the ingestion is over. Note that the target collection must be an unlimited collection.
`CollectionThroughput`	This is used to create a new collection if the `ShouldCleanupOnStart` option is set to `true`.
`ShouldCleanupOnStart`	This will drop the database account and collections before the program is run, and then create new ones with the `DatabaseName`, `CollectionName` and `CollectionThroughput` values.
`ShouldCleanupOnFinish`	This will drop the database account and collections with the specified `DatabaseName` and `CollectionName` after the program is run.
`NumberOfDocumentsToImport`	This will determine the number of test vertices and edges that will be generated in the sample. This number will apply to both vertices and edges.
`NumberOfBatches`	This will determine the number of test vertices and edges that will be generated in the sample. This number will apply to both vertices and edges.
`CollectionPartitionKey`	This will be used to create the test vertices and edges, where this property will be auto-assigned. This will also be used when re-creating the database and collections if the `ShouldCleanupOnStart` option is set to `true`.

Run the sample application

Add your specific database configuration parameters in App.config. This will be used to create a DocumentClient instance. If the database and container have not been created yet, they will be created automatically.
Run the application. This will call BulkImportAsync two times, one to import Vertices and one to import Edges. If any of the objects generates an error when they're inserted, they will be added to either .\BadVertices.txt or .\BadEdges.txt.
Evaluate the results by querying the graph database. If the ShouldCleanupOnFinish option is set to true, then the database will automatically be deleted.

Next steps

To learn about NuGet package details and release notes of bulk executor .NET library, see bulk executor SDK details.
Check out the Performance Tips to further optimize the usage of bulk executor.
Review the BulkExecutor.Graph Reference article for more details about the classes and methods defined in this namespace.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!