Skip to content

Commit

Permalink
[cosmosdb] Upgrade SDK to v4 (#1449)
Browse files Browse the repository at this point in the history
* updated to latest recommended java sdk
* cleaned up binding implementation based on latest sdk
* added a default user agent string of 'azurecosmos-ycsb'

Co-authored-by: Andrew Feldman <Andrew.Feldman@microsoft.com>
  • Loading branch information
armaansood and anfeldma-ms committed Dec 5, 2020
1 parent 5dccbb0 commit 261cc78
Show file tree
Hide file tree
Showing 6 changed files with 480 additions and 350 deletions.
1 change: 1 addition & 0 deletions azurecosmos/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
/bin/
94 changes: 55 additions & 39 deletions azurecosmos/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,19 +16,19 @@ permissions and limitations under the License. See accompanying
LICENSE file.
-->

## Azure Cosmos Quick Start
## Azure Cosmos DB Quick Start

This section describes how to run YCSB on Azure Cosmos.
This section describes how to run YCSB on Azure Cosmos DB.

For more information on Azure Cosmos see
https://azure.microsoft.com/services/cosmos-db/
For more information on Azure Cosmos DB see
https://azure.microsoft.com/services/cosmos-db/.

### 1. Setup
This benchmark expects you to have pre-created the database "ycsb" and
collection "usertable" before running the benchmark commands. When
prompted for a Partition Key use id and for RUs select a value you
collection "usertable" before running the commands. When
prompted for a Partition Key, use "id". For RUs, select a value you
want to benchmark. [RUs are the measure of provisioned thoughput](https://docs.microsoft.com/azure/cosmos-db/request-units)
that Azure Cosmos defines. The higher the RUs the more throughput you will
that Azure Cosmos DB defines. The higher the RUs, the more throughput you will
get. You can override the default database name with the
azurecosmos.databaseName configuration value for side-by-side
benchmarking.
Expand All @@ -40,17 +40,15 @@ You must set the uri and the primaryKey in the azurecosmos.properties file in th
Optionally you can set the uri and primaryKey as follows:
$YCSB_HOME/bin/ycsb load azurecosmos -P workloads/workloada -p azurecosmos.primaryKey=<key from the portal> -p azurecosmos.uri=<uri from the portal>

### 2. DocumenDB Configuration Parameters
### 2. Cosmos DB Configuration Parameters

#### Required parameters

- azurecosmos.uri < uri string > :
- Obtained from the portal and gives a path to your azurecosmos database
account. It will look like the following:
https://<your account name>.documents.azure.com:443/
- Path to your Azure Cosmos DB account and can be obtained from the portal. It will look like the following: https://<your account name>.documents.azure.com:443/

- azurecosmos.primaryKey < key string > :
- Obtained from the portal and is the key to use for benchmarking. The
- Obtained from the portal. The
primary key is used to allow both read & write operations. If you are
doing read only workloads you can substitute the readonly key from the
portal.
Expand All @@ -66,41 +64,59 @@ Optionally you can set the uri and primaryKey as follows:
false and a document already exists the insert will fail.
- Default: false

- azurecosmos.connectionMode (DirectHttps | Gateway):
- Some java operations only work when connecting via the gateway. However
the best performance for basic operations like those used by YCSB are
obtained by using direct more where the client will connect directly to the
master server thats is managing the database and collection.
- Default: DirectHttps
- azurecosmos.includeExceptionStackInLog (true | false):
- Determines if the full stack should be included in the log when an error happens.
- The default is false to reduce output size.
- Default: false

- azurecosmos.userAgent < agent string >:
- The value to be appended to the user-agent header.
- In most cases, you should leave this as "azurecosmos-ycsb".
- Default: "azurecosmos-ycsb"

- azurecosmos.useGateway (true | false):
- Specify if connection mode should use gateway as opposed to direct. By default, direct mode will be used, as the performance is generally better.
- Default: false

- azurecosmos.consistencyLevel (Strong | BoundedStaleness | Session | Eventual):
- This setting defined the level on consistency you want for reads/scans
following inserts/updates.
- Default: Session
- azurecosmos.consistencyLevel (STRONG | BOUNDED_STALENESS | SESSION | CONSISTENT_PREFIX | EVENTUAL):
- If not specified, session level will be used by default.
- Default: SESSION

- azurecosmos.maxRetryAttemptsOnThrottledRequests < integer >
- Sets the maximum number of retry attempts for throttled requests
- Set the maximum number of retries in the case where the request fails due to rate limiting.
- Default: uses default value of azurecosmos Java SDK

- azurecosmos.maxRetryWaitTimeInSeconds < integer >
- Sets the maximum timeout to for retry in seconds
- Sets the maximum timeout to for retry in seconds.
- Default: uses default value of azurecosmos Java SDK

- azurecosmos.maxDegreeOfParallelismForQuery < integer >
- Sets the maximum degree of parallelism for the FeedOptions used in Query operation

- azurecosmos.gatewayMaxConnectionPoolSize < integer >
- Set the value of the connection pool size in gateway mode.

- azurecosmos.directMaxConnectionsPerEndpoint < integer >
- Set the value of the max connections per endpoint in direct mode.

- azurecosmos.gatewayIdleConnectionTimeoutInSeconds < integer >
- Sets the value of the timeout in seconds for an idle connection in gateway mode. After that time, the connection will be automatically closed.
- Default: uses default value of azurecosmos Java SDK

- azurecosmos.directIdleConnectionTimeoutInSeconds < integer >
- Sets the value of the timeout in seconds for an idle connection in direct mode. After that time, the connection will be automatically closed.
- Default: uses default value of azurecosmos Java SDK


- azurecosmos.maxDegreeOfParallelism < integer >
- Sets the number of concurrent operations run client side during parallel query execution.
- Default: -1

- azurecosmos.maxBufferedItemCount < integer >
- Sets the maximum number of items that can be buffered client side during parallel query execution.
- Default: 0

- azurecosmos.preferredPageSize < integer >
- Sets the preferred page size when scanning.
- Default: -1

- azurecosmos.includeExceptionStackInLog (true | false):
- Determines if the full stack when and error happens should be included in the log.
The default is false to reduce a lot of log spew.

- azurecosmos.maxConnectionPoolSize < integer >
- This is the number of connections maintained for operations.
- See the JAVA SDK documentation for ConnectionPolicy.getMaxPoolSize

- azurecosmos.idleConnectionTimeout < integer >
- This value is in seconds and determines how quickly a connection is recycled.
- See the JAVA SDK documentation for ConnectionPolicy.setIdleConnectionTimeout.

These parameters are also defined in a template configuration file in the
following location:
Expand All @@ -109,4 +125,4 @@ following location:
### 3. FAQs

### 4. Example command
./bin/ycsb run azurecosmos -s -P workloads/workloadb -p azurecosmos.primaryKey=<your key eg:45fgt...==> -p azurecosmos.uri=https://<your account>.documents.azure.com:443/ -p recordcount=100 -p operationcount=100
./bin/ycsb run azurecosmos -P workloads/workloadc -p azurecosmos.primaryKey=<your key eg:45fgt...==> -p azurecosmos.uri=https://<your account>.documents.azure.com:443/ -p recordcount=100 -p operationcount=100
90 changes: 59 additions & 31 deletions azurecosmos/conf/azurecosmos.properties
Original file line number Diff line number Diff line change
Expand Up @@ -13,44 +13,72 @@
# permissions and limitations under the License. See accompanying
# LICENSE file.

# Azure Cosmos host uri (ex: https://p3rf.documents.azure.com:443/) and primary key
#azurecosmos.primaryKey =
#azurecosmos.uri =
# See https://docs.microsoft.com/en-us/azure/cosmos-db/performance-tips-java-sdk-v4-sql for details on some of the options below.

# Databse to be used, if not specified 'ycsb' will be used
#azurecosmos.databaseName = ycsb
# Azure Cosmos DB host uri (ex: https://p3rf.documents.azure.com:443/) and primary key.
# azurecosmos.primaryKey =
# azurecosmos.uri =

# Enable/disable the use of single collection, if not specified a single collection will be used by default
# "true" or "false"
#azurecosmos.useSinglePartitionCollection = true
# Database to be used, if not specified 'ycsb' will be used.
# azurecosmos.databaseName = ycsb

# Specify if upsert should be used instead of createDocument
# If not specified, createDocument will be used by default
# Set to true to allow inserts to update existing documents.
# If this is false and a document already exists, the insert will fail.
# "true" or "false"
#azurecosmos.useUpsert = false
# azurecosmos.useUpsert = false

# Determines if the full stack should be included in the log when an error happens.
# The default is false to reduce output size.
# azurecosmos.includeExceptionStackInLog = false

# The value to be appended to the user-agent header.
# In most cases, you should leave this as "azurecosmos-ycsb".
# azurecosmos.userAgent = azurecosmos-ycsb

# CONNECTION OPTIONS

# Specify if connection mode should use gateway as opposed to direct.
# By default, direct mode will be used, as the performance is generally better.
# Value can be true or false.
# azurecosmos.useGateway = false

# Specify consistency level.
# If not specified, session level will be used by default.
# Values can be STRONG, BOUNDED_STALENESS, SESSION, CONSISTENT_PREFIX, or EVENTUAL.
# azurecosmos.consistencyLevel = SESSION

# Set the maximum number of retries in the case where the request fails due to rate limiting.
# If not specified, default values will be used.
# azurecosmos.maxRetryAttemptsOnThrottledRequests = 0

# Set the maximum retry duration in seconds.
# azurecosmos.maxRetryWaitTimeInSeconds = 30

# Set the value of the connection pool size in gateway mode.
# azurecosmos.gatewayMaxConnectionPoolSize = 30

# Set the value of the max connections per endpoint in direct mode.
# azurecosmos.directMaxConnectionsPerEndpoint = -1

# Sets the value of the timeout in seconds for an idle connection in gateway mode. After that time, the connection will be automatically closed.
# The default is 60 seconds.
# azurecosmos.gatewayIdleConnectionTimeoutInSeconds = 60

# Specify if connection policy should use gateway or not
# If not specified, direct connectivity with better performance will be used by default
# Value can be DirectHttps or Gateway.
#azurecosmos.connectionMode = DirectHttps
# Sets the value of the timeout in seconds for an idle connection in direct mode. After that time, the connection will be automatically closed.
# The default is 60 seconds.
# azurecosmos.directIdleConnectionTimeoutInSeconds = 60

# Specify consistency level, values can be Strong, BoundedStaleness, Session or Eventual
# If not specified, Session will be used by default
azurecosmos.consistencyLevel = Session

# Specify retry options to use in case of throttled request.
# If not specified, default values will be used
#azurecosmos.maxRetryAttemptsOnThrottledRequests = 9
#azurecosmos.maxRetryWaitTimeInSeconds = 30
# QUERY OPTIONS

# Specify if hash query should be used in SCAN operation instead of range query.
# If not specified, range query will be used by default.
#azurecosmos.useHashQueryForScan = true
# Sets the number of concurrent operations run client side during parallel query execution.
# Default value is 0.
# azurecosmos.maxDegreeOfParallelism = 0

# Specify if the 'id' property should be used in SCAN operation.
# If not specified, the 'docid' property will be used by default.
#azurecosmos.useIdPropertyForScan = true
# Sets the maximum number of items that can be buffered client side during parallel query execution.
# Default value is 0.
# azurecosmos.maxBufferedItemCount = 0

# Specify the maximum degree of parallelism for the FeedOptions used in Query operation.
# If not specified it will take 0 as the default value.
#azurecosmos.maxDegreeOfParallelismForQuery = 0
# Sets the preferred page size when scanning.
# Default value is -1.
# azurecosmos.preferredPageSize = -1
34 changes: 3 additions & 31 deletions azurecosmos/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -28,15 +28,11 @@ LICENSE file.
<artifactId>azurecosmos-binding</artifactId>
<name>Azure Cosmos Binding</name>
<packaging>jar</packaging>

<properties>
<checkstyle.failOnViolation>false</checkstyle.failOnViolation>
</properties>


<dependencies>
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-documentdb</artifactId>
<groupId>com.azure</groupId>
<artifactId>azure-cosmos</artifactId>
<version>${azurecosmos.version}</version>
</dependency>
<dependency>
Expand All @@ -61,28 +57,4 @@ LICENSE file.
<scope>provided</scope>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-checkstyle-plugin</artifactId>
<version>2.15</version>
<configuration>
<consoleOutput>true</consoleOutput>
<configLocation>../checkstyle.xml</configLocation>
<failOnViolation>true</failOnViolation>
<failsOnError>true</failsOnError>
</configuration>
<executions>
<execution>
<id>validate</id>
<phase>validate</phase>
<goals>
<goal>checkstyle</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Loading

0 comments on commit 261cc78

Please sign in to comment.