Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Share throughput over containers #123

Open
mabvanaartrijk opened this issue Oct 29, 2021 · 11 comments
Open

Share throughput over containers #123

mabvanaartrijk opened this issue Oct 29, 2021 · 11 comments
Labels
enhancement New feature or request help wanted Extra attention is needed up-for-grabs 🙏🏽 Happy to consider a pull review to address this issue

Comments

@mabvanaartrijk
Copy link

Is your feature request related to a problem? Please describe.
If ContainerPerItemType = true, the containers are created with their own throughput (autoscale or manual). However, I would like to share the total database throughput over the containers. This is a setting in Cosmos, but I can't get it to work.

Describe the solution you'd like
Add a setting, or make it possible to use manually created containers (see below).

Describe alternatives you've considered
I've tried to manually create the containers in the Data Explorer, and use those as Cosmos repositories, but this doesn't work.

Additional context
Thanks for this great library🙌

@mumby0168
Copy link
Collaborator

Hey, @mabvanaartrijk I like this idea, so what you are saying is rather than setting the throughput at each level of each container, you would like the throughput at the database level that all the containers share? (I do just need to check on how this works in terms of database level throughput not something I have done before)

Around the issue with not allowing manually created container's could you provide more information on this, please?

@mumby0168 mumby0168 added enhancement New feature or request question Further information is requested labels Nov 1, 2021
@mabvanaartrijk
Copy link
Author

Hi @mumby0168 . Yes, if you create a new container in the Data Explorer on Azure Portal you have the option to provision dedicated throughput for the container. So a RepositoryOptions__ShareDatabaseThroughput option would be really helpful.

Steps to reproduce the issue:

  1. Create a container in Data Explorer with the option Provision dedicated throughput... off.
  2. Add a CosmosRepository, for example:

builder.Services.AddCosmosRepository(options => { options.CosmosConnectionString = "CosmosConnectionString"; options.ContainerPerItemType = true; options.DatabaseId = "database"; options.SyncAllContainerProperties = true; options.AllowBulkExecution = true; options.ContainerBuilder.Configure<TestItem>(containerOptions => containerOptions .WithContainer("Test") .WithPartitionKey("/testPk") );})

  1. Create an item using await _repository.CreateAsync(testItem)
  2. This fails with the message: Response status code does not indicate success: NotFound (404); Substatus: 0; ActivityId: ; Reason: (Throughput is not configured for Test);

Thanks!

@mumby0168
Copy link
Collaborator

Hi @mumby0168 . Yes, if you create a new container in the Data Explorer on Azure Portal you have the option to provision dedicated throughput for the container. So a RepositoryOptions__ShareDatabaseThroughput option would be really helpful.

Steps to reproduce the issue:

  1. Create a container in Data Explorer with the option Provision dedicated throughput... off.
  2. Add a CosmosRepository, for example:

builder.Services.AddCosmosRepository(options => { options.CosmosConnectionString = "CosmosConnectionString"; options.ContainerPerItemType = true; options.DatabaseId = "database"; options.SyncAllContainerProperties = true; options.AllowBulkExecution = true; options.ContainerBuilder.Configure<TestItem>(containerOptions => containerOptions .WithContainer("Test") .WithPartitionKey("/testPk") );})

  1. Create an item using await _repository.CreateAsync(testItem)
  2. This fails with the message: Response status code does not indicate success: NotFound (404); Substatus: 0; ActivityId: ; Reason: (Throughput is not configured for Test);

Thanks!

Hey @mabvanaartrijk thanks for the detailed response, ah okay I see what is happening here.

So the option you have set options.SyncAllContainerProperties = true; will be trying to sync all containers with the default throughput value of 400 RU/s and a manual throughput as the properties.

Is the desired behaviour you would expect to be that if you have configured the setting RepositoryOptions__ShareDatabaseThroughput that then the database would be synced effectively and not the containers?

@mumby0168 mumby0168 added the bug Something isn't working label Nov 1, 2021
@mabvanaartrijk
Copy link
Author

Thanks @mumby0168! I've misinterpreted the SyncAllContainerProperties setting. It works perfectly with this setting to false.

So to recap: manually creating the collections in Data Explorer and setting SyncAllContainerProperties to false works. It would be a nice feature if you can set RepositoryOptions__ShareDatabaseThroughput and use a shared database throughput setting for all underling containers, that are created automatically if they don't exist.

@mumby0168
Copy link
Collaborator

Thanks @mumby0168! I've misinterpreted the SyncAllContainerProperties setting. It works perfectly with this setting to false.

So to recap: manually creating the collections in Data Explorer and setting SyncAllContainerProperties to false works. It would be a nice feature if you can set RepositoryOptions__ShareDatabaseThroughput and use a shared database throughput setting for all underling containers, that are created automatically if they don't exist.

Great news @mabvanaartrijk yeah it certainly could be would this be something you'd like to look to implement? We are happy to accept any PR's

@mabvanaartrijk
Copy link
Author

I'll discuss with my team if we can add it to one of our next sprints👍

@mumby0168
Copy link
Collaborator

I'll discuss with my team if we can add it to one of our next sprints👍

Great look forward to hearing from you 👍

@mumby0168 mumby0168 added help wanted Extra attention is needed up-for-grabs 🙏🏽 Happy to consider a pull review to address this issue and removed bug Something isn't working question Further information is requested labels Nov 2, 2021
@totalitydevelopment
Copy link

Hi @mumby0168 @mabvanaartrijk @IEvangelist

I've been looking at exactly this need today.

Currently all our containers are getting their own throughput which is looking like an expensive hobby $!

Database level throughput would be great for test environments in particular but likely also prod until we know what the userbase will be like as cost is a consideration for my client.

I think possibly more than a bool might be a nice option.

Initially I was looking to populate using ARM templates but if we could get it into the repo that would be a much better option for us.

ARM examples are as per:

@description('Throughput value when using Manual Throughput Policy for the container')
@minValue(400)
@maxValue(1000000)
param manualProvisionedThroughput int = 400

@description('Maximum throughput when using Autoscale Throughput Policy for the container')
@minValue(1000)
@maxValue(1000000)
param autoscaleMaxThroughput int = 4000

@description('The throughput policy for the container')
@allowed([
  'Manual'
  'Autoscale'
])
param throughputPolicy string = 'Manual'

var throughput_Policy = {
  Manual: {
    throughput: manualProvisionedThroughput
  }
  Autoscale: {
    autoscaleSettings: {
      maxThroughput: autoscaleMaxThroughput
    }
  }
}

CosmosClient.CreateDatabaseIfNotExistsAsync has an overload to accept ThroughputProperties.

namespace Microsoft.Azure.Cosmos
{
    //
    // Summary:
    //     Represents a throughput of the resources in the Azure Cosmos DB service. It is
    //     the standard pricing for the resource in the Azure Cosmos DB service.
    //
    // Remarks:
    //     It contains provisioned container throughput in measurement of request units
    //     per second in the Azure Cosmos service.
    public class ThroughputProperties
    {
        //
        // Summary:
        //     Gets the entity tag associated with the resource from the Azure Cosmos DB service.
        //
        // Value:
        //     The entity tag associated with the resource.
        //
        // Remarks:
        //     ETags are used for concurrency checking when updating resources.
        [JsonProperty(PropertyName = "_etag", NullValueHandling = NullValueHandling.Ignore)]
        public string ETag { get; }
        //
        // Summary:
        //     Gets the last modified time stamp associated with Microsoft.Azure.Cosmos.DatabaseProperties
        //     from the Azure Cosmos DB service.
        //
        // Value:
        //     The last modified time stamp associated with the resource.
        [JsonConverter(typeof(UnixDateTimeConverter))]
        [JsonProperty(PropertyName = "_ts", NullValueHandling = NullValueHandling.Ignore)]
        public DateTime? LastModified { get; }
        //
        // Summary:
        //     Gets the provisioned throughput for a resource in measurement of request units
        //     per second in the Azure Cosmos service.
        [JsonIgnore]
        public int? Throughput { get; }
        //
        // Summary:
        //     The maximum throughput the autoscale will scale to.
        [JsonIgnore]
        public int? AutoscaleMaxThroughput { get; }
        //
        // Summary:
        //     Gets the self-link associated with the resource from the Azure Cosmos DB service.
        //
        // Value:
        //     The self-link associated with the resource.
        //
        // Remarks:
        //     A self-link is a static addressable Uri for each resource within a database account
        //     and follows the Azure Cosmos DB resource model. E.g. a self-link for a document
        //     could be dbs/db_resourceid/colls/coll_resourceid/documents/doc_resourceid
        [JsonProperty(PropertyName = "_self", NullValueHandling = NullValueHandling.Ignore)]
        public string SelfLink { get; }

        //
        // Summary:
        //     The Throughput properties for autoscale provisioned throughput offering
        //
        // Parameters:
        //   autoscaleMaxThroughput:
        //     The maximum throughput the resource can scale to.
        //
        // Returns:
        //     Returns a ThroughputProperties for autoscale provisioned throughput
        public static ThroughputProperties CreateAutoscaleThroughput(int autoscaleMaxThroughput);
        //
        // Summary:
        //     The Throughput properties for manual provisioned throughput offering
        //
        // Parameters:
        //   throughput:
        //     The current provisioned throughput for the resource.
        //
        // Returns:
        //     Returns a ThroughputProperties for manual throughput
        public static ThroughputProperties CreateManualThroughput(int throughput);
    }
}

Looking at:
Microsoft.Azure.CosmosRepository.Services.DefaultCosmosContainerService

I wondered if that might be the spot...

public async Task<Container> GetContainerAsync<TItem>(bool forceContainerSync = false) where TItem : IItem
        {
            try
            {
                ItemOptions itemOptions = _cosmosItemConfigurationProvider.GetOptions<TItem>();

                Database database =
                    await _cosmosClientProvider.UseClientAsync(
                        client => client.CreateDatabaseIfNotExistsAsync(_options.DatabaseId, SOME THROUGHPUT PROPERTY OVERRIDE OBJECT GOES HERE)).ConfigureAwait(false);

What do we think? I'm struggling for time myself right now as I'll be away for a bit soon but wanted to chip into the discussion with some thoughts.

@IEvangelist
Copy link
Owner

Hi @mabvanaartrijk - sorry for not replying earlier. Yes, that looks like the correct place.

@hades200082
Copy link

I would say that in the absence of config to the contrary you should default to shared throughput.

@kalyankdas
Copy link

Any word on the adding the feature for RepositoryOptions__ShareDatabaseThroughput?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed up-for-grabs 🙏🏽 Happy to consider a pull review to address this issue
Projects
None yet
Development

No branches or pull requests

6 participants