New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The committed block count cannot exceed the maximum limit of 50,000 blocks. #324
Comments
I think this is more related on how the component azure blob works, @omarsmak can you add some hint on this? |
By the way you could also use an aggregationStrategy for this, I'm preparing some documentation on how to do that and it will be available in the next release. |
@nidhijwt the issue here as I understood, is the amount of records being sent, for example for every kafka record, it will be an equivalent azure append record which may hit the 50,000 limit pretty fast, especially if you have noise records which not meant to be inserted (you can ignore these with SMTs). However, the way I see it, you will need somehow to aggregate these records and insert them as few batches as possible as you mentioned. From the top of my head, you have these options:
|
Thanks for your reply. Is the only thing pending is documentation or code is also not complete? I can see there are 2 aggregationStrategys present in code already StringAggregator and SimpleAggregator. I want to use them but couldn't find the way. I am using Camel connector with Kafka and in distributed mode with following properties
Now I am trying to find what property can be used to specify aggregator. Even StringAggregator will serve my purpose |
Try by adding
Where the size is the batch size before send to Azure and the timeout is the timeout needed in case you don't have enough records to complete the aggregation. This will concatenate the record with a space in between. You'll need to build the master and use the generated connector. This is not released yet. |
@oscerd Thanks for your answer, I built the latest code as suggested but this is not helping in aggregating data. Even with this, it is writing 1 record per block. Looks like inserting a collection is not supported with Azure blob. |
You're doing something wrong. |
It cannot be possible, if data are sent continuosly you should at least see more concatenated record in the block, not 1 record. So I don't know how you build and what you did, but I don't think you're using the latest version, also, please use the correct properties and not the camel.sink.url with the whole endpoint. You need to use the other approach of separated options provided, to make the camel.beans stuff works, I believe. |
@oscerd shouldn't be this class |
Yes, my bad. Anyway I believe in this case avoiding using camel.sink.url will help. But the correct aggregator it's the one reported by @omarsmak |
Thanks, guys, Another error in the above configuration I see is "camel.bean.aggregate" it should be "camel.beans.aggregate". I did above as per suggestions but it is still not working. Then I checked logs and there I found following
The configuration I am using is
Observation: It does not say Property not auto-configured: for camel.beans.aggregate |
Can you try by adding a dynamic name to the file? In this way it will go in append. I still by the way don't get if this problem is of azure blob or ckc.. |
I don't know a way to reproduce this, so I cannot really say nothing. The way to aggregate stuff and send a list of record is through the aggregation. There are actually no other ways to collect records. I still believe there is something wrong anyway. You should see at least 1000 record aggregated with that configuration |
@nidhijwt to narrow down on the problem, can you please try to use a logger sink instead of the azure sink (or any other sink that could help you to troubleshoot the issue further) to see if the data indeed are aggregated? |
ah, the option you're using for camel.beans.aggregate is wrong. It should be |
Yes, Aggregation worked. Thanks When is the next release planned? |
We'll start the release cut this week. |
Ok, thanks. |
Overview
I am using CamelAzurestorageblobSinkConnector to sink to Archive data on my Kafka topics.
The problem
Data that I am sending, goes in Appendblob to Azure blob. What it does is, it creates another block in end every time and adds the block to an azure append blob. Azure gives a maximum limit of 50,000 blocks. Thus after 50,000 records data gets blob gets filled up and this happens within a few minutes. Append blob offers block size is upto 4 MB, which in my case is not being used fully as it saves only one record per block. The size of messages is really small (say 1 kb each). What happens is after 50,000 records my Append blob get full and I get an error saying
com.azure.storage.blob.models.BlobStorageException: Status code 409, "
BlockCountExceedsLimit
The committed block count cannot exceed the maximum limit of 50,000 blocks.\nRequestId:b112f700-301e-0022-476f-5ea523000000\nTime:2020-07-20T08:23:40.1508526Z"\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)\n\tat sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)\n\tatAsk
I do not see a feature where I can buffer a few records and then insert using a connector. Please suggest if any such feature exists as this seems basic thing when archiving using connectors.
I such a thing does not exist what work around other people use?
The text was updated successfully, but these errors were encountered: