Skip to content

HDDS-9803. Limit total number of buckets to avoid OutOfMemoryError in OM#5705

Merged
sadanand48 merged 18 commits intoapache:masterfrom
sumitagrawl:HDDS-9803
Dec 7, 2023
Merged

HDDS-9803. Limit total number of buckets to avoid OutOfMemoryError in OM#5705
sadanand48 merged 18 commits intoapache:masterfrom
sumitagrawl:HDDS-9803

Conversation

@sumitagrawl
Copy link
Copy Markdown
Contributor

@sumitagrawl sumitagrawl commented Nov 30, 2023

What changes were proposed in this pull request?

limit the number of buckets with default to 100000, to avoid OOM when huge number of buckets are created. This is as buckets are full cached in OM.

Its observed that 1 bucket Info consume approx 1KB memory, so 100,000 buckets will take up 100MB approx, so 100,000 is chosen as default and handle most of the usecase.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-9803

How was this patch tested?

  • Unit test is added to check bucket limit

@nandakumar131
Copy link
Copy Markdown
Contributor

@sumitagrawl can you check if the test failures are related?

@xichen01
Copy link
Copy Markdown
Contributor

This is a direct and efficient solution.
But wouldn't it be better if we limited the number of buckets in memory instead of just limiting the number of buckets that can be created, since it's possible that more than just buckets have this problem, and volume is FULL_CACHE as well.

@sumitagrawl
Copy link
Copy Markdown
Contributor Author

This is a direct and efficient solution. But wouldn't it be better if we limited the number of buckets in memory instead of just limiting the number of buckets that can be created, since it's possible that more than just buckets have this problem, and volume is FULL_CACHE as well.

Thanks @xichen01 for review, bucket is full cache table, and for performance, it was cached.

  • most of read/write operation involves bucket access / update. making it partial cache may impact performance and have impact to most of code where assumption is done for bucket to be full cache.

So IMO, its not viable to make bucket as partial cache.

Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sumitagrawl for working on this.

Shouldn't we also limit volumes at the same time? Creating too many volumes can also result in OOME.

</description>
</property>
<property>
<name>ozone.om.max.bucket</name>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: ozone.om.max.buckets.


@Override
public long getEstimatedKeyCount() throws IOException {
if (cache instanceof FullTableCache) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer adding CacheType type() in TableCache, and using cache.type() == FULL_CACHE instead of instanceof check.

long nrOfBuckets = ozoneManager.getMetadataManager().getBucketTable()
.getEstimatedKeyCount();
if (nrOfBuckets >= maxBucket) {
throw new OMException("Number of bucket crosses limit " +
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think "Cannot create more than " + maxBucket + " buckets" would be a better message.

@sumitagrawl
Copy link
Copy Markdown
Contributor Author

Shouldn't we also limit volumes at the same time? Creating too many volumes can also result in OOME.

We already have max number of volume per user restriction, "ozone.om.user.max.volume" with default value 1024. May be this is enough.

Copy link
Copy Markdown
Contributor

@adoroszlai adoroszlai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sumitagrawl for updating the patch.

We already have max number of volume per user restriction, "ozone.om.user.max.volume" with default value 1024. May be this is enough.

Yes, I agree, that's probably fine.

@adoroszlai adoroszlai changed the title HDDS-9803. limit number of bucket to avoid OOM HDDS-9803. Limit total number of buckets to avoid OutOfMemoryError in OM Dec 1, 2023
@kerneltime
Copy link
Copy Markdown
Contributor

@sumitagrawl can you please resolve the conflicting files?

@sumitagrawl
Copy link
Copy Markdown
Contributor Author

@sumitagrawl can you please resolve the conflicting files?

done

Copy link
Copy Markdown
Contributor

@sadanand48 sadanand48 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch @sumitagrawl , LGTM

@sadanand48 sadanand48 merged commit a4b2397 into apache:master Dec 7, 2023
@sadanand48
Copy link
Copy Markdown
Contributor

Thanks @sumitagrawl for the change , @xichen01, @adoroszlai for the reviews.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants