Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Broker] Change the default number of namespace bundles to 32 #12137

Conversation

lhotari
Copy link
Member

@lhotari lhotari commented Sep 22, 2021

Motivation

Currently, the default number of namespace bundles is 4. This leads to cases where a lot of topics can reside in just 1 or 2 brokers in 3 broker cluster.

Side note:
A good rule of thumb rule for defining the number of shards is to have at least 10x more shards than there are nodes (this 10x rule of thumb comes from Akka Cluster documentation). In Pulsar a "namespace bundle" behaves like a shard. In addition to this, there are some recommendations to use a power of 2 for the number of shards (reference to this recommendation not found).
From these rules, you could get that a 3 broker node cluster should use 32 bundles, a 4-6 node cluster 64 bundles and 7-12 node cluster should use 128 bundles.
This might not be the best advice, but I haven't seen better rule of thumb rules for choosing the number of namespace bundles.

Modifications

Change the defaultNumberOfNamespaceBundles to 32.

@lhotari lhotari added the doc-not-needed Your PR changes do not impact docs label Sep 22, 2021
@lhotari lhotari added this to the 2.9.0 milestone Sep 22, 2021
@lhotari lhotari self-assigned this Sep 22, 2021
@lhotari lhotari closed this Sep 22, 2021
@lhotari lhotari reopened this Sep 22, 2021
@lhotari lhotari force-pushed the lh-change-defaultNumberOfNamespaceBundles branch from 7341c7d to 284c56d Compare September 22, 2021 16:06
Copy link
Member

@michaeljmarshall michaeljmarshall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes sense to me. However, it looks like @merlimat's original PR to introduce this configuration, #854, started with 16 bundles and was dropped to 4 after some discussion with @rdhabalia about broker shutdown times. Does the concern still apply?

Before we merge this, I think you'll need to update the website's reference-configuration.md file: https://github.com/apache/pulsar/blob/master/site2/docs/reference-configuration.md.

@rdhabalia
Copy link
Contributor

defaultNumberOfNamespaceBundles is a configurable param and it can be configured with different values for different use cases. Having large number of bundles come with its cost and it should not be large enough by default for every namespace. We still have kept 4 by default which should be more than enough. if specific usecase require 16/32/64 bundles then user should update configuration file in the system and we should avoid changing default values based on specific user's usecase,

@lhotari
Copy link
Member Author

lhotari commented Sep 22, 2021

defaultNumberOfNamespaceBundles is a configurable param and it can be configured with different values for different use cases. Having large number of bundles come with its cost and it should not be large enough by default for every namespace. We still have kept 4 by default which should be more than enough. if specific usecase require 16/32/64 bundles then user should update configuration file in the system and we should avoid changing default values based on specific user's usecase,

That's true that it's configurable, but the default value 4 leads often to a case where a broker in a 3 broker cluster doesn't get bundles assigned and is completely idling while the other 2 brokers take all of the load. I can provide a repro for this case if that helps describing the problem.

@rdhabalia
Copy link
Contributor

user can always enable auto bundle unloading in loadbalancer. and default-bundle =4 is not defined for test cluster with 1 namespace but it considers a normal cluster that is serving multiple namespaces with considerable traffic. again, higher number of bundles comes with cost specially for znodes and default behavior unnecessarily will add more bundles for users who are using default conf file.

@lhotari
Copy link
Member Author

lhotari commented Sep 22, 2021

user can always enable auto bundle unloading in loadbalancer. and default-bundle =4 is not defined for test cluster with 1 namespace but it considers a normal cluster that is serving multiple namespaces with considerable traffic. again, higher number of bundles comes with cost specially for znodes and default behavior unnecessarily will add more bundles for users who are using default conf file.

I have a test cluster on GCP/GKE where the auto bundle unloading never happens and one of the broker pods is completely idling (all 4 bundles get assigned to 2 of the 3 broker pods).

It's using the default settings which contains

loadBalancerBrokerOverloadedThresholdPercentage=85
loadBalancerAutoUnloadSplitBundlesEnabled=true
loadBalancerNamespaceBundleMaxSessions=1000

The test load is created with this type of pulsar-perf command (using 2.7.3 pulsar-perf since current 2.8.0 contains a bug that is fixed in master, the bug causes partitions to stop consuming in pulsar-perf).

# create a 100 partition topic
pulsar-admin topics create-partitioned-topic -p 100 persistent://public/default/perftest

# produce messages
PULSAR_EXTRA_OPTS="-Xms1500M -Xmx1500M -XX:MaxDirectMemorySize=768M" /pulsar/bin/pulsar-perf produce -u pulsar+ssl://pulsar-broker:6651/ -ioThreads 4 -s 512 -r 50000 persistent://public/default/perftest

# consume messages
PULSAR_EXTRA_OPTS="-Xms1500M -Xmx1500M -XX:MaxDirectMemorySize=768M" /pulsar/bin/pulsar-perf consume -u pulsar+ssl://pulsar-broker:6651/ -s sub persistent://public/default/perftest

(these are run on a pod which has 2CPU + 4GB RAM requested/limits, using pulsar-perf from Pulsar 2.7.3 docker image)

The traffic can go on for hours and the CPU load will be very low. Since the threshold never gets exceeded, the auto unload and splitting of the bundle never happens.
In this case, the GKE cluster has a nodepool with 6 n2-standard-32 VMs (32 cores, 128GB RAM). Storage for bookies in on SSD.
The impact of this is that one of the brokers will never get any real load on it.
After changing to a namespace with 32 bundles, the load was evenly distributed. This is the reason why I think that defaultNumberOfNamespaceBundles=4 is not a good default.
The bundle splitting has an open bug #5510 which is another reason to use a higher number of default namespace bundles.

@merlimat
Copy link
Contributor

I agree that 16 or 32 are better defaults that would avoid many surprises out of the box. That has been the config I have set in prod for the past many years.

The case of having a huge namespaces with limited load on each is far less common and it can always be adjusted in the config, where it poses a problem.

The default should be a good setting for ~80% of users.

@lhotari lhotari force-pushed the lh-change-defaultNumberOfNamespaceBundles branch from 284c56d to e83415b Compare September 23, 2021 10:15
@lhotari
Copy link
Member Author

lhotari commented Sep 23, 2021

I agree that 16 or 32 are better defaults that would avoid many surprises out of the box. That has been the config I have set in prod for the past many years.

@merlimat Is 32 fine as a default, or should I change it to 16 in this PR?

@lhotari
Copy link
Member Author

lhotari commented Sep 23, 2021

I now realised that public/default gets always created with 16 bundles, regardless of what the configuration of defaultNumberOfNamespaceBundles is.

Bundles is hard coded to 16 when public/default gets created in cluster metadata setup:

this gets called from this location:

createNamespaceIfAbsent(resources, NamespaceName.get(TopicName.PUBLIC_TENANT, TopicName.DEFAULT_NAMESPACE),
arguments.cluster);

The root cause of the problem where one of the servers was idling must have been caused by something else than the default bundles of 4 since public/default has 16 by default. I now checked with pulsar-admin namespaces bundles public/default and there's 44 for public/default in this perf testing environment.

@merlimat
Copy link
Contributor

I now checked with pulsar-admin namespaces bundles public/default and there's 44 for public/default in this perf testing environment.

Yes, they should be getting split when the load increases, still it takes some time to converge to a reasonable number

@michaeljmarshall
Copy link
Member

Note also that the LoadManager implementation (and in the case the LoadManager is the ModularLoadManagerWrapper, the LoadSheddingStrategy implementation) will affect bundle distribution across brokers. In some implementations, a broker's bundles will not be unloaded if it is not overloaded, even though there might be "underloaded" brokers in the cluster. (I've already discussed this with @lhotari in separate channels. Posting here for any other readers that might want come across this PR.)

@sijie
Copy link
Member

sijie commented Sep 24, 2021

Should this be considered for a PIP since we are changing a default value that can potentially impact users that would create a lot of namespaces?

@lhotari
Copy link
Member Author

lhotari commented Oct 1, 2021

I'm closing this since the public/default is currently created without using the defaultNumberOfNamespaceBundles at all. It's currently hard code to 16 as explained in #12137 (comment) . Therefore there isn't a real need to change defaultNumberOfNamespaceBundles since public/default has 16 in every case.

I had done the analysis of the problem incorrectly and the actual problem that load didn't get balanced was caused by the behavior of the Pulsar load balancer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc-not-needed Your PR changes do not impact docs release/2.9.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants