Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API gateway: cache topic producer for the same gateway #660

Merged
merged 5 commits into from
Oct 30, 2023

Conversation

nicoloboschi
Copy link
Member

@nicoloboschi nicoloboschi commented Oct 27, 2023

Changes:

  • We use Guava cache to keep producer by gateway. The cache is LRU and every access refresh the ttl.
  • The max size is configurable (max 100 across all tenants). The cache can be disabled

@nicoloboschi nicoloboschi marked this pull request as ready for review October 27, 2023 15:56
try {
final SharedTopicProducer sharedTopicProducer =
cache.get(key, () -> new SharedTopicProducer(topicProducerSupplier.get()));
sharedTopicProducer.acquire();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a common pitfall

the "acquire" should be performed in the constructor, because between this line and the previous line it is possible that the producer has been released if the count is zero (t is an edge case, but it usually happens).

The tricky thing is that you have to call acquire if there is already an object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also the producer must be started only once, so you have to start it when you have created it
and the cache should not return the producer until it is fully initialised (started) with success
and if the start operation fails it must not be added to the cache

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the producer is started in during the load in the cache and every producer in the cache is already started.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the acquire must take into condiseration the edge case you mentioned, right. I did it in another manner. When you call acquire, if the producer has been actually closed (removed from cache for example), we retry and that will trigger the constructor to be called

topicConnectionsRuntime.createProducer(
null, streamingCluster, Map.of("topic", topic));
producer.start();
topicProducer.start();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not to be executed here, see the comments above

public interface TopicProducerCache {
record Key(String tenant, String application, String gatewayId) {}

TopicProducer getOrCreate(Key key, Supplier<TopicProducer> topicProducerSupplier);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this operation may fail, it is better to add a "throws" clause. The risk is to forget possible failures (that will happen for instance if the broker is down)

if (topicProperties.isProducersCacheEnabled()) {
return new LRUTopicProducerCache(topicProperties.getProducersCacheSize());
} else {
return (key, topicProducerSupplier) -> topicProducerSupplier.get();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricker then expected, see the comments above
the producer that is returned should be already ready to accept writes (started)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it is


application.topics.producers-cache-enabled=true
application.topics.producers-cache-size=100
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have metrics for this, otherwise it is hard to understand if the system is close to the threshold.
when you hit the threshold we stop caching the system will have very bad performances (latency)

@nicoloboschi nicoloboschi reopened this Oct 27, 2023
@nicoloboschi nicoloboschi merged commit c2208a2 into main Oct 30, 2023
9 of 10 checks passed
@nicoloboschi nicoloboschi deleted the cache-producers branch October 30, 2023 17:11
benfrank241 pushed a commit to vectorize-io/langstream that referenced this pull request May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants