Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance issues with Feign? #1890

Closed
petterasla opened this issue Dec 20, 2022 · 4 comments
Closed

Performance issues with Feign? #1890

petterasla opened this issue Dec 20, 2022 · 4 comments
Labels
feedback provided Feedback has been provided to the author needs info Information is either missing or incomplete.

Comments

@petterasla
Copy link

Hi,

Has anyone experienced performance issues with Feign?

Our company is using Feign with most of our web services.
New requirements demand one of our services to handle at least 3000 transactions per second (TPS).
So for the past few weeks my team have been performance testing with different code config and infrastructure.
We stripped all logic without luck, but once we removed Feign and used pure okHttp3 HTTP clients with http/2 protocol, the performance skyrocketed.

We went from constantly starting to fail around 1500 TPS, independent of pods and resources, to managing 8000 TPS.
With Feign we also see a boost of Java threads when we get close the threshold of 1500 TPS.

Are we configuring Feign wrong? (see config example at the bottom)

Background info

Infrastructure

  • Web services are running on kubernetes - on premise
  • Containers
  • Resources examples:
    • Tried 512MB - 4 GB memory
    • 2, 4, 8 CPU cores
    • Pods: 2, 8, 12, 20
    • Heap Size: 128MB - 2GB

Code

  • Java 11
  • Spring Boot 2
  • Feign with all HTTP clients
  • okHttp3 and java.net.Client HTTP clients
  • Http/2 protocol

Example Feign with okHttp3 client

OkHttpClient.Builder builder = new OkHttpClient.Builder()
        .protocols(List.of(Protocol.HTTP_2, Protocol.HTTP_1_1))
        .followRedirects(true)
        .connectTimeout(Duration.ofSeconds(10))
        .readTimeout(Duration.ofSeconds(10))
        .retryOnConnectionFailure(false);
        
OkHttpClient http2Client = new OkHttpClient(builder);

return Feign.builder()
        .logger(new FeignNTLogger())
        .logLevel(Logger.Level.FULL)
        .client(http2Client)
        .addCapability(new MicrometerCapability(meterRegistry))
        .encoder(new JacksonEncoder(objectMapper))
        .retryer(Retryer.NEVER_RETRY)
        .decoder(new JacksonDecoder(objectMapper))
        .errorDecoder(new WalletApiErrorDecoder(objectMapper))
        .options(new Request.Options(10, TimeUnit.SECONDS, 10, TimeUnit.SECONDS, true))
        .target(WithdrawApi.class, url);
@velo
Copy link
Member

velo commented Dec 23, 2022

You create the WithdrawApi instance once, right?

You shouldn't re-create it over and over again.

You can either use metrics to try to figure where the time is being spent OR run a flame chart see if anything suspicious arrive.

Also, there is some code for testing feign performance on benchmark module, you could introduce an okHttp test in there and look at the numbers

@velo
Copy link
Member

velo commented Jan 13, 2023

@petterasla are you willing to run flame charts on your project? That would help identify the issue

@velo velo added needs info Information is either missing or incomplete. feedback provided Feedback has been provided to the author labels Jan 13, 2023
@steam0
Copy link

steam0 commented Jan 24, 2023

You create the WithdrawApi instance once, right?
You shouldn't re-create it over and over again.

Yes, it is created as a Spring Bean so we would hope it only gets created once per instance of the application.

@petterasla are you willing to run flame charts on your project? That would help identify the issue

I will forward your question to him

@petterasla
Copy link
Author

Hi.
Thank you for your patience. I've been out of office in January.

TLDR;
We solved our issue by dropping Feign for this particular service, using pure OkHttp3 client.

Summary:
We have solved our requirement with number of transactions per second.
The sad part is that we were not able to pin point the exact problem other than using/not using Feign.
Our graphs showed that there was a approximately 20-30ms delay using Feign per transaction, but that in itself does not explain why our service failed around 1500 TPS, even when increasing hardware.

Since the problem was solved by using only okHttp3, no more effort was put in to find the exact issue.
Because we didn't find an answer, I should be careful writing one.
Our suspicion were: Combination of overhead like de/serializing models, logging etc. Or some sort of setup on our side.

I will close the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feedback provided Feedback has been provided to the author needs info Information is either missing or incomplete.
Projects
None yet
Development

No branches or pull requests

3 participants