-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance test and improvement of 3.2.6 #1596
Comments
No, it is not important for this case. Because this is about agent test, the backend is mock, because we don't have such Infra env to do so. |
We haven't done yet. |
Oh, I got what you said. But before used in the product env, we have to assess the performance about the collector cluster. ES itself is ok, because it is used a lot as the factual standards, we could only care about how to adjust its config. Does the data transfer between agent and collect use the tcp long connection? (It seems used long connection based on jetty). In the product env of internet, this is very common: there will be 100+ micro-service instance, and the qps will be 20K total around, 1K qps at least in the high load instance, and it will increase of course in the future. According to the performance report, the 1K qps per instance, then we need assess the collector scale. At last, I mean, is there some advice or default config about collector cluster scale based on the request scale above? |
You are right, the performance test report is just for agent, so use mock to imitate collector is appropriate。 But I think, it seems that we couldn't make the mock collector and actual collector equivalent, it seems will hide the potential risk. The efficiency of the collector cluster is very important in the product env. |
I think should be gRPC HTTP/2. But long connection, yes. 3.x doesn't focus on performance so much, SkyWalking 5 and 6 focus on these fields. In SkyWalking 5 beta2, we run a test, look like 10k~20k per collector if ElasticSearch is big enough. |
Oh, I got it, using gRPC based on protobuf protocol through long connection to transfer, is this true? Could you please have a assessment about collector performance between 3.2.6 and 5&6? I mean, how reliability about10K~20K per collector in the version 3.2.6? |
Yes.
Haven't run that before. Have to do by yourself :)
5.0.0-beta2 did a big performance upgrade by optimization from alpha. From different version, especially old version(3.2.6), really don't know. |
I had do the real performance test to version 3.2.6. Two points:
I think weak or similar reference is used in agent on collecting trace data. So I think this is the reason that why full gc not happen, just young gc. |
We have a ring buffer to avoid memory overload. I think your result is perfect, and we are expected to happen. Core concept is, do our best to don't affect the business app. |
I hope you could share the test result in public, here or post a blog. This will be another case our project did a good work. |
This is the part of the test performance report, here is the result, but just chinese description. The goal of this test is that confirm skywalking has no or little affect to biz. The result is yes. |
This is a very high-value test report from the community. Appreciate!! |
Actually, We must know, this test is not standard, for example ,message size should designed as 100B, 1K, 10K, 100K, etc. But limited by time and resources, I can't afford it. And Why I still to do this work? Because it is necessary, we all use rpc, and qps will be enlarged several times for skywalking-collector. I must assess the affect to biz if qps up to magnanimity. Later if I have time, I think I will have a test that what biz happens when skywalking-collectors all crash. |
You have done much more than others. Provide a very formal report.
Let's see what is happening in there. #1637 A new patch for connection. |
Do you have any documentation on the performance after adding GRPC log reporting |
I checked about the performance test report:
https://skywalkingtest.github.io/Agent-Benchmarks/README_zh.html
And I noticed that there was only provide the skywalking-agent machine configuration.
How about machine configuration about collector node or cluster and es machine or cluster? In my idea, this is very important in the performance test.
Is there some test report about collector and es in the performance test?
The text was updated successfully, but these errors were encountered: