Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]Collect "cpu.user" metric #4195

Closed

Conversation

danielamorais
Copy link

@danielamorais danielamorais commented May 29, 2019

What is the purpose of the change

Collect "cpu.user" for load balancer statistics

  1. Creates a callback service: A thread inside this class is responsible for collecting CPU Usage. When the CPU Usage changes, the listener will receive a notification.
  2. Create unit test

Verifying this change

Run CpuUsageServiceImplTest.java

Follow this checklist to help us incorporate your contribution quickly and easily:

  • Make sure there is a GITHUB_issue field for the change (usually before you start working on it). Trivial changes like typos do not require a GITHUB issue. Your pull request should address just this issue, without pulling in other changes - one PR resolves one issue.
  • Format the pull request title like [Dubbo-XXX] Fix UnknownException when host config not exist #XXX. Each commit in the pull request should have a meaningful subject line and body.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Write necessary unit-test to verify your logic correction, more mock a little better when cross module dependency exist. If the new feature or significant change is committed, please remember to add sample in dubbo samples project.
  • Run mvn clean install -DskipTests=false & mvn clean test-compile failsafe:integration-test to make sure unit-test and integration-test pass.
  • If this contribution is large, please follow the Software Donation Guide.

@codecov-io
Copy link

codecov-io commented May 29, 2019

Codecov Report

Merging #4195 into master will decrease coverage by 0.13%.
The diff coverage is 38.09%.

Impacted file tree graph

@@             Coverage Diff             @@
##             master   #4195      +/-   ##
===========================================
- Coverage     63.34%   63.2%   -0.14%     
+ Complexity      450     448       -2     
===========================================
  Files           770     774       +4     
  Lines         33015   33099      +84     
  Branches       5218    5222       +4     
===========================================
+ Hits          20912   20919       +7     
- Misses         9705    9776      +71     
- Partials       2398    2404       +6
Impacted Files Coverage Δ Complexity Δ
...rpc/cluster/loadbalance/StatisticsLoadBalance.java 0% <0%> (ø) 0 <0> (?)
...adbalance/statistics/CpuUsageAvailabilityTask.java 0% <0%> (ø) 0 <0> (?)
...r/loadbalance/statistics/DubboCpuUsageFactory.java 100% <100%> (ø) 0 <0> (?)
...er/loadbalance/statistics/CpuUsageServiceImpl.java 62.5% <62.5%> (ø) 0 <0> (?)
.../remoting/transport/netty4/NettyServerHandler.java 61.9% <0%> (-9.53%) 0% <0%> (ø)
...e/dubbo/remoting/transport/netty/NettyChannel.java 54.11% <0%> (-8.24%) 19% <0%> (-2%)
...ng/transport/dispatcher/all/AllChannelHandler.java 51.42% <0%> (-5.72%) 0% <0%> (ø)
...ng/exchange/support/header/HeartbeatTimerTask.java 73.68% <0%> (-5.27%) 0% <0%> (ø)
.../org/apache/dubbo/remoting/ExecutionException.java 15.78% <0%> (-5.27%) 0% <0%> (ø)
.../apache/dubbo/remoting/transport/AbstractPeer.java 58.69% <0%> (-4.35%) 0% <0%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 82378a1...cacfc50. Read the comment docs.

@danielamorais danielamorais changed the title [WIP]Collecting new metrics Collect "cpu.user" metric Jun 13, 2019
@ralf0131
Copy link
Contributor

UT failed, the reason is:

[ERROR] src/main/java/org/apache/dubbo/monitor/dubbo/CpuUsageGauge.java:[4,8] (imports) UnusedImports: Unused import - com.alibaba.metrics.Clock.

@danielamorais danielamorais changed the title Collect "cpu.user" metric [WIP] Collect "cpu.user" metric Jun 13, 2019
@danielamorais
Copy link
Author

danielamorais commented Jul 11, 2019

@ralf0131

I made a test with 1 provider and 1 consumer. I found 2 problems in my tests:

  1. After the consumer restarts, the provider doesn't collect CPU Usage anymore.
  2. When is the provider side, RpcContext.getServerContext().getRemoteAddressString() returns "null:0".

CPU usage data:
[ "null:0" -> value updated, "some_ip:20880" -> 0.0]

  • The provider is collecting even with no one consumer up (when doesn't have any listeners). When the consumer starts, a new invoker is added in Map<String, Float> cpuUsage. It is ok?

dubbo-consumer.xml

<dubbo:application name="demo-consumer"/>

<dubbo:registry address="multicast://224.5.6.7:1234"/>

<dubbo:protocol name="dubbo" port="20880"/>
 
<dubbo:reference id="demoService" check="false" interface="org.apache.dubbo.demo.DemoService" timeout="20000"/>

dubbo-provider.xml

<dubbo:application name="demo-provider"/>

<dubbo:registry address="multicast://224.5.6.7:1234" />

<dubbo:protocol name="dubbo"/>

<bean id="demoService" class="org.apache.dubbo.demo.provider.DemoServiceImpl"/>
<bean id="cpuUsageService" class="org.apache.dubbo.rpc.cluster.loadbalance.statistics.CpuUsageServiceImpl">
</bean>

<dubbo:service interface="org.apache.dubbo.demo.DemoService" ref="demoService" loadbalance="statistics"/>
<dubbo:service interface="org.apache.dubbo.rpc.cluster.loadbalance.statistics.CpuUsageService" ref="cpuUsageService" connections="1" callbacks="1000">
	<dubbo:method name="addListener">
	<dubbo:argument index="1" callback="true" />
	</dubbo:method>
</dubbo:service>

dubbo.properties (provider side)

dubbo.application.qos.port=22222
time.to.live=1000
mill.to.collect.cpu.usage=1000

urlBuilder.setPath(CpuUsageService.class.getName());

Invoker<CpuUsageService> cpuUsageInvoker = protocol.refer(CpuUsageService.class, urlBuilder.build());
return proxyFactory.getProxy(cpuUsageInvoker);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ralf0131 The class which implements CpuUsageService has a default constructor that schedule the thread to collect CPU Usage.
Questions:

  1. Should I call the default constructor? Does the proxyFactory do it?
  2. In dubbo-provider.xml I added:
<bean id="cpuUsageService" class="org.apache.dubbo.rpc.cluster.loadbalance.statistics.CpuUsageServiceImpl"/>

Will Spring/Dubbo call the factory class (DubboCpuUsageServiceFactory)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, why do you need to create CpuUsageFactory? I think right now it is ok to put the logic to StatisticsLoadBalance and refactor it later.

@ralf0131
Copy link
Contributor

I made a test with 1 provider and 1 consumer.

You need to have at least 2 provider and 1 consumer in order to make a load balance to work. Please check the logic of org.apache.dubbo.rpc.cluster.support.AbstractClusterInvoker#doSelect

I found 2 problems in my tests:

After the consumer restarts, the provider doesn't collect CPU Usage anymore.

This is because you registered the listener to org.apache.dubbo.rpc.cluster.loadbalance.statistics.CpuUsageServiceImpl#listeners, but failed to remove it when consumer restart.

When a provider failed to send cpu usage data to a consumer, it should throw an expcetion

When is the provider side, RpcContext.getServerContext().getRemoteAddressString() returns "null:0".

Because the thread has been switched when org.apache.dubbo.rpc.cluster.loadbalance.statistics.CpuUsageServiceImpl#collectCpuUsage is called, RpcContext.getServerContext() is a thread local object. If you switch thread you will lose it. You can consider NetUtils.getLocalHost() instead.

There is a third issue, which is mentioned by @guohao , if the provider restart, the consumer should remove the statistics from the org.apache.dubbo.rpc.cluster.loadbalance.StatisticsLoadBalance

@CLAassistant
Copy link

CLAassistant commented Sep 6, 2019

CLA assistant check
All committers have signed the CLA.

@AlbumenJ
Copy link
Member

@danielamorais hi, thanks for your contribution

Please merge the latest master branch to resolve conflict files.

@AlbumenJ AlbumenJ added the status/waiting-for-feedback Need reporters to triage label Apr 11, 2021
@AlbumenJ
Copy link
Member

Close for long time no response.

Please feel free to reopen if you have any question.

If you think these changes still useful in the latest master branch, please submit a new pull request.

@AlbumenJ AlbumenJ closed this May 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/waiting-for-feedback Need reporters to triage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants