Skip to content

HDDS-5794. The misleading "No available thread in pool for past * second" log message in DN StateContext#2693

Merged
ChenSammi merged 5 commits intoapache:masterfrom
ChenSammi:HDDS-5794
Oct 21, 2021
Merged

HDDS-5794. The misleading "No available thread in pool for past * second" log message in DN StateContext#2693
ChenSammi merged 5 commits intoapache:masterfrom
ChenSammi:HDDS-5794

Conversation

@ChenSammi
Copy link
Copy Markdown
Contributor

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this be threadPoolNotAvailableCount instead?

Copy link
Copy Markdown
Contributor Author

@ChenSammi ChenSammi Oct 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for pointing it out. @ayushtkn , could you help to take another look of the change?

Comment on lines 624 to 625
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not able to understand this calculation of unavailableTime
If the lastHeartbeatSent was at 5.
ThreadPool goes unavailable at 8. So the value we set is 5-8=(-3) Negative 3?
In the next iteration say it is now 11, & thread pool isn't available we set -3 -11 = -15 Negative 15?
The pool isn't available from 5 to 11? so should 6 right?
@ChenSammi Can you explain a bit more about the logic here

Copy link
Copy Markdown
Contributor Author

@ChenSammi ChenSammi Oct 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, it should be “System.currentTimeMillis() - lastHeartbeatSent.get() ”.

Comment on lines +624 to +625
long unavailableTime = threadPoolNotAvailableTimeSum.addAndGet(
System.currentTimeMillis() - lastHeartbeatSent.get());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to decode the calculation after the change:
Say:
Last Heartbeat sent at : 5
First Unavailable at 8: So here unavailableTime = 8 - 5 = 3
Second Unavailable at 11: So here unavailableTime = 11 - 5 + 3 = 9
Shouldn't be this 6? Last heartbeat was at 5 & we are at 11, the
unavailable time should be 11-5=6
Can you help clarify

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically there is no need to sum up the unavailable time. "System.currentTimeMillis() - lastHeartbeatSent.get()" is enough. I will update it accordingly.

Copy link
Copy Markdown
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks almost cool to me. @adoroszlai you touched this code last, Will you be able to help double check this once

Comment on lines +626 to +630

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit:
Avoid this

threadPoolNotAvailableCount.set(0);
task.execute(service);
lastHeartbeatSent.set(System.currentTimeMillis());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not using Time.monotonicNow()? That should be a safer bet IMO?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see that Time.monotonicNow() used in some places, but it involves a division operation. What's the paticular benifit of using Time.monotonicNow over System.currentTimeMills? I don't know the story behind, and would you like to shed some light on it?

Copy link
Copy Markdown
Member

@ayushtkn ayushtkn Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is to be used if we want to calculate difference in times. The first cut info I got is from the javaoc
https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/Time.java#L51

In hadoop they changed everywhere- https://issues.apache.org/jira/browse/HDFS-6841

Will try to find some more descriptive doc & share

Copy link
Copy Markdown
Contributor Author

@ChenSammi ChenSammi Oct 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a small test to show the System.currentTimeMills. Between the two runs, I changed the Linux system time. The System.currentTimeMills is really infected by the Linux time. That's the meaning of "because it will be broken by settimeofday". I will change it to Time.monotonicNow(). Thanks @ayushtkn for the info.

@ChenSammi ChenSammi changed the title HDDS-5794. The misleading "No available thread in pool for past * sencond" log message in DN StateContext HDDS-5794. The misleading "No available thread in pool for past * second" log message in DN StateContext Oct 20, 2021
Copy link
Copy Markdown
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM

@ChenSammi ChenSammi merged commit fbb8cf0 into apache:master Oct 21, 2021
@ChenSammi
Copy link
Copy Markdown
Contributor Author

Thanks @ayushtkn for the code review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants