Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCKETMQ-45]Delete hanged consume queue files #39

Closed
wants to merge 5 commits into from
Closed

[ROCKETMQ-45]Delete hanged consume queue files #39

wants to merge 5 commits into from

Conversation

lizhanhui
Copy link
Contributor

@lizhanhui lizhanhui commented Jan 12, 2017

Consume queue file may hang and stop further deleting under concurrent scenario.

If the consume queue file is concurrently held by another thread, mappedFile.destroy(interval) will only set firstShutdownTimestamp to current timestamp and available false, then return false directly.
Check MappedFileQueue.java#deleteExpiredFileByOffset,

if (destroy && mappedFile.destroy(1000 * 60)) {
files.add(mappedFile);
deleteCount++;
} else {
break;
}

the current round of deletion will be stopped and current mapped file becomes handed.

Next time trying to delete mapped file from this mapped file queue, MappedFileQueue.java#deleteExpiredFileByOffset(offset, unitSize) line No.390

MappedFile mappedFile = (MappedFile) mfs[i];
SelectMappedBufferResult result = mappedFile.selectMappedBuffer(this.mappedFileSize - unitSize);

will always return null as the mapped file is no longer available and cannot hold. Current implementation will only log a warn and exit.

@zhouxinyu
Copy link
Member

Hi, @lizhanhui , what's the appearance/effect of this BUG?

Let's start from the next round.

IMO, it's ok when ConsumeQueue-MappedFile hold failed, this file will be released by the last holder, while the empty mapped file will retain in MappedFileQueue, but the min consume queue offset will be corrected.

It seems that the only problem is the hanged consume queue only can be deleted when restart the broker.

@@ -397,11 +397,21 @@ public int deleteExpiredFileByOffset(long offset, int unitSize) {
log.info("physic min offset " + offset + ", logics in current mappedFile max offset "
+ maxOffsetInLogicQueue + ", delete it");
}
} else if (!mappedFile.isAvailable()) { // Handle hanged file.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to add RefCount to the if condition.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why? I do not see the rationale of awaiting RefCount to be 0 after commit log has been deleted after intervalForcibly period of time.

Copy link
Member

@zhouxinyu zhouxinyu Jan 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When MappedFile has been shutdown, no more new holder appear, so the RefCount will to be 0 soon, so I think maybe it's more graceful :), but it's up to you~

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing, can we just mark destroy=true in else if section and next if will destroy it ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another thing, can we just mark destroy=true in else if section and next if will destroy it ?

Yep, I think we can do this.

@lizhanhui
Copy link
Contributor Author

what's the appearance/effect of this BUG?

Reported bug description is consume queue files cannot be automatically deleted and took up large portion of disk.

IMO, it's ok when ConsumeQueue-MappedFile hold failed, this file will be released by the last holder, while the empty mapped file will retain in MappedFileQueue, but the min consume queue offset will be corrected.

ConsumeQueue#correctMinOffset will not work as expected if consume queue file hanged. Double check the source code to figure out.

It seems that the only problem is the hanged consume queue only can be deleted when restart the broker.

Not only this one. As correcting min offset is also affected, consumer client probably experiences long term of "commit log being deleted." before consuming progress catches up. There should be additional potential issues.

@zhouxinyu
Copy link
Member

I checked ConsumeQueue#correctMinOffset, and you are right this method won't work as expected in this situation.

Please @vintagewang @vongosling help review this PR.

@zhouxinyu
Copy link
Member

@lizhanhui Also please add some unit tests -:)..

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 25.854% when pulling b8e53c9 on lizhanhui:ROCKETMQ-45 into b29c318 on apache:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 25.854% when pulling b8e53c9 on lizhanhui:ROCKETMQ-45 into b29c318 on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 25.854% when pulling b8e53c9 on lizhanhui:ROCKETMQ-45 into b29c318 on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 25.814% when pulling 57111a0 on lizhanhui:ROCKETMQ-45 into b29c318 on apache:master.

2 similar comments
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 25.814% when pulling 57111a0 on lizhanhui:ROCKETMQ-45 into b29c318 on apache:master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 25.814% when pulling 57111a0 on lizhanhui:ROCKETMQ-45 into b29c318 on apache:master.

@lizhanhui lizhanhui closed this Jan 22, 2017
@vongosling
Copy link
Member

Usually, we need 3 committers to review PR. We'd better not merge the PR if the opinion fails to unite

@lizhanhui
Copy link
Contributor Author

lizhanhui commented Jan 22, 2017

We'd better not merge the PR if the opinion fails to unite

Agree. But this is a minor change to fix obvious corner case mis-handling. Proposing review points are carefully considered and mostly accepted. Anyway, Let's wait for three review OK next time before merging.

lizhanhui pushed a commit to lizhanhui/rocketmq that referenced this pull request Jun 25, 2019
pingww pushed a commit that referenced this pull request Aug 26, 2022
* fix #38 support prometheus exporter

* fix #38 support prometheus exporter 1

* fix #38 support prometheus exporter frame

* fix #38 support prometheus collect pull status metrics
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants