Skip to content

ARTEMIS-1570 Flush appendExecutor before take journal snapshot#1742

Closed
shoukunhuai wants to merge 3 commits into
apache:masterfrom
shoukunhuai:flush-journal-executor
Closed

ARTEMIS-1570 Flush appendExecutor before take journal snapshot#1742
shoukunhuai wants to merge 3 commits into
apache:masterfrom
shoukunhuai:flush-journal-executor

Conversation

@shoukunhuai
Copy link
Copy Markdown
Contributor

When live start replication, it must make sure there is
no pending write in message & bindings journal, or we may
lost journal records during initial replication.

So we need flush append executor after acquire StorageManager's
write lock, before Journal's write lock.
Also we set a 10 seconds timeout when flush, the same as
Journal::flushExecutor. If we failed to flush in 10 seconds,
we abort replication, backup will try again later.

When live start replication, it must make sure there is
no pending write in message & bindings journal, or we may
lost journal records during initial replication.

So we need flush append executor after acquire StorageManager's
write lock, before Journal's write lock.
Also we set a 10 seconds timeout when flush, the same as
Journal::flushExecutor. If we failed to flush in 10 seconds,
we abort replication, backup will try again later.

Use OrderedExecutorFactory::flushExecutor to flush executor
@shoukunhuai shoukunhuai force-pushed the flush-journal-executor branch from 3d4c459 to 6f04846 Compare December 27, 2017 06:58
@clebertsuconic
Copy link
Copy Markdown
Contributor

How did u find this ? Nice job.

I will see if I can change the logic to not need the wait.

Any tests you can share.

I’m out this week for the Xmas break. Will take a look on next week. Ok ?

@shoukunhuai
Copy link
Copy Markdown
Contributor Author

@clebertsuconic That will be great if we can avoid wait.
Please do not merge, i will push my test.

@clebertsuconic
Copy link
Copy Markdown
Contributor

@shoukunhuai I thought about developing a byteman test with some rules on stopping during the execution, and then releasing it.

Test consistency between live and backup, espacially on
a slow live.
The test use MessagePersister::encode to simulate slow IO condition.
After live started, we send 5 message with a delay(default 500ms),
then start backup, wait until replicated, then send more message
without delay. If all message sent successfully, the backup should
has the same messages as live. We assert the message number only.

For message's delay, you can set it via system property, i test
it with `-Dmessage.property.delay=5000`, the test will fail.
@shoukunhuai
Copy link
Copy Markdown
Contributor Author

@clebertsuconic just pushed my test for your reference.

@michaelandrepearce
Copy link
Copy Markdown
Contributor

@shoukunhuai looks like some check-style bits need tidying up. looking at the PR build failure.

@jbertram
Copy link
Copy Markdown
Contributor

@clebertsuconic, what's the status of this?

@clebertsuconic
Copy link
Copy Markdown
Contributor

I didn't have the time for it yet.. still working on compatibility issues.

ClientProducer producer = session.createProducer("slow");
ClientMessage message = session.createMessage(true);
// this will make journal's append executor busy
message.putLongProperty("delay", Long.getLong("message.property.delay",500L));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where? how this property is affecting the semantic of the server. I didn't find it anywhere.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never mind.. I see.. it's on your extended persister...

I don't think we need a system property.. the test can just keep it constant at 500L

@clebertsuconic
Copy link
Copy Markdown
Contributor

nice job on this one!!! impressed!!!

@clebertsuconic
Copy link
Copy Markdown
Contributor

do you want to contribute more????????? I can help you in any way you need.

@shoukunhuai
Copy link
Copy Markdown
Contributor Author

@clebertsuconic sorry, almost forgot this, being busy on something else.
Looks like this issue was resolved?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants