New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IGNITE-13489 Clients log in and out of the topology #8362
IGNITE-13489 Clients log in and out of the topology #8362
Conversation
client start and stop in loop
|
||
self.__check_status("IGNITE_APPLICATION_FINISHED", timeout=timeout_sec) | ||
|
||
# pylint: disable=W0221 | ||
def stop_node(self, node, clean_shutdown=True, timeout_sec=10): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I see you just move this method higher. Let's revert it.
Why do you change timeout_sec?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert changes
output = node.account.ssh_capture( | ||
"grep '%s' %s" % (name + "->", self.STDOUT_STDERR_CAPTURE), allow_fail=False) | ||
for line in output: | ||
res.append(re.search("%s(.*)%s" % (name + "->", "<-"), line).group(1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Imagine you have multiple application nodes, all of them are failed. res
than contains multiple lines. In method above extract_result
it checks size of the error and raises exception.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I see. Then it is better to build a check based on a dynamic parameter. I mean, instead of the expected 1 line, expect len(self.nodes). Am I getting this right?
/** | ||
* Java client. Tx put operation | ||
*/ | ||
public class SimpleClient extends IgniteAwareApplication { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rename simple client to smth more meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, what about IgniteCachePutClient?
""" | ||
|
||
CACHE_NAME = "simple-tx-cache" | ||
PACING = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 10?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pacing is an internal delay, the interval between discrete operations. This is a customized parameter, 10 ms is selected so that we know for sure that the average number of operations generated by the client does not exceed 100 operations per second
PACING = 10 | ||
JAVA_CLIENT_CLASS_NAME = "org.apache.ignite.internal.ducktest.tests.start_stop_client.SimpleClient" | ||
|
||
CLIENTS_WORK_TIME_S = 30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 30?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLIENTS_WORK_TIME_S - the working time of the client group. It is selected in such a way as to allow time for the load level to stabilize. A certain number of clients are included in the topology and work in this number of CLIENTS_WORK_TIME_S of time. From my point of view, this operating time is not too high, but it is sufficient for the load level on the cluster to settle after new fat clients enter the cluster. But I admit that I'm wrong.
for i in range(self.ITERATION_COUNT): | ||
self.logger.debug(f'Starting iteration: {i}.') | ||
|
||
time.sleep(self.CLIENTS_WORK_TIME_S) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you sleep on first iteration?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We give the STATIC_CLIENTS_NUM client group time to WORK during the CLIENTS_WORK_TIME_S time for each iteration of the TEMP_CLIENTS_NUM client group entry and exit. This is done in order to get a stable state of the cluster under a stable load after each client login and exit procedure
for node in service.nodes: | ||
service.stop_node(node=node, clean_shutdown=False) | ||
else: | ||
service.stop(clean_shutdown=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do not replace this if-else block with service.stop(clean_shutdown=kill_nodes)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the base timeout of 10 seconds was not enough to register completion in the base method. However, when they are completed sequentially, I fit into the timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's because service.stop waits for completion message "IGNITE_APPLICATION_FINISHED" in any case. So if clean_shutdown is False there is no reason to wait the message. So. It's a bug of stop
method. Let's fix it instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guessed, but I was even more afraid to rule. Okay, I'll fix it.
@@ -110,7 +121,8 @@ def extract_result(self, name): | |||
""" | |||
results = self.extract_results(name) | |||
|
|||
assert len(results) <= 1, f"Expected exactly one result occurence, {len(results)} found." | |||
assert len(results) <= len(self.nodes), f"Expected exactly {len(self.nodes)} occurence," \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's replace <= with !=.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only here it is necessary ==
iteration_count=3, | ||
client_work_time=30) | ||
# pylint: disable=R0913 | ||
def test_ignite_start_stop_nodes(self, ignite_version, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable run the test in parallel with others please use @cluster decorator. If user wants to run experiments with different number of nodes, the one must comment the decorator in feature branch.
Actually we need a flag for ducktape to ignore this decorator if needed. I will make a ticket for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok. I found your ticket thank you!
fix await finish event for clear_shutdown=False
LGTM |
client start and stop in loop
Thank you for submitting the pull request to the Apache Ignite.
In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:
The Contribution Checklist
The description explains WHAT and WHY was made instead of HOW.
The following pattern must be used:
IGNITE-XXXX Change summary
whereXXXX
- number of JIRA issue.(see the Maintainers list)
the
green visa
attached to the JIRA ticket (see TC.Bot: Check PR)Notes
If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.