Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-13489 Clients log in and out of the topology #8362

Merged
merged 20 commits into from Oct 23, 2020
Merged

IGNITE-13489 Clients log in and out of the topology #8362

merged 20 commits into from Oct 23, 2020

Conversation

SwirMix
Copy link
Contributor

@SwirMix SwirMix commented Oct 14, 2020

client start and stop in loop

Thank you for submitting the pull request to the Apache Ignite.

In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:

The Contribution Checklist

  • There is a single JIRA ticket related to the pull request.
  • The web-link to the pull request is attached to the JIRA ticket.
  • The JIRA ticket has the Patch Available state.
  • The pull request body describes changes that have been made.
    The description explains WHAT and WHY was made instead of HOW.
  • The pull request title is treated as the final commit message.
    The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • A reviewer has been mentioned through the JIRA comments
    (see the Maintainers list)
  • The pull request has been checked by the Teamcity Bot and
    the green visa attached to the JIRA ticket (see TC.Bot: Check PR)

Notes

If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.

modules/ducktests/tests/ignitetest/services/ignite_app.py Outdated Show resolved Hide resolved

self.__check_status("IGNITE_APPLICATION_FINISHED", timeout=timeout_sec)

# pylint: disable=W0221
def stop_node(self, node, clean_shutdown=True, timeout_sec=10):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see you just move this method higher. Let's revert it.
Why do you change timeout_sec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert changes

output = node.account.ssh_capture(
"grep '%s' %s" % (name + "->", self.STDOUT_STDERR_CAPTURE), allow_fail=False)
for line in output:
res.append(re.search("%s(.*)%s" % (name + "->", "<-"), line).group(1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Imagine you have multiple application nodes, all of them are failed. res than contains multiple lines. In method above extract_result it checks size of the error and raises exception.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I see. Then it is better to build a check based on a dynamic parameter. I mean, instead of the expected 1 line, expect len(self.nodes). Am I getting this right?

/**
* Java client. Tx put operation
*/
public class SimpleClient extends IgniteAwareApplication {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename simple client to smth more meaningful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, what about IgniteCachePutClient?

"""

CACHE_NAME = "simple-tx-cache"
PACING = 10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 10?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pacing is an internal delay, the interval between discrete operations. This is a customized parameter, 10 ms is selected so that we know for sure that the average number of operations generated by the client does not exceed 100 operations per second

PACING = 10
JAVA_CLIENT_CLASS_NAME = "org.apache.ignite.internal.ducktest.tests.start_stop_client.SimpleClient"

CLIENTS_WORK_TIME_S = 30
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 30?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CLIENTS_WORK_TIME_S - the working time of the client group. It is selected in such a way as to allow time for the load level to stabilize. A certain number of clients are included in the topology and work in this number of CLIENTS_WORK_TIME_S of time. From my point of view, this operating time is not too high, but it is sufficient for the load level on the cluster to settle after new fat clients enter the cluster. But I admit that I'm wrong.

for i in range(self.ITERATION_COUNT):
self.logger.debug(f'Starting iteration: {i}.')

time.sleep(self.CLIENTS_WORK_TIME_S)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you sleep on first iteration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We give the STATIC_CLIENTS_NUM client group time to WORK during the CLIENTS_WORK_TIME_S time for each iteration of the TEMP_CLIENTS_NUM client group entry and exit. This is done in order to get a stable state of the cluster under a stable load after each client login and exit procedure

for node in service.nodes:
service.stop_node(node=node, clean_shutdown=False)
else:
service.stop(clean_shutdown=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do not replace this if-else block with service.stop(clean_shutdown=kill_nodes)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the base timeout of 10 seconds was not enough to register completion in the base method. However, when they are completed sequentially, I fit into the timeout

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because service.stop waits for completion message "IGNITE_APPLICATION_FINISHED" in any case. So if clean_shutdown is False there is no reason to wait the message. So. It's a bug of stop method. Let's fix it instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guessed, but I was even more afraid to rule. Okay, I'll fix it.

@@ -110,7 +121,8 @@ def extract_result(self, name):
"""
results = self.extract_results(name)

assert len(results) <= 1, f"Expected exactly one result occurence, {len(results)} found."
assert len(results) <= len(self.nodes), f"Expected exactly {len(self.nodes)} occurence," \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's replace <= with !=.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only here it is necessary ==

iteration_count=3,
client_work_time=30)
# pylint: disable=R0913
def test_ignite_start_stop_nodes(self, ignite_version,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To enable run the test in parallel with others please use @cluster decorator. If user wants to run experiments with different number of nodes, the one must comment the decorator in feature branch.

Actually we need a flag for ducktape to ignore this decorator if needed. I will make a ticket for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I found your ticket thank you!

@timoninmaxim
Copy link
Member

LGTM

@SwirMix SwirMix changed the title Ignite 13489 client in out test IGNITE-13489. Clients log in and out of the topology. To validate the topology. Oct 23, 2020
@nizhikov nizhikov changed the title IGNITE-13489. Clients log in and out of the topology. To validate the topology. IGNITE-13489 Clients log in and out of the topology. To validate the topology. Oct 23, 2020
@nizhikov nizhikov changed the title IGNITE-13489 Clients log in and out of the topology. To validate the topology. IGNITE-13489 Clients log in and out of the topology Oct 23, 2020
@nizhikov nizhikov merged commit bd26242 into apache:ignite-ducktape Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants