Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IGNITE-14385: Add checkpoit information to performance statistics. #8928

Merged
merged 47 commits into from Apr 15, 2021

Conversation

Sega76
Copy link
Contributor

@Sega76 Sega76 commented Mar 25, 2021

Thank you for submitting the pull request to the Apache Ignite.

In order to streamline the review of the contribution
we ask you to ensure the following steps have been taken:

The Contribution Checklist

  • There is a single JIRA ticket related to the pull request.
  • The web-link to the pull request is attached to the JIRA ticket.
  • The JIRA ticket has the Patch Available state.
  • The pull request body describes changes that have been made.
    The description explains WHAT and WHY was made instead of HOW.
  • The pull request title is treated as the final commit message.
    The following pattern must be used: IGNITE-XXXX Change summary where XXXX - number of JIRA issue.
  • A reviewer has been mentioned through the JIRA comments
    (see the Maintainers list)
  • The pull request has been checked by the Teamcity Bot and
    the green visa attached to the JIRA ticket (see TC.Bot: Check PR)

Notes

If you need any help, please email dev@ignite.apache.org or ask anу advice on http://asf.slack.com #ignite channel.

@Sega76 Sega76 changed the title IGNITE-14385: Add start/end time of checkpoint, rebalance, PME to performance statistics. IGNITE-14385: Add checkpoit information to performance statistics. Mar 29, 2021
@@ -120,7 +120,8 @@
}
},
() -> true,
new DataRegionMetricsImpl(new DataRegionConfiguration(), cctx.metric(), NO_OP_METRICS),
new DataRegionMetricsImpl(new DataRegionConfiguration(), cctx.metric(), cctx.performanceStatistics(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add PerformanceStatistics instance to ctx on line 68?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -135,7 +135,8 @@
}
},
() -> true,
new DataRegionMetricsImpl(new DataRegionConfiguration(), cctx.metric(), NO_OP_METRICS),
new DataRegionMetricsImpl(
new DataRegionConfiguration(), cctx.metric(), cctx.performanceStatistics(), NO_OP_METRICS),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add PerformanceStatistics instance to ctx on line 68?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


MetricRegistry mreg = srv.context().metric().registry(DATASTORAGE_METRIC_PREFIX);

AtomicLongMetric lastBeforeLockDuration = mreg.findMetric("LastCheckpointBeforeLockDuration");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's inline all these variables.

@@ -81,6 +82,18 @@ public static void startCollectStatistics() throws Exception {
waitForStatisticsEnabled(true);
}

/** Starts collecting performance statistics with immediately flush. */
protected static void startCollectStatisticsWithImmediatelyFlush() throws Exception {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should get rid of this method.


forceCheckpoint();

assertTrue(waitForCondition(() -> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rewrite test as follows:

  1. Start psproc
  2. forceCheckpoint
  3. Wait for checkpoint finish (based on metric value or similar)
  4. Stop psproc.
  5. Check is there checkpoint data in statistics (no multiple read of statistics).

AtomicInteger cnt = new AtomicInteger();

readFiles(statisticsFiles(), new TestHandler() {
@Override public void checkpoint(UUID nodeId, long beforeLockDuration, long lockWaitDuration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix the code formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -197,6 +210,20 @@ protected static PerformanceStatisticsMBean statisticsMBean(String igniteInstanc
boolean timedOut) {
// No-op.
}

/** {@inheritDoc} */
@Override public void checkpoint(UUID nodeId, long beforeLockDuration, long lockWaitDuration,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, fix code formatting.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


startCollectStatisticsWithImmediatelyFlush();

int keysCnt = 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This variables can be inlined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

private static final long serialVersionUID = 0L;

/** Default checkpoint park nanos. */
private static final int CHECKPOINT_PARK_NANOS = 5_000_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be long.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

/**
* Create File I/O that emulates poor checkpoint write speed.
*/
private static class SlowCheckpointFileIOFactory implements FileIOFactory {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's reuse the same class from PagesWriteThrottleSmokeTest or CheckpointBufferDeadlockTest instead of copy-pasting it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


AtomicLongMetric lastStart = mreg.findMetric("LastCheckpointStart");

// wait for checkpoint to finish on node start
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capital letter on the start and dot in the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@NSAmelchev NSAmelchev merged commit 118c64e into apache:master Apr 15, 2021
xintrian pushed a commit to xintrian/ignite that referenced this pull request May 27, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants