Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Reduce the stalled logging for snapshot #625

Merged
merged 2 commits into from
Jun 25, 2024

Conversation

sebinsunny
Copy link
Contributor

Reduce the stalled logging for the snapshot process. Previously, we were not resetting last_flush_time, resulting in a lot of logs being printed if the previous base backup failed and the current progress surpassed the previous base backup progress for the snapshot process

About this change - What it does

Resolves: #xxxxx

Why this way

@@ -99,6 +99,7 @@ def progress_callback(progress_step: ProgressStep, progress_data: ProgressMetric
progress_step.value, progress_data["handled"], elapsed
)
else:
self.last_flush_time = time.monotonic()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be better at the outer if, no need to have it in each branch. Also can we improve the log messages to better reflect what is happening? Right now it is a bit confusing.

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 0% with 1 line in your changes missing coverage. Please review.

Project coverage is 90.70%. Comparing base (5505b86) to head (a9c631a).
Report is 13 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main     #625      +/-   ##
==========================================
- Coverage   91.01%   90.70%   -0.32%     
==========================================
  Files          31       31              
  Lines        4917     4957      +40     
==========================================
+ Hits         4475     4496      +21     
- Misses        442      461      +19     
Files Coverage Δ
pghoard/basebackup/delta.py 90.51% <0.00%> (-0.70%) ⬇️

... and 9 files with indirect coverage changes

@sebinsunny sebinsunny force-pushed the sebinsunny-reduce-the-snapshot-stalled-log branch from a9c631a to 3933b55 Compare June 25, 2024 00:33
status = "FAILED" if not result.success else "successfully"
log_msg = f"{operation_type.capitalize()} of key: {key}, " \
log_msg = f"{oper.capitalize()} of key: {key}, " \
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reused oper and already contained the necessary info

self.metrics.gauge("pghoard.seconds_since_backup_progress_stalled", 0, tags=tags)
self.log.info(
"Updated snapshot progress for %s to %d files; elapsed time since last check: %.2f seconds.",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It confuses operators, and because it doesn't log frequently, we might think it is stuck. Because of progress_data["handled"] value

@sebinsunny sebinsunny requested a review from facetoe June 25, 2024 01:27
@sebinsunny sebinsunny force-pushed the sebinsunny-reduce-the-snapshot-stalled-log branch from 3933b55 to 7e3d0bf Compare June 25, 2024 02:07
Reduce the stalled logging for the snapshot process. Previously, we were not resetting last_flush_time, resulting in a lot of logs being printed if the previous base backup failed and the current progress surpassed the previous base backup progress for the snapshot process
@sebinsunny sebinsunny force-pushed the sebinsunny-reduce-the-snapshot-stalled-log branch from 7e3d0bf to e1ac0bb Compare June 25, 2024 02:10
@facetoe facetoe merged commit 021be69 into main Jun 25, 2024
7 checks passed
@facetoe facetoe deleted the sebinsunny-reduce-the-snapshot-stalled-log branch June 25, 2024 03:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants