New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot removal and storage cleanup logs #8031
Conversation
@blueorangutan package |
@harikrishna-patnala a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code LGTM
Codecov Report
@@ Coverage Diff @@
## main #8031 +/- ##
============================================
+ Coverage 28.57% 29.03% +0.45%
- Complexity 29784 30281 +497
============================================
Files 5100 5101 +1
Lines 358565 358728 +163
Branches 52316 52353 +37
============================================
+ Hits 102464 104159 +1695
+ Misses 241968 240269 -1699
- Partials 14133 14300 +167
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 221 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 7222 |
...orage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotServiceImpl.java
Outdated
Show resolved
Hide resolved
@@ -364,7 +364,7 @@ protected Void deleteSnapshotCallback(AsyncCallbackDispatcher<SnapshotServiceImp | |||
SnapshotResult res = null; | |||
try { | |||
if (result.isFailed()) { | |||
s_logger.debug("delete snapshot failed" + result.getResult()); | |||
s_logger.debug(String.format("Failed to delete snapshot [%s] due to: [%s].", snapshot.getUuid(), result.getResult())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While logging uuid/id, prefix it with [id=
. This is how we are logging at other places as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, it is not a consensus.
s_logger.debug(String.format("Verifying if snapshot [%s] is in destroying state in any image data store.", snapshotUuid)); | ||
SnapshotInfo snapshotInfo = snapshotFactory.getSnapshot(snapshot.getId(), DataStoreRole.Image); | ||
|
||
if (snapshotInfo != null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If snapshot doesn't exist, snapshotInfo will also be null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. But, snapshotInfo
is getting only snapshots from Image data store role. If a snapshot is in destroying state in a Primary data store role and the same snapshot in the Image data store role has already been cleaned up, snapshotInfo
will be null and a NPE will be thrown.
Snapshots from ID 18 & 19 were cleaned up in Image data store but the snapshot still remains in destroying state in Primary data store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per my understanding, the below code takes care of that. We don't need to check for snapshot
and snapshotInfo
separately. Just a check on snapshotInfo should be fine and the code above (https://github.com/apache/cloudstack/pull/8031/files#diff-cf68e132ad7711cfa9250cda5819c09f51cba9933aa595cd82ad5a02b4edbe64R1319-R1325) can be removed.
Lines 91 to 111 in b4c7705
@Override | |
public SnapshotInfo getSnapshot(long snapshotId, DataStoreRole role, boolean retrieveAnySnapshotFromVolume) { | |
SnapshotVO snapshot = snapshotDao.findById(snapshotId); | |
if (snapshot == null) { | |
return null; | |
} | |
SnapshotDataStoreVO snapshotStore = snapshotStoreDao.findBySnapshot(snapshotId, role); | |
if (snapshotStore == null) { | |
if (!retrieveAnySnapshotFromVolume) { | |
return null; | |
} | |
snapshotStore = snapshotStoreDao.findByVolume(snapshotId, snapshot.getVolumeId(), role); | |
if (snapshotStore == null) { | |
return null; | |
} | |
} | |
DataStore store = storeMgr.getDataStore(snapshotStore.getDataStoreId(), role); | |
SnapshotObject so = SnapshotObject.getSnapshotObject(snapshot, store); | |
return so; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code you sugested to remove is being used to get the snapshot uuid and to log it in places that are out of snapshotInfo
scope and in cases where snapshotInfo
is null.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hsato03 , if I understand correctly you do a DB query for the sole purpose of logging. That seems a stretch. An operator can do this out of bound. If at all one would want this it should be turn-offable like behind an if (LOGGER.isDebugEnabled()) {}
guard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DaanHoogland I made the changes to check if debug is enabled. By the way, the error log can no longer use the snapshot uuid.
...orage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotServiceImpl.java
Outdated
Show resolved
Hide resolved
...orage/snapshot/src/main/java/org/apache/cloudstack/storage/snapshot/SnapshotServiceImpl.java
Outdated
Show resolved
Hide resolved
Co-authored-by: dahn <daan.hoogland@gmail.com>
@blueorangutan package |
@DaanHoogland a [SF] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
@blueorangutan test |
@DaanHoogland a [SF] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
[SF] Trillian test result (tid-7943)
|
@GutoVeronezi @vishesh92 Are you allright with this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CLGTM
Description
The snapshot removal and storage cleanup logs have few information about these processes, which makes troubleshooting difficult.
This PR intends to add new log messages and rewrite the old ones to make troubleshooting easier.
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
I decreased the value of the
storage.cleanup.interval
configuration and deleted a snapshot. Then I checked the management server logs.