-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CLOUDSTACK-9691: Fixed unhandeled excetion in list snapshot command when a primary store is deleted related to it #1847
Conversation
Can you tell me which line was throwing a NullPointerException in the code that you fixed? Thanks! |
@mike-tutkowski Its in bug description. Are you not seeing this issue on your end? 2016-08-26 14:34:36,709 DEBUG [c.c.a.ApiServlet] (catalina-exec-1:ctx-90c9ba3a) (logid:115e39ad) ===START=== 10.233.88.59 – GET command=listSnapshots&response=json&listAll=true&page=1&pagesize=20&_=1472202277072 |
@mike-tutkowski @anshul-gangwar @nvazquez I believe the same issue is addressed in PR1735 but in a more consistent fashion |
@serg38 seems so both addressing same issue. |
Hi @anshul1886, |
Let me provide a bit of background on this and then we can decide which way we want to correct this side effect. Here is the PR that went in a while ago that enabled CloudStack to support volume snapshots that reside on primary storage: The idea being these types of snapshots are faster than the back-up-to-secondary-storage approach CloudStack does by default and they can be a lot more space efficient, as well. As part of this process, I went through and tried to identify all of the locations where we assumed a volume snapshot resided on secondary storage (and I put in code to see if it really resides there or, instead, if it's on primary storage). As we have noted, a couple places were missed and this PR (as wells as #1735) were opened to address those issues. The way this particular PR's code is written should work fine. In the case where the original primary storage has been removed, an exception will be thrown, caught, logged, and then we will default to returning secondary storage as the location (which it should be). Instead of the try/catch approach, though, it might be better if we see if dataStore is null. DataStore dataStore = dataStoreMgr.getDataStore(storagePoolId, DataStoreRole.Primary); If that comes back null, then we apparently have removed primary storage, which can only be done if your snapshots don't reside on it. If dataStore == null, return DataStoreRole.IMAGE. |
8821a7d
to
e988c26
Compare
@mike-tutkowski was in python mode so used try except mode instead of null check which is recommended in java. Pushed the changes for null check. |
To clarify this line: "If that comes back null, then we apparently have removed primary storage, which can only be done if your snapshots don't reside on it. If dataStore == null, return DataStoreRole.IMAGE." I meant this for managed storage. For unmanaged storage, the original primary storage from which the snapshot came can be deleted and the snapshot can remain. |
Code = LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for code changes.
@karuturi Seems this is ready for merge, with code LGTMs and based on test results published by @cloudmonger ? |
@sateesh-chodapuneedi @anshul1886 @mike-tutkowski |
@anshul1886 can you and @nvazquez work together and create a single PR? |
@nvazquez What would you like to do? |
Hi @anshul1886, |
@nvazquez, Yeah I am fine with it. I will add the test from your PR to mine PR |
@anshul1886 great, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for code.
Thanks @anshul1886 for fix, and @nvazquez for integration test.
merging |
oops.. will wait for travis |
ACS CI BVT RunSumarry: Link to logs Folder (search by build_no): https://www.dropbox.com/sh/yj3wnzbceo9uef2/AAB6u-Iap-xztdm6jHX9SjPja?dl=0 Failed tests:
Skipped tests: Passed test suits: |
@karuturi Test failing is unrelated to this PR. === TestName: test_add_user_to_project | Status : EXCEPTION === Test failing in travis is | ContextSuite con | exceptions.Excep | 36.025 | test_project_lim | |
Hi @anshul1886, I just observed that the related PR #1735 was for 4.9. Since this bug also exists in 4.9, can you change the base branch of this PR to 4.9 and rebase? |
Ok, Doing that. |
a primary store is deleted related to it
2b8d800
to
3caedb9
Compare
@karuturi, Done |
@blueorangutan package |
@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress. |
Packaging result: ✔centos6 ✔centos7 ✔debian. JID-506 |
@blueorangutan test |
@borisstoyanov a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests |
CLOUDSTACK-9691: Fixed unhandeled excetion in list snapshot command when a primary store is deleted related to it@mike-tutkowski After support for snapshots on solidifire there are many places which are prone to these NullPointer exceptions resulting in various issues. Root cause for these issues is that we get the primary storage associated with snapshot and then figure out how to handle but if that store is deleted then it results in NullPointer exceptions. Without solidfire we don't need to access primary storage. Should we handle that as issues are found or could there be other way to fix all of these issues? * pr/1847: CLOUDSTACK-9691: Added test list_snapshots_with_removed_data_store CLOUDSTACK-9691: Fixed unhandeled excetion in list snapshot command when a primary store is deleted related to it Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
Trillian test result (tid-854)
|
@nvazquez @rhtyd @anshul1886 We might need to tweak the test_data.py to add additional nfs mount e.g. nfs1 and use it in this test. E.g adding |
@borisstoyanov can you have a look at ^^, thanks. |
@karuturi @anshul1886 Can we tweak line 278 of test_snaphsot.py from and re-merge this PR so it doesn't conflict with other tests using "nfs" test data. |
@borisstoyanov @rhtyd I was checking BlueOrangutan logs: In
Then, it gets removed, lines 46-48:
Then, on
Please note that although there's almost 3-hour difference between logs, there exists a PS with the same id ("fa44baa2-267d-3f36-88c9-c5c545e0204b"), but creation times are different: So, when |
Hi @nvazquez, let me simplify what B.O. is doing, when it read 'test' it kicks a jenkins job that deploys the pr, deploys a zone and executes all tests in the smoketests directory. |
Fix for test_snapshots.py using nfs2 instead of nfs templateFix for marvin test failure introduced in #1847 Cc: @borisstoyanov @rhtyd @karuturi * pr/1961: Fix for test failure Fix for test_snapshots.py using nfs2 instead of nfs template Signed-off-by: Rajani Karuturi <rajani.karuturi@accelerite.com>
@mike-tutkowski After support for snapshots on solidifire there are many places which are prone to these NullPointer exceptions resulting in various issues. Root cause for these issues is that we get the primary storage associated with snapshot and then figure out how to handle but if that store is deleted then it results in NullPointer exceptions. Without solidfire we don't need to access primary storage.
Should we handle that as issues are found or could there be other way to fix all of these issues?