Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Get Diagnostics: Download logs and diagnostics data from SSVM, CPVM, Router #3350

Open
wants to merge 9 commits into
base: master
from

Conversation

@PaulAngus
Copy link
Member

commented May 23, 2019

Description

This implements a new feature to get logs and diagnostics data from systemvms (CPVM, SSVM) and virtual routers as a downloadable zip file from the secondary storage. The diagnostics zip files live in the diagnostics directory of the secondary storage. The feature is only supported for NFS based secondary storage and root admins.

FS: https://cwiki.apache.org/confluence/display/CLOUDSTACK/CloudStack+Get+Diagnostics+Data+API

The feature adds the following global settings:

diagnostics.data.gc.enable
diagnostics.data.gc.interval
diagnostics.data.retrieval.timeout
diagnostics.data.max.file.age
diagnostics.data.disable.threshold
diagnostics.data.systemvm.defaults
diagnostics.data.router.defaults

In the UI, the root admin can now see a download button to get the diagnostics data from CPVM, SSVM and VRs.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)

Screenshots (if appropriate):

Download icon in menu:
image

Prompt to override default files:
image

Download link:
image

Files in the tarball:
image

How Has This Been Tested?

Manual testing of downloading diagnostics for VRs, SSVMs and CPVMs

@DaanHoogland
Copy link
Contributor

left a comment

one structural comment: the use of polymorphism in the DiagnosticsFiles fileprocessor package may not be the most efficient, but seems functional. havy testing needed to but bith unit and marvin tests are provided.

@rhtyd

This comment has been minimized.

Copy link
Member

commented May 24, 2019

Design review:

  • Large archives transported via cmd-answer pattern can cause OOM in both management server and ssvm/kvm agent
  • The feature can benefit from an agnostic distributed file sharing manager/service (such as based on bittorrent or rsync+ssh based)
@DaanHoogland

This comment has been minimized.

Copy link
Contributor

commented May 24, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

commented May 24, 2019

@DaanHoogland a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

commented May 24, 2019

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-2788

@DaanHoogland

This comment has been minimized.

Copy link
Contributor

commented May 27, 2019

@blueorangutan

This comment has been minimized.

Copy link

commented May 27, 2019

@DaanHoogland a Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

@blueorangutan

This comment has been minimized.

Copy link

commented May 28, 2019

Trillian test result (tid-3591)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 35133 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3350-t3591-kvm-centos7.zip
Intermittent failure detected: /marvin/tests/smoke/test_privategw_acl.py
Intermittent failure detected: /marvin/tests/smoke/test_vpc_redundant.py
Smoke tests completed. 70 look OK, 0 have error(s)
Only failed tests results shown below:

Test Result Time (s) Test File
@nvazquez

This comment has been minimized.

Copy link
Contributor

commented Jun 8, 2019

@rhtyd the current implementation is using cmd-answer pattern to indicate which files needs to be copied, but using SCP to copy the files using this library com.trilead.ssh2.SCPClient. Can you explain a bit further how would you think it should be redesigned?

@nvazquez nvazquez force-pushed the shapeblue:retrieve-diagnostics-data-rebase branch from d2f3096 to a3067f5 Jun 10, 2019

@rhtyd

This comment has been minimized.

Copy link
Member

commented Jun 18, 2019

@nvazquez let me revisit this PR soon, I need to check between which points we're scp-ing. The VR logs can grow as much as 0.5GBs in size, if we're passing the payload via cmd-answer pattern then it can potentially cause out-of-memory issues in the JVM process/agent that does this.

@rhtyd rhtyd self-assigned this Jun 18, 2019

@borisstoyanov borisstoyanov changed the title Retrieve diagnostics data rebase WIP:Retrieve diagnostics data rebase Jun 21, 2019

@rhtyd

This comment has been minimized.

Copy link
Member

commented Jul 1, 2019

Rebased against master, will test and review the design and implementation.

@rhtyd rhtyd changed the title WIP:Retrieve diagnostics data rebase [WIP] Retrieve diagnostics data rebase Jul 1, 2019

@rhtyd

This comment has been minimized.

Copy link
Member

commented Jul 1, 2019

Several issues with the implementation fixed, there are two global settings essentially to configure the list of CPVM/SSVM files and router files which are hosted in the NFS based secondary storage at diagnostics directory.

Global settings added by this PR:

diagnostics.data.gc.enable
diagnostics.data.gc.interval
diagnostics.data.retrieval.timeout
diagnostics.data.max.file.age
diagnostics.data.disable.threshold
diagnostics.data.systemvm.defaults
diagnostics.data.router.defaults

@rhtyd rhtyd requested review from GabrielBrascher and nvazquez Jul 1, 2019

@rhtyd

rhtyd approved these changes Jul 1, 2019

Copy link
Member

left a comment

Works OK, however it adds dependency on SSVM/secondary storage and copying of files assume that secondary storage is NFS based and is accessible. On non-NFS storages, this may not explicitly tell the admin that it does not work for non-NFS secondary storage (in most cases unlikely).

Pending testing on XenServer and VMware. KVM LGTM.

@rhtyd rhtyd changed the title [WIP] Retrieve diagnostics data rebase [WIP] Retrieve Diagnostics from SSVM, CPVM, Router Jul 1, 2019

@rhtyd rhtyd changed the title [WIP] Retrieve Diagnostics from SSVM, CPVM, Router [WIP] Get Diagnostics: Download logs and diagnostics data from SSVM, CPVM, Router Jul 1, 2019

@borisstoyanov

This comment has been minimized.

Copy link
Contributor

commented Jul 1, 2019

@blueorangutan package

@rhtyd rhtyd force-pushed the shapeblue:retrieve-diagnostics-data-rebase branch from 595a5b6 to 9836d33 Jul 3, 2019

@blueorangutan

This comment has been minimized.

Copy link

commented Jul 3, 2019

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-79

@borisstoyanov

This comment has been minimized.

Copy link
Contributor

commented Jul 3, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

commented Jul 3, 2019

@borisstoyanov a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

commented Jul 3, 2019

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-80

@borisstoyanov

This comment has been minimized.

Copy link
Contributor

commented Jul 3, 2019

@blueorangutan test matrix

@blueorangutan

This comment has been minimized.

Copy link

commented Jul 3, 2019

@borisstoyanov a Trillian-Jenkins matrix job (centos6 mgmt + xs71, centos7 mgmt + vmware65, centos7 mgmt + kvmcentos7) has been kicked to run smoke tests

@rhtyd

This comment has been minimized.

Copy link
Member

commented Jul 3, 2019

@borisstoyanov as I've mentioned, it is still work in progress, XenServer support is not implemented yet.

@blueorangutan

This comment has been minimized.

Copy link

commented Jul 3, 2019

Trillian test result (tid-93)
Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7
Total time taken: 27149 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3350-t93-kvm-centos7.zip
Smoke tests completed. 71 look OK, 0 have error(s)
Only failed tests results shown below:

@blueorangutan

This comment has been minimized.

Copy link

commented Jul 3, 2019

Trillian test result (tid-94)
Environment: vmware-65u2 (x2), Advanced Networking with Mgmt server 7
Total time taken: 31262 seconds
Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr3350-t94-vmware-65u2.zip
Smoke tests completed. 67 look OK, 4 have error(s)
Only failed tests results shown below:

Dingane Hlaluku and others added some commits Nov 29, 2018

* Complete API implementation
* Complete UI integration
* Complete marvin test
* Complete Secondary storage GC background task
multiple fixes and cleanups
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
fix more bugs, let it return ip rule list in another log file
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
fix missing iprule bug
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>
add support for ARCHIVE type of object to be linked/setup on secstorage
Signed-off-by: Rohit Yadav <rohit.yadav@shapeblue.com>

@rhtyd rhtyd force-pushed the shapeblue:retrieve-diagnostics-data-rebase branch from 9836d33 to c924cae Jul 4, 2019

@rhtyd rhtyd closed this Jul 15, 2019

@rhtyd rhtyd reopened this Jul 15, 2019

@rhtyd

This comment has been minimized.

Copy link
Member

commented Jul 15, 2019

@anuragaw @shwstppr can one of you help fix the issue for xenserver which is to implement a method in vmops file that will scp a zip file in VR/systemvm to a mounted secondary storage folder/path. For KVM and VMware the feature works.

@anuragaw

This comment has been minimized.

Copy link
Contributor

commented Jul 15, 2019

I’ll pick it up @rhtyd

@anuragaw

This comment has been minimized.

Copy link
Contributor

commented Jul 18, 2019

Update: Figured out the solution and close to finishing this for XenServer's broken functionality.

@rhtyd

This comment has been minimized.

Copy link
Member

commented Jul 22, 2019

Any update @anuragaw? Do you want to take over?

@anuragaw

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

@rhtyd I spent some time on it last week but was away since my last comment since Thursday evening. I'll work on this today and update the PR while testing other possible scenarios.

@anuragaw

This comment has been minimized.

Copy link
Contributor

commented Jul 22, 2019

@blueorangutan package

@blueorangutan

This comment has been minimized.

Copy link

commented Jul 22, 2019

@anuragaw a Jenkins job has been kicked to build packages. I'll keep you posted as I make progress.

@blueorangutan

This comment has been minimized.

Copy link

commented Jul 22, 2019

Packaging result: ✔centos6 ✔centos7 ✔debian. JID-169

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.