New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance pbs_snapshot to capture remote data for --with-sudo option #825

Merged
merged 1 commit into from Oct 12, 2018

Conversation

Projects
None yet
5 participants
@agrawalravi90
Contributor

agrawalravi90 commented Sep 27, 2018

Feature Description

  • This is the second part of enhancing pbs_snapshot to make it non-root friendly
  • It adds support for capturing remote host information without having to run pbs_snapshot via root

Affected Platform(s)

  • All linux

Design

Solution Description

  • pbs_snapshot creates multiple threads, one for each additional host to capture
  • Each thread runs a pbs_snapshot command on its host and copies back the child snapshot tar file
  • The child snapshot tar files are added to the main snapshot at the first directory level

Caveats

  • When --additional-hosts option is provided, pbs_snapshot will now not compress the data on fly for the primary host. It will compress it once all the child snapshots have been copied to the main snapshot directory, and only then it will compress the main snapshot.
  • PBSSnapUtils Python class can now only capture data from a single host, so for each host who's data needs to be captured, the user should instantiate an instance of PBSSnapUtils with the 'primary_host' argument set to the host to capture (defaults to localhost)

Testing logs/output

Checklist:

***For further information please visit the Developer Guide Home.***

def childsnap_thread(logger, host, acct_logs, daemon_logs, map_file, anonymize,
with_sudo, main_snaphome):
"""
Thread routine for each child snapshot being run on a remote host

This comment has been minimized.

@bhroam

bhroam Sep 28, 2018

Contributor

Add a sphinx header

@bhroam

bhroam Sep 28, 2018

Contributor

Add a sphinx header

Show outdated Hide outdated test/fw/ptl/utils/pbs_snaputils.py Outdated
if self.num_daemon_logs > 0 and with_comm_logs:
self.__capture_comm_logs(host)
self.__capture_comm_logs()

This comment has been minimized.

@bhroam

bhroam Sep 28, 2018

Contributor

Just an idle thought: What happens if the comm drops a core? Should we grab a stack trace like we do for other daemons?

@bhroam

bhroam Sep 28, 2018

Contributor

Just an idle thought: What happens if the comm drops a core? Should we grab a stack trace like we do for other daemons?

This comment has been minimized.

@agrawalravi90

agrawalravi90 Oct 2, 2018

Contributor

Thanks for catching that, comm cores are generated in server_priv, so we should catch server_priv on comm nodes.

@agrawalravi90

agrawalravi90 Oct 2, 2018

Contributor

Thanks for catching that, comm cores are generated in server_priv, so we should catch server_priv on comm nodes.

Show outdated Hide outdated test/tests/functional/pbs_snapshot_unittest.py Outdated
Show outdated Hide outdated test/tests/functional/pbs_snapshot_unittest.py Outdated
Show outdated Hide outdated test/tests/functional/pbs_snapshot_unittest.py Outdated
Show outdated Hide outdated test/tests/functional/pbs_snapshot_unittest.py Outdated
Show outdated Hide outdated test/tests/functional/pbs_snapshot_unittest.py Outdated
@agrawalravi90

This comment has been minimized.

Show comment
Hide comment
@agrawalravi90

agrawalravi90 Oct 2, 2018

Contributor

I've addressed your comments Bhroam, please provide further feedback. Thanks.

Contributor

agrawalravi90 commented Oct 2, 2018

I've addressed your comments Bhroam, please provide further feedback. Thanks.

Show outdated Hide outdated test/fw/bin/pbs_snapshot Outdated
Show outdated Hide outdated test/fw/ptl/utils/pbs_snaputils.py Outdated
@agrawalravi90

This comment has been minimized.

Show comment
Hide comment
@agrawalravi90

agrawalravi90 Oct 2, 2018

Contributor

Thanks for more comments Bhroam, I've addressed them, please let me know if things look okay now.

Contributor

agrawalravi90 commented Oct 2, 2018

Thanks for more comments Bhroam, I've addressed them, please let me know if things look okay now.

@bhroam

bhroam approved these changes Oct 2, 2018

Looks good. I sign off.

@agrawalravi90

This comment has been minimized.

Show comment
Hide comment
@agrawalravi90

agrawalravi90 Oct 2, 2018

Contributor

Thanks Bhroam! Requesting one more review

Contributor

agrawalravi90 commented Oct 2, 2018

Thanks Bhroam! Requesting one more review

@tpai26

This comment has been minimized.

Show comment
Hide comment
@tpai26

tpai26 Oct 3, 2018

Contributor

I have a additional test request, can you add a test to run pbs_snapshot without "--with-sudo" option and see that pbs_snapshot does not fail when it cannot capture something.

Contributor

tpai26 commented Oct 3, 2018

I have a additional test request, can you add a test to run pbs_snapshot without "--with-sudo" option and see that pbs_snapshot does not fail when it cannot capture something.

@arungrover

I have these minor comments

Show outdated Hide outdated test/fw/bin/pbs_snapshot Outdated
"--daemon-logs=" + str(daemon_logs),
"--accounting-logs=" + str(acct_logs)]
if anonymize:
cmd.extend(["--obfuscate", "--map=" + map_file])

This comment has been minimized.

@arungrover

arungrover Oct 4, 2018

Contributor

You should check for map_file to be not-none before using it

@arungrover

arungrover Oct 4, 2018

Contributor

You should check for map_file to be not-none before using it

This comment has been minimized.

@agrawalravi90

agrawalravi90 Oct 8, 2018

Contributor

We do a None check in main, and this function is not really supposed to be used generically outside of the current context. Would you still like me to do a None check?

@agrawalravi90

agrawalravi90 Oct 8, 2018

Contributor

We do a None check in main, and this function is not really supposed to be used generically outside of the current context. Would you still like me to do a None check?

This comment has been minimized.

@arungrover

arungrover Oct 11, 2018

Contributor

In that case, leave it as is.

@arungrover

arungrover Oct 11, 2018

Contributor

In that case, leave it as is.

timestamp = str(int(time.time()))
childsnap_threads = {}
for host in hostnames:
thread = Thread(target=childsnap_thread, args=(

This comment has been minimized.

@arungrover

arungrover Oct 4, 2018

Contributor

You you have to pass all the arguments other than host? wouldn't it just access these variables from the parent just like it does for timestamp?

@arungrover

arungrover Oct 4, 2018

Contributor

You you have to pass all the arguments other than host? wouldn't it just access these variables from the parent just like it does for timestamp?

This comment has been minimized.

@agrawalravi90

agrawalravi90 Oct 8, 2018

Contributor

that's interesting, I didn;t realize that i was not passing in timestamp, I need to test this out, thanks for pointing it out

@agrawalravi90

agrawalravi90 Oct 8, 2018

Contributor

that's interesting, I didn;t realize that i was not passing in timestamp, I need to test this out, thanks for pointing it out

This comment has been minimized.

@arungrover

arungrover Oct 11, 2018

Contributor

Just curious, what was your finding @agrawalravi90?

@arungrover

arungrover Oct 11, 2018

Contributor

Just curious, what was your finding @agrawalravi90?

This comment has been minimized.

@agrawalravi90

agrawalravi90 Oct 11, 2018

Contributor

My finding was that threading functions cannot access function local variables of the functions where they are invoked from, but they can access global variables, AND, and this is a lil weird, they can also access variables which are defined inside the if name == 'main' block. since all of the variables that i was passing the threading function were defined inside the if name == 'main' block, the thread could access them.

@agrawalravi90

agrawalravi90 Oct 11, 2018

Contributor

My finding was that threading functions cannot access function local variables of the functions where they are invoked from, but they can access global variables, AND, and this is a lil weird, they can also access variables which are defined inside the if name == 'main' block. since all of the variables that i was passing the threading function were defined inside the if name == 'main' block, the thread could access them.

@agrawalravi90

This comment has been minimized.

Show comment
Hide comment
@agrawalravi90

agrawalravi90 Oct 10, 2018

Contributor

@arungrover and @tpai26 I've addressed your comments, please let me know if things look ok. Thanks

Contributor

agrawalravi90 commented Oct 10, 2018

@arungrover and @tpai26 I've addressed your comments, please let me know if things look ok. Thanks

@bhroam

bhroam approved these changes Oct 10, 2018

Looks good.

"--daemon-logs=" + str(daemon_logs),
"--accounting-logs=" + str(acct_logs)]
if anonymize:
cmd.extend(["--obfuscate", "--map=" + map_file])

This comment has been minimized.

@arungrover

arungrover Oct 11, 2018

Contributor

In that case, leave it as is.

@arungrover

arungrover Oct 11, 2018

Contributor

In that case, leave it as is.

Show outdated Hide outdated test/fw/bin/pbs_snapshot Outdated
@@ -140,6 +127,7 @@ def childsnap_thread(logger, host, acct_logs, daemon_logs, map_file, anonymize,
# Create a directory on the remote host with a unique name
# We will create the snapshot here
timestamp = str(int(time.time()))

This comment has been minimized.

@arungrover

arungrover Oct 11, 2018

Contributor

This can potentially result into every thread getting a different timestamp. Is that ok?

@arungrover

arungrover Oct 11, 2018

Contributor

This can potentially result into every thread getting a different timestamp. Is that ok?

This comment has been minimized.

@agrawalravi90

agrawalravi90 Oct 11, 2018

Contributor

yes that's fine, the timestamp is only used to name the snapshots uniquely on the remote hosts to avoid name conflicts in case we capture another snapshot on that host in the future. Afterwards, the snapshots from the remote hosts are renamed as "_snapshot.tgz" when we copy them over to the main snapshot.

@agrawalravi90

agrawalravi90 Oct 11, 2018

Contributor

yes that's fine, the timestamp is only used to name the snapshots uniquely on the remote hosts to avoid name conflicts in case we capture another snapshot on that host in the future. Afterwards, the snapshots from the remote hosts are renamed as "_snapshot.tgz" when we copy them over to the main snapshot.

timestamp = str(int(time.time()))
childsnap_threads = {}
for host in hostnames:
thread = Thread(target=childsnap_thread, args=(

This comment has been minimized.

@arungrover

arungrover Oct 11, 2018

Contributor

Just curious, what was your finding @agrawalravi90?

@arungrover

arungrover Oct 11, 2018

Contributor

Just curious, what was your finding @agrawalravi90?

@agrawalravi90

This comment has been minimized.

Show comment
Hide comment
@agrawalravi90

agrawalravi90 Oct 11, 2018

Contributor

I've addressed your comments Arun, please let me know if things look okay now.

Contributor

agrawalravi90 commented Oct 11, 2018

I've addressed your comments Arun, please let me know if things look okay now.

@arungrover

Change looks good Ravi

@agrawalravi90

This comment has been minimized.

Show comment
Hide comment
@agrawalravi90

agrawalravi90 Oct 11, 2018

Contributor

Thanks for your sign-off Arun!
@bhroam can you please review this again once? Thanks

Contributor

agrawalravi90 commented Oct 11, 2018

Thanks for your sign-off Arun!
@bhroam can you please review this again once? Thanks

@bhroam

This comment has been minimized.

Show comment
Hide comment
@bhroam

bhroam Oct 12, 2018

Contributor

I dislike using a try except to grab something out of a dictionary, but I don't dislike it enough to not sign off. I find the format you used before more intuitive. I sign off. Squash and rebase for the merge.

Contributor

bhroam commented Oct 12, 2018

I dislike using a try except to grab something out of a dictionary, but I don't dislike it enough to not sign off. I find the format you used before more intuitive. I sign off. Squash and rebase for the merge.

@agrawalravi90

This comment has been minimized.

Show comment
Hide comment
@agrawalravi90

agrawalravi90 Oct 12, 2018

Contributor

Thanks Bhroam, I'll take it, I'll rebase now.

Contributor

agrawalravi90 commented Oct 12, 2018

Thanks Bhroam, I'll take it, I'll rebase now.

@mike0042 mike0042 merged commit 48a8276 into PBSPro:master Oct 12, 2018

4 checks passed

Codacy/PR Quality Review Up to standards. A positive pull request.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
license/cla Contributor License Agreement is signed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment