Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pbs_snapshot --obfuscate doesn't obfuscate everything #1096

Merged
merged 6 commits into from May 10, 2019

Conversation

@agrawalravi90
Copy link
Contributor

commented Apr 16, 2019

Describe Bug or Feature

pbs_snapshot --obfuscate has many issues: it doesn’t obfuscate everything, there are bugs with obfuscating special attributes (like it only obfuscates the first entry in managers, acl attributes etc.), the obfuscated string has the same length as the original string, which could be decrypted back to the original, etc.

Describe Your Change

Suggesting a redesign of pbs_snapshot --obfuscate. Please refer to the linked design for more details.

Link to Design Doc

https://pbspro.atlassian.net/wiki/spaces/PD/pages/1233125400/Redesigning+pbs+snapshot+--obfuscate

Attach Test Logs or Output

Ran pbs_snapshot.unittest.py
Single node test results:
snapshot_test.log

Will post multi-node results soon.

@agrawalravi90 agrawalravi90 changed the title Snapshot --obfuscate doesn't obfuscate everything pbs_snapshot --obfuscate doesn't obfuscate everything Apr 16, 2019

@agrawalravi90

This comment has been minimized.

Copy link
Contributor Author

commented Apr 18, 2019

FYI, I won't fix the rest of Codacy's warnings as they are not relevant.

@agrawalravi90

This comment has been minimized.

Copy link
Contributor Author

commented Apr 22, 2019

@arungrover can you please review this PR?

@arungrover
Copy link
Contributor

left a comment

I do not have any comment on the code, just a suggestion to add job comment to obfuscation list

ATTR_submit_arguments, ATTR_o, ATTR_S]
resv_attrs_del = [ATTR_v]
svr_attrs_del = [ATTR_mailfrom]
job_attrs_obf = [ATTR_euser, ATTR_egroup, ATTR_project, ATTR_A,

This comment has been minimized.

Copy link
@arungrover

arungrover Apr 24, 2019

Contributor

Do we need job comment? sometimes job comment can also have user name

This comment has been minimized.

Copy link
@agrawalravi90

agrawalravi90 Apr 24, 2019

Author Contributor

We do a sed at the end to replace all instances of usernames etc. from all files, so comment will get obfuscated that way.
The idea is: Find all sensitive users, groups, hostnames etc., create obfuscated values for them and then use that to obfuscate all files in the snapshot. So, job_attrs_obf only needs to contain the attributes which can add to the list of users, groups etc. The usernames that the Comment attribute will contain should already be covered by the other attributes in that list. Does that make sense?

@arungrover
Copy link
Contributor

left a comment

Thanks for explanation @agrawalravi90
I sign-off

@agrawalravi90

This comment has been minimized.

Copy link
Contributor Author

commented Apr 25, 2019

Thanks @arungrover !

"""
Helper method to capture snapshot of the local host
:returns Name of the snapshot directory/tar file captured

This comment has been minimized.

Copy link
@bhroam

bhroam May 1, 2019

Contributor

Shouldn't there be a : on the other side of returns?

This comment has been minimized.

Copy link
@agrawalravi90

agrawalravi90 May 3, 2019

Author Contributor

I'm not sure what the Doxygen convention is. @hirenvadalia or @borlesanket do you guys know?

if attrname == attr:
delete_line = True
val_del = attrval
break

This comment has been minimized.

Copy link
@bhroam

bhroam May 1, 2019

Contributor

Do you need to make this a for loop? Can't you just check if attrname in attrs_to_del? (same as below)

try:
rscs_name = line.split(" ", 1)[0]
except IndexError:
continue

This comment has been minimized.

Copy link
@bhroam

bhroam May 1, 2019

Contributor

I'm not sure it is possible for this to happen. You're doing a split(" ", 1). This means you'll get a list with either 1 or 2 elements in it. If you don't pass a string, you get an AttributeError.

This comment has been minimized.

Copy link
@agrawalravi90

agrawalravi90 May 3, 2019

Author Contributor

Ya, I'm not sure why i did that, I'll try to remove it and see if everything still works fine.

sched_logs = []
for dirname in os.listdir(snap_dir):
if str(dirname).startswith(DFLT_SCHED_LOGS_PATH):
dirpath = os.path.join(snap_dir, str(dirname))

This comment has been minimized.

Copy link
@bhroam

bhroam May 1, 2019

Contributor

I believe listdir() returns a list of strings. I don't think you need to str() them.


# Obfuscate values from val_obf_map
for key, val in self.val_obf_map.iteritems():
sedcmd = r"%s -i 's/\b%s\b/%s/g' %s" % (sed_path, key,

This comment has been minimized.

Copy link
@bhroam

bhroam May 1, 2019

Contributor

Why run sed when we can do this directly in python with re.sub()

This comment has been minimized.

Copy link
@agrawalravi90

agrawalravi90 May 3, 2019

Author Contributor

I think I had issues with word boundaries when trying out re.sub(), sed's \b handles things like "key", "key=value", "key: value", "key1, key2, key3", "key1 key2 key3" etc., I don't remember exactly but I was having trouble doing the same with re.sub(). I'm not very good with regex, so please do let me know if you can think of a regex that will work here.

This comment has been minimized.

Copy link
@agrawalravi90

agrawalravi90 May 3, 2019

Author Contributor

One more thing: sed can also obfuscate binary files:

[root@pbspro jobs]# grep "pbspro" 2017.pbspro.JB
Binary file 2017.pbspro.JB matches
[root@pbspro jobs]# sed -i 's/pbspro/blah/g' 2017.pbspro.JB 
[root@pbspro jobs]# grep "pbspro" 2017.pbspro.JB
[root@pbspro jobs]# /opt/pbs/bin/printjob 2017.pbspro.JB
---------------------------------------------------
jobid:	2017.blah
---------------------------------------------------
state:		0x4
substate:	0x2a (42)
svrflgs:	0x1 (1)
ordering:	0
inter prior:	0
stime:		1556891829
file base:	17.blah
queue:		
--bad union type -1408172029
--attributes--
short read of attribute

This comment has been minimized.

Copy link
@bhroam

bhroam May 6, 2019

Contributor

I'm not so sure you want to run sed on a binary file. According to this stack overflow article I just read it is meant for text. It might not work properly on binary files even if there is valid text within the files. You don't know what other non-ascii characters are in there. Even if we do go the sed route, I wouldn't run it on the .JB files. I'd run printjob on each .JB file and obfuscate that output. I still think we should use re.sub(). I just checked and python regex has \b.

This comment has been minimized.

Copy link
@agrawalravi90

agrawalravi90 May 7, 2019

Author Contributor

I had tried \b with Python regex but wasn't happy with it, but I tried it again and it seems to be working fine, so I'll go ahead with it. And I'll take your and @scc138 's advice on how to deal with .JB files, I will take printjob outputs for them, and delete the binaries.

operator = str(OPER_USER) + '@*'
self.server.manager(MGR_CMD_SET, SERVER,
{ATTR_operators: (INCR, operator)},
sudo=True)

This comment has been minimized.

Copy link
@bhroam

bhroam May 1, 2019

Contributor

You don't need to use sudo. PTL makes the user running the tests a manager.


# Create a queue with acls set
a = {ATTR_qtype: 'execution', ATTR_start: 't', ATTR_enable: 't',
ATTR_aclgren: 't', ATTR_aclgroup: TSTGRP0,

This comment has been minimized.

Copy link
@bhroam

bhroam May 1, 2019

Contributor

Spell out 'True'.

# Create a queue with acls set
a = {ATTR_qtype: 'execution', ATTR_start: 't', ATTR_enable: 't',
ATTR_aclgren: 't', ATTR_aclgroup: TSTGRP0,
ATTR_acluser: TEST_USER}

This comment has been minimized.

Copy link
@bhroam

bhroam May 1, 2019

Contributor

I think you need to str() TSTGRP0 and TEST_USER

This comment has been minimized.

Copy link
@agrawalravi90

agrawalravi90 May 3, 2019

Author Contributor

manager() does it already, that's why some tests pass in int values when setting ncpus on vnodes.

@agrawalravi90

This comment has been minimized.

Copy link
Contributor Author

commented May 3, 2019

@bhroam I've addressed your comments, please provide further feedback.
@arungrover request you to look at the new changes as well. Thanks!

@agrawalravi90

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2019

@bhroam I've addressed your comments, please let me know what you think.
@arungrover request you to review the latest commit as well, I changed the approach from using sed to re.sub()

@bhroam
Copy link
Contributor

left a comment

I'm just confused on one point.

@@ -239,7 +241,7 @@ class ObfuscateSnapshot(object):
ATTR_rescavail + ".vnode"]
sched_attrs_obf = [ATTR_SchedHost]
queue_attrs_obf = [ATTR_acluser, ATTR_aclgroup, ATTR_aclhost]
skip_vals = ["_pbs_project_default"]
skip_vals = ["_pbs_project_default", "*"]

This comment has been minimized.

Copy link
@bhroam

bhroam May 9, 2019

Contributor

I don't understand the '*'. That's a special regular expression character that can't be by itself.

This comment has been minimized.

Copy link
@agrawalravi90

agrawalravi90 May 9, 2019

Author Contributor

I had to add this to skip over obfuscating the "*" from values like "pbsuser@*" for 'managers' attribute

This comment has been minimized.

Copy link
@bhroam

bhroam May 9, 2019

Contributor

Ahh, I understand now. That's a literal '*'.

@bhroam

bhroam approved these changes May 9, 2019

@bhroam

This comment has been minimized.

Copy link
Contributor

commented May 9, 2019

Travis is failing due to pep8 issues. Fix that and we should be good.

@agrawalravi90

This comment has been minimized.

Copy link
Contributor Author

commented May 10, 2019

@bhroam thanks, I've fixed pep8 issues, please merge this if all looks ok.

@arungrover
Copy link
Contributor

left a comment

I apologize for the delay in getting back to this review. I like the change

@agrawalravi90

This comment has been minimized.

Copy link
Contributor Author

commented May 10, 2019

Thanks @arungrover , can you please merge it?

@arungrover

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

I can merge it, I thought @bhroam might want to have a look again after you resolved pep8 problems.

@bhroam

This comment has been minimized.

Copy link
Contributor

commented May 10, 2019

I'm good with the changes.

@arungrover arungrover merged commit d6ffc90 into PBSPro:master May 10, 2019

3 of 4 checks passed

Codacy/PR Quality Review Not up to standards. This pull request quality could be better.
Details
Travis CI - Pull Request Build Passed
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
license/cla Contributor License Agreement is signed.
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.