-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[git-webkit] Add pre-push hook to prevent publication of security sensative commits #11043
Conversation
5dfede3
to
86d00a9
Compare
EWS run on previous version of this PR (hash 5dfede3) |
EWS run on previous version of this PR (hash 86d00a9) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had questions/comments while reviewing the code, but I don't feel qualified to approve or deny this (as I haven't reviewed related code in the past), so none of my comments should block landing the patch.
In general, I was looking for:
- Ways to improve readability of the code.
- Places where additional sanity checks may be required in the code.
- Places where mistakes could be made or could occur, such as the order of remotes defining security levels, or copying and pasting the same text repeatedly to approve an action.
Also, I may not understand the threat model here, so (for example) the order of remotes matching the security level may be okay. (I was worried about a situation where there was a misconfigured setup, or a setup that had been locally edited for some reason without enforcingβor knowledge ofβthese rules.)
Tools/Scripts/hooks/pre-push
Outdated
SCRIPTS = os.path.dirname(os.path.dirname(LOCATION)) | ||
sys.path.append(SCRIPTS) | ||
|
||
REMOTE_RE = re.compile(r'(?P<protcol>[^@:]+)(@|://)(?P<host>[^:]+)(/|:)(?P<path>\S+)(\.git)?/?') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use urllib.parse
or similar to parse repository URLs instead of a regex? This regex is awfully complex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to rely on anything outside of the Python standard library here, and urllib.parse
doesn't handle something like git@github.com:WebKit/WebKit.git
or ssh://github.com:WebKit/WebKit.git
, so while we could rely on both urllib.parse
and a regex for git@
and ssh://
, but the reduced regex which can handle git@
and ssh://
can nearly handle urllib.parse
's case too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could do something like:
url = 'git@github.com:WebKit/WebKit.git'
scheme = urllib.parse.urlsplit(url).scheme
if not scheme:
url = 'ssh://' + url.replace(':', '/', 1)
if scheme == 'ssh':
url = url.replace(':', '/', 1)
result = urllib.parse.urlsplit(url)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do something like this (in a function, presumably), I think the question is: is this any less confusing than a regex?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think this is easier to follow, and much easier to reason about the correctness of. Of course, it could be made easier to follow with comments, too. But I'm biased because I wrote it :)
Up to you to decide which is better for maintaining.
if not REPOSITORY: | ||
print_error('Checkout too out-of-date to check bug status of novel commits') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the if not REPOSITORY:
block need a sys.exit(1)
call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, if we invoked sys.exit
here, we would prevent users from pushing anywhere if their checkout was too old. Given that your checkout being too old only breaks 1 of our 3 checks (and the other two checks are the ones most relevant to an old checkout anyways), we want to keep this hook useful even in old checkouts
@@ -37,6 +38,8 @@ class Setup(Command): | |||
name = 'setup' | |||
help = 'Configure local settings for the current repository' | |||
|
|||
REMOTE_RE = re.compile(r'(?P<protcol>[^@:]+)(@|://)(?P<host>[^:]+)(/|:)(?P<path>\S+)(\.git)?/?') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in a previous comment, it would be nice if we could use urllib.parse
or similar to parse git repo URLs rather than relying on a complex regex.
result['fork'] = level | ||
else: | ||
result['{}-fork'.format(key.split('.')[-1])] = level | ||
level += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the security level really just based on the order in which the remotes are listed in a .git/config
file, or is there some magic that happens with git config --get-regexp webkitscmpy.remotes
to ensure the correct order each time it is run?
Maybe I'm not understanding the threat model, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These aren't listed in .git/config
, they come from metadata/git_config_extension
. It is the order in this file (which is checked in) that matters. The nice thing about this approach is that it can allow other workflows to define additional remotes in .git/config
and still have this logic work, but we can discuss alternatives here (like perhaps defining security levels in metadata/git_config_extension
) if this implicit order concept isn't making sense to folks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there is precedent for this pattern in the code, but I am scared at how magical the ordering is. It seems like any accidental reordering could lead to a leak of security content.
Would you mind putting a comment in metadata/git_config_extension
explaining the significance of the ordering?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emw-apple and I discussed this offline, instead of relying on implicit ordering here, we're going to do #11251, which makes the security level an explicit part of the configuration. Once that PR lands, I'll adopt it here.
with open(os.path.join(self.path, '.git', 'config'), 'a') as f: | ||
f.write('[webkitscmpy "remotes"]\n') | ||
f.write(' origin = git@example.org:project/project\n') | ||
f.write(' security = git@example.org:project/project-security\n') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Per my previous comment, what happens if you reverse origin
and security
? Will security
be at a lower security level than origin
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would. These are defined in metadata/git_config_extension
, the unit tests using a definition in .git/config
because our mocking isn't smart enough to follow this the way real git
is:
[include]
path = ../metadata/git_config_extension
86d00a9
to
2e1fb34
Compare
EWS run on previous version of this PR (hash 2e1fb34) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working my way through this, but leaving some comments to start with.
Tools/Scripts/hooks/pre-push
Outdated
print('We have detected you are about to publish security content to a public remote') | ||
print('Unless you are engaged in merge-back or fixing a security bug which has not shipped, you are likely making a mistake') | ||
print('') | ||
sys.stdout.write('Type "MAKE PUBLIC" to continue with publication: ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the thing GitHub does for risky actions, where you have to type the type out the branch name you've selected, to mentally confirm that you're operating on the right branch. Typing "Make safari-XXXX.X-branch public" or similar seems like it would work well here.
Tools/Scripts/hooks/pre-push
Outdated
SCRIPTS = os.path.dirname(os.path.dirname(LOCATION)) | ||
sys.path.append(SCRIPTS) | ||
|
||
REMOTE_RE = re.compile(r'(?P<protcol>[^@:]+)(@|://)(?P<host>[^:]+)(/|:)(?P<path>\S+)(\.git)?/?') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you could do something like:
url = 'git@github.com:WebKit/WebKit.git'
scheme = urllib.parse.urlsplit(url).scheme
if not scheme:
url = 'ssh://' + url.replace(':', '/', 1)
if scheme == 'ssh':
url = url.replace(':', '/', 1)
result = urllib.parse.urlsplit(url)
2e1fb34
to
424bb7b
Compare
EWS run on previous version of this PR (hash 424bb7b) |
424bb7b
to
62c30ea
Compare
EWS run on previous version of this PR (hash 62c30ea) |
result['fork'] = level | ||
else: | ||
result['{}-fork'.format(key.split('.')[-1])] = level | ||
level += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know there is precedent for this pattern in the code, but I am scared at how magical the ordering is. It seems like any accidental reordering could lead to a leak of security content.
Would you mind putting a comment in metadata/git_config_extension
explaining the significance of the ordering?
if match: | ||
result['{}:{}'.format(match.group('host'), match.group('path'))] = level | ||
if username: | ||
team, repository = match.group('path').split('/', 1) | ||
if team == repository: | ||
forked_name = repository | ||
elif '-' in repository: | ||
forked_name = repository | ||
else: | ||
forked_name = '{}-{}'.format(repository, team) | ||
result['{}:{}/{}'.format(match.group('host'), username, forked_name)] = level | ||
|
||
result[str(key.split('.')[-1])] = level | ||
if username: | ||
if key.split('.')[-1] == 'origin': | ||
result['fork'] = level | ||
else: | ||
result['{}-fork'.format(key.split('.')[-1])] = level |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, this is assumes the naming convention of forks that git-webkit sets up. And if a user has renamed their fork in GitHub or changed their remote name locally, it'll be caught by pre-push as having "no" security level.
Not leaking security content when someone renames their fork is obviously good, but I don't like putting more naming requirements on things which belong an engineer (namely, their repositories and local configuration). For example, I've mentioned to you before that I call my forks emw
, because emw-apple
is superfluous. It seems like I wouldn't be able to push security commits to a similarly-named private remote.
If we changed this, what would you think about:
- using the GitHub API to check whether a repo is private or not
- determining security level by identifying a known branch or tag on the remote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@emw-apple discussed this offline, and checking using GitHub's API is a good idea, but also an expensive thing to do every push. git-webkit setup
owning the API calls, though, makes a lot of sense. So that's what the most recent iteration of this PR does. That lets us rip out all logic that treats different remote names differently, everything is based on remote URL.
62c30ea
to
e2536e3
Compare
EWS run on previous version of this PR (hash e2536e3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r+ with the pending adoption of #11251
Tools/Scripts/hooks/pre-push
Outdated
# Any future commits (being ancesstors of this one) will nessesarily be on the same remote this commit is | ||
if remotes_with[commit]: | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix typo, plus, isn't this the inverse of the above statement?
# Any future commits (being ancesstors of this one) will nessesarily be on the same remote this commit is | |
if remotes_with[commit]: | |
break | |
else: | |
# Any future commits (being ancestors of this one) will nessesarily be on the same remote this commit is | |
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite, I made this change, then realized it was a performance regression. We want to early exit the loop the moment we encounter a commit which has remotes, even if the commit in question isn't in the dictionary yet.
Tools/Scripts/hooks/pre-push
Outdated
) | ||
print('') | ||
print('We have detected you are about to publish security content to a public remote') | ||
print('Unless you are engaged in merge-back or fixing a security bug which has not shipped, you are likely making a mistake') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re "which has not shipped": How do contributors know this? If there's a silent "in Safari" in this sentence, then Apple engineers can determine whether a security bug has escaped, but I'm not aware of how to make that call more generally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the short answer here is that a contributor which does not understand how various WebKit ports ship code will be unable to answer this question. I think that's fine, such a contributor would then ask for help (which they would need to post a security PR anyways, since the repository where one does that is not publicly accessible).
The longer answer is that if you are fixing a security bug that has just been introduced in the last week or two, it's a good bet it hasn't shipped yet. Past that, answering the question "has this bug shipped" is a hard question to answer that requires coordination between various WebKit ports, and is not likely to be answerable with public information. While I have Safari in mind when writing this comment, making the statement general was deliberate, it applies to other shipping ports as well.
Tools/Scripts/hooks/pre-push
Outdated
branches = sorted(list(set([branch_for(commit) for commit in prompt_for]))) | ||
expected_response = 'publicize {}'.format( | ||
branches[0] if len(branches) == 1 | ||
else '{} and {}'.format(', '.join(branches[:-1]), branches[-1]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: oxford comma
else '{} and {}'.format(', '.join(branches[:-1]), branches[-1]), | |
else '{}, and {}'.format(', '.join(branches[:-1]), branches[-1]), |
Tools/Scripts/hooks/pre-push
Outdated
return 1 | ||
|
||
if prompt_for: | ||
branches = sorted(list(set([branch_for(commit) for commit in prompt_for]))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These look like unnecessary intermediate lists:
branches = sorted(list(set([branch_for(commit) for commit in prompt_for]))) | |
branches = sorted({branch_for(commit) for commit in prompt_for]}) |
Tools/Scripts/hooks/pre-push
Outdated
sys.stdout.write('Type "{}" to continue with publication: '.format(expected_response)) | ||
sys.stdout.flush() | ||
response = TTY.readline().rstrip() | ||
if response != expected_response: | ||
print_error('User failed to explicitly approve potentially dangerous push') | ||
return 1 | ||
print("User has explicitly approved potentially dangerous push, continuing...") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few small ideas here:
- Can we write to
TTY
, in case stdout is redirected somewhere weird? - To help users detect typos, we could confirm when a response is recognized but does not match.
sys.stdout.write('Type "{}" to continue with publication: '.format(expected_response)) | |
sys.stdout.flush() | |
response = TTY.readline().rstrip() | |
if response != expected_response: | |
print_error('User failed to explicitly approve potentially dangerous push') | |
return 1 | |
print("User has explicitly approved potentially dangerous push, continuing...") | |
TTY.write('Type "{}" to continue with publication: '.format(expected_response)) | |
TTY.flush() | |
response = TTY.readline().rstrip() | |
if response != expected_response: | |
if response: | |
print("Response does not match prompt") | |
print_error('User failed to explicitly approve potentially dangerous push') | |
return 1 | |
print("User has explicitly approved potentially dangerous push, continuing...") |
Tools/Scripts/hooks/pre-push
Outdated
SCRIPTS = os.path.dirname(os.path.dirname(LOCATION)) | ||
sys.path.append(SCRIPTS) | ||
|
||
REMOTE_RE = re.compile(r'(?P<protcol>[^@:]+)(@|://)(?P<host>[^:]+)(/|:)(?P<path>\S+)(\.git)?/?') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I think this is easier to follow, and much easier to reason about the correctness of. Of course, it could be made easier to follow with comments, too. But I'm biased because I wrote it :)
Up to you to decide which is better for maintaining.
e2536e3
to
3321a75
Compare
β¦sative commits https://bugs.webkit.org/show_bug.cgi?id=253354 rdar://106216593 Reviewed by Elliott Williams. Write a pre-push hook to block or prompt the user in 3 situations to prevent the inadvertent publication of security sensative commits: - Class 1: A commit exists on a remote more secure than the one a contributor is pushing to - Class 2: A commit is a cherry-pick of a commit from a more secure remote - Class 3: The commit references a security bug the target remote is public The goal of this hook is to prevent class 1 and 2 without relying on code in the checkout, while class 3 relies on webkitbugspy to determine if a linked issue is redacted. * Tools/Scripts/hooks/pre-push: Added. * Tools/Scripts/libraries/webkitscmpy/setup.py: Bump version. * Tools/Scripts/libraries/webkitscmpy/webkitscmpy/__init__.py: Ditto. * Tools/Scripts/libraries/webkitscmpy/webkitscmpy/mocks/local/git.py: (Git): Add `git config --get-regexp` * Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/clean.py: (Clean.cleanup): Forward verbosity into `git push`. (Clean.main): Ditto. (DeletePRBranches.main): Ditto. * Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/land.py: (Land.main): Forward verbosity into `git push`. * Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/publish.py: (Publish.main): Change operating mode of our pre-push hook to allow class-1 security publication with a prompt. * Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/pull_request.py: (PullRequest.create_pull_request): Forward verbosity into `git push`. * Tools/Scripts/libraries/webkitscmpy/webkitscmpy/program/setup.py: (Setup._security_levels): Provide a security level for source and fork remotes based on the order of our source remotes. (Setup.git): Pass arguments to template for our pre-push hook. * Tools/Scripts/libraries/webkitscmpy/webkitscmpy/test/git_unittest.py: * Tools/Scripts/libraries/webkitscmpy/webkitscmpy/test/setup_unittest.py: Canonical link: https://commits.webkit.org/261526@main
3321a75
to
604395a
Compare
Committed 261526@main (604395a): https://commits.webkit.org/261526@main Reviewed commits have been landed. Closing PR #11043 and removing active labels. |
EWS run on current version of this PR (hash 3321a75) |
604395a
3321a75
π iosπ macπ wincairoπ mac-AS-debugπ§ͺ wpe-wk2π§ͺ ios-wk2π§ͺ api-macπ§ͺ api-iosπ§ͺ mac-wk1π§ͺ gtk-wk2π tvπ§ͺ mac-wk2π§ͺ api-gtkπ tv-simπ§ͺ mac-AS-debug-wk2π§ͺ servicesπ watchπ§ͺ mac-wk2-stressπ watch-sim