Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a (b)ack option to 'Is this a valid secret?' Closes Issue #63 #72

Merged
merged 8 commits into from
Sep 11, 2018

Conversation

cleborys
Copy link

@cleborys cleborys commented Aug 30, 2018

Add going backwards option to 'Is this a valid secret?', closing Issue #63

This would be my first contribution to open source ever - thank you for the "good first issue" flag!

I have not yet figured out how to run the tests properly, else I would try to be test-driven.

The two directions I could see this go in are currently (preferences welcome):

  1. (the "obvious" one) Make _secret_generator in core/audit.py a list instead of a generator and handle indices in audit_baseline
  2. Cast _secret_generator to a "bidirectional iterator" object (still goes through list). That adds some overhead, but keeps the for ... in _secret_generator of audit_baseline and might make audit.py more readable.

.gitignore Outdated
@@ -1,6 +1,7 @@
*.egg-info
*.py[co]
*.sw[op]
.secrets.baseline
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.secrets.baseline felt like it should only be in a local copy. I might be mistaken.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, that is confusing, sorry about that, it is purposefully meant to be included in the Git repo though.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I re-added it :)

@KevinHock KevinHock self-requested a review September 3, 2018 00:41
@KevinHock
Copy link
Collaborator

This would be my first contribution to open source ever

That's really awesome :D 🎈 🍰 🎉

I'll get back to you on Tuesday or Wednesday with possible implementation preferences.

cleborys added 3 commits September 3, 2018 15:40
next call of `__next__` would look like. Does not work properly yet (can
only step back if last choice was `s` and counter at top does not
decrease properly.
@domanchi
Copy link
Contributor

domanchi commented Sep 4, 2018

Bidirectional iterator looks good! Looking forward to tests!

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far 👍

detect_secrets/core/bidirectional_iterator.py Outdated Show resolved Hide resolved
raise StopIteration
return result

def next(self):
Copy link
Collaborator

@KevinHock KevinHock Sep 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Looping should call the __next__ method directly, so no need for a next method 👍

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added next to be python2 compatible (it was renamed form next to __next__ from python2 to 3).
However, this does not feel very clean - perhaps you know of a better way? :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha! Very good point, I only tested on Python 3. My bad. I am impressed by how thorough you are 👍

@cleborys
Copy link
Author

cleborys commented Sep 7, 2018

Is there an assumption that secrets in the results of the baseline are in a fixed order? (I mean the lists baseline['results'][filename])
If not I would re-write merge_results in baseline.py to remove this assumption so that it works with updates to the baseline that are out-of-order.
That would simplify some choices in audit_baseline of audit.py a lot :)

@domanchi
Copy link
Contributor

domanchi commented Sep 7, 2018

@cleborys: currently, no there isn't. It's a Python dictionary for O(1) access by filename (and because the JSON dump is human readable), but that means key iteration cannot be depended on for order.

It's a good point though -- and would simplify testing and other logic. Maybe you would like to work on it as a next PR? =D If we sort the keys before iterating through it, that should fix stuff.

@cleborys
Copy link
Author

cleborys commented Sep 7, 2018

@domanchi Sorry, I am a bit confused 😅
I meant the list that is stored under the filename key and contains the secrets in that file. The merge_results function assumes that updates to that list are in order so that it is easier to update, but that now means I have some problems when stepping back on choices in the audit:

Currently the audit iterates over all secrets and automatically skips these that already had the is_secret flag set. But that means that when I skip back one secret after I previously chose y or n, it will be automatically skipped again, hence the back option currently only works correctly when the last option chosen was skip.
Instead I would like to change the loop to only iterate over secrets without the is_secret flag set and then append all previously known secrets that had there flag set already. But that would shuffle the list of secrets (the lists that are stored by filename) and I'm not sure if it is ok to shuffle this (after removing the "order" assumption of merge_results.

I am not completely confident that that would work as stated, because simply removing the "has is_secret-flag already set, then skip"-condition seems to produce lots of identical secrets to be shown, so there seems to be an extra effect that I don#t understand yet 😉

Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚢 , thanks a bunch for making this 😁

@cleborys
Copy link
Author

cleborys commented Sep 7, 2018

@KevinHock Thanks :) But it is not yet functional! Only the framework with the new iterator works yet, so don't merge 😉

(I don't really know the etiquette yet - I made a pull request already so that you can see whats happening, but I wouldn't remove the [WIP] tag in the title until it is functional and I think I'm done. So I hope your approval doesn't mean that you intend to merge)

@domanchi
Copy link
Contributor

domanchi commented Sep 7, 2018

@cleborys, ah, I understand you better now.

Yes, the secrets in the list are ordered. One benefit of this is comparing side by side diffs of baselines - and it's very easy to see the changes between iterations. You can quite easily see whether a secret has been removed, added, or appropriately labelled. By reordering the contents, it becomes that much harder to see differences, if applicable.

For example, if you start the audit process with pre-existing audited secrets, but not perform any additional labelling, you would expect no changes to the baseline.

If the goal is to allow the back option to iterate over previously labelled secrets, perhaps you can just append that to the if statement?

e.g. something like

if 'is_secret' not in secret or decision == 'b':
    blah

@cleborys
Copy link
Author

cleborys commented Sep 7, 2018

@domanchi Cool, I will not mess around with the order then 😅
Yes, I will try something like this. it is not completely straightforward, because you don't want to have the user step back into secrets that were previously autoskipped and that he is not aware of.
I think I will, before the decision loop runs, split the list into secrets shown to the user and secrets not shown, then iterate over the secrets to be shown, and then interlace the results in the correct order, afterwards :)

…d merged afterwards. Stepping back now functional.
@cleborys
Copy link
Author

cleborys commented Sep 7, 2018

It seems I had forgotten how Python works ("everything is a pointer") and in the end the necessary changes were much easier.

The user decision loop now only loops over secrets which don't have the is_secret flag set and decisions modify them in-place. So the order is kept and there is no need to backtrack changes when they are overridden. The results dict by filename is constructed after the user is done and then merged.

I expect that everything works as it should now, but I'll add some tests to make sure 😄

@cleborys
Copy link
Author

cleborys commented Sep 7, 2018

Tests written and passing, manual sanity checks also successful! 🎉 🎈
Thanks, guys! Ready for final review.

@cleborys cleborys changed the title [WIP] Add a (b)ack option to 'Is this a valid secret?' Closes Issue #63 Add a (b)ack option to 'Is this a valid secret?' Closes Issue #63 Sep 7, 2018
Copy link
Collaborator

@KevinHock KevinHock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, gonna merge :D

@KevinHock KevinHock merged commit f7d05ac into Yelp:master Sep 11, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants