Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Add action to stop passwords leaking #28

Closed
shochdoerfer opened this issue Apr 9, 2019 · 13 comments
Closed

[Feature] Add action to stop passwords leaking #28

shochdoerfer opened this issue Apr 9, 2019 · 13 comments
Assignees

Comments

@shochdoerfer
Copy link
Contributor

Via this blog post I came across this paper which contains a list of regex expressions to characterize potential passwords.

It would be cool to have an action which could check the modified files for potential passwords and report them. Not sure though how accurate the detection is since it's based on regex patterns. Maybe it's needed to define a whitelist of files that are "ok" to commit.

Not sure how many actions you want to provide out-of-the-box, maybe it makes sense to release this as a separate package.

@sebastianfeldmann
Copy link
Collaborator

Sounds great.
I'm alway open to add more hooks to CaptainHook directly as long as it doesn't result in adding a lot of dependencies to the composer.json.
For features that depend heavily on external libraries I would prefer an extra package.

@sebastianfeldmann sebastianfeldmann added this to the v>yx- milestone Apr 11, 2019
@icanhazstring
Copy link
Contributor

With the GitLab security issue and this tweet https://twitter.com/movetodevnull/status/1124924936030699520 I though about the same.

Just to see if the "conditions" fit to add the action directly into captainhook or to create a separate repo.

as @shochdoerfer pointed out, we would need to following things:

  • whitelist -- possibly as options for the action it self, so we have the in captainhook.json
  • tokens -- list of keywords to search for, so you can add the into your library and share it with the team. Also as option for the action itself

Where to look for critical values?

Now the question of "how" to define if there is something committed that shouldn't be. There are many many possibilities which I guess can be quite "overwhelming".

Just to list the formats:

  • .yml
  • .json
  • .php
  • .env
  • .ini
  • others?

Which all have their respective format of defining values.
Should we parse these files in there respective language? Like using php-parser, yaml reader and so on?

How to evaluate a token as critical

The problem is how to define a critical token. Purely by its present? Or do we need to validate the value as well?

This should be a critical:

// credential.config.production.php
return [
    'password' => '123bvazusdv2'
];

But this?

// credential.config.global.php
return [
    'password' => ''
];

As for me, the "easiest" and "dirtiest" way for now is simply use the whitelist and tokens to search by regex. But that would mean you have to commit using -n which disables everything else or you have to add "problems" into the whitelist as well.

@shochdoerfer
Copy link
Contributor Author

Where to look for critical values? I'd say basically any non binary file, not sure though if we can figure that out in PHP.

How to evaluate a token as critical? whitelist + tokens sounds good to me. As an alternative we could give the diffrent regexes names and allow the user to configure an ignore-list of those tokens. Depends a bit if we want to provide a fixed set of regex'es or let the user define them on his own.

@icanhazstring
Copy link
Contributor

We could provide a list of predefined regexes (is that the plural? 😅). But I think I would make them opt-in.

So for example

{
    "regex": [
        "CaptainHook\\...\\RegexA", 
        "CaptainHook\\...\\RegexB" 
    ] 
} 

This way we could provide an interface that people can implement to provide even more custom tests that will trigger a critical.

@sebastianfeldmann sebastianfeldmann removed this from the v>yx- milestone Aug 2, 2019
@sebastianfeldmann sebastianfeldmann self-assigned this Nov 27, 2023
@sebastianfeldmann
Copy link
Collaborator

It just took 4 years but version 5.19.0 will feature a BlockSecrets Action.

I finally decided to go with a simple regex approach instead of implementing some fancy entropy calculation that kind of works but takes for ages to run.

Currently it has some basic regex pattern build in to detect AWS keys, Google keys, GitHub tokens, Stripe keys and general passwords.

If you have very specific keys, passwords or tokens in your project you can add your own patterns.
Or implement your own provider by implementing the Regex::patterns():array<string> interface.
You can add dummy or dev passwords to an allow list.

{
  "action": "\CaptainHook\App\Hook\File\Action\BlockSecrets",
  "options": {
    "providers":  [
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Aws",
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Github",
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Stripe",
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Google",
      "\\CaptainHook\\App\\Hook\\File\\Regex\\Password"
    ],
    "blocked": [
       "#password\": \"[\S]+\"#i"
    ],
    "allowed": [
      "#my-dummy-token#", "#root#"
    ]
  }
}

@sebastianfeldmann
Copy link
Collaborator

Currently still making it work for pre-push and pre-commit at the same time.

@shochdoerfer
Copy link
Contributor Author

Pretty cool! Have had this on my "eventually todo list" for a while now but somehow other things have been more important. Thanks for finally tackling this!

@sebastianfeldmann
Copy link
Collaborator

I'm moving the Interface and default implementations to another Composer package. Maybe other people want to use it differently. So it will take another day or two ;)

@sebastianfeldmann
Copy link
Collaborator

sebastianfeldmann commented Nov 30, 2023

Ok, I think I'm happy now.

The Cap'n is not checking the whole files anymore. I'm using git diff to get the actually added content.
This kind of made it possible to do this entropy stuff again. It is now opt in and tries die detect secrets in .php, .yml, .json, and .ini files (sorry no .env "yet")

Cap't will run the regex, if nothing is found he will do the entropy stuff if the threshold is greater 0.
I found 3.6 - 3.8 reasonable values that had no false positives but found tokens and auto generated passwords pretty reliably.

The final configuration should look like this.

{
  "action": "\CaptainHook\App\Hook\Diff\Action\BlockSecrets",
  "options": {
    "entropyThreshold": 3.7,
    "providers":  [
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Aws",
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Github",
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Stripe",
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Google",
      "\\CaptainHook\\Secrets\\Regex\\Supplier\\Password"
    ],
    "blocked": [
       "#password\": \"[\S]+\"#i"
    ],
    "allowed": [
      "#my-dummy-token#", "#root#"
    ]
  }
}

What do you think?

@shochdoerfer
Copy link
Contributor Author

The Cap'n is not checking the whole files anymore. I'm using git diff to get the actually added content.

Nice!

It is now opt in and tries die detect secrets in .php, .yml, .json, and .ini files (sorry no .env "yet")

Would it be possible - maybe in a 2nd step - to make that configurable? Or is there a specific reason you hardcoded this?

Maybe it makes sense to also scan Markdown files?

Pretty cool to see this in the Cap'n. One more thing that sets it apart from the competition ;)

@sebastianfeldmann
Copy link
Collaborator

You are right. I limited it to .php, .yml, .json, and .ini files because I could use regex to find variable assignments and only check strings that have risk potential.
But skipping files without that kind of structural advantage is not a good idea.
If you provide the entropyThreshold I will just check everything and do a brute force check for every word in the change set.

Adding a whitelist for checked file types could further improve the performance. I will check after I implemented the brute force check how necessary it might be.

@shochdoerfer
Copy link
Contributor Author

Finding variable assignments sounds like a neat idea!

@shochdoerfer
Copy link
Contributor Author

I guess the issue can be closed now :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants