Update CONTRIBUTING.md to outline detector development process (Yelp#240

) * Update CONTRIBUTING.md to outline detector development process Supports git-defenders/detect-secrets-discuss#312 * Minor wording update * Address comments
IBM · Jan 16, 2020 · 479265e · 479265e
1 parent d8a6d54
commit 479265e
Showing 1 changed file with 25 additions and 0 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -49,3 +49,28 @@ Work in Progress pull requests are also welcome to get feedback early on, or if
 - [How to Contribute to Open Source](https://opensource.guide/how-to-contribute/)
 - [Using Pull Requests](https://help.github.com/articles/about-pull-requests/)
 - [GitHub Help](https://help.github.com)
+
+## Process for Adding a New Secret Detector to whitewater-detect-secrets
+There are two key steps for developing a new secret detector: secret identification and secret verification.
+It is often easier to review contributions if these two steps are submitted as separate PRs, although this is not mandatory.
+The processes for each of these two steps are outlined below.
+
+### Secret Identification
+- Develop an understanding of all the secret types for a given service. A service may have combinations of basic-auth, IAM auth, tokens, keys, passwords, and / or other proprietary authentication methods.
+- Identify any specification documents from the service provider for the use & format of the secret types to be captured.
+- Develop an understanding of the API / service call uses.
+  - Is it purely RESTful, or are there prevalent SDKs which should be accounted for and detected?
+- Search / identify examples of the signature and use cases in github.ibm.com or create your own.
+  - Iterate until you have sufficient representation of the different ways in which the secret may be used.
+- Using the other detectors under `detect-secrets/plugins` as examples, create a new Python file under that path. The file should contain a new detector class which inherits from `RegexBasedDetector`.
+- Write one or more regexes to match and capture secrets when found within the use cases identified above. Assign a list of regexes to the `denylist` variable. We have created helper functions to make this easier, which may be seen in the existing detectors.
+- If multiple factors exist, identify a primary factor to capture with the `denylist` regexes. Secondary factors will be captured as part of the verification process below.
+- Create test cases to ensure that example secrets matching the (primary factor's) secret signature will be caught. Use the test files under `tests/plugins` as examples.
+
+### Secret Verification
+- Identify a service endpoint (API or SDK) to which the potentially multiple factors of the secret can be presented for verification.
+  - In complex cases (where the service is hosted internally), it's often helpful to identify an IBM SME who can help navigate the API / SDK spec of the service for verification purposes. [w3 ProductPages](https://productpages.w3ibm.mybluemix.net/ProductPages/index.html) is a good resource to help identify an SME.
+  - Note: if there are _many_ signature hits, it may create a stressful load on the verification endpoint, so a key design point is to minimize false positive cases.
+- Using the existing plugins in `detect_secrets/plugins` as examples, add the `verify()` function to your detector. The `verify` function should validate a found secret with the service endpoint and determine whether it is active or not, returning either `VerifiedResult.VERIFIED_TRUE` or `VerifiedResult.VERIFIED_FALSE`. `verify()` may also return `VerifiedResult.UNVERIFIED` if verification cannot be completed due to issues like endpoint availability, lack of expected data elements, etc.
+- If multiple factors must be found to verify the secret, write an additional helper function to scan the context lines surrounding the primary factor. One or more additional regexes may be required. Context lines are passed to the `verify()` function as `content`. The number of context lines pulled from above and below the primary factor is defined in `plugins/base.py` as the global variable `LINES_OF_CONTEXT`.
+- Using the existing tests in `tests/plugins` as examples, create test cases for positive and negative verification results, considering required factors & return codes. Note that you should mock responses from the service endpoint to avoid actually calling it during tests.