Skip to content

Scan local files and directories

Marco Rosa edited this page Oct 25, 2021 · 4 revisions

Credential Digger now offers the possibility to scan files and directories from the local file system, regardless of being related to a git repository.

How to scan files and directories

  1. Install the dipendencies
  2. Instantiate the client (either Postgres or Sqlite)
    from credentialdigger import PgClient
    
    c = PgClient(dbhost='xxx.xxx.xxx.xxx', dbport=NUM, dbname='mydbname', dbuser='myusername', dbpassword='mypassword')
    or
    from credentialdigger import SqliteClient
    
    c = SqliteClient(path='/path/to/data.db')
  3. Launch the scan of a directory
    new_discoveries = c.scan_path(scan_path=REPO_PATH,
                                  category=CATEGORY,
                                  models=MODELS,
                                  force=FORCE,
                                  debug=DEBUG,
                                  similarity=SIMILARITY,
                                  max_depth=MAX_DEPTH,
                                  ignore_list=IGNORE_LIST)

Arguments

  • scan_path: the path of the directory or file to scan
  • category: if specified, scan the repo using all the rules of this category, otherwise use all the rules in the db
  • models: a list of models for the ML false positives detection
  • force: force a complete re-scan of the repository, in case it has already been scanned previously
  • debug: flag used to decide whether to visualize the progressbars during the scan (e.g., during the insertion of the detections in the db)
  • generate_snippet_extractor: generate the extractor model to be used in the SnippetModel. The extractor is generated using the ExtractorGenerator. If False, use the pre-trained extractor model [DEPRECATED IN v4.4]
  • similarity: build the embedding model, compute and store discoveries embeddings to allow for automatic update of similar discoveries
  • max_depth: the maximum depth to which traverse the subdirectories tree. A negative value will not affect the scan.
  • ignore_list: a list of paths to ignore during the scan. This can include file names, directory names, or whole paths. Wildcards are supported as per the fnmatch package.

Returns

The id of the discoveries detected by the scanner (excluded the ones classified as false positives).