ShareClean

Local-first Python CLI for sanitizing logs, stack traces, config snippets, and terminal output before you paste them into GitHub issues, support tickets, Slack, or AI chats.

ShareClean detects common sensitive values, replaces only the risky portion, and reports safe metadata without storing or printing the original secret. It makes no network calls and sends no telemetry.

Try the interactive browser playground to see the redaction rules before installing.

Browser playground shown for illustration; real workflows run locally through the CLI.

Install

With pipx:

pipx install shareclean

From a local checkout:

python -m pip install -e .

Run without installing from the repository root:

python -m shareclean --help

Quick Start

shareclean app.log
shareclean app.log --output app.cleaned.log
shareclean app.log --report
shareclean app.log --report --report-format json
shareclean app.log --check
shareclean app.log --check --fail-on severity:high
shareclean app.log --check --fail-on category:token,rule:SC004
shareclean app.log --check --ignore-for-check category:pii_email
shareclean app.log --private-ip
shareclean app.log --phone
shareclean app.log --custom-pattern "EMP-[0-9]{6}"

--check exits 1 only for findings selected by the check policy and never writes sanitized text to stdout.

Configured fail_on and ignore_for_check policies from config files, profiles, or environment variables apply only in --check mode. Normal sanitization still redacts and reports findings, but those policies do not change the exit decision unless --check is present.

Configuration

ShareClean supports committed project policy in either pyproject.toml or .shareclean.toml.

[tool.shareclean]
redact_email = true
redact_private_ip = false
redact_phone = false
redact_mac_address = false
redaction_label = "[REDACTED]"
profile = "default"
custom_patterns = [
  { name = "Employee ID", pattern = "EMP-[0-9]{6}" },
  { name = "Tenant", pattern = "tenant=(?P<value>[a-z0-9-]+)" },
]

[tool.shareclean.profiles.ci]
redact_email = true
redact_private_ip = true
fail_on = ["severity:high"]

For .shareclean.toml, omit the tool.shareclean prefix:

redact_email = true
redact_private_ip = false
redact_phone = false
redact_mac_address = false
custom_patterns = [
  { name = "Employee ID", pattern = "EMP-[0-9]{6}" },
]

[profiles.ci]
redact_private_ip = true
fail_on = ["severity:high"]

Config location:

--config PATH
Nearest project directory containing .shareclean.toml or a pyproject.toml with [tool.shareclean]
Defaults

Auto-discovery walks upward from the current directory until the Git root or filesystem root. It uses only the nearest config directory and never merges parent configs. If .shareclean.toml and ShareClean config in pyproject.toml exist in the same selected directory, ShareClean exits 2.

Config precedence:

CLI flags
Environment variables
Selected profile values
Base project config
Defaults

Environment variables:

SHARECLEAN_REDACT_EMAIL
SHARECLEAN_REDACT_PRIVATE_IP
SHARECLEAN_REDACT_PHONE
SHARECLEAN_REDACT_MAC_ADDRESS
SHARECLEAN_REDACTION_LABEL
SHARECLEAN_PROFILE
SHARECLEAN_FAIL_ON
SHARECLEAN_IGNORE_FOR_CHECK

Boolean environment values accept true, 1, yes, on, false, 0, no, and off.

Inspect effective configuration without reading input:

shareclean config show

Detection Rules

Rule ID	Detector	Category	Severity
`SC001`	Key-value secret	`credential`	`high`
`SC002`	Bearer token	`token`	`high`
`SC003`	JWT-like token	`token`	`high`
`SC004`	Connection-string password	`connection_string`	`critical`
`SC005`	Email address	`pii_email`	`medium`
`SC006`	Local user path	`pii_path`	`medium`
`SC007`	Private IP address	`internal_network`	`medium`
`SC008`	PEM private-key block	`private_key`	`critical`
`SC009`	Known provider API token	`token`	`high`
`SC010`	Webhook URL	`token`	`critical`
`SC011`	URL query secret	`credential`	`high`
`SC012`	Cookie secret	`token`	`high`
`SC013`	CLI secret argument	`credential`	`high`
`SC014`	XML secret element	`credential`	`high`
`SC015`	SAML assertion	`token`	`critical`
`SC016`	Docker or Kubernetes secret	`credential`	`high`
`SC017`	Phone number	`pii_phone`	`medium`
`SC018`	MAC address	`pii_hardware`	`low`
`SC019`	Generic sensitive key-value	`credential`	`high`

Private IP detection is off by default; enable it with --redact-private-ip, --private-ip, or config. When enabled, it covers private IPv4 and IPv6 addresses.

Provider-aware token detection covers high-confidence shapes such as OpenAI, Anthropic, GitHub, GitLab, Hugging Face, Stripe, Slack, Telegram, SendGrid, AWS access keys, Google API keys, npm, PyPI, Docker Hub, Netlify, DigitalOcean, and Terraform Cloud tokens. Structured key-value detection preserves JSON, YAML, TOML, INI, and environment-file formatting where possible. A generic sensitive-key fallback redacts assigned values when keys contain secret-bearing segments such as password, token, api_key, or client_secret.

Phone and MAC address detection are off by default to avoid noisy matches; enable them with --redact-phone, --phone, --redact-mac-address, or config.

Custom regex rules are reported as CUSTOM001, CUSTOM002, and so on. If a custom regex defines a named group called value, only that group is redacted; otherwise the whole match is replaced.

Programmatic use:

from shareclean import add_custom_regex
from shareclean.detectors import get_rules
from shareclean.redactor import sanitize

rules = add_custom_regex(get_rules(), r"employee=(?P<value>EMP-[0-9]{6})")
result = sanitize("employee=EMP-123456", rules)

When detectors overlap on the same text range, ShareClean emits one finding using the highest-severity rule. If severities match, it uses the most specific detector.

JSON Reports

JSON reports use schema version 1.0 and do not include filenames, paths, matched values, hashes, source snippets, or masked previews.

{
  "schema_version": "1.0",
  "source": "file",
  "summary": {
    "findings": 1,
    "by_category": {
      "credential": 1
    },
    "by_severity": {
      "high": 1
    }
  },
  "findings": [
    {
      "rule_id": "SC001",
      "category": "credential",
      "severity": "high",
      "location": {
        "start": {
          "line": 1,
          "column": 10
        },
        "end": {
          "line": 1,
          "column": 27
        }
      },
      "replacement": "[REDACTED]"
    }
  ]
}

Locations are 1-based. End positions are exclusive. Columns count Unicode code points after treating CRLF as one LF newline for location purposes.

CLI Reference

usage: shareclean [-h] [--version] [--check] [--output FILE] [--report]
                  [--report-format {text,json}] [--config FILE]
                  [--profile NAME] [--redact-email] [--no-redact-email]
                  [--redact-private-ip] [--private-ip]
                  [--no-redact-private-ip] [--no-private-ip]
                  [--redact-phone] [--phone] [--no-redact-phone]
                  [--no-phone]
                  [--redact-mac-address] [--no-redact-mac-address]
                  [--redaction-label TEXT] [--fail-on SELECTORS]
                  [--ignore-for-check SELECTORS]
                  [--custom-pattern REGEX]
                  [FILE]

--no-email remains as a deprecated alias for --no-redact-email.

Exit codes:

Code	Meaning
`0`	Completed successfully
`1`	Selected findings detected in `--check` mode
`2`	User, I/O, config, or selector error
`3`	Unexpected internal error

Safety Model

ShareClean is intentionally local and transparent:

No network calls
No cloud processing
No telemetry
No account or API key required
Original matched secret values are not stored in findings or reports
Input files are never modified in place

Coverage And Limitations

ShareClean is pattern-based. It can miss unusual formats and can redact benign text that resembles a secret. It is not a replacement for repository secret scanners, source-history scanning, or DLP systems.

The test corpus under tests/fixtures/ uses only fake values and is split into generic, cloud, database, CI/CD, SaaS, log, YAML/JSON/env, provider-token, cookie/URL/CLI, and false-positive packs. Bug reports that change detection should add a regression fixture using clearly fake data.

Development

Run the test suite:

python -m unittest discover -s tests -v

Run packaging checks:

python -m compileall -q src tests
python -m build
python -m twine check dist/*

License

ShareClean is released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github		.github
docs		docs
src/shareclean		src/shareclean
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
ROADMAP.md		ROADMAP.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ShareClean

Install

Quick Start

Configuration

Detection Rules

JSON Reports

CLI Reference

Safety Model

Coverage And Limitations

Development

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ShareClean

Install

Quick Start

Configuration

Detection Rules

JSON Reports

CLI Reference

Safety Model

Coverage And Limitations

Development

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages