Skip to content

πŸ”¨ βœ… optimize log scrubbing with scrubadub and single-pass regex#88

Merged
Miyamura80 merged 7 commits intomainfrom
feat/optimize-log-scrubbing
Jan 26, 2026
Merged

πŸ”¨ βœ… optimize log scrubbing with scrubadub and single-pass regex#88
Miyamura80 merged 7 commits intomainfrom
feat/optimize-log-scrubbing

Conversation

@Miyamura80
Copy link
Copy Markdown
Owner

Summary

  • Integrated scrubadub library for robust PII redaction (email, phone, etc.).
  • Optimized secret/API key matching to $O(N)$ using a single-pass compiled regex.
  • Externalized redaction patterns to global_config.yaml.
  • Extended log scrubbing to include extra context fields.
  • Added and updated unit tests to verify new capabilities.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jan 26, 2026

Greptile Overview

Greptile Summary

This PR successfully optimizes log scrubbing by integrating scrubadub for robust PII detection and implementing single-pass regex matching for custom secrets. The refactoring externalizes redaction patterns from hardcoded constants to global_config.yaml, following the project's configuration standards.

Key improvements:

  • Integrated scrubadub library for automatic detection of emails, phone numbers, and other PII patterns
  • Refactored secret matching from multiple sequential regex passes to O(N) single-pass using combined regex with named groups
  • Externalized all redaction patterns to global_config.yaml with proper pydantic validation
  • Extended scrubbing to cover extra context fields in log records
  • All previous review comments have been addressed: scrubbing order corrected (PII before secrets), regex capturing group fixed, and session files added to .gitignore

Tests updated:

  • Test assertions updated to match scrubadub placeholder format ({{EMAIL}}, {{PHONE}})
  • Added new test for phone number redaction capability

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • All code changes are well-structured with proper error handling, comprehensive test coverage, and all previous review feedback has been addressed. The refactoring improves performance while maintaining functionality, follows project conventions for configuration management, and introduces no breaking changes.
  • No files require special attention

Important Files Changed

Filename Overview
common/config_models.py Added RedactionPattern and RedactionConfig models for type-safe redaction configuration
common/global_config.yaml Externalized redaction patterns from code to config with proper regex patterns
src/utils/logging_config.py Refactored to use scrubadub for PII detection and single-pass regex for custom secrets with extra context scrubbing
tests/test_logging_security.py Updated tests for scrubadub placeholders and added phone number test

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 file reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

Comment thread src/utils/logging_config.py Outdated
Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

Comment thread session-ses_4047.md Outdated
Comment thread common/global_config.yaml Outdated
@Miyamura80 Miyamura80 merged commit 6756ba5 into main Jan 26, 2026
11 checks passed
@github-actions github-actions Bot deleted the feat/optimize-log-scrubbing branch January 26, 2026 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant