Skip to content

feat(logs): add pattern clustering, tag management, and JSON processor#49218

Draft
DDuongNguyen wants to merge 1 commit into04-10-feat_logs_add_generic_eviction_framework_for_pattern_managementfrom
04-10-feat_logs_add_pattern_clustering_tag_management_and_json_processor
Draft

feat(logs): add pattern clustering, tag management, and JSON processor#49218
DDuongNguyen wants to merge 1 commit into04-10-feat_logs_add_generic_eviction_framework_for_pattern_managementfrom
04-10-feat_logs_add_pattern_clustering_tag_management_and_json_processor

Conversation

@DDuongNguyen
Copy link
Copy Markdown
Contributor

@DDuongNguyen DDuongNguyen commented Apr 10, 2026

What does this PR do?

Adds the pattern intelligence layer — clustering, tag management, and JSON processing:

  • Clustering (pkg/logs/patterns/clustering/): ClusterManager groups logs by structural similarity using tokenized signatures. Includes Pattern abstraction, pattern-level eviction (using the eviction framework from PR 3), and a merging/ sub-package for cluster consolidation
  • Tag management (pkg/logs/patterns/tags/): TagManager tracks per-tag pattern state so patterns can be scoped to log sources. Includes tag-level eviction to bound memory per source
  • JSON processor (pkg/logs/patterns/processor/): Extracts and processes JSON-structured logs before tokenization, handling nested fields and array normalization

This is PR 4/6 in a stack. Depends on PR 3 (#49217) for the eviction framework. The gRPC sender (PR 5) consumes cluster/tag state for encoding decisions.

Motivation

This is the core intelligence of the stateful encoding feature: deciding which logs share patterns, tracking those patterns per source/tag, and evicting stale patterns to bound memory. These modules sit between raw tokenization (PR 2) and the network sender (PR 5).

Describe how you validated your changes

  • Unit tests for cluster formation, pattern matching, and merging
  • Tag manager tests covering lifecycle, eviction, and multi-tag scenarios
  • Pattern eviction manager tests verifying bounded growth
  • JSON processor tests for nested extraction and edge cases

How to Review this PR

  1. clustering/cluster_manager.go — the main entry point for pattern grouping
  2. clustering/pattern.go + pattern_eviction.go — how patterns are tracked and evicted
  3. tags/tag_manager.go — per-source pattern scoping
  4. processor/json.go — JSON log preprocessing
  5. merging/merging.go — cluster consolidation logic

Additional Notes

~6,000 lines including tests. The clustering and tag modules both implement the Evictable interface from PR 3, demonstrating the framework's reuse.

Copy link
Copy Markdown
Contributor Author

DDuongNguyen commented Apr 10, 2026

@DDuongNguyen DDuongNguyen force-pushed the 04-10-feat_logs_add_pattern_clustering_tag_management_and_json_processor branch from f2d7858 to d0fc212 Compare April 10, 2026 22:06
@DDuongNguyen DDuongNguyen force-pushed the 04-10-feat_logs_add_generic_eviction_framework_for_pattern_management branch from a46a9dd to 82ac7b1 Compare April 10, 2026 22:06
@DDuongNguyen DDuongNguyen force-pushed the 04-10-feat_logs_add_generic_eviction_framework_for_pattern_management branch from 82ac7b1 to e4890f9 Compare April 14, 2026 16:34
@DDuongNguyen DDuongNguyen force-pushed the 04-10-feat_logs_add_pattern_clustering_tag_management_and_json_processor branch from d0fc212 to 27da999 Compare April 14, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant