New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SQL obfuscation with go-sqllexer pkg #19952
Conversation
Go Package Import DifferencesBaseline: 055497d
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting to see this change! I just have the one question about the configuration (two fields or just one?).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks like this is a great idea to reduce duplication! A few comments on testing/perf etc.
Is there a plan or timeline to move fully to this version of obfuscation and delete the duplicate code that exists here? It wouldn't be great to support two different obfuscation paths for SQL without a plan to fully delete one of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests that verify this new version of obfuscation behaves the same as the previous version? Since this obfuscation happens on resource names, any change to the output might break customer monitors / dashboards etc so it's nice to make sure we aren't changing things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also any benchmark comparisons on the performance differences here? I've seen SQL Obfuscation can be a pretty significant portion of cpu profiles from trace-agents with lots of SQL applications
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Performance benchmark between current obfuscator vs. new one has done here
DataDog/go-sqllexer#7.
Extract the comparison from the PR
go-sqllexer
pkg: github.com/DataDog/go-sqllexer
BenchmarkObfuscator/Escaping/512-10 282336 4441 ns/op 6624 B/op 14 allocs/op
BenchmarkObfuscator/Grouping/199-10 489855 2407 ns/op 3296 B/op 12 allocs/op
BenchmarkObfuscator/Large/3694-10 26586 48246 ns/op 86240 B/op 23 allocs/op
BenchmarkObfuscator/Complex/969-10 111474 11046 ns/op 15584 B/op 18 allocs/op
BenchmarkObfuscator/SuperLarge/4198-10 21439 55041 ns/op 95712 B/op 25 allocs/op
datadog-agent/pkg/obfuscate
pkg: github.com/DataDog/datadog-agent/pkg/obfuscate
BenchmarkObfuscateSQLString/Escaping/512-10 175992 7701 ns/op 1336 B/op 7 allocs/op
BenchmarkObfuscateSQLString/Grouping/199-10 343860 4010 ns/op 648 B/op 7 allocs/op
BenchmarkObfuscateSQLString/Large/3694-10 16992 67959 ns/op 11416 B/op 7 allocs/op
BenchmarkObfuscateSQLString/Complex/969-10 59134 21882 ns/op 3096 B/op 7 allocs/op
BenchmarkObfuscateSQLString/SuperLarge/4198-10 15924 78368 ns/op 14744 B/op 7 allocs/op
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding plan of moving to this obfuscation, we will first start with DBM in postgres check. That way we can scope the change to one of the integration check to verify the obfuscator. If things looking good, will rollout to all DBM checks, tracer and remove old version of the obfuscation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added unit test to verify the output stays between the new obfuscation and existing obfuscation implementation.
https://github.com/DataDog/datadog-agent/pull/19952/files#diff-227b097ab9ca27e34a8da14d557f169e46b7fcda5b199462023d5a5befeb5c55R2323
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah nice, thanks for attaching the output here! Looks like the performance is overall faster but uses a bit more memory which seems fine to me?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes the mem allocation is higher but since this operation is mostly CPU bounded, we think a slight hit on memory with a bit faster runtime is a good tradeoff.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good from APM Agent!
What does this PR do?
This PR imports new go-sqllexer package to the SQL obfuscate package, which allows us to
Motivation
The motivation behind this PR is to keep improving the SQL obfuscation & normalization stability, and also gives us the ability to run normalization to the backend in the future.
Additional Notes
Possible Drawbacks / Trade-offs
Describe how to test/QA your changes
Reviewer's Checklist
Triagemilestone is set.major_changelabel if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.changelog/no-changeloglabel has been applied.qa/skip-qalabel is not applied.team/..label has been applied, indicating the team(s) that should QA this change.need-change/operatorandneed-change/helmlabels have been applied.k8s/<min-version>label, indicating the lowest Kubernetes version compatible with this feature.