Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL obfuscation with go-sqllexer pkg #19952

Merged
merged 17 commits into from Oct 16, 2023
Merged

Conversation

lu-zhengda
Copy link
Contributor

@lu-zhengda lu-zhengda commented Oct 4, 2023

What does this PR do?

This PR imports new go-sqllexer package to the SQL obfuscate package, which allows us to

  • Use a shared SQL obfuscation & normalization package across agent and backend
  • Move normalization to the backend in the future for DBM integrations

Motivation

The motivation behind this PR is to keep improving the SQL obfuscation & normalization stability, and also gives us the ability to run normalization to the backend in the future.

Additional Notes

Possible Drawbacks / Trade-offs

Describe how to test/QA your changes

Reviewer's Checklist

  • If known, an appropriate milestone has been selected; otherwise the Triage milestone is set.
  • Use the major_change label if your change either has a major impact on the code base, is impacting multiple teams or is changing important well-established internals of the Agent. This label will be use during QA to make sure each team pay extra attention to the changed behavior. For any customer facing change use a releasenote.
  • A release note has been added or the changelog/no-changelog label has been applied.
  • Changed code has automated tests for its functionality.
  • Adequate QA/testing plan information is provided if the qa/skip-qa label is not applied.
  • At least one team/.. label has been applied, indicating the team(s) that should QA this change.
  • If applicable, docs team has been notified or an issue has been opened on the documentation repo.
  • If applicable, the need-change/operator and need-change/helm labels have been applied.
  • If applicable, the k8s/<min-version> label, indicating the lowest Kubernetes version compatible with this feature.
  • If applicable, the config template has been updated.

@cit-pr-commenter
Copy link

cit-pr-commenter bot commented Oct 4, 2023

Go Package Import Differences

Baseline: 055497d
Comparison: 90ec667

binaryosarchchange
agentlinuxamd64
+1, -0
+github.com/DataDog/go-sqllexer
agentlinuxarm64
+1, -0
+github.com/DataDog/go-sqllexer
agentwindowsamd64
+1, -0
+github.com/DataDog/go-sqllexer
agentwindows386
+1, -0
+github.com/DataDog/go-sqllexer
agentdarwinamd64
+1, -0
+github.com/DataDog/go-sqllexer
agentdarwinarm64
+1, -0
+github.com/DataDog/go-sqllexer
heroku-agentlinuxamd64
+1, -0
+github.com/DataDog/go-sqllexer
serverlesslinuxamd64
+1, -0
+github.com/DataDog/go-sqllexer
serverlesslinuxarm64
+1, -0
+github.com/DataDog/go-sqllexer
trace-agentlinuxamd64
+1, -0
+github.com/DataDog/go-sqllexer
trace-agentlinuxarm64
+1, -0
+github.com/DataDog/go-sqllexer
trace-agentwindowsamd64
+1, -0
+github.com/DataDog/go-sqllexer
trace-agentwindows386
+1, -0
+github.com/DataDog/go-sqllexer
trace-agentdarwinamd64
+1, -0
+github.com/DataDog/go-sqllexer
trace-agentdarwinarm64
+1, -0
+github.com/DataDog/go-sqllexer
heroku-trace-agentlinuxamd64
+1, -0
+github.com/DataDog/go-sqllexer

@lu-zhengda lu-zhengda added this to the 7.50.0 milestone Oct 10, 2023
@lu-zhengda lu-zhengda marked this pull request as ready for review October 10, 2023 18:14
@lu-zhengda lu-zhengda requested review from a team as code owners October 10, 2023 18:14
@lu-zhengda lu-zhengda requested a review from a team October 10, 2023 18:14
Copy link
Contributor

@alexandre-normand alexandre-normand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting to see this change! I just have the one question about the configuration (two fields or just one?).

pkg/collector/python/datadog_agent.go Outdated Show resolved Hide resolved
pkg/obfuscate/sql_test.go Outdated Show resolved Hide resolved
pkg/collector/python/datadog_agent.go Outdated Show resolved Hide resolved
pkg/obfuscate/sql_test.go Show resolved Hide resolved
Copy link
Contributor

@aliciascott aliciascott left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for docs.

pkg/obfuscate/sql_test.go Outdated Show resolved Hide resolved
pkg/obfuscate/sql.go Outdated Show resolved Hide resolved
pkg/obfuscate/sql.go Outdated Show resolved Hide resolved
pkg/obfuscate/sql.go Outdated Show resolved Hide resolved
Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks like this is a great idea to reduce duplication! A few comments on testing/perf etc.

Is there a plan or timeline to move fully to this version of obfuscation and delete the duplicate code that exists here? It wouldn't be great to support two different obfuscation paths for SQL without a plan to fully delete one of them?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests that verify this new version of obfuscation behaves the same as the previous version? Since this obfuscation happens on resource names, any change to the output might break customer monitors / dashboards etc so it's nice to make sure we aren't changing things

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also any benchmark comparisons on the performance differences here? I've seen SQL Obfuscation can be a pretty significant portion of cpu profiles from trace-agents with lots of SQL applications

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance benchmark between current obfuscator vs. new one has done here
DataDog/go-sqllexer#7.
Extract the comparison from the PR

  • go-sqllexer
pkg: github.com/DataDog/go-sqllexer
BenchmarkObfuscator/Escaping/512-10         	  282336	      4441 ns/op	    6624 B/op	      14 allocs/op
BenchmarkObfuscator/Grouping/199-10         	  489855	      2407 ns/op	    3296 B/op	      12 allocs/op
BenchmarkObfuscator/Large/3694-10           	   26586	     48246 ns/op	   86240 B/op	      23 allocs/op
BenchmarkObfuscator/Complex/969-10          	  111474	     11046 ns/op	   15584 B/op	      18 allocs/op
BenchmarkObfuscator/SuperLarge/4198-10      	   21439	     55041 ns/op	   95712 B/op	      25 allocs/op
  • datadog-agent/pkg/obfuscate
pkg: github.com/DataDog/datadog-agent/pkg/obfuscate
BenchmarkObfuscateSQLString/Escaping/512-10         	  175992	      7701 ns/op	    1336 B/op	       7 allocs/op
BenchmarkObfuscateSQLString/Grouping/199-10         	  343860	      4010 ns/op	     648 B/op	       7 allocs/op
BenchmarkObfuscateSQLString/Large/3694-10           	   16992	     67959 ns/op	   11416 B/op	       7 allocs/op
BenchmarkObfuscateSQLString/Complex/969-10          	   59134	     21882 ns/op	    3096 B/op	       7 allocs/op
BenchmarkObfuscateSQLString/SuperLarge/4198-10      	   15924	     78368 ns/op	   14744 B/op	       7 allocs/op

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding plan of moving to this obfuscation, we will first start with DBM in postgres check. That way we can scope the change to one of the integration check to verify the obfuscator. If things looking good, will rollout to all DBM checks, tracer and remove old version of the obfuscation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added unit test to verify the output stays between the new obfuscation and existing obfuscation implementation.
https://github.com/DataDog/datadog-agent/pull/19952/files#diff-227b097ab9ca27e34a8da14d557f169e46b7fcda5b199462023d5a5befeb5c55R2323

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice, thanks for attaching the output here! Looks like the performance is overall faster but uses a bit more memory which seems fine to me?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes the mem allocation is higher but since this operation is mostly CPU bounded, we think a slight hit on memory with a bit faster runtime is a good tradeoff.

pkg/obfuscate/sql.go Show resolved Hide resolved
Copy link
Contributor

@ajgajg1134 ajgajg1134 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good from APM Agent!

@lu-zhengda lu-zhengda merged commit 7ef0b32 into main Oct 16, 2023
132 checks passed
@lu-zhengda lu-zhengda deleted the zhengda.lu/sql-obfuscation branch October 16, 2023 14:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants