Add defensive ddcommon_ffi checks for Slice#604
Conversation
It seems it is prudent to add checks here. At some point you need to be able to trust the FFI caller to uphold needed invariants, but it's a bit too easy to zero-initialize stuff in C and not fix it up for these Slices.
BenchmarksComparisonBenchmark execution time: 2024-09-06 22:04:50 Comparing candidate commit 63171ce in PR branch Found 0 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics. CandidateCandidate benchmark detailsGroup 1
Group 2
Group 3
Group 4
Group 5
Group 6
Group 7
Group 8
Group 9
Group 10
Group 11
Group 12
BaselineOmitted due to size. |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #604 +/- ##
==========================================
+ Coverage 72.60% 72.62% +0.02%
==========================================
Files 246 246
Lines 34971 35005 +34
==========================================
+ Hits 25390 25423 +33
- Misses 9581 9582 +1
|
|
I'm not a fan of this approach 😓. From our discussion on slack, the underlying assumption here is that we would catch all such crashes in development. And... I'm slightly suspicious that such an assumption would always hold. The penalty of getting the assumption wrong is blowing up the client application, which is quite high... 💥 My suggestion is to change this to reporting back an API failure: This would equally be caught in development (as we're supposed to check errors on all datadog API calls already!), and in the off chance we never trigger the exact set of circumstances that would cause this error to show up, the penalty in production is that the profiler would get an error, which again we should already be handling. (If I'll be honest, I'd actually prefer to go with option 3: Libdatadog should correctly handle A |
|
I experimented with getting rid of most of the existing API and replacing it with a Result return. % git diff --shortstat
22 files changed, 860 insertions(+), 507 deletions(-)I still have pages of errors to fix. I am okay if we go down this route, but it's not a good idea to do that without more discussion. It touches crash tracking, telemetry, profiling, and data pipeline. Maybe more, since I'm not finished. I can open this as a draft PR, if people want. |
|
Fair enough, I agree the error approach "sounds good" has a big ripple effect in a lot of code/APIs. How much do you dislike my suggestion of having the ffi code take care of handling nulls so the resulting API is more friendly to ffi callers? I don't mind taking it up myself and writing such a PR. If we have to, I believe we can live with this PR as-is but... I really do fear it's a really big sharp edge of the "oops you forgot you couldn't do this very natural/straightforward thing (from a caller POV) in this corner case and we blew up in production" kind 😅 |
The PR handles slices with null pointers when the length is zero. It panics when:
So it "handles" null pointers for the only case it could be reasonable. These checks will slow down the code, of course. |
ivoanjo
left a comment
There was a problem hiding this comment.
Ack, I think the current version is quite reasonable. 🙏
LGTM 👍
What does this PR do?
Adds defensive checks in
Sliceinddcommon_ffiand reroutes some code to use it.Motivation
It seems it is prudent to add checks here. At some point you need to be able to trust the FFI caller to uphold needed invariants, but it's a bit too easy to zero-initialize stuff in C and not fix it up for these Slices.
Additional Notes
@AlexJF had some of their time wasted due to not early detecting this mistake..
How to test the change?
Regular testing is fine. This accepts more valid inputs, but if you had invalid inputs for some edge cases, it might crash now.
1 . If you provide a null pointer but non-zero length will now panic.
2. If you provide a pointer that is not aligned correctly for the slice type, it will now panic.