[PROF-9476] Add experimental profiling managed string storage #725

ivoanjo · 2024-11-11T15:41:49Z

What does this PR do?

This PR builds on the work started by @AlexJF on #607 to introduce "managed string storage" for profiling.

The idea is to introduce another level of string storage for profiling that is decoupled in lifetime from individual profiles, and that is managed by the libdatadog client.

At its core, managed string storage provides a hashtable that stores strings and returns ids. These ids can then be provided to libdatadog instead of CharSlices when recording profiling samples.

For FFI users, this PR adds the following APIs to manage strings:

ddog_prof_ManagedStringStorage_new
ddog_prof_ManagedStringStorage_intern(String)
ddog_prof_ManagedStringStorage_unintern(id)
ddog_prof_ManagedStringStorage_advance_gen
ddog_prof_ManagedStringStorage_drop
ddog_prof_ManagedStringStorage_get_string

A key detail of the current implementation is that each intern call with the same string will increase an internal usage counter, and unintern call with reduce the counter.

Then at advance_gen time, if the counter is zero, we get rid of the string.

Then to interact with profiles, there's a new ddog_prof_Profile_with_string_storage API to create a profile with a given ManagedStringStorage, and all structures that make up a Sample (Mapping, Function, Label) etc have been extended so that they either take a CharSlice or a ManagedStringId.

Thus, after interning all strings for a sample, it's possible to add a sample to a profile entirely by referencing strings by ids, rather than CharSlices.

Motivation

The initial use-case is to support heap profiling -- "samples" related to heap profiling usually live across multiple profiles (as long as a given object is alive) and so this data must be kept somewhere.
Previously for Ruby we were keeping this on the Ruby profiler side, but having libdatadog manage this instead presents a few optimization opportunities.

We also hope to replace a few other "string tables" that other profilers had to build outside of libdatadog for similar use-cases.

This topic was also discussed in the following two documents (Datadog-only, sorry!):

Additional Notes

In keeping with the experimental nature of this feature, I've tried really hard to not disturb existing profiling API users with the new changes.

That is -- I was going for, if you're not using managed string storage, you should NOT be affected AT ALL by it -- be it API changes or overhead.

(This is why on the pure-Rust profiling crate side, I ended up duplicating a bunch of structures and functions. I couldn't think of a great way to not disturb existing API users other than introducing alternative methods, but to be honest the duplication is all in very simple methods so I don't think this substantially increases complexity/maintenance vs trying to be smarter to bend Rust to our will.)

There's probably a lot of improvements we can make, but with this PR I'm hoping to have something in a close to "good enough" state, that we can merge this in and then start iterating on master, rather than have this continue living in a branch for a lot longer.

This doesn't mean we shouldn't fix or improve things before merging, but I'll be trying to identify what needs to go in now and what can go in as separate, follow-up PRs.

As an addendum, there's still a bunch of expects sprinkled that should be turned into proper errors. I plan to do a pass on all of those. (But again, none of the panics affect existing code, so they're harmless and inert unless you're experimenting with the new APIs)

How to test the change?

The branch in https://github.com/DataDog/dd-trace-rb/tree/ivoanjo/prof-9476-managed-string-storage-try2 is where I'm testing the changes on the Ruby profiler side.

It may not be entirely up-to-date with the latest ffi changes on the libdatadog side (I've been prettying up the API), but it shows how to use this concept, while passing all the profiling unit/integration tests, and has shown improvements in memory and latency in the reliability environment.

pr-commenter · 2024-11-11T15:47:13Z

Benchmarks

Comparison

Benchmark execution time: 2025-01-20 11:57:04

Comparing candidate commit 022e6d8 in PR branch ivoanjo/prof-9476-managed-string-storage-try3-clean with baseline commit de64524 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 52 metrics, 2 unstable metrics.

Candidate

Candidate benchmark details

Group 1

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
sql/obfuscate_sql_string	execution_time	69.491µs	69.668µs ± 0.152µs	69.636µs ± 0.046µs	69.705µs	69.827µs	69.923µs	71.309µs	2.40%	7.062	69.048	0.22%	0.011µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
sql/obfuscate_sql_string	execution_time	[69.647µs; 69.689µs] or [-0.030%; +0.030%]	None	None	None

Group 2

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching deserializing traces from msgpack to their internal representation	execution_time	59.839ms	60.046ms ± 0.185ms	59.999ms ± 0.049ms	60.051ms	60.419ms	60.813ms	61.267ms	2.11%	3.292	13.405	0.31%	0.013ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching deserializing traces from msgpack to their internal representation	execution_time	[60.021ms; 60.072ms] or [-0.043%; +0.043%]	None	None	None

Group 3

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_trace/test_trace	execution_time	259.549ns	272.537ns ± 14.992ns	268.460ns ± 5.227ns	274.833ns	306.404ns	328.636ns	329.682ns	22.80%	2.247	4.906	5.49%	1.060ns	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_trace/test_trace	execution_time	[270.459ns; 274.615ns] or [-0.762%; +0.762%]	None	None	None

Group 4

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
ip_address/quantize_peer_ip_address_benchmark	execution_time	5.395µs	5.468µs ± 0.038µs	5.470µs ± 0.031µs	5.494µs	5.530µs	5.539µs	5.613µs	2.61%	0.247	-0.216	0.69%	0.003µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
ip_address/quantize_peer_ip_address_benchmark	execution_time	[5.463µs; 5.473µs] or [-0.095%; +0.095%]	None	None	None

Group 5

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
write only interface	execution_time	1.461µs	3.314µs ± 1.476µs	3.116µs ± 0.033µs	3.149µs	3.767µs	14.372µs	15.428µs	395.07%	7.485	56.601	44.43%	0.104µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
write only interface	execution_time	[3.110µs; 3.519µs] or [-6.173%; +6.173%]	None	None	None

Group 6

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
two way interface	execution_time	18.156µs	27.831µs ± 14.485µs	18.647µs ± 0.315µs	36.308µs	46.600µs	52.071µs	151.101µs	710.32%	3.996	28.135	51.92%	1.024µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
two way interface	execution_time	[25.823µs; 29.838µs] or [-7.213%; +7.213%]	None	None	None

Group 7

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
tags/replace_trace_tags	execution_time	2.654µs	2.715µs ± 0.018µs	2.718µs ± 0.006µs	2.723µs	2.744µs	2.752µs	2.755µs	1.36%	-1.282	2.986	0.65%	0.001µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
tags/replace_trace_tags	execution_time	[2.713µs; 2.718µs] or [-0.090%; +0.090%]	None	None	None

Group 8

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	179.461µs	182.084µs ± 1.322µs	182.023µs ± 0.994µs	182.968µs	184.512µs	185.145µs	185.668µs	2.00%	0.473	-0.443	0.72%	0.093µs	1	200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	5385949.028op/s	5492244.217op/s ± 39730.554op/s	5493804.808op/s ± 30001.323op/s	5525297.266op/s	5545195.777op/s	5553833.907op/s	5572241.323op/s	1.43%	-0.445	-0.482	0.72%	2809.374op/s	1	200
normalization/normalize_name/normalize_name/bad-name	execution_time	21.055µs	21.282µs ± 0.143µs	21.253µs ± 0.078µs	21.336µs	21.540µs	21.675µs	22.046µs	3.73%	1.457	3.691	0.67%	0.010µs	1	200
normalization/normalize_name/normalize_name/bad-name	throughput	45359112.034op/s	46990479.736op/s ± 311999.308op/s	47053124.222op/s ± 172369.821op/s	47200270.232op/s	47369961.083op/s	47423748.116op/s	47494401.856op/s	0.94%	-1.388	3.293	0.66%	22061.683op/s	1	200
normalization/normalize_name/normalize_name/good	execution_time	14.201µs	14.346µs ± 0.077µs	14.341µs ± 0.054µs	14.401µs	14.479µs	14.532µs	14.636µs	2.06%	0.483	0.057	0.54%	0.005µs	1	200
normalization/normalize_name/normalize_name/good	throughput	68323502.340op/s	69706296.278op/s ± 373230.843op/s	69727832.200op/s ± 263618.043op/s	69981683.873op/s	70281850.793op/s	70370977.865op/s	70417173.088op/s	0.99%	-0.454	-0.009	0.53%	26391.406op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	execution_time	[181.901µs; 182.268µs] or [-0.101%; +0.101%]	None	None	None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo...	throughput	[5486737.944op/s; 5497750.490op/s] or [-0.100%; +0.100%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	execution_time	[21.262µs; 21.302µs] or [-0.093%; +0.093%]	None	None	None
normalization/normalize_name/normalize_name/bad-name	throughput	[46947239.633op/s; 47033719.840op/s] or [-0.092%; +0.092%]	None	None	None
normalization/normalize_name/normalize_name/good	execution_time	[14.336µs; 14.357µs] or [-0.074%; +0.074%]	None	None	None
normalization/normalize_name/normalize_name/good	throughput	[69654570.073op/s; 69758022.483op/s] or [-0.074%; +0.074%]	None	None	None

Group 9

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
concentrator/add_spans_to_concentrator	execution_time	6.357ms	6.375ms ± 0.011ms	6.374ms ± 0.004ms	6.378ms	6.385ms	6.424ms	6.472ms	1.54%	4.981	39.010	0.17%	0.001ms	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
concentrator/add_spans_to_concentrator	execution_time	[6.373ms; 6.376ms] or [-0.023%; +0.023%]	None	None	None

Group 10

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
redis/obfuscate_redis_string	execution_time	38.534µs	39.224µs ± 1.163µs	38.707µs ± 0.057µs	38.798µs	41.725µs	41.797µs	42.863µs	10.74%	1.722	1.055	2.96%	0.082µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
redis/obfuscate_redis_string	execution_time	[39.063µs; 39.386µs] or [-0.411%; +0.411%]	None	None	None

Group 11

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	700.769µs	702.117µs ± 0.809µs	702.072µs ± 0.422µs	702.481µs	703.091µs	704.071µs	708.094µs	0.86%	2.954	17.900	0.11%	0.057µs	1	200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	1412242.338op/s	1424265.113op/s ± 1634.760op/s	1424354.713op/s ± 857.366op/s	1425226.102op/s	1426309.803op/s	1426671.992op/s	1427004.390op/s	0.19%	-2.916	17.559	0.11%	115.595op/s	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	480.237µs	480.910µs ± 0.351µs	480.876µs ± 0.197µs	481.079µs	481.454µs	481.922µs	483.225µs	0.49%	1.785	8.871	0.07%	0.025µs	1	200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	2069428.453op/s	2079393.922op/s ± 1517.476op/s	2079537.949op/s ± 852.613op/s	2080354.230op/s	2081443.032op/s	2081998.754op/s	2082305.652op/s	0.13%	-1.769	8.745	0.07%	107.302op/s	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	190.765µs	191.261µs ± 0.285µs	191.213µs ± 0.135µs	191.367µs	191.816µs	192.225µs	192.363µs	0.60%	1.296	2.310	0.15%	0.020µs	1	200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	5198510.453op/s	5228482.108op/s ± 7784.933op/s	5229763.982op/s ± 3704.748op/s	5233112.344op/s	5239081.065op/s	5241235.533op/s	5242050.330op/s	0.23%	-1.285	2.273	0.15%	550.478op/s	1	200
normalization/normalize_service/normalize_service/[empty string]	execution_time	46.209µs	46.420µs ± 0.068µs	46.417µs ± 0.043µs	46.464µs	46.529µs	46.580µs	46.621µs	0.44%	0.067	0.228	0.15%	0.005µs	1	200
normalization/normalize_service/normalize_service/[empty string]	throughput	21449775.200op/s	21542419.992op/s ± 31466.950op/s	21544015.600op/s ± 19773.192op/s	21561981.035op/s	21591547.153op/s	21606255.411op/s	21640914.740op/s	0.45%	-0.057	0.229	0.15%	2225.049op/s	1	200
normalization/normalize_service/normalize_service/test_ASCII	execution_time	48.915µs	49.090µs ± 0.064µs	49.091µs ± 0.039µs	49.129µs	49.192µs	49.244µs	49.312µs	0.45%	0.056	0.609	0.13%	0.005µs	1	200
normalization/normalize_service/normalize_service/test_ASCII	throughput	20278963.955op/s	20370805.755op/s ± 26627.443op/s	20370175.255op/s ± 16166.191op/s	20387017.071op/s	20416661.348op/s	20432468.657op/s	20443828.466op/s	0.36%	-0.046	0.601	0.13%	1882.845op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	execution_time	[702.005µs; 702.229µs] or [-0.016%; +0.016%]	None	None	None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000...	throughput	[1424038.551op/s; 1424491.675op/s] or [-0.016%; +0.016%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	execution_time	[480.861µs; 480.958µs] or [-0.010%; +0.010%]	None	None	None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて	throughput	[2079183.615op/s; 2079604.230op/s] or [-0.010%; +0.010%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	execution_time	[191.221µs; 191.300µs] or [-0.021%; +0.021%]	None	None	None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters	throughput	[5227403.191op/s; 5229561.025op/s] or [-0.021%; +0.021%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	execution_time	[46.411µs; 46.430µs] or [-0.020%; +0.020%]	None	None	None
normalization/normalize_service/normalize_service/[empty string]	throughput	[21538058.976op/s; 21546781.009op/s] or [-0.020%; +0.020%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	execution_time	[49.081µs; 49.099µs] or [-0.018%; +0.018%]	None	None	None
normalization/normalize_service/normalize_service/test_ASCII	throughput	[20367115.448op/s; 20374496.063op/s] or [-0.018%; +0.018%]	None	None	None

Group 12

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
benching string interning on wordpress profile	execution_time	138.976µs	139.631µs ± 0.288µs	139.566µs ± 0.152µs	139.767µs	140.175µs	140.551µs	140.823µs	0.90%	1.098	1.839	0.21%	0.020µs	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
benching string interning on wordpress profile	execution_time	[139.591µs; 139.670µs] or [-0.029%; +0.029%]	None	None	None

Group 13

cpu_model	git_commit_sha	git_commit_date	git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz	`022e6d8`	1737373504	ivoanjo/prof-9476-managed-string-storage-try3-clean

scenario	metric	min	mean ± sd	median ± mad	p75	p95	p99	max	peak_to_median_ratio	skewness	kurtosis	cv	sem	runs	sample_size
credit_card/is_card_number/	execution_time	4.273µs	4.289µs ± 0.008µs	4.288µs ± 0.001µs	4.290µs	4.293µs	4.296µs	4.395µs	2.48%	11.770	154.469	0.19%	0.001µs	1	200
credit_card/is_card_number/	throughput	227547397.011op/s	233158849.419op/s ± 423526.559op/s	233193791.888op/s ± 65754.033op/s	233256503.495op/s	233418250.866op/s	233566687.150op/s	234006723.706op/s	0.35%	-11.669	152.753	0.18%	29947.850op/s	1	200
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	89.214µs	90.675µs ± 0.643µs	90.631µs ± 0.369µs	91.017µs	91.581µs	91.916µs	95.849µs	5.76%	2.634	19.568	0.71%	0.045µs	1	200
credit_card/is_card_number/ 3782-8224-6310-005	throughput	10433074.086op/s	11028949.297op/s ± 76909.939op/s	11033745.123op/s ± 44897.710op/s	11074747.036op/s	11132431.892op/s	11189714.669op/s	11209015.522op/s	1.59%	-2.340	16.745	0.70%	5438.354op/s	1	200
credit_card/is_card_number/ 378282246310005	execution_time	83.220µs	83.716µs ± 0.389µs	83.699µs ± 0.136µs	83.826µs	83.939µs	84.057µs	88.612µs	5.87%	10.037	124.395	0.46%	0.027µs	1	200
credit_card/is_card_number/ 378282246310005	throughput	11285150.069op/s	11945416.059op/s ± 53027.861op/s	11947613.304op/s ± 19442.362op/s	11967794.666op/s	11989483.924op/s	11998883.541op/s	12016289.282op/s	0.57%	-9.680	118.522	0.44%	3749.636op/s	1	200
credit_card/is_card_number/37828224631	execution_time	4.277µs	4.289µs ± 0.004µs	4.289µs ± 0.001µs	4.291µs	4.294µs	4.296µs	4.339µs	1.17%	7.419	83.972	0.10%	0.000µs	1	200
credit_card/is_card_number/37828224631	throughput	230479443.695op/s	233141266.420op/s ± 233740.274op/s	233168720.428op/s ± 76468.716op/s	233232360.990op/s	233344968.013op/s	233453782.300op/s	233828976.564op/s	0.28%	-7.326	82.627	0.10%	16527.933op/s	1	200
credit_card/is_card_number/378282246310005	execution_time	80.579µs	80.879µs ± 0.101µs	80.883µs ± 0.055µs	80.936µs	81.021µs	81.131µs	81.288µs	0.50%	0.115	1.650	0.13%	0.007µs	1	200
credit_card/is_card_number/378282246310005	throughput	12301987.604op/s	12364165.808op/s ± 15494.561op/s	12363542.391op/s ± 8434.675op/s	12372712.614op/s	12390151.983op/s	12403497.762op/s	12410168.892op/s	0.38%	-0.101	1.633	0.13%	1095.631op/s	1	200
credit_card/is_card_number/37828224631000521389798	execution_time	58.522µs	58.690µs ± 0.051µs	58.683µs ± 0.027µs	58.713µs	58.784µs	58.851µs	58.866µs	0.31%	0.673	1.520	0.09%	0.004µs	1	200
credit_card/is_card_number/37828224631000521389798	throughput	16987807.423op/s	17038586.652op/s ± 14912.008op/s	17040796.864op/s ± 7743.657op/s	17046832.210op/s	17058903.396op/s	17068675.458op/s	17087693.443op/s	0.28%	-0.665	1.511	0.09%	1054.438op/s	1	200
credit_card/is_card_number/x371413321323331	execution_time	6.434µs	6.443µs ± 0.004µs	6.443µs ± 0.002µs	6.445µs	6.450µs	6.453µs	6.455µs	0.19%	0.440	1.363	0.05%	0.000µs	1	200
credit_card/is_card_number/x371413321323331	throughput	154919715.867op/s	155202483.510op/s ± 84550.174op/s	155215030.637op/s ± 36254.337op/s	155246513.291op/s	155339541.730op/s	155412232.658op/s	155427054.550op/s	0.14%	-0.435	1.359	0.05%	5978.600op/s	1	200
credit_card/is_card_number_no_luhn/	execution_time	4.274µs	4.288µs ± 0.003µs	4.288µs ± 0.001µs	4.290µs	4.293µs	4.295µs	4.298µs	0.24%	-0.153	3.932	0.06%	0.000µs	1	200
credit_card/is_card_number_no_luhn/	throughput	232642704.402op/s	233195878.523op/s ± 149720.722op/s	233203474.734op/s ± 78474.194op/s	233277261.429op/s	233418193.519op/s	233549640.627op/s	233957554.562op/s	0.32%	0.165	3.959	0.06%	10586.854op/s	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	68.942µs	69.734µs ± 0.399µs	69.701µs ± 0.291µs	70.026µs	70.392µs	70.776µs	70.855µs	1.66%	0.278	-0.341	0.57%	0.028µs	1	200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	14113336.956op/s	14340700.023op/s ± 82028.084op/s	14347031.608op/s ± 60068.214op/s	14398130.056op/s	14464907.691op/s	14499769.928op/s	14504965.181op/s	1.10%	-0.251	-0.365	0.57%	5800.261op/s	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	63.847µs	64.520µs ± 0.363µs	64.496µs ± 0.248µs	64.745µs	65.156µs	65.375µs	66.046µs	2.40%	0.763	0.632	0.56%	0.026µs	1	200
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	15140943.132op/s	15499598.479op/s ± 86771.121op/s	15504815.113op/s ± 59837.340op/s	15564624.320op/s	15614058.054op/s	15645110.200op/s	15662414.540op/s	1.02%	-0.729	0.520	0.56%	6135.645op/s	1	200
credit_card/is_card_number_no_luhn/37828224631	execution_time	4.270µs	4.288µs ± 0.003µs	4.288µs ± 0.002µs	4.290µs	4.292µs	4.294µs	4.294µs	0.15%	-1.215	8.490	0.06%	0.000µs	1	200
credit_card/is_card_number_no_luhn/37828224631	throughput	232860103.978op/s	233208703.680op/s ± 145648.164op/s	233218326.942op/s ± 85965.858op/s	233293391.965op/s	233417501.207op/s	233481674.285op/s	234176546.230op/s	0.41%	1.231	8.611	0.06%	10298.880op/s	1	200
credit_card/is_card_number_no_luhn/378282246310005	execution_time	61.533µs	61.928µs ± 0.085µs	61.942µs ± 0.047µs	61.981µs	62.041µs	62.076µs	62.084µs	0.23%	-0.987	2.019	0.14%	0.006µs	1	200
credit_card/is_card_number_no_luhn/378282246310005	throughput	16107107.092op/s	16147885.619op/s ± 22192.230op/s	16144210.458op/s ± 12329.742op/s	16159998.760op/s	16186264.831op/s	16209645.960op/s	16251396.667op/s	0.66%	1.000	2.068	0.14%	1569.228op/s	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	58.533µs	58.688µs ± 0.051µs	58.683µs ± 0.028µs	58.713µs	58.773µs	58.834µs	58.879µs	0.33%	0.456	1.283	0.09%	0.004µs	1	200
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	16984103.872op/s	17039212.644op/s ± 14901.678op/s	17040585.443op/s ± 8049.579op/s	17048265.335op/s	17059189.984op/s	17072614.798op/s	17084432.665op/s	0.26%	-0.448	1.276	0.09%	1053.708op/s	1	200
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	6.275µs	6.442µs ± 0.013µs	6.443µs ± 0.002µs	6.445µs	6.450µs	6.454µs	6.457µs	0.21%	-11.935	157.108	0.19%	0.001µs	1	200
credit_card/is_card_number_no_luhn/x371413321323331	throughput	154881672.993op/s	155221178.197op/s ± 309631.231op/s	155205635.982op/s ± 51650.369op/s	155258132.881op/s	155372234.982op/s	155434769.520op/s	159363350.593op/s	2.68%	12.032	158.794	0.20%	21894.234op/s	1	200

scenario	metric	95% CI mean	Shapiro-Wilk pvalue	Ljung-Box pvalue (lag=1)	Dip test pvalue
credit_card/is_card_number/	execution_time	[4.288µs; 4.290µs] or [-0.026%; +0.026%]	None	None	None
credit_card/is_card_number/	throughput	[233100152.712op/s; 233217546.127op/s] or [-0.025%; +0.025%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	execution_time	[90.586µs; 90.764µs] or [-0.098%; +0.098%]	None	None	None
credit_card/is_card_number/ 3782-8224-6310-005	throughput	[11018290.319op/s; 11039608.275op/s] or [-0.097%; +0.097%]	None	None	None
credit_card/is_card_number/ 378282246310005	execution_time	[83.662µs; 83.770µs] or [-0.064%; +0.064%]	None	None	None
credit_card/is_card_number/ 378282246310005	throughput	[11938066.908op/s; 11952765.211op/s] or [-0.062%; +0.062%]	None	None	None
credit_card/is_card_number/37828224631	execution_time	[4.289µs; 4.290µs] or [-0.014%; +0.014%]	None	None	None
credit_card/is_card_number/37828224631	throughput	[233108872.266op/s; 233173660.574op/s] or [-0.014%; +0.014%]	None	None	None
credit_card/is_card_number/378282246310005	execution_time	[80.865µs; 80.893µs] or [-0.017%; +0.017%]	None	None	None
credit_card/is_card_number/378282246310005	throughput	[12362018.411op/s; 12366313.205op/s] or [-0.017%; +0.017%]	None	None	None
credit_card/is_card_number/37828224631000521389798	execution_time	[58.683µs; 58.697µs] or [-0.012%; +0.012%]	None	None	None
credit_card/is_card_number/37828224631000521389798	throughput	[17036519.991op/s; 17040653.313op/s] or [-0.012%; +0.012%]	None	None	None
credit_card/is_card_number/x371413321323331	execution_time	[6.443µs; 6.444µs] or [-0.008%; +0.008%]	None	None	None
credit_card/is_card_number/x371413321323331	throughput	[155190765.669op/s; 155214201.351op/s] or [-0.008%; +0.008%]	None	None	None
credit_card/is_card_number_no_luhn/	execution_time	[4.288µs; 4.289µs] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number_no_luhn/	throughput	[233175128.671op/s; 233216628.376op/s] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	execution_time	[69.679µs; 69.789µs] or [-0.079%; +0.079%]	None	None	None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005	throughput	[14329331.719op/s; 14352068.326op/s] or [-0.079%; +0.079%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	execution_time	[64.470µs; 64.570µs] or [-0.078%; +0.078%]	None	None	None
credit_card/is_card_number_no_luhn/ 378282246310005	throughput	[15487572.836op/s; 15511624.121op/s] or [-0.078%; +0.078%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	execution_time	[4.288µs; 4.288µs] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631	throughput	[233188518.245op/s; 233228889.114op/s] or [-0.009%; +0.009%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	execution_time	[61.916µs; 61.940µs] or [-0.019%; +0.019%]	None	None	None
credit_card/is_card_number_no_luhn/378282246310005	throughput	[16144809.989op/s; 16150961.248op/s] or [-0.019%; +0.019%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	execution_time	[58.681µs; 58.695µs] or [-0.012%; +0.012%]	None	None	None
credit_card/is_card_number_no_luhn/37828224631000521389798	throughput	[17037147.414op/s; 17041277.873op/s] or [-0.012%; +0.012%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	execution_time	[6.441µs; 6.444µs] or [-0.027%; +0.027%]	None	None	None
credit_card/is_card_number_no_luhn/x371413321323331	throughput	[155178266.287op/s; 155264090.108op/s] or [-0.028%; +0.028%]	None	None	None

Baseline

Omitted due to size.

profiling-ffi/src/profiles.rs

profiling-ffi/src/string_storage.rs

gleocadie · 2024-11-12T16:57:19Z

profiling-ffi/src/string_storage.rs

+#[must_use]
+#[no_mangle]
+/// TODO: @ivoanjo Should this take a `*mut ManagedStringStorage` like Profile APIs do?
+pub unsafe extern "C" fn ddog_prof_ManagedStringStorage_advance_gen(


what does this function ?
It's not clear from the name what advance gen means. Does it mean increase the reference counter/generation number so it's not collected ?

I would appreciate some more exposition here too. This basically seems like a re-implementation of reference counted strings? Do we not use those std lib types for some specific reason? Edit: I mean internally. Obviously Rust "references" and lifetimes cannot be accurately tracked across FFI. Maybe they just provide the requisite API?

This is an excellent question. Quoting Alex's original notes:

String storage cleanup in the PoC is based on usage counts.

The initial implementation was not based on usage counts but on last_usage_gen: if we didn't use a string while building the current profile, we can drop it for the next profile. This led to crashes when we implemented the optimization to not report objects with age == 0 though (the interned strings associated with those objects would only be used after a GC but if a profile flush occurred in between the 2 events, the interned strings would become invalidated).

A trivial change to that would have been to only clean up when unused for x generations. But this felt brittle (what if we have a usecase where we'll skip objects with age < 10) and memory wasteful.

So yeah the original intent was to automatically clean up unused strings based on "was this used in the last profile or not".

This (almost) matches really well with heap profiling, because heap profiling is all about repeating a sample on every profile until the object gets collected, so this was a somewhat natural fit.

But, as Alex pointed out, this becomes a bit thornier as we have an optimization on the Ruby heap profiler to not report objects that haven't at least survived a single GC cycle.

Thus, in the current version, this is purely done based on the caller doing all the tracking work, rather than the generational approach.

I'd say the code is a bit... weird right now because it's not very confident on what to do here. So here's my questions to y'all:

Does the purely reference-counted mechanism work for you?

Would the usage-in-generation one work for you?

I'm thinking we could even easily support both (as a setting when the managed string table gets constructed), but I don't want to get too feature-happy if nobody's interested in it yet.

(I think once we settle this, it does make sense to re-examine if we can clean up the code as much as possible, as Levi pointed out -- I'm happy to have as little custom fancy stuff as possible)

nsrip-dd · 2024-11-25T12:12:59Z

Forgot to comment sooner, but I tried out using this in the Python profiler. Commit here, very WIP because I was trying out other approaches as well at the time. The biggest issue I ran into was around forking. If the program forks while the RwLock in the table is held, the child process will deadlock trying to access the table. I think we'd need to just make a new table on fork. Not the end of the world, but I guess just something to look out for.

ivoanjo · 2024-11-26T15:46:35Z

If the program forks while the RwLock in the table is held, the child process will deadlock trying to access the table. I think we'd need to just make a new table on fork. Not the end of the world, but I guess just something to look out for.

Yeap, this is a good point. I think even without the lock, it's probably not a great idea to do anything other than throw away the table after the fork.

(It would, on the other hand, be cool to have a lock-free implementation for this, but I fear the cost would probably not be great).

Would having a way to clear the table ignoring the lock work as a temporary solution for Python? How do y'all handle this issue for the regular profile data structures? Reset on fork as well?

codecov-commenter · 2024-11-26T16:25:26Z

Codecov Report

Attention: Patch coverage is 11.71171% with 392 lines in your changes missing coverage. Please review.

Project coverage is 70.72%. Comparing base (de64524) to head (022e6d8).

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #725      +/-   ##
==========================================
- Coverage   71.28%   70.72%   -0.57%     
==========================================
  Files         319      321       +2     
  Lines       46871    47300     +429     
==========================================
+ Hits        33414    33453      +39     
- Misses      13457    13847     +390

Components	Coverage Δ
crashtracker	`39.73% <ø> (-0.03%)`	⬇️
crashtracker-ffi	`5.74% <ø> (ø)`
datadog-alloc	`98.73% <ø> (ø)`
data-pipeline	`91.48% <ø> (ø)`
data-pipeline-ffi	`90.08% <ø> (ø)`
ddcommon	`80.24% <ø> (ø)`
ddcommon-ffi	`62.11% <ø> (ø)`
ddtelemetry	`59.51% <ø> (ø)`
ddtelemetry-ffi	`22.46% <ø> (ø)`
dogstatsd	`90.29% <ø> (ø)`
dogstatsd-client	`79.77% <ø> (ø)`
ipc	`82.69% <ø> (ø)`
profiling	`78.96% <11.71%> (-5.34%)`	⬇️
profiling-ffi	`67.66% <13.48%> (-9.90%)`	⬇️
serverless	`0.00% <ø> (ø)`
sidecar	`41.79% <ø> (ø)`
sidecar-ffi	`10.78% <ø> (ø)`
spawn-worker	`54.37% <ø> (ø)`
tinybytes	`93.60% <ø> (ø)`
trace-mini-agent	`72.48% <ø> (ø)`
trace-normalization	`98.23% <ø> (ø)`
trace-obfuscation	`95.96% <ø> (ø)`
trace-protobuf	`77.67% <ø> (ø)`
trace-utils	`94.13% <ø> (ø)`

r1viollet · 2024-12-18T09:45:05Z

Would everyone agree with merging this ?

It does not impact other user's workflows.
It allows us to experiment / measure with this approach.

nsrip-dd · 2024-12-18T12:22:30Z

Would everyone agree with merging this ?

Yeah. I followed up on this with @ivoanjo on slack but forgot to update here. It turns out we don't need this right now for Python so nothing about this needs to change on our account :)

profiling-ffi/build.rs

morrisonlevi · 2025-01-03T16:03:21Z

profiling-ffi/src/string_storage.rs

+}
+
+#[no_mangle]
+/// TODO: @ivoanjo Should this take a `*mut ManagedStringStorage` like Profile APIs do?


Ugh, this is something even in C I go back and forth on. It's one of those "do I trust the user or do I be more defensive?" things. Setting the C pointer to null makes it easier to debug when things go wrong, and can sometimes even prevent further things from going wrong. Sometimes it also makes it harder to debug because you don't get a use-after-free warning from ASAN, so that can swing both ways. But it's theoretically wholly wasted work because nobody should use the thing after it's been dropped...

Yeah, the mut option I think is nice since it means we're often returning an error back on the api calls wrongly, and since the client should handle those anyway, it means we're turning something that's a definitively a bug into a nice error message.

I still don't have the "answer" to this one. All I can say is that I made the others re-assign the pointer null on drop when it was feasible, so I clearly thought at the time that it was a good idea.

morrisonlevi · 2025-01-03T16:19:08Z

profiling-ffi/src/string_storage.rs

+
+#[must_use]
+#[no_mangle]
+/// TODO: Consider having a variant of intern (and unintern?) that takes an array as input, instead


Do you have a use case for this in your PoC for Ruby? If so, I'd do it, and if not, I'd pass.

I do! We're literally interning in a loop when I need to consume a whole stack. (Note: intern_or_raise here is just a nice helper to call ddog_prof_ManagedStringStorage_intern and check if there's an error in the result)

heap_stack* heap_stack_new(heap_recorder *recorder, ddog_prof_Slice_Location locations) { uint16_t frames_len = locations.len; // ...some error checking... heap_stack *stack = ruby_xcalloc(1, sizeof(heap_stack) + frames_len * sizeof(heap_frame)); stack->frames_len = frames_len; for (uint16_t i = 0; i < stack->frames_len; i++) { const ddog_prof_Location *location = &locations.ptr[i]; stack->frames[i] = (heap_frame) { .name = intern_or_raise(recorder->string_storage, location->function.name), .filename = intern_or_raise(recorder->string_storage, location->function.filename), // ddog_prof_Location is a int64_t. We don't expect to have to profile files with more than // 2M lines so this cast should be fairly safe? .line = (int32_t) location->line, }; } return stack; }

(from my working branch which is based off of DataDog/dd-trace-rb#3628 ).

My thinking is that, unlike most other libdatadog APIs where either a) Are very small but we don't call them very often (e.g. setup and reporting); b) We do a big chunk of work on every call (profile add), this API does c) Both very little work and gets called many times.

Thus, it seems like a prime candidate to turn C into B -> by having a more coarse-grained call that lowers the overhead cost of the ffi and locking.

profiling-ffi/src/string_storage.rs

profiling/src/collections/string_storage.rs

morrisonlevi · 2025-01-06T18:57:18Z

profiling/src/collections/string_storage.rs

+    cached_seq_num_for: Cell<Option<*const StringTable>>,
+    cached_seq_num: Cell<Option<StringId>>,


We are storing the cache on each string, but aren't these all added in a batch to the same string table? I think you said that you add all these managed strings to the Profile's string table just before serialization, right? Couldn't we perform a larger batch operation and store the cache there? That way the memory is only used on serialization rather than kept around but largely not being used.

As I as looking at this, I did a small tweak to store these as a tuple, rather than as separate entries, as they're related anyway -- 1f2b953 .

Your suggestion is interesting, but I'm curious how far were you thinking about the "large batch operation". In particular, were you thinking of moving the cache entirely away from the string table, to the profile? Or even to the caller of the profile?

profiling/src/collections/string_storage.rs

morrisonlevi

Not quite finished but publishing what I have.

profiling-ffi/src/profiles.rs

profiling-ffi/src/string_storage.rs

morrisonlevi

Is this still considered "experimental"? What's the rough plan to either revert it or make it non-experimental? Asking mostly because it it does intrude, albeit only a little, onto the FFI structs of the "main" thing.

Approved in general, I've blocked this long enough I think ^_^

morrisonlevi · 2025-01-16T20:32:06Z

profiling-ffi/src/string_storage.rs

+}
+
+#[no_mangle]
+/// TODO: @ivoanjo Should this take a `*mut ManagedStringStorage` like Profile APIs do?


I still don't have the "answer" to this one. All I can say is that I made the others re-assign the pointer null on drop when it was feasible, so I clearly thought at the time that it was a good idea.

ivoanjo · 2025-01-17T09:28:36Z

Is this still considered "experimental"? What's the rough plan to either revert it or make it non-experimental?

Good question. I guess my current thinking about calling it "experimental" is because it's a feature we may end up iterating a lot on, including breaking APIs and whatnot, so if anyone's building on top of this, I was trying to get that across.

Since the early benchmarks we had for Ruby showed good results for this I wouldn't expect this to be entirely reverted; at most maybe we could throw it in a Ruby-specific folder if it turns out this use-case is too specific and nobody else needs it.

Asking mostly because it it does intrude, albeit only a little, onto the FFI structs of the "main" thing.

My thinking is that it's still harmless, even if ffi callers don't properly zero out those fields, for two reasons:

The code that picks whether to use ids at the ffi level goes "is there a regular string? yes, then use that and don't look at the id"
If somehow we missed a spot and somehow the garbage ids get accidentally used instead of strings, then, because there's no string table installed (e.g. because only Ruby will have one), the FFI will report a clear "there's no string table" error, and we can use that to spot the issue.

To be honest I'm somewhat more annoyed about the duplication in the profile implementation, but there, because I think the right solution is to have separate structures (thus enforcing that you either have ids OR strings), the only way of avoiding the duplication would be to introduce a bunch of macros to "hide" the fact that we'd have two versions of the code with only the very minor differences. (At least given the current API....)

What's the rough plan [...]

I'd say:

Get this out for use in a stable libdatadog version
If we spot any issues with other libraries caused by these APIs, change them
Consider it no longer experimental once we've validated it and put out the first Ruby profiler version that's using this for heap profiling (although still very open for feedback/changes, if anyone also wants to use it)

Approved in general, I've blocked this long enough I think ^_^

cc @danielsn any concerns with me going ahead and merging the PR?

profiling-ffi/src/profiles.rs

profiling-ffi/src/string_storage.rs

danielsn · 2025-01-15T18:33:19Z

profiling-ffi/src/string_storage.rs

+/// TODO: Consider having a variant of intern (and unintern?) that takes an array as input, instead
+/// of just a single string at a time.
+/// TODO: @ivoanjo Should this take a `*mut ManagedStringStorage` like Profile APIs do?
+pub unsafe extern "C" fn ddog_prof_ManagedStringStorage_intern(


Should these have #safety comments?

I'm not sure they'd be especially helpful here.

In particular, the only comment I can think of is "the charslice needs to be valid or empty, and the managed string storage needs to be valid or null", which... IDK... seems to describe every function in our ffi that takes a pointer? 👀

profiling-ffi/src/string_storage.rs

profiling/src/collections/string_storage.rs

profiling/src/internal/profile/mod.rs

This will later allow us to introduce the new StringId code without disturbing existing API clients. This duplication is intended to be only an intermediate step to introducing support for StringId.

@AlexJF

Credit goes to @AlexJF, this is lifted from his earlier PR #607

…called with id 0 This is much nicer than having a weird panic in there.

…for now

…safer With the current structure of the code, the `expect` inside `resolve` should never fail; hopefully we don't introduce a bug in the future that changes this. (I know that ideally in Rust we would represent such constraints in the type system, but I don't think I could do so without a lot of other changes, so I decided to go for the more self-contained solution for now.)

In particular, in the unlikely event that we would overflow the id, we signal an error back to the caller rather than impacting the application. The caller is expected to stop using this string table and create a new one instead. In the future we may make it easy to do so, by e.g. having an API to create a new table from the existing strings or something like that.

This will enable us to propagate failures when a ManagedStringId is not known, which will improve debugability and code quality by allowing us to signal the error.

This string is supposed to live for as long as the managed string storage does. Treating it specially in intern matches what we do in other functions and ensures that we can never overflow the reference count (or something weird like that).

Adding it as `pub` was an oversight, since the intent is for this to be an inner helper that's only used by `intern` and by `new`. Having this as `pub` is quite dangerous as this method can easily be used to break a lot of the assumptions for the string storage.

There's currently nothing that can fail in this conversion, so let's take advantage of this in the code. (The `TryFrom` was somewhat of a leftover from copy/pasting these conversion functions from the variants that did need to deal with Strings, but in the id variants we can simplify).

This should be safer than the existing helper, since according to Levi it doesn't rely on the string being null-terminated (and I'm guessing doesn't need to measure it either).

I spotted during code review that this was incorrect -- `unintern` is a mutable operation on the managed string table (it decreases the refcount of items) so the write lock must be used for it.

Not sure if this was there from the beginning or the result of successive refactors, but yeah, we were unpacking and repacking the `ManagedStringId`s uselessly (the input and output types were the same!). I'm pretty sure the compiler would optimize all of this away -- in the end this is a struct with an int -- but our code sure does look better.

…ile_with_string_storage`

ivoanjo · 2025-01-20T11:45:50Z

I think the commit history for this PR is worth preserving. I'm preparing to merge this to master with a regular merge commit (not with a squash). As part of it, I'll push force a rebase so that there's no more "merge from master" commits.

ivoanjo requested a review from a team as a code owner November 11, 2024 15:41

github-actions bot added the profiling Relates to the profiling* modules. label Nov 11, 2024

ivoanjo mentioned this pull request Nov 11, 2024

[PROF-9476] Managed string storage for interning over several profiles #607

Closed

nsrip-dd reviewed Nov 11, 2024

View reviewed changes

profiling-ffi/src/profiles.rs Show resolved Hide resolved

gleocadie reviewed Nov 12, 2024

View reviewed changes

profiling-ffi/src/string_storage.rs Outdated Show resolved Hide resolved

gleocadie reviewed Nov 12, 2024

View reviewed changes

taegyunkim self-requested a review November 25, 2024 16:45

morrisonlevi reviewed Dec 20, 2024

View reviewed changes

profiling-ffi/build.rs Outdated Show resolved Hide resolved

morrisonlevi reviewed Jan 3, 2025

View reviewed changes

profiling-ffi/src/string_storage.rs Outdated Show resolved Hide resolved

morrisonlevi reviewed Jan 3, 2025

View reviewed changes

profiling/src/collections/string_storage.rs Outdated Show resolved Hide resolved

morrisonlevi reviewed Jan 6, 2025

View reviewed changes

profiling/src/collections/string_storage.rs Outdated Show resolved Hide resolved

morrisonlevi reviewed Jan 6, 2025

View reviewed changes

profiling/src/collections/string_storage.rs Outdated Show resolved Hide resolved

morrisonlevi reviewed Jan 6, 2025

View reviewed changes

morrisonlevi reviewed Jan 13, 2025

View reviewed changes

profiling/src/collections/string_storage.rs Show resolved Hide resolved

morrisonlevi reviewed Jan 14, 2025

View reviewed changes

morrisonlevi approved these changes Jan 16, 2025

View reviewed changes

danielsn approved these changes Jan 17, 2025

View reviewed changes

ivoanjo added 4 commits January 20, 2025 11:43

Introduce StringId copies of all functions that deal with Sample

df99296

This will later allow us to introduce the new StringId code without disturbing existing API clients. This duplication is intended to be only an intermediate step to introducing support for StringId.

Introduce StringStorage implementation

86bed3b

Credit goes to @AlexJF, this is lifted from his earlier PR #607

Introduce StringStorage instance into Profile

7b5d735

Ran cargo fmt

e90ce37

ivoanjo added 23 commits January 20, 2025 11:44

Cargo fmt

01de813

Use NonZeroU32 in managed string table functions that shouldn't be …

4f40005

…called with id 0 This is much nicer than having a weird panic in there.

Minor tweak to comment

56c40f0

Simplify cached field into tuple, as both are related

7c804d0

Remove last_usage_gen from string storage since we're not using it …

3dec9f8

…for now

Document that failure to acquire lock is unlikely

34ec388

Refactor: Allow Profile::resolve to fail

20b9e30

This will enable us to propagate failures when a ManagedStringId is not known, which will improve debugability and code quality by allowing us to signal the error.

Properly handle invalid ManagedStringIds by returning errors

46aa20b

Add fast-path for empty string interning

3a7a5fe

Make intern_new private

ec35f9e

Adding it as `pub` was an oversight, since the intent is for this to be an inner helper that's only used by `intern` and by `new`. Having this as `pub` is quite dangerous as this method can easily be used to break a lot of the assumptions for the string storage.

Simplify code using try_to_utf8 helper

b11171d

This should be safer than the existing helper, since according to Levi it doesn't rely on the string being null-terminated (and I'm guessing doesn't need to measure it either).

Tiny code simplification, thanks Levi for suggestion

fd35797

Document boolean argument on get_inner_string_storage

908a25a

Use write lock instead of read lock when uninterning

b6a6dfb

I spotted during code review that this was incorrect -- `unintern` is a mutable operation on the managed string table (it decreases the refcount of items) so the write lock must be used for it.

Replace for loops with list comprehension-style approach

0464dfe

Avoid duplication between ddog_prof_Profile_new and `ddog_prof_Prof…

e226ae8

…ile_with_string_storage`

Simplify empty string testing

fa3b04b

Replace expect call with returning an error

022e6d8

ivoanjo force-pushed the ivoanjo/prof-9476-managed-string-storage-try3-clean branch from 8de5289 to 022e6d8 Compare January 20, 2025 11:46

ivoanjo enabled auto-merge (rebase) January 20, 2025 11:48

ivoanjo merged commit 723277d into main Jan 20, 2025
31 checks passed

ivoanjo deleted the ivoanjo/prof-9476-managed-string-storage-try3-clean branch January 20, 2025 12:19

ivoanjo mentioned this pull request Jan 30, 2025

[PROF-9476] Reduce heap profiling overhead by using libdatadog's managed string storage DataDog/dd-trace-rb#4331

Merged

realFlowControl mentioned this pull request Apr 8, 2025

chore(prof): update libdatadog to v17 DataDog/dd-trace-php#3192

Merged

2 tasks

		cached_seq_num_for: Cell<Option<*const StringTable>>,
		cached_seq_num: Cell<Option<StringId>>,

[PROF-9476] Add experimental profiling managed string storage #725

[PROF-9476] Add experimental profiling managed string storage #725

Uh oh!

Conversation

ivoanjo commented Nov 11, 2024

What does this PR do?

Motivation

Additional Notes

How to test the change?

Uh oh!

pr-commenter bot commented Nov 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Comparison

Candidate

Group 1

Group 2

Group 3

Group 4

Group 5

Group 6

Group 7

Group 8

Group 9

Group 10

Group 11

Group 12

Group 13

Baseline

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

morrisonlevi Nov 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nsrip-dd commented Nov 25, 2024

Uh oh!

ivoanjo commented Nov 26, 2024

Uh oh!

codecov-commenter commented Nov 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

r1viollet commented Dec 18, 2024

Uh oh!

nsrip-dd commented Dec 18, 2024

Uh oh!

Uh oh!

morrisonlevi Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ivoanjo Jan 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

morrisonlevi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pr-commenter bot commented Nov 11, 2024 •

edited

Loading

morrisonlevi Nov 25, 2024 •

edited

Loading

codecov-commenter commented Nov 26, 2024 •

edited

Loading

morrisonlevi Jan 3, 2025 •

edited

Loading

ivoanjo Jan 6, 2025 •

edited

Loading