Helper agg init: remove copy of report share data. #3303

branlwyd · 2024-07-13T00:31:38Z

Previously, we copied report share data into the transaction callback (re-copying it on transaction retry). Notably, this required us to keep two copies of the output shares in memory. This was done to allow modification of the report aggregations based on information retrieved from the datastore, i.e. if the report was replayed.

Now, we remove the copy by passing an Arc of the relevant data into the transaction callback. Modification of the report aggregation is implemented via a Cow, which will be borrowed from the data in the Arc in the common case of no modification. The aggregation job writer is augmented with the ability to accept report aggregations already stored in a Cow, and is taught to be smart enough not to double-wrap Cow around its already-Cow'ed internal state.

Closes #3285.

Previously, we copied report share data into the transaction callback (re-copying it on transaction retry). Notably, this required us to keep two copies of the output shares in memory. This was done to allow modification of the report aggregations based on information retrieved from the datastore, i.e. if the report was replayed. Now, we remove the copy by passing an Arc of the relevant data into the transaction callback. Modification of the report aggregation is implemented via a Cow, which will be borrowed from the data in the Arc in the common case of no modification. The aggregation job writer is augmented with the ability to accept report aggregations already stored in a Cow, and is taught to be smart enough not to double-wrap Cow around its already-Cow'ed internal state.

inahga

Changes LGTM and I think they accomplish the goal.

Do you have memory profiles that show improvement? Since a few LGTMs are not necessarily evidence of it working. Broadly, it would be nice to document how we're profiling, if for no other reason than I'm unfamiliar with Rust profiling strategies (but I don't think that should hold up this PR).

inahga · 2024-07-15T16:39:01Z

aggregator/src/aggregator/aggregation_job_writer.rs

@@ -507,7 +507,7 @@ where
                            aggregation_job: Cow::Borrowed(aggregation_job),
                            report_aggregations: report_aggregations
                                .iter()
-                                .map(Cow::Borrowed)
+                                .map(RA::borrow)


Check my understanding: at this point, report_aggregations is (roughly) a Vec<impl ReportAggregationUpdate>, which, concretely, could be a struct ReportAggregationMetadata or a Cow of it?

In which case a naive map(Cow::Borrowed) would double Cow a Cow<'_, ReportAggregationMetadata>, so we need a trait method to avoid this.

That's correct. The implementation of ReportAggregationUpdate over Cow exists to allow a Cow<WritableReportAggregation> (or a Cow of anything else that implements ReportAggregationUpdate) to be provided by the user of AggregationJobWriter, without needing to deal with Cow<Cow<T>> internally.

inahga · 2024-07-15T16:42:27Z

aggregator/src/aggregator.rs

@@ -2373,24 +2375,23 @@ impl VdafOps {
                    }

                    // Write report shares, and ensure this isn't a repeated report aggregation.
-                    try_join_all(report_share_data.iter_mut().map(|rsd| {
+                    let report_aggregations = try_join_all(report_share_data.iter().map(|rsd| {
                        let task = Arc::clone(&task);


Can we accomplish the goal of the PR by doing .iter_mut() and modifying the elements of ReportShareData in place? We would need to at least wrap it in a std::sync::Mutex for this to work, but it would avoid copying in all cases and save a bunch of code.

(Just checking if you've considered it, it probably is more involved of a change than I'm thinking).

We can't modify report_share_data: it's in an Arc to avoid copying the ReportShareData inside the Vec, and Arc doesn't allow mutable references. Or, to look at it another way, we can't modify report_share_data because we might need the unmodified value if we retry the transaction callback.

(We used to be able to modify report_share_data, but that's because we were cloning the entire Vec & its contents with each transaction retry -- which is the copying we are trying to avoid with this PR.)

FWIW I was able to coerce my suggestion into "working" by having report_share_data being in an Arc<tokio::sync::Mutex<_>>, but it then becomes difficult to take a .report_aggregation for use in the AggregationJobWriter without .clone()ing it--i.e. I landed on needing to pass around a Cow anyway. So I think your approach is super-correct 👍. (Not that I doubted it, was just curious where exactly Mutex breaks down).

Yeah -- N.B. the problem with Arc<Mutex<Vec<T>>> is that, without a Cow, ultimately we would be modifying the report_share_data, which would remain modified in the case of a transaction retry (and on retry, we want to start from the same data as in the previous attempt, to ensure correctness).

As you note, we could do Arc<Mutex<Vec<Cow<T>>>>, but at that point, I think we're better off just generating a new Vec<Cow<T>> based off of the report_share_data since we need to modify the vector anyway.

divergentdave · 2024-07-15T16:54:03Z

I re-tested this and it demonstrated the desired drop in peak memory usage. (something a bit below 50%) I merged https://github.com/divergentdave/janus/tree/david/experiment-dhat-harness-2 onto this branch, built it in release mode with debug symbols, populated test fixture files by running the different subcommands several times, and then ran valgrind --tool=dhat --dhat-out-file=dhat.out.helper-arc-cow ./target/profiling/profiler_harness helper. I popped the resulting file into the DHAT viewer static webapp, and looked at the "At t-gmax (bytes)" metric.

branlwyd · 2024-07-15T16:58:44Z

Thank you for profiling this!

branlwyd requested a review from a team as a code owner July 13, 2024 00:31

branlwyd changed the title ~~Helper agg init: Remove copy of report share data.~~ Helper agg init: remove copy of report share data. Jul 13, 2024

tgeoghegan approved these changes Jul 15, 2024

View reviewed changes

inahga approved these changes Jul 15, 2024

View reviewed changes

branlwyd merged commit 7e6eca7 into main Jul 15, 2024
9 checks passed

branlwyd deleted the bran/remove-output-share-copy branch July 15, 2024 17:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Helper agg init: remove copy of report share data. #3303

Helper agg init: remove copy of report share data. #3303

branlwyd commented Jul 13, 2024

inahga left a comment •

edited

Loading

inahga Jul 15, 2024

branlwyd Jul 15, 2024 •

edited

Loading

inahga Jul 15, 2024 •

edited

Loading

branlwyd Jul 15, 2024 •

edited

Loading

inahga Jul 15, 2024 •

edited

Loading

branlwyd Jul 15, 2024

divergentdave commented Jul 15, 2024

branlwyd commented Jul 15, 2024

Helper agg init: remove copy of report share data. #3303

Helper agg init: remove copy of report share data. #3303

Conversation

branlwyd commented Jul 13, 2024

inahga left a comment • edited Loading

Choose a reason for hiding this comment

inahga Jul 15, 2024

Choose a reason for hiding this comment

branlwyd Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

inahga Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

branlwyd Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

inahga Jul 15, 2024 • edited Loading

Choose a reason for hiding this comment

branlwyd Jul 15, 2024

Choose a reason for hiding this comment

divergentdave commented Jul 15, 2024

branlwyd commented Jul 15, 2024

inahga left a comment •

edited

Loading

branlwyd Jul 15, 2024 •

edited

Loading

inahga Jul 15, 2024 •

edited

Loading

branlwyd Jul 15, 2024 •

edited

Loading

inahga Jul 15, 2024 •

edited

Loading